# Chaitanya's Random Pages

## March 26, 2014

### Inverse variance weighting form of the conditional covariance of multivariate Gaussian vectors

Filed under: mathematics — ckrao @ 5:48 am

Let $X$ and $V$ be independent zero-mean real Gaussian vectors of respective length $m$ and $n$ with respective invertible covariance matrices $\Sigma_X$ and $\Sigma_V$. Let $C$ be a full-rank $m \times n$ matrix and define $Y$ by $\displaystyle Y = CX + V.\quad\quad(1)$

If we are given the vector $X$ we know that $Y$ will be Gaussian with mean $CX$ and covariance $\Sigma_V$.

However suppose we are given $Y$ and wish to find the conditional distribution of $X|Y$. Here we may think of $X$ as a hidden variable and $Y$ as the observed variable. In this case the result is a little more involved. If we recover the results of this earlier blog post, $(X^T,Y^T)$ is jointly Gaussian and so $X|Y$ is Gaussian with mean $\displaystyle E[X|Y] = E[X] + \text{cov}(X,Y)(\text{cov}(Y))^{-1}(Y - E[Y])\quad\quad(2)$

and covariance $\displaystyle \text{cov}(X|Y) = \text{cov}(X) - \text{cov}(X,Y)\text{cov}(Y)^{-1}\text{cov}(Y,X).\quad\quad(3)$

(Here $\text{cov}(A,B) := E[AB^T]$ is the cross-covariance of $A$ and $B$, while $\text{cov}(A):= \text{cov}(A,A)$.)

Using the fact that $E(X) = 0$, $E(Y) = 0$, $\text{cov}(X) = \Sigma_X$, $\text{cov}(X,Y) = E[X(CX+V)^T] = E[XX^T]C^T = \Sigma_X C^T$

and $\text{cov}(Y) = C\Sigma_X C^T + \Sigma_V$, (2) and (3) become \begin{aligned} E[X|Y] &= \Sigma_X C^T(C\Sigma_X C^T + \Sigma_V)^{-1}Y,\quad\quad&(4)\\ \text{cov}(X|Y) &= \Sigma_X - \Sigma_X C^T(C\Sigma_X C^T + \Sigma_V)^{-1}C\Sigma_x. \quad\quad&(5)\end{aligned}

In this post we also derive the following alternative expressions (also described here) which the covariances appear as inverse matrices. \boxed{ \begin{aligned} E[X|Y] &= \Sigma C^T \Sigma_V^{-1} Y\quad\quad&(6)\\ \text{cov}(X|Y) &= \Sigma,\quad\quad&(7)\\\text{where}&&\\ \Sigma &:= (\Sigma_X^{-1} + C^T \Sigma_V^{-1} C)^{-1}.\quad\quad&(8)\end{aligned} }

Note that in the scalar case $y = cx + v$ with variances $\sigma_x^2$ and $\sigma_v^2$ (5) and (7) become the identity $\displaystyle \sigma_x^2 - \frac{\sigma_x^4c^2}{c^2 \sigma_x^2 + \sigma_v^2} = (\sigma_x^{-2} + c^2\sigma_v^{-2})^{-1}.\quad\quad(9)$

In the case where $y$ is a scalar and $C$ is a diagonal matrix, $y$ is a weighted sum of the elements of vector $X$ and (8) becomes the inverse of a sum of inverses of variances (inverse-variance weighting).

One can check algebraically that the expressions (4),(6) and (5),(7) are equivalent in the matrix case, or we may proceed as follows.

Let $V = \left[ \begin{array}{cc} V_{11} & V_{12}\\ V_{21} & V_{22} \end{array} \right] = \left[ \begin{array}{cc}E(XX^T) & E(XY^T)\\ E(YX^T) & E(YY^T) \end{array} \right]$ be the covariance matrix of the joint vector $\left[ \begin{array}{c} X\\ Y \end{array} \right]$. Then since $Y = CX + V$ we have $V_{11} = \Sigma_X$ and \begin{aligned} V_{12} &= V_{21}^T\\ &= E(XY^T)\\ &= E(X(CX + V)^T\\ &= EXX^T C^T + EXV^T\\ &= \Sigma_X C^T. \quad\quad(10) \end{aligned}

Then as the Gaussian vector $\left[ \begin{array}{c} X\\ Y \end{array} \right]$ is zero-mean with covariance $V$, the joint pdf of $X$ and $Y$ is proportional to $\exp \left(-\frac{1}{2} [X^T Y^T]V^{-1}\left[ \begin{array}{c} X\\ Y \end{array} \right] \right)$.

The key step now is to make use of the following identity (also see this explanation) based on completion of squares: $\displaystyle \exp \left(-\frac{1}{2} [X^T Y^T]V^{-1}\left[ \begin{array}{c} X\\ Y \end{array} \right] \right) = \exp\left( -\frac{1}{2} (X^T - Y^TA^T) S_{22}^{-1}(X-AY)\right) \exp\left( -\frac{1}{2} Y^T V_{22}^{-1} Y\right),\quad\quad(11)$

where $A$ and $S_{22}$ are matrices, defined similarly to $A$ and $s$ in the scalar equation $\displaystyle ax^2 + 2bxy + cy^2 = a(x-Ay)^2 + sy^2.$

We will show that $A = V_{12}V_{22}^{-1}$ and $S_{22} = V_{11} - V_{12}V_{22}^{-1}V_{21}$ ( $S_{22}$ is the Schur complement of $V_{22}$ in $V$ also discussed in this previous blog post).

The second term in the right side of (11) is proportional to the pdf of $Y$ (being Gaussian) $p(Y)$, so the first term must be proportional to the conditional pdf $p(X|Y)$. We are left to find $S_{22} = \text{cov}(X|Y)$ and $AY = E(X|Y)$.

From (11), $S_{22}^{-1}$ is the top-left block of $V^{-1}$ while $-S_{22}^{-1}A$ is the top-right block of $V^{-1}$. To find these blocks, consider the block matrix equation $\displaystyle V \left[ \begin{array}{c} r\\s \end{array} \right] = \left[ \begin{array}{c} a\\b \end{array} \right] \quad \Rightarrow \quad \left[ \begin{array}{c} r\\s \end{array} \right] = V^{-1} \left[ \begin{array}{c} a\\b \end{array} \right]. \quad\quad(12)$

This is the same as the system of equations \displaystyle \begin{aligned} V_{11} r + V_{12} s &= a, \quad\quad&(13)\\ V_{21} r + V_{22} s &= b. \quad \quad&(14)\\ \end{aligned}

Multiplying (13) by $V_{21}V_{11}^{-1}$ gives $\displaystyle V_{21}r + V_{21}V_{11}^{-1}V_{12}s = V_{21}V_{11}^{-1}a.$

Subtracting this from (14) gives \begin{aligned} (V_{22} - V_{21}V_{11}^{-1}V_{12})s &= b - V_{21}V_{11}^{-1}a\\ \Rightarrow s &= (V_{22} - V_{21}V_{11}^{-1}V_{12})^{-1}b - (V_{22} - V_{21}V_{11}^{-1}V_{12})V_{21}V_{11}^{-1}a.\quad \quad(15)\end{aligned}

Since $\left[ \begin{array}{c} r\\s \end{array} \right] = V^{-1} \left[ \begin{array}{c} a\\b \end{array} \right]$ the coefficients of $a$ and $b$ in (15) are the bottom-left and bottom-right blocks of $V^{-1}$ respectively. We may then write $S_{11} := V_{22} - V_{21}V_{11}^{-1}V_{12}$ and so from (15) $\displaystyle S_{11} s = b - V_{21}V_{11}^{-1}a.\quad \quad(16)$

Also we have from (13) $\displaystyle r = V_{11}^{-1} a - V_{11}^{-1} V_{12}s.\quad\quad (17)$

Using (16) this becomes \begin{aligned} r &= V_{11}^{-1} a - V_{11}^{-1} V_{12} S_{11}^{-1}(b - V_{21}V_{11}^{-1}a)\\ &=(V_{11}^{-1} + V_{11}^{-1} V_{12} S_{11}^{-1} V_{21} V_{11}^{-1} )a - V_{11}^{-1} V_{12} S_{11}^{-1}b.\quad\quad(18) \end{aligned}

Note that analogous to (16) we could have written $\displaystyle S_{22} r = a - V_{12}V_{22}^{-1} b \Rightarrow r = S_{22}^{-1} a -S_{22}^{-1}V_{12}V_{22}^{-1}b .\quad\quad(19)$

Comparing coefficients of $a$, $b$ of (18) and (19) gives $\displaystyle S_{22}^{-1} = V_{11}^{-1} + V_{11}^{-1} V_{12} S_{11}^{-1} V_{21} V_{11}^{-1}\quad\quad(20)$

and $\displaystyle V_{11}^{-1} V_{12} S_{11}^{-1} = S_{22}^{-1} V_{12}V_{22}^{-1}.\quad\quad(21)$

In the same way that $S_{22} = \text{cov}(X|Y)$, $S_{11} = \text{cov}(Y|X)$, which for $Y = CX + V$ is simply $\Sigma_V$ as we saw before.

Hence from (20), \begin{aligned} \Sigma^{-1} = S_{22}^{-1} &= V_{11}^{-1} + V_{11}^{-1} V_{12} S_{11}^{-1} V_{21} V_{11}^{-1}\\ &= \Sigma_X^{-1} + \Sigma_X^{-1}\Sigma_X C^T \Sigma_V^{-1} C \Sigma_X^{-1} \Sigma_X\\ &= \Sigma_X^{-1} + C^T \Sigma_V^{-1} C \end{aligned}

and from the $b$-coefficient of (18), \begin{aligned} S_{22}^{-1}A &= V_{11}^{-1} V_{12} S_{11}^{-1}\\ \Rightarrow A &= S_{22} V_{11}^{-1} V_{12} S_{11}^{-1}\\ &= S_{22} \Sigma_X^{-1} \Sigma_X C^T \Sigma_V^{-1}\\ &= \Sigma C^T \Sigma_V^{-1}. \end{aligned}

Hence $E[X|Y] = AY = \Sigma C^T \Sigma_V^{-1} Y$ as desired, and equations (6)-(8) have been verified.