Chaitanya's Random Pages

March 26, 2014

Inverse variance weighting form of the conditional covariance of multivariate Gaussian vectors

Filed under: mathematics — ckrao @ 5:48 am

Let X and V be independent zero-mean real Gaussian vectors of respective length m and n with respective invertible covariance matrices \Sigma_X and \Sigma_V. Let C be a full-rank m \times n matrix and define Y by

\displaystyle Y = CX + V.\quad\quad(1)

If we are given the vector X we know that Y will be Gaussian with mean CX and covariance \Sigma_V.

However suppose we are given Y and wish to find the conditional distribution of X|Y. Here we may think of X as a hidden variable and Y as the observed variable. In this case the result is a little more involved. If we recover the results of this earlier blog post, (X^T,Y^T) is jointly Gaussian and so X|Y is Gaussian with mean

\displaystyle E[X|Y] = E[X] + \text{cov}(X,Y)(\text{cov}(Y))^{-1}(Y - E[Y])\quad\quad(2)

and covariance

\displaystyle \text{cov}(X|Y) = \text{cov}(X) - \text{cov}(X,Y)\text{cov}(Y)^{-1}\text{cov}(Y,X).\quad\quad(3)

(Here \text{cov}(A,B) := E[AB^T] is the cross-covariance of A and B, while \text{cov}(A):= \text{cov}(A,A).)

Using the fact that E(X) = 0, E(Y) = 0, \text{cov}(X) = \Sigma_X,

\text{cov}(X,Y) = E[X(CX+V)^T] = E[XX^T]C^T = \Sigma_X C^T

and \text{cov}(Y) = C\Sigma_X C^T + \Sigma_V, (2) and (3) become

\begin{aligned} E[X|Y] &= \Sigma_X C^T(C\Sigma_X C^T + \Sigma_V)^{-1}Y,\quad\quad&(4)\\ \text{cov}(X|Y) &= \Sigma_X - \Sigma_X C^T(C\Sigma_X C^T + \Sigma_V)^{-1}C\Sigma_x. \quad\quad&(5)\end{aligned}

In this post we also derive the following alternative expressions (also described here) which the covariances appear as inverse matrices.

\boxed{ \begin{aligned} E[X|Y] &= \Sigma C^T \Sigma_V^{-1} Y\quad\quad&(6)\\ \text{cov}(X|Y) &= \Sigma,\quad\quad&(7)\\\text{where}&&\\ \Sigma &:= (\Sigma_X^{-1} + C^T \Sigma_V^{-1} C)^{-1}.\quad\quad&(8)\end{aligned} }

Note that in the scalar case y = cx + v with variances \sigma_x^2 and \sigma_v^2 (5) and (7) become the identity

\displaystyle \sigma_x^2 - \frac{\sigma_x^4c^2}{c^2 \sigma_x^2 + \sigma_v^2} = (\sigma_x^{-2} + c^2\sigma_v^{-2})^{-1}.\quad\quad(9)

In the case where y is a scalar and C is a diagonal matrix, y is a weighted sum of the elements of vector X and (8) becomes the inverse of a sum of inverses of variances (inverse-variance weighting).

One can check algebraically that the expressions (4),(6) and (5),(7) are equivalent in the matrix case, or we may proceed as follows.

Let V = \left[ \begin{array}{cc} V_{11} & V_{12}\\ V_{21} & V_{22} \end{array} \right] = \left[ \begin{array}{cc}E(XX^T) & E(XY^T)\\ E(YX^T) & E(YY^T) \end{array} \right] be the covariance matrix of the joint vector \left[ \begin{array}{c} X\\ Y \end{array} \right]. Then since Y = CX + V we have V_{11} = \Sigma_X and

\begin{aligned}  V_{12} &= V_{21}^T\\  &= E(XY^T)\\  &= E(X(CX + V)^T\\  &= EXX^T C^T + EXV^T\\  &= \Sigma_X C^T. \quad\quad(10)  \end{aligned}

Then as the Gaussian vector \left[ \begin{array}{c} X\\ Y \end{array} \right] is zero-mean with covariance V, the joint pdf of X and Y is proportional to \exp \left(-\frac{1}{2} [X^T Y^T]V^{-1}\left[ \begin{array}{c} X\\ Y \end{array} \right] \right).

The key step now is to make use of the following identity (also see this explanation) based on completion of squares:

\displaystyle  \exp \left(-\frac{1}{2} [X^T Y^T]V^{-1}\left[ \begin{array}{c} X\\ Y \end{array} \right] \right)  = \exp\left( -\frac{1}{2} (X^T - Y^TA^T) S_{22}^{-1}(X-AY)\right) \exp\left( -\frac{1}{2} Y^T V_{22}^{-1} Y\right),\quad\quad(11)

where A and S_{22} are matrices, defined similarly to A and s in the scalar equation

\displaystyle ax^2 + 2bxy + cy^2 = a(x-Ay)^2 + sy^2.

We will show that A = V_{12}V_{22}^{-1} and S_{22} = V_{11} - V_{12}V_{22}^{-1}V_{21} (S_{22} is the Schur complement of V_{22} in V also discussed in this previous blog post).

The second term in the right side of (11) is proportional to the pdf of Y (being Gaussian) p(Y), so the first term must be proportional to the conditional pdf p(X|Y). We are left to find S_{22} = \text{cov}(X|Y) and AY = E(X|Y).

From (11), S_{22}^{-1} is the top-left block of V^{-1} while -S_{22}^{-1}A is the top-right block of V^{-1}. To find these blocks, consider the block matrix equation

\displaystyle V \left[ \begin{array}{c} r\\s \end{array} \right] = \left[ \begin{array}{c} a\\b \end{array} \right] \quad \Rightarrow \quad \left[ \begin{array}{c} r\\s \end{array} \right] = V^{-1} \left[ \begin{array}{c} a\\b \end{array} \right]. \quad\quad(12)

This is the same as the system of equations

\displaystyle \begin{aligned}  V_{11} r + V_{12} s &= a, \quad\quad&(13)\\  V_{21} r + V_{22} s &= b. \quad \quad&(14)\\  \end{aligned}

Multiplying (13) by V_{21}V_{11}^{-1} gives

\displaystyle V_{21}r + V_{21}V_{11}^{-1}V_{12}s = V_{21}V_{11}^{-1}a.

Subtracting this from (14) gives

\begin{aligned} (V_{22} - V_{21}V_{11}^{-1}V_{12})s &= b - V_{21}V_{11}^{-1}a\\ \Rightarrow s &= (V_{22} - V_{21}V_{11}^{-1}V_{12})^{-1}b - (V_{22} - V_{21}V_{11}^{-1}V_{12})V_{21}V_{11}^{-1}a.\quad \quad(15)\end{aligned}

Since \left[ \begin{array}{c} r\\s \end{array} \right] = V^{-1} \left[ \begin{array}{c} a\\b \end{array} \right] the coefficients of a and b in (15) are the bottom-left and bottom-right blocks of V^{-1} respectively. We may then write S_{11} := V_{22} - V_{21}V_{11}^{-1}V_{12} and so from (15)

\displaystyle S_{11} s = b - V_{21}V_{11}^{-1}a.\quad \quad(16)

Also we have from (13)

\displaystyle r = V_{11}^{-1} a - V_{11}^{-1} V_{12}s.\quad\quad (17)

Using (16) this becomes

\begin{aligned}  r &= V_{11}^{-1} a - V_{11}^{-1} V_{12} S_{11}^{-1}(b - V_{21}V_{11}^{-1}a)\\  &=(V_{11}^{-1} + V_{11}^{-1} V_{12} S_{11}^{-1} V_{21} V_{11}^{-1} )a - V_{11}^{-1} V_{12} S_{11}^{-1}b.\quad\quad(18)  \end{aligned}

Note that analogous to (16) we could have written

\displaystyle S_{22} r = a - V_{12}V_{22}^{-1} b \Rightarrow r = S_{22}^{-1} a -S_{22}^{-1}V_{12}V_{22}^{-1}b .\quad\quad(19)

Comparing coefficients of a, b of (18) and (19) gives

\displaystyle S_{22}^{-1} = V_{11}^{-1} + V_{11}^{-1} V_{12} S_{11}^{-1} V_{21} V_{11}^{-1}\quad\quad(20)

and

\displaystyle V_{11}^{-1} V_{12} S_{11}^{-1} = S_{22}^{-1} V_{12}V_{22}^{-1}.\quad\quad(21)

In the same way that S_{22} = \text{cov}(X|Y), S_{11} = \text{cov}(Y|X), which for Y = CX + V is simply \Sigma_V as we saw before.

Hence from (20),

\begin{aligned}  \Sigma^{-1} = S_{22}^{-1} &= V_{11}^{-1} + V_{11}^{-1} V_{12} S_{11}^{-1} V_{21} V_{11}^{-1}\\  &= \Sigma_X^{-1} + \Sigma_X^{-1}\Sigma_X C^T \Sigma_V^{-1} C \Sigma_X^{-1} \Sigma_X\\  &= \Sigma_X^{-1} + C^T \Sigma_V^{-1} C  \end{aligned}

and from the b-coefficient of (18),

\begin{aligned}  S_{22}^{-1}A &= V_{11}^{-1} V_{12} S_{11}^{-1}\\  \Rightarrow A &= S_{22} V_{11}^{-1} V_{12} S_{11}^{-1}\\  &= S_{22} \Sigma_X^{-1} \Sigma_X C^T \Sigma_V^{-1}\\  &= \Sigma C^T \Sigma_V^{-1}.  \end{aligned}

Hence E[X|Y] = AY = \Sigma C^T \Sigma_V^{-1} Y as desired, and equations (6)-(8) have been verified.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: