Chaitanya's Random Pages

December 30, 2011

The distribution of Melbourne’s maximum temperatures

Filed under: climate and weather — ckrao @ 11:26 pm

Here I have generated histograms by month of the maximum temperatures of my home city Melbourne (Australia) for the forty-year period of 1971-2010. The temperatures are collected into temperature bins of size 2 degrees Celsius. The most frequent temperature range is 14-16°C in the cooler months (June-Sep) up to 20-22°C in the warmer months (Dec-Mar). The moderating influence of the Southern Ocean is the primary cause of this relatively small difference. However in the summer months hot winds from the interior of the continent skew the distribution and push the mean maximum closer to 25°C. Curiously in January the maximum has been as likely to be 36-38°C as 30-32°C.

One can see how bunched up the maximum temperatures are in the winter (never too cold) compared with the summer (sometimes too hot!). Since the data is collected over a forty year period, simply divide the y value by 40 to see how many days per month one achieves a particular range. For example, in the month of May between 8 and 9 days per month the maximum temperatures has been between 16 and 18°C.

Finally here is the above data aggregated.

Here we see the most common range is 14-16°C which has happened about 57 days per year (47 of these between May and Sep) and 81% of the time it has been between 12 and 26°C (and 17% of the time above 26°C). The overall range is 7.0 to 46.4°C. On average the maximum temperature has exceeded 40°C 1.5 times per year and been less than 10°C 0.8 times per year.

The data for this post comes from

The results of Fuss and Carlitz for bicentric quadrilaterals

Filed under: mathematics — ckrao @ 4:47 am

Any triangle has both an incircle (tangent to its sides) and circumcircle (through its vertices).

The distance d between the centres O, I of these circles is related to their radii r (for the incircle) and R (for the circumcircle) via Euler’s theorem in geometry:

\displaystyle d^2 = R^2 - 2Rr

or the equivalent formula

\displaystyle \frac{1}{R-d} + \frac{1}{R+d} = \frac{1}{r}.

A convex quadrilateral however need not have an incircle or circumcircle in general. For it to have an incircle (i.e. a tangential quadrilateral), its pairs of opposite sides must have the same sum. For it to have a circumcircle (i.e. a cyclic quadrilateral), its opposite angles must be supplementary. One that satisfies both of these properties is called a bicentric quadrilateral.

Remarkably, if we define d, r and R for a bicentric quadrilateral analogous to the triangular case, we have formulas similar to Euler’s due to Nicolaus Fuss and Leonard Carlitz:

Carlitz: \displaystyle d^2 = R^2 - 2Rr.\mu,

where \displaystyle \mu = \sqrt{\frac{(ab+cd)(ad + bc)}{(a+c)^2(ac+bd)}} and a,b,c,d are the quadrilateral’s side lengths.

Fuss: \displaystyle \frac{1}{(R-d)^2} + \frac{1}{(R+d)^2} = \frac{1}{r^2}

Isn’t it cool how similar these formulas look for the triangular and quadrilateral cases?! Both of these results (and indeed Euler’s theorem) follow from the intersecting chords theorem. Fuss’s theorem can be shown by proving that both sides of the equation are equal to \displaystyle \frac{1}{AI^2} + \frac{1}{CI^2}. Carlitz’s identity can be shown via the preliminary result that \mu = IE/BE, where E is the intersection of line AI with the circumcircle. A beautiful proof of Fuss’s theorem due to Salazar (using little more than a short angle chase and the theorems of Pythagoras and Apollonius) is found in [1] while that of Carlitz can be seen from p154 of [2].

By Poncelet’s Porism, it is true that if we start with two circles of radius r and R with distance d between their centres such that the Fuss equation \frac{1}{(R-d)^2} + \frac{1}{(R+d)^2} = \frac{1}{r^2} is satisfied, then a bicentric quadrilateral with those parameters may be constructed. Simply start at any point on the circumcircle and successively draw tangents to the incircle to generate the other three vertices of the quadrilateral. This was how I drew the above bicentric quadrilateral.


[1] A. Bogomolny, Fuss’ Theorem from Interactive Mathematics Miscellany and Puzzles, Accessed 30 December 2011.

[2] O. Calin, Euclidean and Non-Euclidean Geometry: a metric approach, available at

December 15, 2011

Australia in close test match finishes

Filed under: cricket,sport — ckrao @ 10:59 am

Following Australia’s recent narrow loss to New Zealand I had a look at this page and found that Australia features in 15 out of the 17 closest finishes (measured by runs) in a test match!

One can add to these the only two test match ties both involving Australia as well!

Here are the recent closest results (by runs) for Australia together with intermediate scores in their final innings. (By recent, I mean after 1929!)

lost by 1 run vs WI, 92/3 (7/74 chasing 184)
lost by 2 runs vs Eng, 2005 (7/137 chasing 282)
lost by 3 runs vs Eng, 82/3 (9/218 chasing 292)
lost by 5 runs vs SA, 93/4 (8/75 chasing 117)
lost by 7 runs vs NZ, 11/2 (9/199 chasing 241)
lost by 12 runs vs Eng, 98/9 (3/130 chasing 175)
lost by 12 runs vs Ind, 04/5 (7/58 chasing 107)

All were losses, but in all but one case Australia was way behind and did well to come as close as it did!

The first recent case of a close win to Australia was in 92/3 versus Sri Lanka (16 runs). That time it was Sri Lanka’s turn to collapse (2/127 chasing 181) after a first innings lead of 291!

On the plus side for Australia, its narrowest wins (after 1977) while chasing have all come against South Africa in South Africa:

won by 2 wickets vs SA, 96/7 (target 270)
won by 2 wickets vs SA, 05/6 (target 292)
won by 2 wickets vs SA, 11/2 (target 310)


December 5, 2011

The MMSE estimator is orthogonal projection

Filed under: mathematics — ckrao @ 9:36 pm

Suppose we are given the value of the random vector Y from which we wish to estimate the value of another random vector X. In other words, given Y we want to find a function f such that f(Y) is a good approximation of X. How might we go about this? If our definition of “good approximation” is the MMSE (minimum mean squared error) E \|f(Y)-X\|^2 and the random vectors X and Y have finite second order moments (i.e. E \|X\|^2 = E \sum_i |X_i|^2 is finite), then the language of Hilbert spaces helps us greatly.

Let the dimensions of X and Y be m and n respectively (they need not be equal). The convenience of the metric \|f(Y) - X\|^2 is that we may minimise it by finding the component functions  f_1(Y), \ldots, f_m(Y) that minimise each term E|f_i(Y) - X_i|^2 of the sum separately, and then our required f is f(Y) = [f_1(Y), \ldots, f_m(Y)]^T.

Imagine a component X_i (now a random variable as opposed to a random vector) living in the space of random variables with finite second order moment (this space is denoted by L^2(\cal{F}) where \cal{F} is a collection of events whose probability can be measured). The space of measurable functions of Y from \mathbb{C}^n to \mathbb{C} forms a subspace of L^2(\cal{F}) which we denote by L^2(\sigma(Y)). (By measurable we mean that we can still compute the probability of events based on values of the function.) If X is in this subspace, it means that X_i can be entirely determined from knowledge of Y, and an error of zero is possible. If X is not in the subspace, we imagine X_i as an arrow from 0 sticking out from the subspace.

With inner product on L^2(\cal{F}) given by \langle U,V \rangle = E UV^* (noting that \langle U,U \rangle = 0 if and only if U = 0 with probability one), we see that E|f_i(Y) - X_i|^2 is minimised when the error e_i = f_i(Y) - X_i is orthogonal to any measurable function of Y. In our Hilbert space we think of f_i(Y) being the projection of X onto the subspace L^2(\sigma(Y)):

\displaystyle \langle g(Y), f_i(Y) - X_i \rangle = 0 for any random variable g(Y) \quad \quad (*)

Note that if this were not true, but instead there existed g(Y) \in L^2(\sigma(Y)) with \langle g(Y),g(Y) \rangle = E|g(Y)|^2 = 1 (i.e. normalised) and \langle f_i(Y) - X_i, g(Y) \rangle = \alpha \neq 0, we may write h(Y) = f_i(Y) - \alpha g(Y) \in L^2(\sigma(Y)) and so

\begin{array}{lcl} E|h(Y) - X_i|^2 &=& E|f_i(Y) - \alpha g(Y) - X_i|^2 \\ &=& E| f_i(Y) - X_i|^2 + |\alpha|^2 E |g(Y)|^2 - \alpha^* \langle f_i(Y) - X_i , g(Y) \rangle - \alpha \langle g(Y), f_i(Y) - X_i \rangle \\&=& E|f_i(Y) - X_i|^2 + |\alpha|^2 - |\alpha|^2 - |\alpha|^2 \\ &<& E| f_i(Y) - X_i|^2,\end{array}

contradicting the minimality of E|f_i(Y) - X_i|^2. Hence the orthogonality condition (*) must hold. Such a projection is unique in the almost-sure sense (i.e. any other random variable is equal to f_i(Y) with probability one).

By (*), f_i(Y) - X_i is also orthogonal to any constant random variable, or in other words, Ec(f_i(Y) - X_i) = 0 for any constant c showing that the error f_i(Y) - X_i has zero mean (worth remembering: zero-mean random variables are orthogonal to constants!). In vector notation, with e := f(Y) - X,

Ee = 0  or Ef(Y) = EX \quad \quad (1)

For each j, f_j(Y) is in L^2(\sigma(Y)) so by (*), \langle f_j(Y), f_i(Y) - X_i \rangle = 0. Switching to vector notation, this gives us

\displaystyle {\rm Cov}(f(Y), f(Y)-X) = Ef(Y)(f(Y) - X)^* = 0 \quad \quad (2),

where {\rm Cov}(U,V) :=: \Sigma_{UV} := E(U-E(U))(V-E(V))^* (we shall use the notation {\rm Cov}(U) = \Sigma_U to mean {\rm Cov}(U,U)).

This is the orthogonality condition in our vector case. By Pythagoras’s theorem, the minimum error is

\sum_i E|f_i(Y) - X_i|^2 = \sum_i \left(E|X_i|^2 - E|f_i(Y)|^2\right) = {\rm tr}({\rm Cov}(X) - {\rm Cov}(f(Y))),

where {\rm tr} represents the sum of the diagonal entries of a square matrix (the trace).

Linear Case

If we take the example of a linear estimator of the form f(Y) = AY + b (A a matrix, b a vector), we may use (1) and (2) to identify A and b that minimises E\|f(Y) - X\|^2. By (1), AEY + Eb = EX, from which b = Eb = EX - AEY and f(Y) = A(Y-EY) + EX.

Note that any component of Y is in itself a linear transform of Y. By (*), it must be orthogonal to any component of the error vector f(Y) - X, giving us

\displaystyle 0 = E(f(Y)-X)Y^* = E[(A(Y-EY) + EX - X)Y^*] = A\Sigma_Y - \Sigma_{XY}.

If {\rm Cov}(Y) is invertible, this gives A = \Sigma_{XY}\Sigma_Y^{-1}, leading to the following expressions:

\displaystyle f(Y) = EX + \Sigma_{XY}\Sigma_Y^{-1}(Y-EY) \quad \quad (3)

\begin{array} {lcl} {\rm Cov}(f(Y) - X) &=& {\rm Cov}(X) - {\rm Cov}(EX + {\rm Cov}(X,Y){\rm Cov}(Y)^{-1}(Y-EY))\\ &=& {\rm Cov}(X) - {\rm Cov}(X,Y){\rm Cov}(Y)^{-1}{\rm Cov}(Y-EY) {\rm Cov}(Y)^{-1}{\rm Cov}(Y,X)\\ &=& \Sigma_X - \Sigma_{XY}\Sigma_Y^{-1} \Sigma_{YX} \quad \quad (4) \end{array}

(here using the fact that {\rm Cov}(AX) = A{\rm Cov}(X)A^*),

E\|f(Y) - X\|^2 = {\rm tr}( {\rm Cov}(f(Y)) - X) = {\rm tr} (\Sigma_X - \Sigma_{XY}\Sigma_Y^{-1} \Sigma_{YX}) \quad \quad (5).

Note that this precise argument is made in least squares problems: if we wish to find a matrix W so that \|x - Wy\|^2 is minimised and Wy is only permitted in a space spanned by the columns of some matrix C, we find the projection of x onto the subspace whose columns are formed by C. This gives the orthogonality condition 0 = C^* (x - Wy) = C^* (x - Cz) (for some z, since Wy is in the column space of C), from which

\displaystyle Wy = Cz = C(C^*C)^{-1}C^*x.

In the special case when C =c is a single column vector, the projection operator has the attractive form (cc^*)/(c^*c) which is simply the outer product cc^* if c has unit length.

Conditional Expectation

The post so far has made no mention of the conditional expectation E[X|Y], which is what in fact the minimum mean squared estimator turns out to be. In brief E[X|Y] is defined as a measurable function of Y satisfying

\displaystyle E[g(Y)E[X|Y]] = E[g(Y)X] \quad \quad (+)

for any measurable function g(Y). This is a measure-theoretic definition that has the advantage of unifying the discrete and continuous cases, and avoids division-by-zero possibilities that can arise when dealing with probability densities. (Actually conditional expectation is first defined with respect to a set of events called a sigma algebra which may be considered as an information source for that random variable. The larger the set, the more information we have about that random variable.)

If expectation is thought of as an averaging process, then conditional expectation is an average with respect to information or uncertainty. Formula (+) says that this average should be equal to the unconditioned average on any measurable set (to see this take g(Y) to be 1 on that set and 0 otherwise). There are details missing here, but the interested reader is encouraged to see the references for more.

In the Hilbert space of random variables with finite second moment, this condition is equivalent to (X-E[X|Y]) being orthogonal to g(Y), so E[X|Y] is our best estimate f(Y) found earlier. Hence the conditional expectation E[X_i|Y] can be viewed as the vector projection of X_i onto L^2(\sigma(Y)), i.e. the best estimate of X_i in the expected least squares sense. Conditional probability may be defined in terms of conditional expectation via P(A|B) = E[1_A|B], where 1_A is the random variable equal to 1 if our event A occurs, and 0 otherwise.

Gaussian Case

In the particular case of X and Y being Gaussian vectors, it turns out that the best linear estimate is also the best overall estimate in the least squares sense. To see this, let f(Y) be the best linear estimate of X given Y. Since uncorrelated and independent are equivalent notions in the Gaussian world, it follows from the independence of (X-f(Y)) and f(Y) that given Y=y the conditional distribution of (X-f(Y)) does not depend on y. It is in fact zero-mean Gaussian with variance given in (4)). Hence the distribution of X (= (X-f(Y)) + f(Y)) given Y=y is Gaussian with mean f(Y) (the linear estimate of X) and the same covariance matrix. As a result the conditional expectation E[X|Y=y] is equal to the mean of that Gaussian distribution, which is f(y), and as this is true for all y, E[X|Y] = f(Y).

For example, if Y = HX + V, where X is zero-mean Gaussian, H is a deterministic n by m matrix and V is a zero-mean n-dimensional Gaussian noise vector uncorrelated with X, we have

\displaystyle \Sigma_{XY} ={\rm Cov}(X,HX +V) = {\rm Cov}(X)H^* + {\rm Cov}(X,V) = \Sigma_X H^*

\displaystyle \Sigma_Y = {\rm Cov}(HX+V) = H\Sigma_X H^*+ \Sigma_V,

and as X and Y are jointly Gaussian, our MMSE estimator is also the best linear estimator:

\displaystyle E[X|Y] = f(Y) =\Sigma_{XY}\Sigma_Y^{-1}Y = \Sigma_X H^*(H\Sigma_X H^* + \Sigma_V)^{-1} Y.

This solution is used in a wide variety of linear estimation applications, ranging from regression analysis in statistics to communication theory (estimating signals passing through a channel H and corrupted by noise V).


[1] Williams, Probability with Martingales, Cambridge University Press, 2001.

[2] Hajek, Notes for ECE 534: An Exploration of Random Processes for Engineers, July 2011, available here.

Blog at

%d bloggers like this: