Chaitanya's Random Pages

June 30, 2017

The ballot problem and Catalan’s triangle

Filed under: Uncategorized — ckrao @ 10:15 pm

The ballot problem asks for the probability that candidate A is always ahead of candidate B during a tallying process if they respectively end up with p and q votes where p > q. For example if p = 2, q = 1 there are 3 ways in which the three votes are counted (AAB, ABA, BAA) but the only favourable outcome in which A remains ahead throughout is occurs if the tally appears as AAB. Hence the probability A remains ahead is 1/3.

If there are no restrictions, the number of ways the votes are tallied is the binomial coefficient \binom{p+q}{p}. The number of favourable outcomes (the numerator of the desired probability) in which A remains ahead can counted recursively in a similar way to Pascal’s triangle (each number the sum of the two neighbours above it) except no number may appear to the left of the vertical midline, as illustrated below. For example, the second element of the fifth row (3) corresponds to the case p = 3, q = 1 (AAAB, AABA, ABAA). More generally, dividing into the cases where the final vote is A or B, the number of ways N_{p,q} in which A remains ahead of B is equal to N_{p-1,q} + N_{p,q-1} where N_{p,q} = 0 if q \geq p. This sequence appears as A008313 in the OEIS and is the reversed form of Catalan’s triangle.


A way of generating the general term is making use of a beautiful reflection principle that gives a 1-1 correspondence between the number of tallies leading to a tie at some point and the number of tallies in which the first vote goes to candidate B: simply interchange A with B for all votes up to and including that tie. This amounts to reflecting the random walk about the midline, as illustrated below with the blue path corresponding to ABAA and the the red path BAAA.


Since p > q, the probability candidate A always leads is 1 minus the probability the sequence ties at some point. But the bijection above shows an equal number of these start with A and with B, so our desired probability is

\displaystyle 1 - 2 \text{Pr(sequence starts with B)} = 1 - 2\frac{q}{p+q} = \frac{p-q}{p+q}.

The numbers in the triangle are also formed by differences of adjacent entries of Pascal’s triangle, namely row p+q has terms of the form

\displaystyle \begin{aligned} N_{p,q} &= \binom{p+q}{p}\frac{p-q}{p+q}\\ &= \frac{(p+q-1)!(p-q)}{p!q!}\\&= \binom{p+q-1}{q}-\binom{p+q-1}{p}.\end{aligned}

This can be interpreted as the number of unrestricted sequences with p As and q Bs of length (p+q) that start with A minus the corresponding number that start with B, again following from the reflection principle.

As an aside, looking at the bottom row above we see N_{8,6} = N_{10,4} = 429, or equivalently

\displaystyle \binom{13}{4} - \binom{13}{3} = \binom{13}{6} - \binom{13}{5} = 429.

Finally we note that the Catalan numbers arise from the following parts of the triangle above:

  • as entries in the first column (counting Dyck paths)
  • as the sum of squares of each row
  • as the sum of entries in NE-SW diagonals

Catalan’s triangle can be generalised to a trapezium in which we count the number of strings consisting of n As and k Bs such that in every initial segment of the string the number of Bs does not exceed the number of As by m or more.

Blog at

%d bloggers like this: