Chaitanya's Random Pages

August 30, 2014

Waiting times and the size biased distribution

Filed under: mathematics — ckrao @ 12:48 pm

Suppose buses arrive at a stop after independent random intervals of T minutes, where T is uniformly distributed between 10 and 20. It is natural to wonder how long one is expected to wait from some random point in time t until the first bus arrives. For example on the one hand the bus could arrive immediately, or one could be unlucky with time t just after the previous bus left and we could wait as many as 20 minutes for the next bus. Interestingly this waiting time W is no longer uniformly distributed.

In general if interarrival times T have (cumulative) distributive function F_T and mean \mathbb{E}[T], the following is true.

The probability density of waiting time is f_W(x) = \text{Pr}(T \geq x)/\mathbb{E}[T] = (1-F_T(x))/\mathbb{E}[T].\quad\quad(1)

Applying this formula to the above bus example, we have \mathbb{E}[T] = 15 and

\displaystyle F_T(x) = \begin{cases}0, & x < 10\\  (x-10)/10, & 10 \leq x \leq 20\\  1, & x > 20\end{cases}

Using (1), the probability density function of waiting time f_W(x) is plotted below.

waitingtimeforbus

We see that the density function is a rectangle (centred at x = 5 and having 2/3 of the total probability mass) appended to a triangle (having centroid at x=40/3 and 1/3 of the total probability mass). From this we can say the expected waiting time is \mathbb{E}[W] = 5 \times (2/3) + 40/3 \times (1/3) = 70/9 \approx 7.78 minutes. This is more than half the mean interarrival time of 15/2 = 7.5 minutes that one might initially expect.

To prove (1), we shall first look at the length of a particular inter-arrival period that contains a given point t, and the waiting time can then be the part of this after time t. We make use of the following result from alternating renewal process theory [1]:

Proposition: Consider a system that alternates between on and off states where:

  • the on durations X_1, X_2, \ldots are independent and identically distributed according to random variable X, and
  • the off durations Y_1, Y_2, \ldots are independent and identically distributed according to random variable Y.

(However it may be that X_n and Y_n are dependent in general.)

Let p(t) be the probability that the system is in the on state at time t. Then if X+ Y has finite mean and is non-lattice (i.e. takes on more values than just an integral multiple of some positive number), then

\displaystyle \lim_{t \rightarrow \infty} p(t) = \frac{\mathbb{E}[X]}{\mathbb{E}[X+Y]}.\quad\quad(2)

This result is easily understood: the probability the system is on at a given time is the mean on time divided by the mean total cycle time. However it is based on the key renewal theorem, a limit theorem which is not so straightforward to prove.

Now consider a random process with independent and identically distributed interarrival times T_1, T_2, \ldots with common distribution T with probability distribution function F_T. Consider the length of the particular interarrival period T^* that contains some point t. We shall determine the distribution of T^*. We define an on-off system as follows.

  • If T^* is greater than some given x > 0, let our system be on for the entire interarrival period (i.e. X_{n} = T_n, Y_{n} = 0).
  • If the length of the interarrival period is less than or equal to x let our system be off for the interarrival period (i.e. X_{n} = 0, Y_{n} = T_n).

renewalperiodcontainingtThen by this definition the conditions of the above proposition are met and so

\begin{aligned} \text{Pr}(T^* > x) &= \text{Pr(system is on at time t)}\\  &= \frac{\mathbb{E}[X_n]}{\mathbb{E}[X_n+Y_n]}\quad\text{(by (2))}\\ &= \frac{\mathbb{E}[T_n \mid T_n > x]\text{Pr}(T_n > x)}{\mathbb{E}[T_n]}\\  &= \frac{\int_x^{\infty} t\ \ dF_T(t)}{\mathbb{E}[T]}.\quad\quad(3)\end{aligned}

Equivalently, \text{Pr}(T^* \leq x) = \frac{\int_0^{x} t\ \ dF_T(t)}{\mathbb{E}[T]}, and if T has probability density f_T, differentiating both sides of this equation gives the probability density function of the length of interarrival period containing a point t:

\displaystyle f_{T^*}(x) = xf_T(x)/\mathbb{E}[T].\quad\quad (4)

Intuitively, longer periods are more likely to contain the point t, with probability in direct proportion to the length of the interval. Hence the original density f_T(x) is scaled by the length x and then normalized by \int_0^{\infty} x f_T(x)\ dx = \mathbb{E}[T] to make xf_T(x)/\mathbb{E}[T] a valid probability density. T^* defined by (3) is known as the size-biased transform of T.

Note that the length of the particular interarrival interval T^* is stochastically larger than any of the identically distributed interarrival times T. In other words,

\displaystyle \text{Pr}(T^* > x) \geq \text{Pr}(T > x).\quad\quad(5)

This is the so-called inspection paradox, which can be proved by the Chebyshev correlation inequality (a form of rearrangement inequality) which states than when f and g are functions of the random variable X having the same monotonicity (i.e. both non-increasing or both non-decreasing), then f(X) and g(X) are non-negatively correlated:

\displaystyle \mathbb{E} [f(X)g(X)] \geq \mathbb{E}[f(X)] \mathbb{E}[g(X)].\quad\quad(6)

Applying this result with the non-decreasing functions f(T) = T and g(T) = {\bold 1}_{(T > x)} (which is 1 when T > x and 0 otherwise):

\begin{aligned} \text{Pr}(T^* > x) &= \frac{\int_x^{\infty} t\ \ dF_T(t)}{E[T]}\\ &= \frac{\mathbb{E}[T {\bold 1}_{(T>x)}]}{\mathbb{E}[T]}\\ &\geq \frac{\mathbb{E}[T] \mathbb{E}[{\bold 1}_{(T>x)}]}{\mathbb{E}[T]}\\ &= \mathbb{E}[{\bold 1}_{(T>x)}]\\ &= \text{Pr}(T > x).\quad\quad(7)\end{aligned}

The inspection paradox is a form of sampling bias, where results are modified as a result of a non-intended sampling of a given space. Another example is the friendship paradox, where in a graph of acquaintances with vertices having some differing degrees (number of friends), the average degree of a friend is greater than the average degree of a randomly chosen node. For one thing a friend will never have degree zero while a randomly chosen person might! The degree of a friend is an example of sampling from a size-biased distribution.

How about the distribution of the waiting time W for the next arrival? For fixed x > 0 we consider another on-off system where this time the system is in an off state during the last x units of time of an interarrival interval, and in the on state otherwise. That is,

  • Y_n = \min(x,T_n),
  • X_n = T_n - Y_n.

waitingperiodThen we have

\begin{aligned} \text{Pr}(W \leq x) &= \text{Pr(system is off at time t)}\\  &= \frac{\mathbb{E}[Y_n]}{\mathbb{E}[X_n+Y_n]} \quad\quad \text{(by (2))}\\  &= \frac{\mathbb{E}[\min(x,T_n)]}{\mathbb{E}[T_n]}\\  &= \frac{\int_0^{x} t\ \ dF_T(t) + \int_x^{\infty} x\ \ dF_T(t) }{\mathbb{E}[T_n]}\\  &= \frac{ \left[tF_T(t) \right]_{0}^x - \int_0^xF_T(t)\ dt + x(1-F_T(x))}{\mathbb{E}[T_n]}\quad \quad \text{(integrating the first term by parts)}  \\&= \frac{xF_T(x) - \int_0^x (1-F_T(t))\ dt - xF_T(x)}{\mathbb{E}[T_n]}\\  &= \frac{\int_0^x (1-F_T(t))\ dt }{\mathbb{E}[T_n]}.\quad\quad(8)\end{aligned}

Alternatively this form can be found by imagining our time t to be uniformly distributed within the current interarrival interval, thus partitioning our inspection interarrival interval T^* into (1-X)T^* and XT^*, where X is the uniform random variable X \sim U(0,1). Hence the distribution of W = XT^* is found by integrating over the distribution of T^* and using the fact that dF_T^*(t) = \frac{t}{\mathbb{E}[T]}\ dF_T(t) from (3):

\begin{aligned} \text{Pr}(W \leq x) &= \text{Pr}(XT^* \leq x)\\&= \int_0^{\infty} \text{Pr}(Xt \leq x \mid T^* = t)\ dF_{T^*}(t)\\&= \int_0^{\infty} \text{Pr}(U \leq x/t)\ dF_{T^*}(t)\\&=\int_0^x 1\ dF_{T^*}(t) + \int_x^{\infty} (x/t)\ dF_{T^*}(t)\\&=\int_0^x \frac{t}{\mathbb{E}[T]}\ dF_T(t) + x\int_x^{\infty} \frac{1}{t}\frac{t}{\mathbb{E}[T]}\ dF_T(t)\\&= \frac{\left[tF_T(t)\right]_0^x - \int_0^x F_T(t)\ dt + x(1-F_T(x))}{\mathbb{E}[T]} \quad\quad \text{(integrating the first term by parts)}\\&= \frac{- \int_0^xF_T(t)\ dt + x}{\mathbb{E}[T]}\\&= \frac{\int_0^x (1-F_T(t))\ dt }{\mathbb{E}[T]}.\quad\quad(9)\end{aligned}

Differentiating both sides of (8) or (9) with respect to x leads to (1), the probability density function of waiting time f_W(x) = (1-F_T(x))/\mathbb{E}[T].

Finally, if we wish to find any moments E[W^k] we write W = XT^* where X \sim U(0,1) and X is independent of the size-biased variable T^*:

\begin{aligned}\mathbb{E}[W^k] &= \mathbb{E}[(XT^*)^k]\\ &= \mathbb{E}[X^k] \mathbb{E}[(T^*)^k]\\&= \mathbb{E}[X^k]\int_0^{\infty}\frac{x}{\mathbb{E}[T]} (x^k)\ dF_T(x)\quad\quad \text{(}dF_T^*(x)\text{ related to }dF_T(x)\text{ via (3))}\\ &= \frac{\mathbb{E}[X^k]\mathbb{E}[T^{k+1}]}{\mathbb{E}[T]}\\ &= \frac{\int_0^1 x^k\ dx \mathbb{E}[T^{k+1}]}{\mathbb{E}[T]}\\ &= \frac{\mathbb{E}[T^{k+1}]}{(k+1)\mathbb{E}[T]}.\quad\quad(10)\end{aligned}

In particular

\begin{aligned} \mathbb{E}[W] &= \mathbb{E}[T^2]/(2\mathbb{E}[T])\\ &\geq (\mathbb{E}[T])^2/(2\mathbb{E}[T])\\ &= \mathbb{E}[T]/2\quad\quad(11)\end{aligned}.

Hence we can say that the mean waiting time for the next arrival from any given point in time is more than half the mean interarrival time, unless \mathbb{E}[T^2] = (\mathbb{E}[T])^2 (i.e. zero variance in T, meaning T is deterministic).

For example, in the initial bus example where T \sim U(10,20):

  • E[T] = 15,
  • E[T^2] = \int_{10}^{20} (t^2/10)\ dt = (20^3-10^3)/30 = 700/3 and
  • E[T^3] = \int_{10}^{20} (t^3/10)\ dt = (20^4-10^4)/40 = 3750,

so E[W] = E[T^2]/(2E[T]) = 700/90 = 70/9 \approx 7.78 > E[T]/2 = 7.5 and E[W^2] = E[T^3]/(3E[T]) = 250/3 \approx 83.33. Hence the waiting time has mean 7.78 and standard deviation of \sqrt{250/3-(70/9)^2} \approx 4.78.

References

[1] S. Ross, Stochastic Processes, John Wiley & Sons, 1996.

[2] R. Arratia and L. Goldstein, Size bias, sampling, the waiting time paradox, and infinite divisibility: when is the increment independent? Available at arxiv.org/pdf/1007.3910.

[3] This answer by Did on math.stackexchange.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: