Chernoff Bound

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function.

The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential (e.g. sub-Gaussian). It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

The bound is commonly named after Herman Chernoff who described the method in a 1952 paper, though Chernoff himself attributed it to Herman Rubin. In 1938 Harald Cramér had published an almost identical concept now known as Cramér's theorem.

It is a sharper bound than the first- or second-moment-based tail bounds such as Markov's inequality or Chebyshev's inequality, which only yield power-law bounds on tail decay. However, when applied to sums the Chernoff bound requires the random variables to be independent, a condition that is not required by either Markov's inequality or Chebyshev's inequality.

The Chernoff bound is related to the Bernstein inequalities. It is also used to prove Hoeffding's inequality, Bennett's inequality, and McDiarmid's inequality.

Generic Chernoff bounds

Chernoff Bound 
Two-sided Chernoff bound for a chi-square random variable

The generic Chernoff bound for a random variable Chernoff Bound  is attained by applying Markov's inequality to Chernoff Bound  (which is why it sometimes called the exponential Markov or exponential moments bound). For positive Chernoff Bound  this gives a bound on the right tail of Chernoff Bound  in terms of its moment-generating function Chernoff Bound :

    Chernoff Bound 

Since this bound holds for every positive Chernoff Bound , we may take the infimum:

    Chernoff Bound 

Performing the same analysis with negative Chernoff Bound  we get a similar bound on the left tail:

    Chernoff Bound 

and

    Chernoff Bound 

The quantity Chernoff Bound  can be expressed as the expectation value Chernoff Bound , or equivalently Chernoff Bound .

Properties

The exponential function is convex, so by Jensen's inequality Chernoff Bound . It follows that the bound on the right tail is greater or equal to one when Chernoff Bound , and therefore trivial; similarly, the left bound is trivial for Chernoff Bound . We may therefore combine the two infima and define the two-sided Chernoff bound:

Chernoff Bound 
which provides an upper bound on the folded cumulative distribution function of Chernoff Bound  (folded at the mean, not the median).

The logarithm of the two-sided Chernoff bound is known as the rate function (or Cramér transform) Chernoff Bound . It is equivalent to the Legendre–Fenchel transform or convex conjugate of the cumulant generating function Chernoff Bound , defined as:

Chernoff Bound 
The moment generating function is log-convex, so by a property of the convex conjugate, the Chernoff bound must be log-concave. The Chernoff bound attains its maximum at the mean, Chernoff Bound , and is invariant under translation: Chernoff Bound .

The Chernoff bound is exact if and only if Chernoff Bound  is a single concentrated mass (degenerate distribution). The bound is tight only at or beyond the extremes of a bounded random variable, where the infima are attained for infinite Chernoff Bound . For unbounded random variables the bound is nowhere tight, though it is asymptotically tight up to sub-exponential factors ("exponentially tight").[citation needed] Individual moments can provide tighter bounds, at the cost of greater analytical complexity.

In practice, the exact Chernoff bound may be unwieldy or difficult to evaluate analytically, in which case a suitable upper bound on the moment (or cumulant) generating function may be used instead (e.g. a sub-parabolic CGF giving a sub-Gaussian Chernoff bound).

Exact rate functions and Chernoff bounds for common distributions
Distribution Chernoff Bound  Chernoff Bound  Chernoff Bound  Chernoff Bound 
Normal distribution Chernoff Bound  Chernoff Bound  Chernoff Bound  Chernoff Bound 
Bernoulli distribution(detailed below) Chernoff Bound  Chernoff Bound  Chernoff Bound  Chernoff Bound 
Standard Bernoulli

(H is the binary entropy function)

Chernoff Bound  Chernoff Bound  Chernoff Bound  Chernoff Bound 
Rademacher distribution Chernoff Bound  Chernoff Bound  Chernoff Bound  Chernoff Bound 
Gamma distribution Chernoff Bound  Chernoff Bound  Chernoff Bound  Chernoff Bound 
Chi-squared distribution Chernoff Bound  Chernoff Bound  Chernoff Bound  Chernoff Bound 
Poisson distribution Chernoff Bound  Chernoff Bound  Chernoff Bound  Chernoff Bound 

Lower bounds from the MGF

Using only the moment generating function, a lower bound on the tail probabilities can be obtained by applying the Paley-Zygmund inequality to Chernoff Bound , yielding:

Chernoff Bound 
(a bound on the left tail is obtained for negative Chernoff Bound ). Unlike the Chernoff bound however, this result is not exponentially tight.

Theodosopoulos constructed a tight(er) MGF-based lower bound using an exponential tilting procedure.

For particular distributions (such as the binomial) lower bounds of the same exponential order as the Chernoff bound are often available.

Sums of independent random variables

When X is the sum of n independent random variables X1, ..., Xn, the moment generating function of X is the product of the individual moment generating functions, giving that:

    Chernoff Bound 

    ()

and:

    Chernoff Bound 

Specific Chernoff bounds are attained by calculating the moment-generating function Chernoff Bound  for specific instances of the random variables Chernoff Bound .

When the random variables are also identically distributed (iid), the Chernoff bound for the sum reduces to a simple rescaling of the single-variable Chernoff bound. That is, the Chernoff bound for the average of n iid variables is equivalent to the nth power of the Chernoff bound on a single variable (see Cramér's theorem).

Sums of independent bounded random variables

Chernoff bounds may also be applied to general sums of independent, bounded random variables, regardless of their distribution; this is known as Hoeffding's inequality. The proof follows a similar approach to the other Chernoff bounds, but applying Hoeffding's lemma to bound the moment generating functions (see Hoeffding's inequality).

    Hoeffding's inequality. Suppose X1, ..., Xn are independent random variables taking values in [a,b]. Let X denote their sum and let μ = E[X] denote the sum's expected value. Then for any Chernoff Bound ,
      Chernoff Bound 
      Chernoff Bound 

Sums of independent Bernoulli random variables

The bounds in the following sections for Bernoulli random variables are derived by using that, for a Bernoulli random variable Chernoff Bound  with probability p of being equal to 1,

    Chernoff Bound 

One can encounter many flavors of Chernoff bounds: the original additive form (which gives a bound on the absolute error) or the more practical multiplicative form (which bounds the error relative to the mean).

Multiplicative Chernoff bound. Suppose X1, ..., Xn are independent random variables taking values in {0, 1}. Let X denote their sum and let μ = E[X] denote the sum's expected value. Then for any δ > 0,

    Chernoff Bound 

A similar proof strategy can be used to show that for 0 < δ < 1

    Chernoff Bound 

The above formula is often unwieldy in practice, so the following looser but more convenient bounds are often used, which follow from the inequality Chernoff Bound  from the list of logarithmic inequalities:

    Chernoff Bound 
    Chernoff Bound 
    Chernoff Bound 

Notice that the bounds are trivial for Chernoff Bound .

The following theorem is due to Wassily Hoeffding and hence is called the Chernoff–Hoeffding theorem.

    Chernoff–Hoeffding theorem. Suppose X1, ..., Xn are i.i.d. random variables, taking values in {0, 1}. Let p = E[X1] and ε > 0.
      Chernoff Bound 
    where
      Chernoff Bound 
    is the Kullback–Leibler divergence between Bernoulli distributed random variables with parameters x and y respectively. If p1/2, then Chernoff Bound  which means
      Chernoff Bound 

A simpler bound follows by relaxing the theorem using D(p + ε || p) ≥ 2ε2, which follows from the convexity of D(p + ε || p) and the fact that

    Chernoff Bound 

This result is a special case of Hoeffding's inequality. Sometimes, the bounds

    Chernoff Bound 

which are stronger for p < 1/8, are also used.

Applications

Chernoff bounds have very useful applications in set balancing and packet routing in sparse networks.

The set balancing problem arises while designing statistical experiments. Typically while designing a statistical experiment, given the features of each participant in the experiment, we need to know how to divide the participants into 2 disjoint groups such that each feature is roughly as balanced as possible between the two groups.

Chernoff bounds are also used to obtain tight bounds for permutation routing problems which reduce network congestion while routing packets in sparse networks.

Chernoff bounds are used in computational learning theory to prove that a learning algorithm is probably approximately correct, i.e. with high probability the algorithm has small error on a sufficiently large training data set.

Chernoff bounds can be effectively used to evaluate the "robustness level" of an application/algorithm by exploring its perturbation space with randomization. The use of the Chernoff bound permits one to abandon the strong—and mostly unrealistic—small perturbation hypothesis (the perturbation magnitude is small). The robustness level can be, in turn, used either to validate or reject a specific algorithmic choice, a hardware implementation or the appropriateness of a solution whose structural parameters are affected by uncertainties.

A simple and common use of Chernoff bounds is for "boosting" of randomized algorithms. If one has an algorithm that outputs a guess that is the desired answer with probability p > 1/2, then one can get a higher success rate by running the algorithm Chernoff Bound  times and outputting a guess that is output by more than n/2 runs of the algorithm. (There cannot be more than one such guess.) Assuming that these algorithm runs are independent, the probability that more than n/2 of the guesses is correct is equal to the probability that the sum of independent Bernoulli random variables Xk that are 1 with probability p is more than n/2. This can be shown to be at least Chernoff Bound  via the multiplicative Chernoff bound (Corollary 13.3 in Sinclair's class notes, μ = np).:

    Chernoff Bound 

Matrix Chernoff bound

Rudolf Ahlswede and Andreas Winter introduced a Chernoff bound for matrix-valued random variables. The following version of the inequality can be found in the work of Tropp.

Let M1, ..., Mt be independent matrix valued random variables such that Chernoff Bound  and Chernoff Bound . Let us denote by Chernoff Bound  the operator norm of the matrix Chernoff Bound . If Chernoff Bound  holds almost surely for all Chernoff Bound , then for every ε > 0

    Chernoff Bound 

Notice that in order to conclude that the deviation from 0 is bounded by ε with high probability, we need to choose a number of samples Chernoff Bound  proportional to the logarithm of Chernoff Bound . In general, unfortunately, a dependence on Chernoff Bound  is inevitable: take for example a diagonal random sign matrix of dimension Chernoff Bound . The operator norm of the sum of t independent samples is precisely the maximum deviation among d independent random walks of length t. In order to achieve a fixed bound on the maximum deviation with constant probability, it is easy to see that t should grow logarithmically with d in this scenario.

The following theorem can be obtained by assuming M has low rank, in order to avoid the dependency on the dimensions.

Theorem without the dependency on the dimensions

Let 0 < ε < 1 and M be a random symmetric real matrix with Chernoff Bound  and Chernoff Bound  almost surely. Assume that each element on the support of M has at most rank r. Set

    Chernoff Bound 

If Chernoff Bound  holds almost surely, then

    Chernoff Bound 

where M1, ..., Mt are i.i.d. copies of M.

Sampling variant

The following variant of Chernoff's bound can be used to bound the probability that a majority in a population will become a minority in a sample, or vice versa.

Suppose there is a general population A and a sub-population B ⊆ A. Mark the relative size of the sub-population (|B|/|A|) by r.

Suppose we pick an integer k and a random sample S ⊂ A of size k. Mark the relative size of the sub-population in the sample (|BS|/|S|) by rS.

Then, for every fraction d ∈ [0,1]:

    Chernoff Bound 

In particular, if B is a majority in A (i.e. r > 0.5) we can bound the probability that B will remain majority in S(rS > 0.5) by taking: d = 1 − 1/(2r):

    Chernoff Bound 

This bound is of course not tight at all. For example, when r = 0.5 we get a trivial bound Prob > 0.

Proofs

Multiplicative form

Following the conditions of the multiplicative Chernoff bound, let X1, ..., Xn be independent Bernoulli random variables, whose sum is X, each having probability pi of being equal to 1. For a Bernoulli variable:

    Chernoff Bound 

So, using (1) with Chernoff Bound  for any Chernoff Bound  and where Chernoff Bound ,

    Chernoff Bound 

If we simply set t = log(1 + δ) so that t > 0 for δ > 0, we can substitute and find

    Chernoff Bound 

This proves the result desired.

Chernoff–Hoeffding theorem (additive form)

Let q = p + ε. Taking a = nq in (1), we obtain:

    Chernoff Bound 

Now, knowing that Pr(Xi = 1) = p, Pr(Xi = 0) = 1 − p, we have

    Chernoff Bound 

Therefore, we can easily compute the infimum, using calculus:

    Chernoff Bound 

Setting the equation to zero and solving, we have

    Chernoff Bound 

so that

    Chernoff Bound 

Thus,

    Chernoff Bound 

As q = p + ε > p, we see that t > 0, so our bound is satisfied on t. Having solved for t, we can plug back into the equations above to find that

    Chernoff Bound 

We now have our desired result, that

    Chernoff Bound 

To complete the proof for the symmetric case, we simply define the random variable Yi = 1 − Xi, apply the same proof, and plug it into our bound.

See also

References

Further reading

Tags:

Chernoff Bound Generic Chernoff boundsChernoff Bound Sums of independent random variablesChernoff Bound Sums of independent bounded random variablesChernoff Bound Sums of independent Bernoulli random variablesChernoff Bound ApplicationsChernoff Bound Matrix Chernoff boundChernoff Bound Sampling variantChernoff Bound ProofsChernoff Bound Further readingChernoff BoundBernoulli random variableMoment generating functionProbability theorySub-Gaussian distribution

🔥 Trending searches on Wiki English:

Alex JonesSeven deadly sinsUsher (musician)Rajiv Gandhi International Cricket StadiumAndrew HubermanHouse of the DragonThree-BodyThe Three-Body Problem (novel)Tom HollandEurovision Song Contest 2024UEFA Champions LeagueDerek DraperThe BeatlesCaitlin ClarkCasino gameSelf-immolation of Aaron BushnellDonald TrumpRachin RavindraJennifer LopezIslamic StateKeira KnightleySpain2003 Angola Boeing 727 disappearanceGoogleHarvey WeinsteinNatalie PortmanChaturbateList of Indian Premier League records and statisticsContinuous truss bridgeState of PalestineDuffy (singer)FIFA Men's World RankingGmailCincinnati riots of 1884Cultural RevolutionTaylor SwiftAlbert EinsteinInterstellar (film)Luna SnowBilly MagnussenElvis PresleyDwayne JohnsonCarrie FisherNullJohnny DeppJohnny CashX-Men '97Jenna OrtegaThe Zone of Interest (film)List of ethnic slursCillian MurphyTenebraeFacebookMarina AbramovićSiddharth (actor)Lady GagaBody Cam (film)The Rookie (TV series)Olivier GiroudTupac ShakurCuba Gooding Jr.Joe KeeryDune (novel)Josh PeckNicholas GalitzineGermanySacha Baron CohenFort CarrollCrucifixion of JesusLiam CunninghamRobert F. Kennedy Jr.Challengers (film)BaltimoreMichael JordanPeriodic tableRohit SharmaLara TrumpAll That🡆 More