**Accurate Methods for the Statistics of Surprise and Coincidence. Ted Dunning. Computational Linguistics 1993.**

- Ideas
- "ordinary words are 'rare', any statistical work with texts must deal with the reality of rare events ... Unfortunately, the foundational assumption of most common statistical analyses used in computational linguistics is that the events being analyzed are relatively common."
- Counting a word
*w*can be viewed as a series of Bernoulli trials. Each token is tested to see if it is*w*. Assuming a uniform probability*p*that it is*w*, the count is distributed binomially and, for*np(1-p)*>5 (where n=number of tokens), it is distributed almost normally. But this is not true when*np(1-p)*<1. - Given outcomes
*k*, propose a model with parameters*w*. The likelihood of the parameter value*w*is*P(k|w)*. A hypothesis is a subset of the parameter space*W*. - The likelihood ratio of the hypothesis (parameter subspace)
*W0*= LR = max_{*w*\in*W0*}*P(k|w)*/ max_{*w*\in*W*}*P(k|w)*. - Fact: -2 log LR ~ \Chi^2(dim
*W*- dim*W0*). - References
- More info on parametric and distribution-free tests: Bradley (1968), and Mood, Graybill, and Boes (1974).
- Likelihood ratio tests: Mood et al. (1974)