1
How the english distributions were produced.
3
Where 50.txt is all english common words from SCOWL <= 50
4
charstats 50.txt > words.txt
6
Then hand-edited words.txt to get all single letters and necessary
7
doubles (those where the single doesn't appear much more than
14
1. Strip all really low counts (below lowest single letter)
17
2. Remove doubles where sum of individuals is higher than double.
18
The logic here is that the natural occurence of the individual items
19
is enough to cover the doubles.
20
**NOTE**: Preserve the "Doubles" however!
22
3. Keep doubles, strip isolated singles (makes some words impossible,
26
4. Now you have raw distributions, convert as desired to rates.
27
-convert to fixed string of all available letters
b'\\ No newline at end of file'