Most English words are short • Ross Churchley

Based on corpus data, over half of the words in a typical page of English text has four or fewer letters, with the average word length being slightly less than five.

Length of…	Mean	Median	Mode
Unique words	7.52	7	7
Words weighted by frequency	4.95	4	3

Source code

import pandas

wd = pandas.read_csv(
"eng-ca_web_2002_1M-words.txt",
"\t",
header=None,
names=["Rank", "Word", "Frequency"]
)

wd.Word = wd.Word.apply(str).str.lower()
wd = (
wd.query("Frequency > 5") # Remove rare words,
.query("Word.str.match('^[A-Za-z]\*$')") # words with non-letters,
.query("Word.str.contains('[aeiouy]')") # abbrvns w/o vowels,
.groupby("Word") # and duplicates
.sum()
)

wd["Length"] = pandas.Series(wd.index, index=wd.index).str.len()

sm = wd.groupby("Length")

result = pandas.DataFrame({
"NormalizedCount": sm.Frequency.count() / sm.Frequency.count().sum(),
"NormalizedFrequency": sm.Frequency.sum() / sm.Frequency.sum().sum()
})