Here are frequency lists comparable to the Gutenberg ones, but based on 29,213,800 words from TV and movie scripts and transcripts.
Here's a fuller explanation of how the list was generated and its limitations: Wiktionary:Frequency lists/TV/2006/explanation.
Here are the top 100 words (from tv scripts) in alphabetical order:
Here they are in frequency order:
From the 10000th to the 40000th :
That'll probably be it. It's a third of all the unique words. The rest were used 5 or fewer times each.
These lists are the most frequent words, when performing a simple, straight (obvious) frequency count of all the books found on Project Gutenberg. The list of books was downloaded in July of 2005, and "rsync"'ed monthly thereafter. These are mostly English words, with some other languages finding representation to a lesser extent. Many Project Gutenberg books are scanned once their copyright expires, typically book editions published before 1923, so the language does not represent modern usage. For example, "hath" is listed as the 534th-most-common word. Also, with 24,000+ books, the text of the boilerplate warning for Project Gutenberg appears on each of them.
Here are the top 100 words (from Project Gutenberg texts) in alphabetical order:
The 2,000 most common words in contemporary fiction can be found here:
See the list on Simple English Wiktionary.
See the list on Simple English Wiktionary.
from Max Havelaar (numbers between parentheses denote occurrences):
Frequency lists from http://wortschatz.uni-leipzig.de/html/wliste.html with the authorization from the laboratory.
Note: these indicative lists still require some cleanup, because:
This list does not unify inflected words (with plural or feminine mark on nouns or adjectives, or conjugated verbs), and does not recognize auxiliaries of verbs at compound tenses as part of the conjugated verb, but treat auxiliaries separately for each inflected form.
From the works of Ana Bugarín presented in the ILG Lexicographical Symposium 2006 (basical form):
From the CORGA corpus (lexical items):
The rest until the 1000th position is available at CORGA corpus.
Top 100.000 words in Hungarian text: http://mokk.bme.hu/resources/webcorpus
Hungarian frequency list 1-10000
The 100 most frequent Icelandic verbs according to the verb webpage.
Icelandic verb frequency list 1-100
Top 1000 Italian words from subtitles:
Top 10000 Spanish words from subtitles: