How Many Words Does It Take To Understand A Language?

In a study by Mark Davies at Brigham Young University in the USA it was found that when learning a language, simply knowing the 1000 most frequently used words should give a person a sufficient basis to understand around 88% of all oral speech.

In the study, words are broken down into lexemes; the most basic form of a word. For example the word ‘plant’ is both a noun and a verb, both with different meanings. For the purposes of Davies study a word such as ‘plant’ would count twice in the list of frequently used words as it has two lexemes. Both lexemes need to be learned so therefore the word is learned twice. However, in the case of a verb such as ‘to be’ then all associated derivatives such as am, is, are etc are counted as one lexeme as they all have the same overall meaning.

The body of knowledge from which words are taken also needs to be taken into account when formulating a list of the 1000 most frequently used words. If you compose your list by primarily using Shakespeare’s scripts then many of the words would not be relevant to modern society.

The range of a word also needs to be taken into account. If you use 5 books to compile your list and one of the books is a specific medical journal then you may find words such as ‘haemoglobin’ and ‘tibia’ cropping up regularly even though they don’t often get used in conversation.

The overall findings of Davies study showed that learning the most frequently used 1000 words of a language can allow a person to understand 76.0% of all non-fiction writing, 79.6% of all fiction writing, and 87.8% of all oral speech. Learning 2000 frequently used words increases a person’s understanding to 84% for non-fiction, 86.1% for fiction, and 92.7% for oral speech. And learning 3000 words further increases the understanding of the language to 88.2% for non-fiction, 89.6% for fiction, and 94.0% for oral speech.

Learning the top 1000 words of a language should enable a person to be able to converse with a native speaker of the language. It’s likely that the conversation won’t flow completely smoothly as there will still be words that need to be explained or looked-up in a dictionary. However, the general gist of the conversation will be understood by only knowing 1000 of the top used words.