+vocabulary as the 6of12 list is for American English. (Or, alternately, it may
+simply mean that my choice of sources was too narrow.)
+
+This criteria for inclusion in this list were basically those of the 2of12inf
+list. In particular, inflections are included for all words, but hyphenated
+words, contractions, phrases, proper names and abbreviations are all excluded.
+One important difference between the two is the way in which inflections were
+determined for inclusion. The 2of12inf list includes some inflections found in
+one (or even none) of its sources. Further, as discussed in detail above, it
+includes plurals for words which are not normally considered to have plurals.
+The 2of4brif list differs in both of these regards. It includes only
+inflections endorsed by two or more of the sources, specifically excluding any
+plural forms for nouns listed as uncountable.
+
+The 2of4brif list includes no signature words as such. I made a small number of
+adjustments for consistency, such as making sure that -ise and -ize spellings
+were equally represented, and adding plurals for ordinal numbers. (Why
+fourteenth would be defined as a fraction, but not seventeenth, I must simply
+regard as a mystery.) These edits were so few, and so clearly harmless, that I
+have not marked them.
+
+Prospective users of the 2of4brif list should realize that it was compiled by
+an American. If my sources contained any glaring errors (and most dictionaries
+have a few), I might well not have noticed, and perpetuated them in the list.
+The fact that two citations were required is some protection against such an
+event, but no guarantee.
+
+As the 2of4brif list is very similar in makeup to the 2of12inf list, a user who
+wants a larger, more international list than either could reasonably merge the
+two. If you do this, you should remove the unusual plurals (marked with a "%")
+from the 2of12inf list in the process, for consistency.
+
+Note that I have deprecated the 2of4brif list. I believe that any applications
+of this list would be better off using the 3of6game list in its place.
+
+The 3of6 lists
+
+The lists 3of6game and 3of6all are new with version 6 of 12dicts. Both were
+derived from a set of six advanced learner's ESL dictionaries. The dictionaries
+can be broken down as follows:
+
+ • One strongly American-oriented dictionary.
+ • Two somewhat British-oriented dictionaries.
+ • Three international dictionaries, one from an American publisher, two from
+ a British publisher.
+
+This provided a good balance between British and American usage. My goal was to
+produce lists that contained blancmange and swede as well as applesauce and
+boysenberry. Note that some of the British dictionaries include words from
+Australian, Indian, African and Caribbean English, and a fraction of this
+vocabulary made it into the 3of6 lists.
+
+In previous versions of 12dicts, I asked users to tell me what they were doing
+with the lists. The most common answer was that they were used to supply the
+vocabulary for a word game. The 3of6game list was designed to fulfill this
+purpose. It contains only the sort of words likely to be used in a word game
+(no hyphenated words, proper names, abbreviations, contractions or phrases),
+but does contain inflections. In general, words must appear in three of the
+sources to be included. The rules, however, do provide for a number of
+(annotated) exceptions, including uncommon inflections and words whose most
+common form is either hyphenated or phrasal. Details are below.
+
+The 3of6all list is a larger list, basically containing any kind of word you
+can imagine, if found in three of the sources. As with 3+3game, some additional
+words were added as exceptions, but there are not as many of them, as the goal
+of this list is to be as faithful as reasonable to the sources.
+
+Both the 3of6game and 3of6all lists contain signature words/phrases. The
+3of6game list also contains neologisms, as game players are likely to want to
+play recently coined or popularized words.
+
+The 3of6game list
+
+The 3of6game list contains words which are listed in 3 of the 6 advanced
+learners dictionaries described above. Only words suitable for play in most
+word games are included, excluding hyphenated words, multi-word phrases,
+capitalized words, abbreviations and contractions. There are no restrictions on
+length - in particular, it contains four one-letter words: a, x (a verb meaning
+to cross out), I and O, the last two of which are included despite their
+capitalization (which is an English spelling phenomenon entirely disconnected
+from logic). In certain cases, words are present in this list despite being
+listed in fewer than three sources. This serves the purpose of offering game
+players more words in situations where lexicographers differ about what word
+forms are correct. Some exceptional situations are:
+
+ • A word is one of a set of close variants, none of which is present in three
+ of the sources. These words are marked with a "^" suffix. An example is the
+ word aqualung, which is sometimes capitalized or hyphenated.
+ • The word is a British spelling of an American word listed in three sources,
+ or an American spelling of a British word from three sources. These words
+ are marked with a "&" suffix. Examples include prolog, an American form of
+ the British prologue, and hyaena, a British spelling of the American hyena.
+ • A word is a plural of a word which only two of the sources describe as
+ countable, such as boyhoods. Similarly, adjectival inflections are added if
+ as few as two of the sources attest to it, as with frillier and frilliest.
+ • A word is an unusual inflection of a word where at least three sources
+ agree that some inflection is called for, such as the less common plural
+ planetaria of planetarium.
+ • A word is an inflection for a word used as an unusual part of speech, whose
+ meaning is closely related to a more common meaning. Examples are the verb
+ forms autopsied and autopsying, whose meanings are closely related to the
+ common meaning of the noun autopsy.
+ • A word is a unhyphenated form of a word normally hyphenated or written
+ phrasally such as ballgame, which is more commonly written ball game.
+
+Words not present in three of the source dictionaries are marked with the "$"
+suffix character if the "^" and "&" annotations do not apply.
+
+The 3of6game list includes both signature words and neologisms, marked with a
+"+" or "!" respectively. There are 520 signature words for this list,
+representing words that I feel "ought to be" included. Each signature word is
+present in at least one of the source dictionaries. Virtually all of these
+words are American English, as I am not qualified to tell whether a interesting
+Britishism like tosspot is used often enough to justify its addition as a
+signature word. Note that the presence of annotations allows a user to remove
+these extra words if she finds their addition unjustified.
+
+The 3of6game list could be combined with the 2of12inf list (minus the
+uncountable plurals) and/or 2of4brif if a larger list is required. Note that
+because 2of2inf is very strongly American, such a combination will be less
+balanced between American and British English than 3of6game itself.
+
+The 3of6all list
+
+The 3of6all list contains words which are listed in three of the six advanced
+learner's dictionaries. In contrast to the 3of6game list, no words are
+excluded, not even abbreviations, prefixes or suffixes. Most words have their
+inflections included. An exception is made for phrasal verbs and other verb
+phrases, whose inflections are completely predictable from the initial word of
+the phrase.
+
+The 3of6all list contains many phrasal verbs, such as let down, take after,
+sound off and make out, whose meanings are often quite hard for inexperienced
+students of English to guess. Phrasal verbs are marked by the ";" suffix
+character. Only four of the six source dictionaries provide phrasal verb
+information in an easy-to-collect way. For this reason, I put a phrasal verb
+into the 3of6all list even if I found it in only two of the sources.
+
+The 3of6all list contains some other words present in fewer than three of the
+dictionaries, though not as many as 3of6game. All such words are marked. The
+cases where this occurs are as follows:
+
+ • As described for the 3of6game list, a word is one of a set of close
+ variants, none of which is present in three of the sources. These words are
+ marked with a "^" suffix. For this list, in addition to differences in
+ hyphenation or single/multi-word format, variants only in capitalization or
+ (for abbreviations) the presence or absence of a period are considered
+ close.
+ • As described for the 3of6game list, a word is a British spelling of an
+ American word listed in three sources, or an American spelling of a British
+ word from three sources. These words are marked with a "&" suffix.
+ • A few other words present in fewer than three of the dictionaries are
+ added. Usually, this occurs when a word is found by three sources to have
+ the same part of speech, but the sources fail to agree on the spelling of
+ the inflection(s). An example is the word Grammy, whose plural is claimed
+ by two sources to be Grammies, and by two others to be Grammys. These words
+ are annotated with the "$" suffix.
+
+There is one other situation where an annotation suffix is used. This occurs
+when a word is shown by a majority of the sources as being used only in a few
+specific phrases, even though other dictionaries may give it a regular
+definition. An example is the word bated, which is shown by most of the sources
+as used only in the phrase with bated breath. In this case, the word is flagged
+with a ">" suffix. A search on a word so flagged will reveal the key phrase(s)
+elsewhere in the list.
+
+Recall that, sometimes, a word may have more than one suffix. An abbreviation
+shown with the ":" suffix (indicating the absence of a final period) may be
+followed by another suffix, and the combination ">^" appears upon occasion.
+
+The 3of6all list contains signature phrases, but no neologisms. The signature
+phrases are marked with the "+" suffix. The 629 3of6all signatures are all
+basic conversational idioms and common connective phrases, like I told you so,
+in front of and on the other hand. Though these phrases often show up in the
+sources in lists of idioms, they generally do not appear as separate headwords,
+which kept me from easily recording their presence. I believe, however, that
+all of these phrases are extremely common, and deserve to be included in this
+list. The signature phrases are all marked with the "+" suffix.
+
+The 5d+2a list
+
+I created the 5d+2a list (originally called 5desk) in an attempt to do a better
+/usr/dict/words (the failings of which were a large part of my motivation for
+doing 12dicts in the first place). The sorts of words admitted are the same
+sorts that /usr/dict/words traditionally contains. Though somewhat larger in
+size than many versions of /usr/dict/words, this is still a short word list,
+striving for inclusion of words one is likely to encounter rather than the
+complete jargon of every possible scientific, artistic or occult endeavor.
+
+The original 5desk list was assembled primarily from five "desk dictionaries".
+It was augmented by words from five minor sources, including a "vocabulary
+builder" and a collection of proper names. It excluded prefixes, suffixes,
+phrases, hyphenated words, contractions and most abbreviations and acronyms.
+There was no requirement for multiple listings; all qualifying words from each
+of the sources were included. Inflections of included words were not included
+themselves except when irregular, or separately defined. Variant and
+non-American spellings were not excluded, and no signature words were added.
+
+Words commonly considered to be abbreviations/acronyms were included if they
+contained at least one upper case character, and were defined with an explicit
+part of speech. This excluded items like Mr and Feb, which are abbreviations in
+the classic sense, but allowed words like DNA and ATM, which are used far more
+frequently than that which they abbreviate. While there is a trend in modern
+dictionaries to list such words as nouns (or occasionally verbs, adverbs,
+etc.), it is a trend in progress, and rather inconsistently applied. For this
+reason, the set of such words in the 5desk list is somewhat incoherent,
+including SPCA but not PETA, AIDS but not SAD, KGB but not CIA, and PDQ but not
+ASAP.
+
+When version 6 of 12dicts was released, the 5desk list was augmented by adding
+qualifying words from two advanced learner's ESL dictionaries, and as a result
+renamed to 5d+2a.txt. Both of the additional dictionaries had a strongly
+international vocabulary, causing the new list to have a less American and more
+cosmopolitan character. This increased the size of the list by about 20% to
+about 68,000 words.
+
+One class of commonly-used words is regrettably absent from the 5desk list,
+because I was unable to find a satisfactory source for them. This is the class
+of commercial names such as Exxon, Tylenol, Pepsi and Chevy. This is probably
+forgivable, as this class of names is as ephemeral and transitory as teenage
+slang. The one-time household words Kool, Ovaltine, Philco and Ipana serve now
+only as answers to trivia questions, with modern wonders like Starbucks,
+Google, Ritalin and TiVo taking their place on the tongues of the trendy.
+
+The 5d+2a list contains no signature words. I did take the liberty of adding
+the personal names of around thirty well-known individuals, mostly statesmen
+and politicians. Though the original 5desk list contained many such names from
+all periods of human history, I have not found a useful source to bring the
+list into the twenty-first century. At the same time, I felt that distributing
+a list full of names that did not include Cheney and Obama was not reasonable.
+So I compromised by adding a few names whose historical significance was clear
+to me, until such time as a better source than my own memories of the last 15
+years can be found.
+
+The 5d+2a list has clearly moved beyond any "core vocabulary" concept. It
+includes quite esoteric words (ogee, pleonastic), very uncommon spellings (
+thiamine, yuppy), and obscure geographical and historical names (Paricutin,
+Nevelson). Like the traditional /usr/dict/words, it is frequently inconsistent
+and arbitrary, but I hope at the least I have avoided including spelling
+errors, and overlooking the stuff of everyday conversation. Perhaps it will be
+useful as a compromise between basic lists such as 3esl, and truly massive
+lists like Mendel Cooper's ENABLE.
+
+The lemmatized 12dicts lists
+
+Version 6 of 12dicts provides three lemmatized lists combining words from the
+2of12inf, 3of6game and 2of4brif lists. The word "lemmatized" is a rare word,
+which you will find in none of these lists, but what it means is that these
+lists are formatted as a collection of word sets, called lemmas (or lemmata, if
+you're into irregular plurals), each set composed of a headword and some number
+(possibly zero) of closely related words. Two of these lists were introduced in
+version 5 of 12dicts, but they have undergone major revisions since then.
+
+The three lists are 2+2+3lem (originally 2+2lemma), 2+2+3frq (originally
+2+2gfreq) and 2+2+3cmn. 2+2+3lem simply arranges the words of the three source
+lists into lemmas and lists them alphabetically by headword. 2+2+3frq arranges
+the same lemmas by approximate order of their frequency of usage, computed with
+the help of a frequency list obtained from Brigham Young University (BYU),
+omitting those words and lemmas whose usage is so small that they fail to show
+up in the BYU material. 2+2+3cmn extracts a subset of the lemmas of 2+2+3lem,
+namely those lemmas with a certain minimum level of usage (approximately the
+level of the word butterscotch), and lists them alphabetically by headword.
+This is yet another attempt in 12dicts to generate a core English vocabulary.
+
+The advantage of a lemmatized presentation of words is that it puts related
+words together, even when spellings differ greatly, as for be, are, is and were
+. A moderate disadvantage is that the same word can appear in more than one
+lemma, such as putting, which is present in the lemmas headed by both put and
+putt. Overall, I find the lemmatized format to be clearer and more useful than
+a simple alphabetized list, and I rather wish I had released the other lists
+which include inflections in that format.
+
+The following table summarizes the contents of each of the lists in the
+Lemmatized directory, ordered by size in words:
+┌─────────────────┬────────┬────────┬────────┐
+│ │2+2+3cmn│2+2+3frq│2+2+3lem│
+├─────────────────┼────────┼────────┼────────┤
+│Size (Words) │25,000 │34,000 │84,000 │
+├─────────────────┼────────┼────────┼────────┤
+│Number of Sources│21 │21 │21 │
+├─────────────────┼────────┼────────┼────────┤
+│American English │Y │Y │Y │
+├─────────────────┼────────┼────────┼────────┤
+│British English │Some │Some │Y │
+├─────────────────┼────────┼────────┼────────┤
+│Ordinary words │Y │Y │Y │
+├─────────────────┼────────┼────────┼────────┤
+│Inflections │Some │Some │Y │
+├─────────────────┼────────┼────────┼────────┤
+│Hyphenations │Some │Some │Y │
+├─────────────────┼────────┼────────┼────────┤
+│Phrases │– │– │– │
+├─────────────────┼────────┼────────┼────────┤
+│Names │Some │Some │– │
+├─────────────────┼────────┼────────┼────────┤
+│Abbreviations │Some │Some │– │
+├─────────────────┼────────┼────────┼────────┤
+│Acronyms │Some │Some │– │
+├─────────────────┼────────┼────────┼────────┤
+│Prefixes/Suffixes│– │– │– │
+├─────────────────┼────────┼────────┼────────┤
+│Signature words │Y │* │* │
+├─────────────────┼────────┼────────┼────────┤
+│Neologisms │A few │A few │Y │
+├─────────────────┼────────┼────────┼────────┤
+│Annotations │Y │Y │Y │
+└─────────────────┴────────┴────────┴────────┘
+
+A * in the "Signature Words" row means that signature words associated with
+some other list may be present, but there are no signature words associated
+specifically with that list.
+
+The 2+2+3lem list
+
+The list 2+2+3lem.txt contains the words in the 2of12inf, 2of4brif and 3of3game
+lists. Also, the new words from the neol2016.txt list have been added, marked
+with a "!" if they would not have otherwise been included. (Marking the new
+words permits them to be removed if it is preferred for this list to be in
+synch with the other 12dicts lists.) Furthermore, some high-frequency
+hyphenated words from 2of12.txt and 3of6all have been added. These words were
+originally added to the lemmatized frequency list (see below), and I liked the
+results so much that I added them to this list as well. Finally, British forms
+of words in the 2of12inf list not already in the other lists have been added.
+Words marked with a % in the 2of12inf list ("Scrabble plurals") have however
+been omitted.
+
+In the previous version of 12dicts, the 2+2+3lem list was called 2+2lemma. The
+only significant changes were the addition of new words, and switching from "+"
+to "!" to mark neologisms in the list.
+
+The 2+2+3lem list is not formatted as a simple list of words. It is composed of
+entries of 1 or 2 lines each. The first line contains a headword, and the
+second line, which is indented if present, contains an alphabetized list of
+related words. A simple example:
+
+funny
+ funnier, funnies, funniest, funnily, funniness
+
+The list of related words contains three sorts of entries.
+
+ 1. Inflections.
+
+ 2. Variant spellings.
+
+ 3. Words formed with certain suffixes.
+
+In addition to true variant spellings such as grey for gray and thru for
+through, item 2 also includes words which, though pronounced differently, are
+clearly variants of the headword. Thus, hooray is considered a variant of
+hurrah (but mere synonyms like furze and gorse remain independent).
+
+Item 3 is based on a small list of suffixes, producing closely and consistently
+related words. These suffixes are -ful, -ish, -less, -like, -ly, -most and
+-ness. -ally is also allowed, if there is no -al word to apply the -ly suffix
+to. (For instance, basically is considered to be derived from basic, because
+there is no word basical.) When one of these suffixes is used in an unusual
+way, the resulting word is considered independent. For instance, likely is not
+considered to be derived from like, nor bashful from bash. There are some
+rather difficult questions here, such as how closely slavish is related to
+slave, or sluggish to slug. In general, I have chosen the course of least
+surprise by treating such pairs as independent.
+
+Here are some other notes on the determination of what words are related.
+
+Certain uses of the suffixes -ed and -s are treated as inflections, even though
+technically they are not. Thus, talented is treated as derived from talent, and
+optics from optic.
+
+Words ending with the suffix -ability/ibility are treated as relatives of the
+corresponding -able/ible word.
+
+Sometimes, the choice of which variant to treat as the headword is somewhat
+arbitrary. I have consistently chosen an American spelling over a British
+spelling here. This has some effect on the number of headwords. I treat cheque
+as a variant of check, whereas, to an observer with a British bias, they would
+no doubt be separate headwords.
+
+No distinction is made of different meanings of the same word, even when they
+are so different that dictionaries list them separately. wind the noun and wind
+the verb are considered as a single word, as are second the adjective, second
+the noun and second the verb.
+
+It may sometimes happen that two different words have the same inflection (
+putting derives both from putt and put; holier relates to holey as well as holy
+), or that an inflection is a headword in its own right (as with wound, the
+past tense of wind, or crooked, the past tense of crook). These situations are
+noted in the 2+2+3lem list as cross-references to the alternate headword. There
+are two specific situations which might not be obvious where inflections are
+treated as different words. These occur when a present tense form or a -ness
+word has a plural inflection, as with meaning and weakness. Such words are
+always made headwords, even when the relationship to the original root is very
+close. Here is an example showing how cross-references are indicated:
+
+base
+ based, baseless, basely, baseness, baser, bases -> [basis], basest, basing
+
+Almost always, a given word has only one cross-reference - the biggest
+exception is the incredible tangle shown in the example below:
+
+slue -> [slough]
+ slew -> [slay, slew, slough], slewed, slewing, slews -> [slew, slough],
+slued, slues -> [slough], sluing
+
+where 4 uncommon words mostly pronounced sloo have become thoroughly confused.
+
+The 2+2+3frq list
+
+In the previous version of 12dicts, there was a file called 2+2gfreq.txt. This
+file has been completely replaced by a new implementation of the same idea.
+Like the older list, the 2+2+3frq list presents the lemmas of 2+2+3lem in bands
+of lemmas with about the same frequency of use. However, there are the
+following major differences from what was done before:
+
+ • In the previous version, word frequency information was obtained from data
+ collected from the World Wide Web supplied by Google. This data was very
+ voluminous, but was quite distorted by the Web's emphasis on computerese,
+ pornography and marketing. I am now using a commercial word frequency
+ database, supplied by Brigham Young University, based on its Corpus of
+ Contemporary American English (COCA). This data is less voluminous than the
+ Google data, but is far more balanced and seemingly trustworthy. It has
+ some other advantages, discussed below.
+ • High-frequency hyphenated words from 2of12inf and 3of6all have been added.
+ I liked the effect of this so much that I added the same words to the
+ 2+2+3lem list.
+ • A certain number of high frequency abbreviations, contractions and
+ capitalized words were added. Some of these words were not to be found in
+ any other 12dicts list, for which reason I did not also add them to
+ 2+2+3lem.
+ • The list was shortened by omitting all lemmas which did not appear at all
+ in the BYU data.
+ • Individual lemmas were shortened by omitting very infrequent words and all
+ regular inflections, except when they were used frequently as a part of
+ speech different from the headword, such as disappointed as an adjective
+ rather than a verb form.
+
+The lemmas of 2+2+3frq are grouped into bands by the combined number of
+occurrences in the BYU data of the words in the lemmas. Band 21 contains lemmas
+whose words together appear between 16 and 31 times in the BYU data. Each other
+band contains lemmas of twice the frequency of the following band, that is,
+each lemma in band 20 appears in the BYU data between 32 and 63 times, and so
+on. The first band contains the three lemmas most frequently used in the
+English language (according to BYU), namely the, be (plus its inflections) and
+to. As already noted, some words are found in multiple lemmas. One helpful
+aspect of the BYU data is that it separates frequency data for a word by parts
+of speech, and notes the base word for inflected words. This often allows the
+frequency counts for a word like building to be accumulated under the correct
+lemma (either build or building). In the event that the BYU data is unable to
+completely resolve the appropriate lemma for a word, its frequency count is
+divided equally among the various candidates.
+
+2+2+3frq is divided into bands by lines like this:
+
+----- 5 -----
+
+The lemmas in each band are presented in alphabetical order, not by the
+frequency of the individual lemma.
+
+Note that because the BYU data was extracted from a corpus of American English,
+the 2+2+3frq file tilts in an American direction, though some British words
+like bloke, colour and lorry have made it through.
+
+A useful attribute of the BYU data is that it, unlike the Google data, includes
+hyphenated words, as well as some abbreviations, contractions and capitalized
+words. The two cases are rather different. The inclusion of hyphenated words is
+explicitly intended. However, the BYU documentation states that proper names
+have been excluded where possible, while admitting that, in many cases, the
+software processing the data was unable to be sure whether a word was a proper
+name or not, in which case the word was included. The effect is that there are
+many words generally considered to be proper names present, notably the names
+of months of the year and days of the week, plus those of religions,
+nationalities and ideologies. You will not find names like linda, picasso,
+vladivostok, microsoft or rumpelstiltskin in the data, but you will find
+november, buddhist, peruvian and marxist, to the extent that I wonder if BYU
+might have used a different definition of "proper name" than the one I was
+taught in school. As for abbreviations, the BYU documentation makes no mention
+of them, but there are some very familiar abbreviations in the data. There are
+not a lot of them, which makes me wonder whether their presence was intentional
+or a processing error. Either way, I have no reason to doubt their frequency
+counts.
+
+I decided that I wanted to add high-frequency hyphenated words, proper names
+and abbreviations to the frequency list, as I consider this data to be very
+interesting. When I did so, I discovered in band 17 the words atlantean and
+klingon. I really don't think that these words have anywhere close to the same
+frequency as armband and carpool, which are also present in band 17. This makes
+me suspect that, for words of this frequency or less, the BYU data is starting
+to become less reliable. For this reason, I decided to stop adding hyphenated
+words, capitalized words, contractions and abbreviations after band 17.
+
+In the case of hyphenated words, I added them to the 2+2+3frq list only if they
+were present in either 2of12.txt or 3of6all.txt. I also added these words to
+the 2+2+3lem list. In the case of abbreviations and capitalized words, there
+were not all that many of them, and some of them were not present in any other
+12dicts list, such as Americanist, Thatcherism and, of course, Klingon. For
+this reason, when I added capitalized words, contractions and abbreviations to
+2+2+3frq, I parenthesized them to indicate that their presence had nothing to
+do with any source but the BYU data. The same consideration led me to omit
+these words from the 2+2+3lem list.
+
+The 2+2+3frq list is considerably smaller than the previous 2+2gfreq list due
+to my decision to drop lemmas which were absent from the BYU data, especially
+since the BYU data was considerably less voluminous and so left out many more
+words than the Google data. In addition, I observed that many high-frequency
+lemmas contained unusual spellings and archaic forms that were not present in
+the BYU data, such as cocoanut, iodin and didst, and decided to drop
+non-headwords from the lemmas unless their frequency was at or above the level
+of band 17. A similar decision was made to drop regular inflections from the
+lemmas in the 2+2+3frq list unless they had high frequency with a different
+part of speech, for example, loving as an adjective or fighting as a noun.
+Finally, I chose to drop the word/lemma cross-references from the 2+2+3frq
+list, replacing them with a * indicating that a word was to be found under
+another headword (though it might have been suppressed if it was a regular
+inflection).
+
+As an example of how this works out in practice, here is the lemma for time
+from 2+2+3lem:
+
+time
+ timed, timeless, timelessly, timelessness, times, timing -> [timing]
+
+and here is the condensed version from 2+2+3frq.
+
+time
+ timed, timeless
+
+The words timelessly and timelessness are not used often enough (according to
+BYU) to mention in the frequency list, while the word times was not frequently
+used except as a form of time, and, while the word timing was frequently used
+as a noun, its counts were collected under the lemma timing rather than time.
+
+The 2+2+3cmn list
+
+The 2+2+3cmn list is a relatively simple transformation of the 2+2+3frq list,
+in yet another attempt to produce a "core English" word list. It is composed of
+the lemmas of the 2+2+3frq list from bands 1 through 17, sorted in alphabetical
+order by headword. Minor formatting differences are that the "!" is removed
+from neologisms, and the parentheses are removed from capitalized words,
+abbreviations and contractions.
+
+I have added 67 signature words to 2+2+3cmn, which are abbreviations,
+contractions and capitalized words (mostly contractions) which I know to be
+extremely high frequency, but which were not present in the BYU data, words
+such as can't, Mr. and DVD. These words are marked with a + to indicate their
+absence from the 2+2+3frq source data.
+
+Like 2+2+3frq, 2+2+3cmn tilts strongly in the direction of American English.
+
+Because all the words of 2+2+3cmn are of moderately high frequency (assuming
+the BYU data is to be trusted), it probably is a better claimant than either
+2of5core or 3esl to truly representing a core English vocabulary, at least of
+the American variety.
+
+Specialized 12 dicts lists
+
+The following table summarizes the contents of each of the lists in the Special
+directory, ordered by size in words:
+┌─────────────────┬────────┬────────┬───────┐
+│ │neol2016│2of5core│6phrase│
+├─────────────────┼────────┼────────┼───────┤
+│Size (Words) │600 │4,700 │22,000 │
+├─────────────────┼────────┼────────┼───────┤
+│Number of Sources│0 │5 │6 │
+├─────────────────┼────────┼────────┼───────┤
+│American English │Y │Y │Y │
+├─────────────────┼────────┼────────┼───────┤
+│British English │A little│Y │Y │
+├─────────────────┼────────┼────────┼───────┤
+│Ordinary words │Y │Y │– │
+├─────────────────┼────────┼────────┼───────┤
+│Inflections │Y │– │– │
+├─────────────────┼────────┼────────┼───────┤
+│Hyphenations │Y │A few │– │
+├─────────────────┼────────┼────────┼───────┤
+│Phrases │Y │A few │Y │
+├─────────────────┼────────┼────────┼───────┤
+│Names │Y │A few │A few │
+├─────────────────┼────────┼────────┼───────┤
+│Abbreviations │Y │A few │A few │
+├─────────────────┼────────┼────────┼───────┤
+│Acronyms │Y │A few │– │
+├─────────────────┼────────┼────────┼───────┤
+│Prefixes/Suffixes│– │– │– │
+├─────────────────┼────────┼────────┼───────┤
+│Signature words │– │– │* │
+├─────────────────┼────────┼────────┼───────┤
+│Neologisms │Y │– │– │
+├─────────────────┼────────┼────────┼───────┤
+│Annotations │Y │N │Y │
+└─────────────────┴────────┴────────┴───────┘
+
+A * in the "Signature Words" row means that signature words associated with
+some other list may be present, but there are no signature words associated
+specifically with that list.
+
+The neol2016 list
+
+The neol2016 list is a very simple list of new or newly recognized words, as
+described above. It is comprised of three parts, separated by blank lines.
+
+The first part lists regular (non-hyphenated, non-capitalized) words together
+with their inflections and variants, laid out similarly to the 2+2+3lem list.
+It includes plurals for uncountable nouns, marked with a "%" suffix. These
+words (except for the uncountable plurals) have been pre-added to the 2of12inf
+and 3of6game lists, suffixed with "!", allowing them to be easily removed if
+desired.
+
+The second part of the file is a small set of words for which additional
+inflections have been added. This portion of the file is in the same format as
+the first list. These inflections have also been added to the 2of12inf and
+3of6game lists.
+
+The third part of the file contains new words and phrases which are not regular
+words: hyphenated words, multi-word phrases, proper names, abbreviations and
+acronyms. These words have not been pre-added to any other list.
+
+In all cases, users are encouraged to add some or all of these words to any of
+the other lists, as they feel appropriate.
+
+The 2of5core list
+
+Five of the six advanced learner's ESL dictionaries from which the 3of6 lists
+were compiled mark a subset of their words as being important words which every
+student of English should master. These subsets vary widely from dictionary to
+dictionary. As one of the original goals of the 12dicts project was to compile
+a list representing the English core vocabulary, I thought it would be
+interesting to combine these lists. My original thought was to provide a list
+that was simply the union of the marked subsets for each source. However, one
+particular dictionary had at least twice as many words in its subset as any of
+the others, and in many cases the words seemed to me to be poorly chosen. (Do
+moor and cash flow seem like key English language concepts to you?) So I chose
+when assembling my list to require that all words be marked as important words
+by at least two of the sources. The result was the 2of5core list, which
+contains about 4,700 words.
+
+While most words selected in this way were the same in American and British
+English, some belonged to one variant or the other. In some cases, a word
+appeared in two forms, such as center and centre. When I observed that a word
+was present in two forms, I combined them into a single line, for example
+center/centre. No other changes were made to the list.
+
+Due to the way in which the list was constructed, it seems somewhat haphazard.
+You may want to check out the Oxford 3000™, a list of 3000 words available from
+Oxford University, which is a core vocabulary created by lexicographers, to my
+eye superior to the 2of5core list.
+
+The 6phrase list
+
+When I was compiling the 3of6all list, I noticed something interesting. There
+were an extraordinary number of phrases listed by only one of the sources. Many
+of these were extremely common phrases, which I would expect most experienced
+English speakers to understand. So, naturally, I decided to compile them all
+into a list.
+
+The 6phrase list contains all multi-word phrases from any of the six advanced
+learner's dictionaries which were used as sources for 3of6all, all 22,000 of
+them. The list does not include inflections, except in a few cases where a
+plural cannot easily be guessed from the words in a phrase. Usually, this
+happens for phrases of non-English origin, such as eau de cologne, whose plural
+is eaux de cologne. The list includes phrasal verbs, which are suffixed by the
+";" character, as in the 3of6all list. The list is sorted in a different order
+than the lexicographical ordering used by the other lists, in order to group
+all phrases starting with the same word together.
+
+You will observe that the same phrase will often be repeated several times in
+the list, with slightly different spelling, capitalization and/or hyphenation.
+No attempt was made to edit the list to remove or reduce such "clutter".
+
+The 6phrase list includes the 3of6all signature phrases. These are not marked
+with a suffix.
+
+In contrast to most of the other lists, I am unable to think of any
+applications of the 6phrase list. But I find it rather interesting, which is
+why I'm bothering to include it. At the very least, it may serve as an
+illustration of the incredible richness of the English language, without even
+venturing into vocabulary too esoteric to be included in a learner's
+dictionary.