Unofficial Alternate 12Dicts package (Alt12Dicts) Files by Alan Beale Packaged by Kevin Atkinson Revision 4 August 6, 2004 The files contained in this archive are the result of a rather extensive conversation between me (Kevin Atkinson) and Alan Beale, the author of the 12Dicts package. I can be contacted at kevina@gnu.org and Alan Beale can be contacted at biljir@pobox.com. This archive contains almost all the information in the official 12Dicts package but in a different format as well as a good deal of additional information. However it is not meant as a replacement for the official 12Dicts package. It simply offers the information in a different way. This package corresponds to Version 4.0 of the official 12Dicts package. The latest version of this package and the official 12Dicts package can be found at http://aspell.sourceforge.net/wl/. The file README-orig contains the original Readme file distributed with the official 12Dicts package. README-infl contains the Readme file for 2of12infl.txt and finally README-agid contains the Readme for AGID which 2of12infl.txt is based on. All of these files have been explicitly placed in the Public Domain by Alan Beale. 2of12full.txt description: The file 2of12full.txt contains the all words appearing in more than than one of Alan Beale's source dictionaries. Each line contains four numbers, being the total number of dictionaries, the non-variant entries, the variant entries, and the non-American entries. Counts of zero are replaced by hyphens. For instance, the entry 7: - 2# 5& aeroplane indicates that the word "aeroplane" is listed in 7 of the dictionaries. None list it as a primary American word, 2 list it as a variant form, and 5 list it as a non-American word. Note that words may be marked with a "&" for either of 2 reasons. They may represent a non-American spelling of an American word, such as "aeroplane" or "gaol", or they may represent a word not normally used in American English, such as "bloke" or "lorry". Words marked with a colon (":") after it are abbrivations which are entirely lower-case and alphabetic. This file contains almost all the information found in the normal 12Dicts package except for the marking of "second class", the inclusion of "signature words" which did not appear in at least two dictionaries. A second class word is a word that that an inflection which was defined in the same entry as the base word, is a derived word (-ly, -ness or -er/or) which was not defined in a separate entry, or appeared in a list of undefined words with a common prefix, such as un- or re-. signature.txt description: The file signature.txt contains a list of signature words. Signature words are words are words which failed are not in at least 6 dictionaries but Alan Beale thought should be included at the 6of12 level (see README-orig). Examples of some of the sorts of words are included are: 1. Words of the same category as other included words. An example is the astrological sign "Cancer", which alone of all the astro- logical signs fails to appear in 6 or more of the dictionaries. Similarly added were the omitted holidays "Thanksgiving" and "Valentine's Day". 2. Vulgarities, sexual terms and insults. Some such words were already included, but most of the source dictionaries were quite squeamish about them. These words are very widely known indeed; I hold that any list of "common" words which does not include the infamous f-word is simply discredited thereby. Some may feel that it would have been better to leave some or all of these terms unmentioned. Nevertheless, the expression of blasphemy, unwarranted contempt, and perverse lust, whether in words or in deeds, is a very human trait. Suppressing the evidence of these aspects of the human condition in our language makes no more sense than excluding "leprosy", "gangrene" and "dementia", no matter how unpleasant they may be to contemplate. 3. Conventional conversational phrases so common as to be practically invisible to native speakers. Examples are "thank you", "good night", "uh-huh", "of course" and "gesundheit". 4. Sports terminology, especially for football and baseball. signature2.txt description: The file signature2.txt contains inflections of irregular verbs not explicitly mentioned in 2 source dictionaries, such as "outfought" and "reheard". variants.txt description: The variants.txt file contains a subset of the words appearing in at least one of the 12 source dictionaries marked as variants or non-American. This list contains only the words which are spelling variants, words which represent different ways of saying the same thing (such as "henceforward" as a variant of "henceforth") and non-American words without a similar American form (such as "telly") have been removed. Each entry is followed by a tab, and a notation indicating which of several classes the word falls into. To describe the classes, it is best to do a little algebra. Let NV be the total number of non-variants, A the number of American variants, B the number of non-American variants, and V=A+B. Then the following annotations are to be interpreted as follows: #! - A >= B, NV = 0 &! - A < B, NV = 0 # - A >= B, V > NV & - A < B, V > NV #? - A >= B, 0.65*NV < V <= NV &? - A < B, 0.65*NV < V <= NV Simplifying, the choice between # and & indicates which variety of variant dominates, while ! and ? indicate a stronger or weaker than average agreement on variance. Additional notes on the list from Alan: I should note a couple other characteristics of this file. First of all, there are cases where spellings exist which are clearly variants of one another, but where this is not recognized by the source dictionaries. An example is the pair "levelheaded" and "level-headed". These are clearly the same word, but none of my sources lists both of them. I have chosen not to go beyond the source dictionaries and put such words on the variants list, even in obvious cases like this one. I should also note that there are cases where the question of whether 2 words are spelling variants or actually different words is not easy to answer. For instance, consider the pairs "lengthways"/"lengthwise" or "toward"/"towards". I've simply made whatever decision seemed best to me in cases like this ("lengthways" is a variant, "towards" is not), but recognize that any other observer (who could bring himself to care) would be likely to occasionally disagree. abbr.txt description: This file contains (almost) all the abbreviations and acronyms from the 12Dicts sources. Abbreviations which also in a list of common personal names (of about the same completeness as the ESL dictionaries) are marked with a tilda ("~") after it. There are still likely to be some abbreviations not marked with a tilda that match less common names. Additional notes from Alan: For words containing upper-case, I [Alan Beale] had not recorded whether a word was an abbreviation, so I was forced to remove the non-abbreviations from the list by hand. Because of the need to remove non-abbreviations, I limited myself to consideration of upper-case words of 6 or fewer characters. It is possible that a small number of acronyms or abbreviations longer than 6 characters might have been missed. variant-notes.txt description: The file variant-notes.txt contains some additional notes on questionable variants sent to me when I pointed out that nought was not marked as a variant. 2of12full.txt description: See README-infl 2of4brif.txt, 3esl.txt, and 5desk.txt description: These files are identical to the orignal files in the 12Dicts package. See README-orig for more info.