X-Git-Url: https://git.donarmstrong.com/?a=blobdiff_plain;f=r%2Fspecial%2FREADME;fp=r%2Fspecial%2FREADME;h=4ef9efa7515bd7710931b7e59e318321672eaf0b;hb=b13ea8a082364672c6de2b010e558211ff52ec9a;hp=0000000000000000000000000000000000000000;hpb=01534a94130c1f5a3a230cf4fe18365a235ba271;p=deb_pkgs%2Fscowl.git diff --git a/r/special/README b/r/special/README new file mode 100644 index 0000000..4ef9efa --- /dev/null +++ b/r/special/README @@ -0,0 +1,136 @@ +This directory contains numerous special word list which I have +created myself. + +abbreviations: + +A list of commonly used abbreviations and acronyms, especially in +conversational text such as email. + +abbreviations-also: + +Some additional abbreviations which I did not want to include at the +lower word list sizes. + +frequent: + +A combination of the two top 1000 lists found in the mwords package. +I carefully went though and weeded out words which where an artifact +of the corpus used. + +letters: + +A list of single letters and there inflected forms + +names.from_alan_beale: + +A list of names (version 5.2) sent to be by Alan Beale : + + I have a large list of proper names, whose origins are in the + linux-words proper names, but which both removes a lot of (what I + considered to be) junk entries, and adds a lot of names of various + sorts, notably names of commercial products and noteworthy + historical personages. + +never-abbreviations: + +A list of words that I do not consider abbreviations. + +never-variant: + +A list of words I do not consider variants. + +not-possessive: + +A list of nouns which should not take a possessive form with "'s". + +proper-names: + +A list of additional proper names. + +roman-numerals: + +A list of roman numerous originally extracted from the ispell word +lists. + +signature.35: + +A small list of words that I thought really out to be at the 35 level. + +signature.??: + +Additional words to add at the respective level. + +marco-alan.??: + +Words Marco A.G.Pinto proposed to add that Alan Beale also thought +should be added. Words with 3 stars or more (see +app.aspell.net:/lookup-freq) are added at the 60 and others at the 70 +level. + +extra.60: + +Non-signature words suggested for inclusion by others that are +recognized by most dictionaries but not all that common. + +macro-alan-manual.70: + +Words Marco A.G.Pinto and Alan Beale though should be added that I +(Kevin Atkinson) for one reason or another didn't want to add at the +60 level. The most likely reasons are that the word is too similar to +a more common word or a compound word that is normally spelled as two +words or with a hyphen. + +macro.80: + +Words that are Macro added to en_GB not in one of the above lists. + +unix-terms: + +A list of commonly used unix terms often used as regular english words +by geeks. + +variant: + +A list of words which the 12dicts package does not consider variants +but I do. + +not-upper: + +Normal words that just happen to start with an uppercase words and +have no relation to a proper name, for example OK AWOL + +2800-ptr: + +Words from "2800 Personality Trait Descriptors" (1967), see +https://sourceforge.net/p/wordlist/issues/60/ + +neol2015.txt: + +Draft version of Alan Beale's latest neologism list. See +http://wordlist.aspell.net/12dicts-readme-r5/ for more details on the +format of the list. + +neol2015.poss: + +Possessive forms for words in neol2015.txt. + +exclude.??: + +Words to exclude up to the specified level (and hence bump them to the +next level.) Used mostly for obscure words that are very similar to a +far more common word and hence could mask the misspelling of the more +common word. + +hacker-exclude: + +List of words found in the hacker category that are not found anywhere +in Google Book's corpus (1980-2008) and thus should in all likelihood +not be included considering "words" such as FTPing, grepped, etc. are +in the corpus. + +prefixes: + +Common prefixes that are often followed used with a dash. For example +"multi-". Note "pre" is left out as it is too close to "per" which is more +common. +