X-Git-Url: https://git.donarmstrong.com/?a=blobdiff_plain;ds=sidebyside;f=7.1%2Fr%2Fuk-freq-class%2Fnotes.txt;fp=7.1%2Fr%2Fuk-freq-class%2Fnotes.txt;h=bc7af0a25d51acebe0e252c587f34fab6388040b;hb=01534a94130c1f5a3a230cf4fe18365a235ba271;hp=0000000000000000000000000000000000000000;hpb=7b14ba883fb1046508c44be37b4c6ba5da5feacf;p=deb_pkgs%2Fscowl.git diff --git a/7.1/r/uk-freq-class/notes.txt b/7.1/r/uk-freq-class/notes.txt new file mode 100644 index 0000000..bc7af0a --- /dev/null +++ b/7.1/r/uk-freq-class/notes.txt @@ -0,0 +1,68 @@ + UK English Wordlist With Frequency Classification + +This wordlist is primarily intended to be useful for +checking spelling. Editorial policy is conservative. + +Principal omissions: + + - words requiring a capital letter + - abbreviations + - slang + +Colloquialisms and archaisms are generally excluded. A rare +word similar to a common word may be excluded. Both -ise and +-ize spellings are included. + +The character set is: lowercase letters, hyphen, apostrophe. +Words which can be spelt with accents occur here in their +plain form. + +If this wordlist is to be used with ispell the following +lines may be appropriate for the affix file: + + boundarychars [---] + boundarychars ' + wordchars [a-z] [A-Z] + +The commonest words are labelled 16 and the least common 0. + +Coverage of common words should be good, but note the +categories excluded. + + Brian Kelk bck22@bckelk.uklinux.net + April 2002 + + +Here are bits of a brief conversation I had with the author: + +From: Brian Kelk +Date: Sat, 08 Jul 2000 20:27:21 +0100 + +> I was wondering what the copyright status of your "UK English Wordlist +> With Frequency Classification" word list as it seems to be lacking any +> copyright notice. Also, how did you arrive at the "Frequency +> Classification". + +There were many many sources in total, but any text marked +"copyright" was avoided. Locally-written documentation was one +source. An earlier version of the list resided in a filespace +called PUBLIC on the University mainframe, because it was +considered public domain. + +Briefly about frequency: rather than counting occurrences of +a word this classification is more along the lines of counting +the number of texts in which the word occurs. That way you +get some noise immunity, which you very much need. It's based +on maybe 5-10 million words of text on the Cambridge mainframe +in the 1980s. I had in mind that it might be useful for ranking +possible corrections ... + +Date: Tue, 11 Jul 2000 19:31:34 +0100 + +> So are you saying your word list is also in the public domain? + +That is the intention. + + + +