X-Git-Url: https://git.donarmstrong.com/?a=blobdiff_plain;ds=sidebyside;f=6%2Fr%2Fuk-freq-class%2Fnotes.txt;fp=6%2Fr%2Fuk-freq-class%2Fnotes.txt;h=0000000000000000000000000000000000000000;hb=b13ea8a082364672c6de2b010e558211ff52ec9a;hp=bc7af0a25d51acebe0e252c587f34fab6388040b;hpb=01534a94130c1f5a3a230cf4fe18365a235ba271;p=deb_pkgs%2Fscowl.git diff --git a/6/r/uk-freq-class/notes.txt b/6/r/uk-freq-class/notes.txt deleted file mode 100644 index bc7af0a..0000000 --- a/6/r/uk-freq-class/notes.txt +++ /dev/null @@ -1,68 +0,0 @@ - UK English Wordlist With Frequency Classification - -This wordlist is primarily intended to be useful for -checking spelling. Editorial policy is conservative. - -Principal omissions: - - - words requiring a capital letter - - abbreviations - - slang - -Colloquialisms and archaisms are generally excluded. A rare -word similar to a common word may be excluded. Both -ise and --ize spellings are included. - -The character set is: lowercase letters, hyphen, apostrophe. -Words which can be spelt with accents occur here in their -plain form. - -If this wordlist is to be used with ispell the following -lines may be appropriate for the affix file: - - boundarychars [---] - boundarychars ' - wordchars [a-z] [A-Z] - -The commonest words are labelled 16 and the least common 0. - -Coverage of common words should be good, but note the -categories excluded. - - Brian Kelk bck22@bckelk.uklinux.net - April 2002 - - -Here are bits of a brief conversation I had with the author: - -From: Brian Kelk -Date: Sat, 08 Jul 2000 20:27:21 +0100 - -> I was wondering what the copyright status of your "UK English Wordlist -> With Frequency Classification" word list as it seems to be lacking any -> copyright notice. Also, how did you arrive at the "Frequency -> Classification". - -There were many many sources in total, but any text marked -"copyright" was avoided. Locally-written documentation was one -source. An earlier version of the list resided in a filespace -called PUBLIC on the University mainframe, because it was -considered public domain. - -Briefly about frequency: rather than counting occurrences of -a word this classification is more along the lines of counting -the number of texts in which the word occurs. That way you -get some noise immunity, which you very much need. It's based -on maybe 5-10 million words of text on the Cambridge mainframe -in the 1980s. I had in mind that it might be useful for ranking -possible corrections ... - -Date: Tue, 11 Jul 2000 19:31:34 +0100 - -> So are you saying your word list is also in the public domain? - -That is the intention. - - - -