X-Git-Url: https://git.donarmstrong.com/?a=blobdiff_plain;f=current%2Fr%2Fvarcon%2FREADME;fp=current%2Fr%2Fvarcon%2FREADME;h=0000000000000000000000000000000000000000;hb=b13ea8a082364672c6de2b010e558211ff52ec9a;hp=c69c206c5819fd4a4ec5af9e123efcaf6855cd2e;hpb=01534a94130c1f5a3a230cf4fe18365a235ba271;p=deb_pkgs%2Fscowl.git diff --git a/current/r/varcon/README b/current/r/varcon/README deleted file mode 100644 index c69c206..0000000 --- a/current/r/varcon/README +++ /dev/null @@ -1,385 +0,0 @@ -Variant Conversion Info (VarCon) - -Revision 5.1 (SVN Revision 161) - -January 6, 2011 - -Copyright 2000-2011 by Kevin Atkinson (kevina@gnu.org) - -This package contains information to convert between American, -British, and Canadian spellings and vocabulary as well and other -variant information. - -The latest version can be found at http://wordlist.sourceforge.net/. - -The main data file is varcon.txt. It contains information on the -preferred American, British, and Canadian spelling of a word as well -as other variant information. - -Each line contains a mapping between the various spellings of a word. -Words are tageed to indicate where the spelling is used, and each -word/tag pair is separated with a " / ". For example in the line: - A Cv: acknowledgment / Av B C: acknowledgement -"acknowledgment" and "acknowledgement" are two spellings of the same -word and "A", "Cv", "B", etc are the tags. Tags are seperated by -spaces and the group of tags is seperated from the word with a ": ". -Here, "acknowledgment" is the preferred American spelling (as -indicated by the "A") of the word, and "acknowledgement" is the -preferred Canadian and British spelling ("B" and "C"). However the -American spelling is sometimes used in Canada (as indicated by "Cv", -where the lowercase "v" indicated a variant form) and the British -spelling is sometimes used in America (as indicated the the "Av"). - -More generally each tag consists of a spelling category (for example -"A") followed possible by a variant indicator. The spelling -categories are as follows: - A: American - B: British "ise" spelling - Z: British "ize" spelling or OED prefered Spelling - C: Canadian - _: Other (Variant info based on American dictionaries, never used - with any of the above). -and the variants tags are as follows: - .: equal - v: variant - V: seldom used variant - -: possible variant, should generally not used - x: improper variant (should not use) - -The "." or equal variant tags are reserved for special cases when -there is little agreement between dictionaries or when I think the -dictionary is wrong. The "v" indicator is used for most words marked -as variants in the dictionary. However, some variants will be demoted -to a "V". For example, if the variant is marked as "also" by -Merriam-Webster, or also if only some dictionaries acknowledge the -existence the variant. "-" is used when the variant is generally not -listed is the dictionary but I could find some evidence of it use, or -when it is it marked as as a archaic spelling for the word. The "x" -is used when the spelling is almost generally considered a -misspelling, and is only included for completeness. - -If there are no tags with the 'Z' spelling category on the line than -'B' implies 'Z'. Similarly if there are no 'C' tags than 'Z' implies -'C'. - -For ease of reading and maintaining the data file, each line is -grouped in a cluster of closely related words. Each cluster is -uniquely identifed by a headword, which is generally the American -spelling of word on the first line of the cluster. Each cluster is -started with a '#' and is followed by the headword with some -additional information after it. For example the cluster for -acknowledgment is: - # acknowledgment (level 35) - A Cv: acknowledgment / Av B C: acknowledgement - A Cv: acknowledgments / Av B C: acknowledgements - A Cv: acknowledgment's / Av B C: acknowledgement's -The "" tag will be explained latter, and "(level 35)" -indictated what level in SCOWL (see http://wordlist.sourceforge.net) -the headword is found in. The levels generaly mean the following: - <= 35: Very common word - <= 70: Can be found in the dictionary - 80: Likely a valid word, can likely be found in an - unabridged dictionary - > 80: May not even be a legal word - -Sometimes the spelling of a word depends on the usage. If so the word -is listed more than once within a cluster, with any usage information -being indicated after a " | ". For example here is part of the cluser -for prize: - A B: prize | reward - A B: prizes | reward - A C: prize / B: prise | otherwise - A C: prizes / B: prises | otherwise -which indicated than the preferred spelling of prize is always with a -"z" when meaning a reward, but otherwise is spelled with a "s" is -British English. In the example above a brief definition of the word -is given, but often no such attempt is made, and the definition simply -consists of a number, for example: - A B: sake | :1 - A C: sake / Av B Cv: saki | :2 - -Sometimes part-of-speach (POS) info is given to help distinguish which -form is used. For example: - A B C: practice / AV Cv: practise | - A Cv: practice / AV B C: practise | -POS info is always given given in the form "" and if a definition -is also given the the POS info is always first. The POS tags used are as -follows: - : Noun - : Verb - : Adjective - : Adverb - -A "(-)" before the definition indicated a rarly used or archaic form -of a word, for example: - A B: bark | :1 - A: bark / Av B: barque | (-) ship - -A "--" indicates a note rather than definition. This is generally -used to indicate that the spelling of the plural form not depend on -the spelling of the root word, for example: - _: cabby / _.: cabbie - _: cabbies | -- plural - -Misc. notes on a particular form of a word are given after a "#" on -the same line. Misc. notes for the cluster are given at the end of -the cluster and are prefixed with "##", for example: - # coloration (level 50) - A B C: coloration / B. Cv: colouration - A B C: colorations / B. Cv: colourations - A B C: coloration's / B. Cv: colouration's - ## OED has coloration as the prefered spelling and discolouration as a - ## variant for British Engl or some reason -In the notes ODE (not to be confused with OED) stands for Oxford -Dictionary of English, "Ox" is used for any Oxford dictionary, and -"M-W" for Merriam-Webster. - -Earlier versions of varcon contained numerous errors. With version -5.0 massive effort has been made to correct many of these errors. -Clusters that have undergone some form of verification (and likely -correction) are marked with "". As of version 5.0, most -clusters with headwords word in common usage (SCOWL level 35 and -below) should now be checked, as well as many others. No effort was -made to check clusters with headwords in SCOWL level 80 and above; -many of those entries are unlikely to be in the dictionary anyway. - -The file variant-also.tab contains additional mappings between various -spellings of a word which are not yet in varcon.txt. No attempt is -made to distinguish the primary form of a word. The file -variant-infl.tab is like variant-also.tab except that it is created -automatically from the AGID inflection database. The file -variant-wroot.tab is like variant-infl.tab except that it also -included the root form of the word. - -The file voc.tab is similar to varcon.txt but converts between -vocabulary instead of spelling. Unlike varcon.tab it is a simple tab -seperated file with the fields correspoding to the American, British, -and Canadian words. If more than one word if often used to describe -the same thing the words are separated with commas. The last column -contains additional notes on when the word is used. Unlike varcon.txt -it is generally not suitable for automatic conversion. - -The "make-variant" Perl script will combine varcon.txt, -variant-also.tab, and variant-infl.tab into one huge mapping and will -output the result to "variant.tab". If the "no-infl" option is given -than variant-infl.tab will not be included. - -The "split" script will split out the information in varcon.txt into -several word lists named as follows: - [-v][-uncommon].lst -where is one of: american, british, british_z, canadian, -common, or other. "common" is used for words which appear in -varcon.txt, yet are used in all versions of english, such as "prize", -and "other" is used for the "_" spelling category. The mapping from -the variant indicators in varcon.txt to the numberic variant level is -as follows: - v => 0 - V => 1 - - => 2 -"-uncommon" is used for forms marked with "(-)" as already described. - -The "translate" Perl script will translate a text file from one -spelling to another. Its usage is: - -translate [] - is any of - -?,-h,--help this screen - -m,--mark mark words where the translation is questionable - -i,--include include words where the translation is questionable - is the file name of the translation array, - defaults to "abbc.tab". - and are one of: american, british, british_z, or canadian. -british-ise and british-ize can also be used. - -Text is read in from standard input and is outputted to standard out. -Words are marked with a '?' before and after the questionable word -when the option is enabled. - -The file varcon.pm contains some library routines for parsing -varcon.txt and is used by many of the scripts above. - -If you discover any errors in these mappings or have suggestions for -additions please file a bug report, which you can find instructions -for at http://wordlist.sourceforge.net/, or alternativly email me -directly at kevina@gnu.org, but I will likely tell you to file a bug -report so that I don't forget about it. - -SOURCE: - -These mappings were compiled from numerous sources. - -The abc.tab was originally created from the American and British word -lists found in the Ispell distribution and the Canadian word list -created by Garst R. Reese : - - What I have discovered is that Canadian is a modification of British. - Canadians use ize ization, izing izable like Americans, and gram instead - of gramme. The one exception I found was practise. It does not go to - practize. Otherwise they use British spelling. So, what I am currently - checking books with is a an edited version of British, where I have - changed all occurrences of ise to ize, isab to izab, isation to ization, - ising to izing, and gramme to gram except I allow programme, which is - sometimes proper unless you are talking about a computer program. I did - bunches of greps to be sure these substitutions would work as expected. - -Many other words have been added to abc.tab which were not in the -original Ispell word lists. - -Many different web sources were consuled when crating the tables. They -include: - - The American-British British-American Dictionary - http://www.peak.org/~jeremy/dictionary/dictionary.html - American and British Spelling Differences - http://www.peak.org/~jeremy/dictionary/spellcat.html - Dave (VE7CNV)'s Truly Canadian Dictionary of Canadian Spelling - http://www.luther.bc.ca/~dave7cnv/cdnspelling/cdnspelling.html - Canadian Spelling Convention - http://imej.wfu.edu/articles/1999/1/02/demo/tutorial/canas.html - Cornerstone's Canadian English Page - http://www.web.net/cornerstone/cdneng.htm - Inter-Play Translation: British/Canadian/American Spelling - http://www.inter-play.com/translation/spel-ukus.htm - Inter-Play Translation: British/Canadian/American Vocabulary - http://www.inter-play.com/translation/voc-ukus.htm - -As well as several online dicionaries: - - Marriam-Webster: http://www.m-w.com/ - American Heritage: http://www.bartleby.com/61/ - Cambridge (ESL): http://dictionary.cambridge.org/ - -In version 5.0 a massive effort to correct the numerous errors in -VarCon was done. The primary sources used for verification where: - - Marriam-Webster: http://www.m-w.com/ - Free version of Oxford Dictionaries Online: - http://www.oxforddictionaries.com/ - Oxford dictionaries available via Oxford Reference Online - (subscription service, http://www.oxfordreference.com/): - The New Oxford American Dictionary (2nd edition, 2006) - and sometimes: The Oxford American Dictionary of Current English (2002) - The Concise Oxford English Dictionary (11th edition revised, 2008) - and sometimes: The Oxford Dictionary of English (2nd edition revised, 2005) - The Canadian Oxford Dictionary (2004) - -I also used Tysto UK vs US spelling list available at: - http://www.tysto.com/articles05/q1/20050324uk-us.shtml -to make sure I didn't leave out any information in VarCon, however any -additions from his lists where verified using the dictionaries -mentioned above as his lists contained numerous errors (such as -including archaic spellings of words) - -I also made indirect use of Luke's Canadian, British and American -Spelling page available at: - http://www.lukemastin.com/testing/spelling/cgi-bin/database.cgi?database=spelling -but only to perform some initial verification, in the end I rechecked -his data using the dictionaries above. (However, his data is, by far, -more accurate than Tysto's) - -CHANGELOG: - -From Revision 5.0 to Revision 5.1 (January 6, 2010) - - - Corrected numerous errors after running various forms - of verification on varcon.txt. - - - Reordered the clusters in varcon.txt so that they are - mostly in alphabetic order based on the headword. - -From Revision 4.1 to Revision 5.0 (December 27, 2010) - - - Completely new format for the main table which, in addition to - providing the preferred spelling of a word for various forms of - English, also records variant and other information. To reflect - this change, the name of the file was renamed from abbc.tab to - varcon.txt. - - - Massive effort to verify the variant information against - authoritative sources (mainly Oxford dictionaries). Most entries - for common words (SCOWL level 35 and below) have been checked - against at least a British and Canadian dictionary. - - - Added variant information for numerous other words, even when - there is no difference between the various forms on English. - - - Other changes corresponding to the new format. - -From Revision 4 to Revision 4.1 (August 10, 2004) - - - Fixed various errors in abbc.tab - - - Removed clause 4 from the Ispell copyright with permission of Geoff - Kuenning. - -From Revision 3 to Revision 4 (August 7, 2004) - - - Added a column to "abc.tab" for the British "ize" spelling and - renamed the file to abbc.tab. - - Added verb forms of prize/prise to abbc.tab, removed from - variant-also.tab - -From Revision 2 to Revision 3 (January 2, 2003) - - - Added an option for not including variant-infl.tab for the - make-variant perl script - - Added the file variant-wroot.tab - - Added a few entries given to me by Francis Bond and Edward Betts - -From Revision 1 to Revision 2 (January 27, 2001) - - - Removed all "B" markers because I could not find any evidence for - them - - Corrected a few Canadian entries, especially those with the "B" - markers - - Added some more entries by trying fixed changes (such as ize to - ise) to words in SCOWL and hand-checking over the ones with semi-common - words in them. - - Added variant-infl.tab - -COPYRIGHT: - -Copyright 2000-2010 by Kevin Atkinson - -Permission to use, copy, modify, distribute and sell this array, the -associated software, and its documentation for any purpose is hereby -granted without fee, provided that the above copyright notice appears -in all copies and that both that copyright notice and this permission -notice appear in supporting documentation. Kevin Atkinson makes no -representations about the suitability of this array for any -purpose. It is provided "as is" without express or implied warranty. - -Since the original words lists come from the Ispell distribution: - -Copyright 1993, Geoff Kuenning, Granada Hills, CA -All rights reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions -are met: - -1. Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. -2. Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. -3. All modifications to the source code must be clearly marked as - such. Binary redistributions based on modified source code - must be clearly marked as modified versions in the documentation - and/or other materials provided with the distribution. -(clause 4 removed with permission from Geoff Kuenning) -5. The name of Geoff Kuenning may not be used to endorse or promote - products derived from this software without specific prior - written permission. - -THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND -ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS -OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) -HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT -LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY -OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF -SUCH DAMAGE.