--- /dev/null
+Variant Conversion Info (VarCon)
+
+Revision 5.1 (SVN Revision 161)
+
+January 6, 2011
+
+Copyright 2000-2011 by Kevin Atkinson (kevina@gnu.org)
+
+This package contains information to convert between American,
+British, and Canadian spellings and vocabulary as well and other
+variant information.
+
+The latest version can be found at http://wordlist.sourceforge.net/.
+
+The main data file is varcon.txt. It contains information on the
+preferred American, British, and Canadian spelling of a word as well
+as other variant information.
+
+Each line contains a mapping between the various spellings of a word.
+Words are tageed to indicate where the spelling is used, and each
+word/tag pair is separated with a " / ". For example in the line:
+ A Cv: acknowledgment / Av B C: acknowledgement
+"acknowledgment" and "acknowledgement" are two spellings of the same
+word and "A", "Cv", "B", etc are the tags. Tags are seperated by
+spaces and the group of tags is seperated from the word with a ": ".
+Here, "acknowledgment" is the preferred American spelling (as
+indicated by the "A") of the word, and "acknowledgement" is the
+preferred Canadian and British spelling ("B" and "C"). However the
+American spelling is sometimes used in Canada (as indicated by "Cv",
+where the lowercase "v" indicated a variant form) and the British
+spelling is sometimes used in America (as indicated the the "Av").
+
+More generally each tag consists of a spelling category (for example
+"A") followed possible by a variant indicator. The spelling
+categories are as follows:
+ A: American
+ B: British "ise" spelling
+ Z: British "ize" spelling or OED prefered Spelling
+ C: Canadian
+ _: Other (Variant info based on American dictionaries, never used
+ with any of the above).
+and the variants tags are as follows:
+ .: equal
+ v: variant
+ V: seldom used variant
+ -: possible variant, should generally not used
+ x: improper variant (should not use)
+
+The "." or equal variant tags are reserved for special cases when
+there is little agreement between dictionaries or when I think the
+dictionary is wrong. The "v" indicator is used for most words marked
+as variants in the dictionary. However, some variants will be demoted
+to a "V". For example, if the variant is marked as "also" by
+Merriam-Webster, or also if only some dictionaries acknowledge the
+existence the variant. "-" is used when the variant is generally not
+listed is the dictionary but I could find some evidence of it use, or
+when it is it marked as as a archaic spelling for the word. The "x"
+is used when the spelling is almost generally considered a
+misspelling, and is only included for completeness.
+
+If there are no tags with the 'Z' spelling category on the line than
+'B' implies 'Z'. Similarly if there are no 'C' tags than 'Z' implies
+'C'.
+
+For ease of reading and maintaining the data file, each line is
+grouped in a cluster of closely related words. Each cluster is
+uniquely identifed by a headword, which is generally the American
+spelling of word on the first line of the cluster. Each cluster is
+started with a '#' and is followed by the headword with some
+additional information after it. For example the cluster for
+acknowledgment is:
+ # acknowledgment <verified> (level 35)
+ A Cv: acknowledgment / Av B C: acknowledgement
+ A Cv: acknowledgments / Av B C: acknowledgements
+ A Cv: acknowledgment's / Av B C: acknowledgement's
+The "<verified>" tag will be explained latter, and "(level 35)"
+indictated what level in SCOWL (see http://wordlist.sourceforge.net)
+the headword is found in. The levels generaly mean the following:
+ <= 35: Very common word
+ <= 70: Can be found in the dictionary
+ 80: Likely a valid word, can likely be found in an
+ unabridged dictionary
+ > 80: May not even be a legal word
+
+Sometimes the spelling of a word depends on the usage. If so the word
+is listed more than once within a cluster, with any usage information
+being indicated after a " | ". For example here is part of the cluser
+for prize:
+ A B: prize | reward
+ A B: prizes | reward
+ A C: prize / B: prise | otherwise
+ A C: prizes / B: prises | otherwise
+which indicated than the preferred spelling of prize is always with a
+"z" when meaning a reward, but otherwise is spelled with a "s" is
+British English. In the example above a brief definition of the word
+is given, but often no such attempt is made, and the definition simply
+consists of a number, for example:
+ A B: sake | :1
+ A C: sake / Av B Cv: saki | :2
+
+Sometimes part-of-speach (POS) info is given to help distinguish which
+form is used. For example:
+ A B C: practice / AV Cv: practise | <N>
+ A Cv: practice / AV B C: practise | <V>
+POS info is always given given in the form "<POS>" and if a definition
+is also given the the POS info is always first. The POS tags used are as
+follows:
+ <N>: Noun
+ <V>: Verb
+ <Adj>: Adjective
+ <Adv>: Adverb
+
+A "(-)" before the definition indicated a rarly used or archaic form
+of a word, for example:
+ A B: bark | :1
+ A: bark / Av B: barque | (-) ship
+
+A "--" indicates a note rather than definition. This is generally
+used to indicate that the spelling of the plural form not depend on
+the spelling of the root word, for example:
+ _: cabby / _.: cabbie
+ _: cabbies | -- plural
+
+Misc. notes on a particular form of a word are given after a "#" on
+the same line. Misc. notes for the cluster are given at the end of
+the cluster and are prefixed with "##", for example:
+ # coloration <verified> (level 50)
+ A B C: coloration / B. Cv: colouration
+ A B C: colorations / B. Cv: colourations
+ A B C: coloration's / B. Cv: colouration's
+ ## OED has coloration as the prefered spelling and discolouration as a
+ ## variant for British Engl or some reason
+In the notes ODE (not to be confused with OED) stands for Oxford
+Dictionary of English, "Ox" is used for any Oxford dictionary, and
+"M-W" for Merriam-Webster.
+
+Earlier versions of varcon contained numerous errors. With version
+5.0 massive effort has been made to correct many of these errors.
+Clusters that have undergone some form of verification (and likely
+correction) are marked with "<verified>". As of version 5.0, most
+clusters with headwords word in common usage (SCOWL level 35 and
+below) should now be checked, as well as many others. No effort was
+made to check clusters with headwords in SCOWL level 80 and above;
+many of those entries are unlikely to be in the dictionary anyway.
+
+The file variant-also.tab contains additional mappings between various
+spellings of a word which are not yet in varcon.txt. No attempt is
+made to distinguish the primary form of a word. The file
+variant-infl.tab is like variant-also.tab except that it is created
+automatically from the AGID inflection database. The file
+variant-wroot.tab is like variant-infl.tab except that it also
+included the root form of the word.
+
+The file voc.tab is similar to varcon.txt but converts between
+vocabulary instead of spelling. Unlike varcon.tab it is a simple tab
+seperated file with the fields correspoding to the American, British,
+and Canadian words. If more than one word if often used to describe
+the same thing the words are separated with commas. The last column
+contains additional notes on when the word is used. Unlike varcon.txt
+it is generally not suitable for automatic conversion.
+
+The "make-variant" Perl script will combine varcon.txt,
+variant-also.tab, and variant-infl.tab into one huge mapping and will
+output the result to "variant.tab". If the "no-infl" option is given
+than variant-infl.tab will not be included.
+
+The "split" script will split out the information in varcon.txt into
+several word lists named as follows:
+ <spelling>[-v<variant level>][-uncommon].lst
+where <spelling> is one of: american, british, british_z, canadian,
+common, or other. "common" is used for words which appear in
+varcon.txt, yet are used in all versions of english, such as "prize",
+and "other" is used for the "_" spelling category. The mapping from
+the variant indicators in varcon.txt to the numberic variant level is
+as follows:
+ v => 0
+ V => 1
+ - => 2
+"-uncommon" is used for forms marked with "(-)" as already described.
+
+The "translate" Perl script will translate a text file from one
+spelling to another. Its usage is:
+
+translate <options> [<translation array>] <from> <to>
+<options> is any of
+ -?,-h,--help this screen
+ -m,--mark mark words where the translation is questionable
+ -i,--include include words where the translation is questionable
+<translation array> is the file name of the translation array,
+ defaults to "abbc.tab".
+<from> and <to> are one of: american, british, british_z, or canadian.
+british-ise and british-ize can also be used.
+
+Text is read in from standard input and is outputted to standard out.
+Words are marked with a '?' before and after the questionable word
+when the option is enabled.
+
+The file varcon.pm contains some library routines for parsing
+varcon.txt and is used by many of the scripts above.
+
+If you discover any errors in these mappings or have suggestions for
+additions please file a bug report, which you can find instructions
+for at http://wordlist.sourceforge.net/, or alternativly email me
+directly at kevina@gnu.org, but I will likely tell you to file a bug
+report so that I don't forget about it.
+
+SOURCE:
+
+These mappings were compiled from numerous sources.
+
+The abc.tab was originally created from the American and British word
+lists found in the Ispell distribution and the Canadian word list
+created by Garst R. Reese <reese@isn.net>:
+
+ What I have discovered is that Canadian is a modification of British.
+ Canadians use ize ization, izing izable like Americans, and gram instead
+ of gramme. The one exception I found was practise. It does not go to
+ practize. Otherwise they use British spelling. So, what I am currently
+ checking books with is a an edited version of British, where I have
+ changed all occurrences of ise to ize, isab to izab, isation to ization,
+ ising to izing, and gramme to gram except I allow programme, which is
+ sometimes proper unless you are talking about a computer program. I did
+ bunches of greps to be sure these substitutions would work as expected.
+
+Many other words have been added to abc.tab which were not in the
+original Ispell word lists.
+
+Many different web sources were consuled when crating the tables. They
+include:
+
+ The American-British British-American Dictionary
+ http://www.peak.org/~jeremy/dictionary/dictionary.html
+ American and British Spelling Differences
+ http://www.peak.org/~jeremy/dictionary/spellcat.html
+ Dave (VE7CNV)'s Truly Canadian Dictionary of Canadian Spelling
+ http://www.luther.bc.ca/~dave7cnv/cdnspelling/cdnspelling.html
+ Canadian Spelling Convention
+ http://imej.wfu.edu/articles/1999/1/02/demo/tutorial/canas.html
+ Cornerstone's Canadian English Page
+ http://www.web.net/cornerstone/cdneng.htm
+ Inter-Play Translation: British/Canadian/American Spelling
+ http://www.inter-play.com/translation/spel-ukus.htm
+ Inter-Play Translation: British/Canadian/American Vocabulary
+ http://www.inter-play.com/translation/voc-ukus.htm
+
+As well as several online dicionaries:
+
+ Marriam-Webster: http://www.m-w.com/
+ American Heritage: http://www.bartleby.com/61/
+ Cambridge (ESL): http://dictionary.cambridge.org/
+
+In version 5.0 a massive effort to correct the numerous errors in
+VarCon was done. The primary sources used for verification where:
+
+ Marriam-Webster: http://www.m-w.com/
+ Free version of Oxford Dictionaries Online:
+ http://www.oxforddictionaries.com/
+ Oxford dictionaries available via Oxford Reference Online
+ (subscription service, http://www.oxfordreference.com/):
+ The New Oxford American Dictionary (2nd edition, 2006)
+ and sometimes: The Oxford American Dictionary of Current English (2002)
+ The Concise Oxford English Dictionary (11th edition revised, 2008)
+ and sometimes: The Oxford Dictionary of English (2nd edition revised, 2005)
+ The Canadian Oxford Dictionary (2004)
+
+I also used Tysto UK vs US spelling list available at:
+ http://www.tysto.com/articles05/q1/20050324uk-us.shtml
+to make sure I didn't leave out any information in VarCon, however any
+additions from his lists where verified using the dictionaries
+mentioned above as his lists contained numerous errors (such as
+including archaic spellings of words)
+
+I also made indirect use of Luke's Canadian, British and American
+Spelling page available at:
+ http://www.lukemastin.com/testing/spelling/cgi-bin/database.cgi?database=spelling
+but only to perform some initial verification, in the end I rechecked
+his data using the dictionaries above. (However, his data is, by far,
+more accurate than Tysto's)
+
+CHANGELOG:
+
+From Revision 5.0 to Revision 5.1 (January 6, 2010)
+
+ - Corrected numerous errors after running various forms
+ of verification on varcon.txt.
+
+ - Reordered the clusters in varcon.txt so that they are
+ mostly in alphabetic order based on the headword.
+
+From Revision 4.1 to Revision 5.0 (December 27, 2010)
+
+ - Completely new format for the main table which, in addition to
+ providing the preferred spelling of a word for various forms of
+ English, also records variant and other information. To reflect
+ this change, the name of the file was renamed from abbc.tab to
+ varcon.txt.
+
+ - Massive effort to verify the variant information against
+ authoritative sources (mainly Oxford dictionaries). Most entries
+ for common words (SCOWL level 35 and below) have been checked
+ against at least a British and Canadian dictionary.
+
+ - Added variant information for numerous other words, even when
+ there is no difference between the various forms on English.
+
+ - Other changes corresponding to the new format.
+
+From Revision 4 to Revision 4.1 (August 10, 2004)
+
+ - Fixed various errors in abbc.tab
+
+ - Removed clause 4 from the Ispell copyright with permission of Geoff
+ Kuenning.
+
+From Revision 3 to Revision 4 (August 7, 2004)
+
+ - Added a column to "abc.tab" for the British "ize" spelling and
+ renamed the file to abbc.tab.
+ - Added verb forms of prize/prise to abbc.tab, removed from
+ variant-also.tab
+
+From Revision 2 to Revision 3 (January 2, 2003)
+
+ - Added an option for not including variant-infl.tab for the
+ make-variant perl script
+ - Added the file variant-wroot.tab
+ - Added a few entries given to me by Francis Bond and Edward Betts
+
+From Revision 1 to Revision 2 (January 27, 2001)
+
+ - Removed all "B" markers because I could not find any evidence for
+ them
+ - Corrected a few Canadian entries, especially those with the "B"
+ markers
+ - Added some more entries by trying fixed changes (such as ize to
+ ise) to words in SCOWL and hand-checking over the ones with semi-common
+ words in them.
+ - Added variant-infl.tab
+
+COPYRIGHT:
+
+Copyright 2000-2010 by Kevin Atkinson
+
+Permission to use, copy, modify, distribute and sell this array, the
+associated software, and its documentation for any purpose is hereby
+granted without fee, provided that the above copyright notice appears
+in all copies and that both that copyright notice and this permission
+notice appear in supporting documentation. Kevin Atkinson makes no
+representations about the suitability of this array for any
+purpose. It is provided "as is" without express or implied warranty.
+
+Since the original words lists come from the Ispell distribution:
+
+Copyright 1993, Geoff Kuenning, Granada Hills, CA
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+1. Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+3. All modifications to the source code must be clearly marked as
+ such. Binary redistributions based on modified source code
+ must be clearly marked as modified versions in the documentation
+ and/or other materials provided with the distribution.
+(clause 4 removed with permission from Geoff Kuenning)
+5. The name of Geoff Kuenning may not be used to endorse or promote
+ products derived from this software without specific prior
+ written permission.
+
+THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGE.