-Variant Conversion Info (VARCON)
-
-Revision 4.1
-
-August 10, 2004
-
-Copyright 2000-2004 by Kevin Atkinson (kevina@gnu.org)
-
-This package contains tables to convert between American, British, and
-Canadian spellings and vocabulary as well as well as a table listing the
-equivalent forms of other variants.
-
-The latest version can be found at http://wordlist.sourceforge.net/.
-
-The abbc.tab contains mappings between American, British with "ise"
-spelling, British with "ize" spelling, and Canadian spellings. The
-fields are separated by a tab character and have the Unix EOL
-character. The first four columns are the spellings respectively.
-The last column is used to mark words where the American or British
-spelling is also used in the British or American spelling but only
-when the word has a certain meaning. American words that are also
-used in British / Canadian spellings are marked with a "A" while
-British words that are also used in American / Canadian spellings are
-marked with a "B".
-
-The file voc.tab is like abbc.tab except that it converts between
-vocabulary instead of spelling. If more than one word if often uses to
-describe the same thing the words are separated with commas. The last
-column contains additional notes on when the word is used. Unlike
-abbc.tab it is generally not suitable for automatic conversion.
+Variant Conversion Info (VarCon)
+
+Version 2017.08.24
+
+Copyright 2000-2016 by Kevin Atkinson (kevina@gnu.org) and Benjamin
+Titze (btitze@protonmail.ch).
+
+This package contains information to convert between American,
+British, Canadian, and Australian spellings and vocabulary as well as
+other variant information.
+
+The latest version can be found at http://wordlist.aspell.net/.
+
+The main data file is varcon.txt. It contains information on the
+preferred American, British, and Canadian spelling of a word as well
+as other variant information.
+
+Each line contains a mapping between the various spellings of a word.
+Words are tagged to indicate where the spelling is used, and each
+word/tag pair is separated with a " / ". For example in the line:
+ A Cv: acknowledgment / Av B C: acknowledgement
+"acknowledgment" and "acknowledgement" are two spellings of the same
+word and "A", "Cv", "B", etc are the tags. Tags are separated by
+spaces and the group of tags is separated from the word with a ": ".
+Here, "acknowledgment" is the preferred American spelling (as
+indicated by the "A") of the word, and "acknowledgement" is the
+preferred Canadian and British spelling ("B" and "C"). However the
+American spelling is sometimes used in Canada (as indicated by "Cv",
+where the lowercase "v" indicated a variant form) and the British
+spelling is sometimes used in America (as indicated the "Av").
+
+More generally each tag consists of a spelling category (for example
+"A") followed possible by a variant indicator. The spelling
+categories are as follows:
+ A: American
+ B: British "ise" spelling
+ Z: British "ize" spelling or OED preferred Spelling
+ C: Canadian
+ D: Australian
+ _: Other (Variant info based on American dictionaries, never used
+ with any of the above).
+and the variants tags are as follows:
+ .: equal
+ v: variant
+ V: seldom used variant
+ -: possible variant, should generally not used
+ x: improper variant (should not use)
+
+The "." or equal variant tags are reserved for special cases when
+there is little agreement between dictionaries or when I think the
+dictionary is wrong. The "v" indicator is used for most words marked
+as variants in the dictionary. However, some variants will be demoted
+to a "V". For example, if the variant is marked as "also" by
+Merriam-Webster, or also if only some dictionaries acknowledge the
+existence the variant. "-" is used when the variant is generally not
+listed is the dictionary but I could find some evidence of its use, or
+when it is marked as an archaic spelling for the word. The "x"
+is used when the spelling is almost generally considered a
+misspelling, and is only included for completeness.
+
+For Australian English "v" was used for variants that are widely used,
+but not preferred, and "V" for all "-or" (vs. "-our") variants and
+variants considered "chiefly US".
+
+If there are no tags with the 'Z' spelling category on the line then
+'B' implies 'Z'. Similarly if there are no 'C' tags then 'Z' implies
+'C'. If there are no 'D' tags then 'B' implies 'D'.
+
+For ease of reading and maintaining the data file, each line is
+grouped in a cluster of closely related words. Each cluster is
+uniquely identified by a headword, which is generally the American
+spelling of word on the first line of the cluster. Each cluster is
+started with a '#' and is followed by the headword with some
+additional information after it. For example the cluster for
+acknowledgment is:
+ # acknowledgment <verified> (level 35)
+ A Cv: acknowledgment / Av B C: acknowledgement
+ A Cv: acknowledgments / Av B C: acknowledgements
+ A Cv: acknowledgment's / Av B C: acknowledgement's
+The "<verified>" tag will be explained latter, and "(level 35)"
+indicate what level in SCOWL (see http://wordlist.sourceforge.net)
+the headword is found in. The levels generally mean the following:
+ <= 35: Very common word
+ <= 70: Can be found in the dictionary
+ 80: Likely a valid word, can likely be found in an
+ unabridged dictionary
+ > 80: May not even be a legal word
+
+Sometimes the spelling of a word depends on the usage. If so the word
+is listed more than once within a cluster, with any usage information
+being indicated after a " | ". For example here is part of the cluster
+for prize:
+ A B: prize | reward
+ A B: prizes | reward
+ A C: prize / B: prise | otherwise
+ A C: prizes / B: prises | otherwise
+which indicated than the preferred spelling of prize is always with a
+"z" when meaning a reward, but otherwise is spelled with a "s" is
+British English. In the example above a brief definition of the word
+is given, but often no such attempt is made, and the definition simply
+consists of a number, for example:
+ A B: sake | :1
+ A C: sake / Av B Cv: saki | :2
+
+Sometimes part-of-speech (POS) info is given to help distinguish which
+form is used. For example:
+ A B C: practice / AV Cv: practise | <N>
+ A Cv: practice / AV B C: practise | <V>
+POS info is always given in the form "<POS>" and if a definition
+is also given the POS info is always first. The POS tags used are as
+follows:
+ <N>: Noun
+ <V>: Verb
+ <Adj>: Adjective
+ <Adv>: Adverb
+
+A "(-)" before the definition indicated a rarely used or archaic form
+of a word, for example:
+ A B: bark | :1
+ A: bark / Av B: barque | (-) ship
+
+A "--" indicates a note rather than definition. This is generally
+used to indicate that the spelling of the plural form not depend on
+the spelling of the root word, for example:
+ _: cabby / _.: cabbie
+ _: cabbies | -- plural
+
+Misc. notes on a particular form of a word are given after a "#" on
+the same line. Misc. notes for the cluster are given at the end of
+the cluster and are prefixed with "##", for example:
+ # coloration <verified> (level 50)
+ A B C: coloration / B. Cv: colouration
+ A B C: colorations / B. Cv: colourations
+ A B C: coloration's / B. Cv: colouration's
+ ## OED has coloration as the preferred spelling and discolouration as a
+ ## variant for British Engl or some reason
+In the notes ODE (not to be confused with OED) stands for Oxford
+Dictionary of English, "Ox" is used for any Oxford dictionary, and
+"M-W" for Merriam-Webster.
+
+Earlier versions of varcon contained numerous errors. With version
+5.0 massive effort has been made to correct many of these errors.
+Clusters that have undergone some form of verification (and likely
+correction) are marked with "<verified>". As of version 5.0, most
+clusters with headwords word in common usage (SCOWL level 35 and
+below) should now be checked, as well as many others. No effort was
+made to check clusters with headwords in SCOWL level 80 and above;
+many of those entries are unlikely to be in the dictionary anyway.
The file variant-also.tab contains additional mappings between various
-spellings of a word which are not necessarily region dependent. Only
-mappings that are not listed in abbc.tab are included in this mapping.
-No attempt is made to distinguish the primary form of a word. The
-file variant-infl.tab is like variant-also.tab except that it is
-created automatically from the AGID inflection database. The file
+spellings of a word which are not yet in varcon.txt. No attempt is
+made to distinguish the primary form of a word. The file
+variant-infl.tab is like variant-also.tab except that it is created
+automatically from the AGID inflection database. The file
variant-wroot.tab is like variant-infl.tab except that it also
included the root form of the word.
-Both the translation array and variant table includes a lot of strange
-affixations of words which I have no intention of removing as some
-people might find them useful.
-
-The "make-variant" Perl script will combine abbc.tab, variant-also.tab,
-and variant-infl.tab into one huge mapping and will output the result
-to "variant.tab". If the "no-infl" option is given than
-variant-infl.tab will not be included.
-
-The "split" script will create 5 words lists from abbc.tab:
-american.lst, british.lst, british_z.lst, canadian.lst, and
-common.lst. The common.lst file contains words that were marked in
-the last column as explained above and the other four contain all the
-possible words that might be used by that particular country, included
-some of the words marked in the last column.
+The file voc.tab is similar to varcon.txt but converts between
+vocabulary instead of spelling. Unlike varcon.tab it is a simple tab
+separated file with the fields corresponding to the American, British,
+and Canadian words. If more than one word if often used to describe
+the same thing the words are separated with commas. The last column
+contains additional notes on when the word is used. Unlike varcon.txt
+it is generally not suitable for automatic conversion.
+
+The "make-variant" Perl script will combine varcon.txt,
+variant-also.tab, and variant-infl.tab into one huge mapping and will
+output the result to "variant.tab". If the "no-infl" option is given
+than variant-infl.tab will not be included.
+
+The "split" script will split out the information in varcon.txt into
+several word lists named as follows:
+ <spelling>[-v<variant level>][-uncommon].lst
+where <spelling> is one of: american, british, british_z, canadian,
+common, or other. "common" is used for words which appear in
+varcon.txt, yet are used in all versions of english, such as "prize",
+and "other" is used for the "_" spelling category. The mapping from
+the variant indicators in varcon.txt to the numeric variant level is
+as follows:
+ v => 0
+ V => 1
+ - => 2
+"-uncommon" is used for forms marked with "(-)" as already described.
The "translate" Perl script will translate a text file from one
spelling to another. Its usage is:
Words are marked with a '?' before and after the questionable word
when the option is enabled.
-If you discover any errors in these mappings, besides the strange
-affixations, or have suggestions for additions be sure and let me know
-at kevina@gnu.org.
+The file varcon.pm contains some library routines for parsing
+varcon.txt and is used by many of the scripts above.
+
+If you discover any errors in these mappings or have suggestions for
+additions please file a bug report at
+https://github.com/kevina/wordlist/issues, or alternatively email me
+directly at kevina@gnu.org, but I will likely tell you to file a bug
+report so that I don't forget about it.
SOURCE:
Many other words have been added to abc.tab which were not in the
original Ispell word lists.
-Many different web sources were consuled when crating the tables. They
+Many different web sources were consulted when crating the tables. They
include:
The American-British British-American Dictionary
Inter-Play Translation: British/Canadian/American Vocabulary
http://www.inter-play.com/translation/voc-ukus.htm
-As well as several online dicionaries:
+As well as several online dictionaries:
Marriam-Webster: http://www.m-w.com/
American Heritage: http://www.bartleby.com/61/
Cambridge (ESL): http://dictionary.cambridge.org/
+In version 5.0 a massive effort to correct the numerous errors in
+VarCon was done. The primary sources used for verification were:
+
+ Marriam-Webster: http://www.m-w.com/
+ Free version of Oxford Dictionaries Online:
+ http://www.oxforddictionaries.com/
+ Oxford dictionaries available via Oxford Reference Online
+ (subscription service, http://www.oxfordreference.com/):
+ The New Oxford American Dictionary (2nd edition, 2006)
+ and sometimes: The Oxford American Dictionary of Current English (2002)
+ The Concise Oxford English Dictionary (11th edition revised, 2008)
+ and sometimes: The Oxford Dictionary of English (2nd edition revised, 2005)
+ The Canadian Oxford Dictionary (2004)
+
+I also used Tysto UK vs US spelling list available at:
+ http://www.tysto.com/articles05/q1/20050324uk-us.shtml
+to make sure I didn't leave out any information in VarCon, however any
+additions from his lists where verified using the dictionaries
+mentioned above as his lists contained numerous errors (such as
+including archaic spellings of words)
+
+I also made indirect use of Luke's Canadian, British and American
+Spelling page available at:
+ http://www.lukemastin.com/testing/spelling/cgi-bin/database.cgi?database=spelling
+but only to perform some initial verification, in the end I rechecked
+his data using the dictionaries above. (However, his data is, by far,
+more accurate than Tysto's)
+
+In Version 2016.11.20 Benjamin Titze added support for Australian English.
+The primary sources for this addition were:
+
+ The Macquarie Dictionary: https://www.macquariedictionary.com.au/
+ Style Manual: For Authors, Editors and Printers, 6th Edition. DCITA.
+ University of Technology Sydney Publications Style Guide:
+ http://www.gsu.uts.edu.au/publications/styleguide/spelling.html
+ Style Manual, Department of Treasury and Finance, Tasmania:
+ http://conference.tasa.org.au/wp-content/uploads/2015/03/Style-Manual.pdf
+ Editor Australia - Style Guide:
+ http://www.editoraustralia.com/styleguide_spelling.html
+ Webster in Australia (history of "our"/"or" spelling variants):
+ http://blogs.usyd.edu.au/elac/2008/01/webster_in_australia.html
+
+
CHANGELOG:
+From 2016.11.20 to 2017.08.24
+
+ - Typo fixes thanks to Jakub Wilk
+
+From 2016.06.26 to 2016.11.20
+
+ - New Australian spelling category thanks to the work of Benjamin
+ Titze.
+
+ - Various other fixes.
+
+From 2016.01.19 to 2016.06.26
+
+ - Fix plural of "bus".
+
+From 2015.08.24 to 2016.01.19
+
+ - Undo the effects of PERL_UNICODE in the translate script.
+
+ - Other minor fixes and new entries.
+
+From 2014.02.15 to 2015.08.24 (Aug 24, 2015)
+
+ - Added entry for Koran/Koranic.
+
+ - Tweaked "adviser" cluster.
+
+ - Fix formatting problems.
+
+From 2015.01.28 to 2014.02.15 (February 15, 2015)
+
+ - Various new entries
+
+From 2014.11.17 to 2015.01.28 (January 28, 2015)
+
+ - Minor adjustments to a few entries (analytic, amid)
+
+ - Added entry for shareable
+
+ - Remove a junk entry (ted/taed).
+
+From 2014.08.11 to 2014.11.17 (November 17, 2014)
+
+ - Fix typos in README
+
+ - Enhancement to VarCon translate script. It will now, by default,
+ filter clusters with a SCOWL level > 80. This behavior can be
+ controlled with the new "--thresh" option.
+
+ - Remove a few junk entries.
+
+From Revision 5.1 to Version 2014.08.11 (August 8, 2014)
+
+ - Various corrections. Most of them minor. Two notable exceptions:
+
+ - Added an entry for furor as the correct British spelling is furore
+
+ - Fixed racket entries as Canadians still use racquet even
+ though it is a British English (at least according to the
+ Oxford dictionaries)
+
+ - Other minor changes.
+
+From Revision 5.0 to Revision 5.1 (January 6, 2010)
+
+ - Corrected numerous errors after running various forms
+ of verification on varcon.txt.
+
+ - Reordered the clusters in varcon.txt so that they are
+ mostly in alphabetic order based on the headword.
+
+From Revision 4.1 to Revision 5.0 (December 27, 2010)
+
+ - Completely new format for the main table which, in addition to
+ providing the preferred spelling of a word for various forms of
+ English, also records variant and other information. To reflect
+ this change, the name of the file was renamed from abbc.tab to
+ varcon.txt.
+
+ - Massive effort to verify the variant information against
+ authoritative sources (mainly Oxford dictionaries). Most entries
+ for common words (SCOWL level 35 and below) have been checked
+ against at least a British and Canadian dictionary.
+
+ - Added variant information for numerous other words, even when
+ there is no difference between the various forms on English.
+
+ - Other changes corresponding to the new format.
+
From Revision 4 to Revision 4.1 (August 10, 2004)
- - Fixed various errors ib abbc.tab
+ - Fixed various errors in abbc.tab
- Removed clause 4 from the Ispell copyright with permission of Geoff
Kuenning.
COPYRIGHT:
-Copyright 2000-2004 by Kevin Atkinson
+Copyright 2000-2016 by Kevin Atkinson
Permission to use, copy, modify, distribute and sell this array, the
associated software, and its documentation for any purpose is hereby
representations about the suitability of this array for any
purpose. It is provided "as is" without express or implied warranty.
+Copyright 2016 by Benjamin Titze
+
+Permission to use, copy, modify, distribute and sell this array, the
+associated software, and its documentation for any purpose is hereby
+granted without fee, provided that the above copyright notice appears
+in all copies and that both that copyright notice and this permission
+notice appear in supporting documentation. Benjamin Titze makes no
+representations about the suitability of this array for any
+purpose. It is provided "as is" without express or implied warranty.
+
Since the original words lists come from the Ispell distribution:
Copyright 1993, Geoff Kuenning, Granada Hills, CA