Variant Conversion Info (VarCon)
-Revision 5.1 (SVN Revision 161)
+Version 2016.11.20
-January 6, 2011
-
-Copyright 2000-2011 by Kevin Atkinson (kevina@gnu.org)
+Copyright 2000-2016 by Kevin Atkinson (kevina@gnu.org) and Benjamin
+Titze (btitze@protonmail.ch).
This package contains information to convert between American,
-British, and Canadian spellings and vocabulary as well and other
-variant information.
+British, Canadian, and Australian spellings and vocabulary as well as
+other variant information.
-The latest version can be found at http://wordlist.sourceforge.net/.
+The latest version can be found at http://wordlist.aspell.net/.
The main data file is varcon.txt. It contains information on the
preferred American, British, and Canadian spelling of a word as well
as other variant information.
Each line contains a mapping between the various spellings of a word.
-Words are tageed to indicate where the spelling is used, and each
+Words are tagged to indicate where the spelling is used, and each
word/tag pair is separated with a " / ". For example in the line:
A Cv: acknowledgment / Av B C: acknowledgement
"acknowledgment" and "acknowledgement" are two spellings of the same
-word and "A", "Cv", "B", etc are the tags. Tags are seperated by
-spaces and the group of tags is seperated from the word with a ": ".
+word and "A", "Cv", "B", etc are the tags. Tags are separated by
+spaces and the group of tags is separated from the word with a ": ".
Here, "acknowledgment" is the preferred American spelling (as
indicated by the "A") of the word, and "acknowledgement" is the
preferred Canadian and British spelling ("B" and "C"). However the
categories are as follows:
A: American
B: British "ise" spelling
- Z: British "ize" spelling or OED prefered Spelling
+ Z: British "ize" spelling or OED preferred Spelling
C: Canadian
+ D: Australian
_: Other (Variant info based on American dictionaries, never used
with any of the above).
and the variants tags are as follows:
is used when the spelling is almost generally considered a
misspelling, and is only included for completeness.
-If there are no tags with the 'Z' spelling category on the line than
-'B' implies 'Z'. Similarly if there are no 'C' tags than 'Z' implies
-'C'.
+For Australian English "v" was used for variants that are widely used,
+but not preferred, and "V" for all "-or" (vs. "-our") variants and
+variants considered "chiefly US".
+
+If there are no tags with the 'Z' spelling category on the line then
+'B' implies 'Z'. Similarly if there are no 'C' tags then 'Z' implies
+'C'. If there are no 'D' tags then 'B' implies 'D'.
For ease of reading and maintaining the data file, each line is
grouped in a cluster of closely related words. Each cluster is
-uniquely identifed by a headword, which is generally the American
+uniquely identified by a headword, which is generally the American
spelling of word on the first line of the cluster. Each cluster is
started with a '#' and is followed by the headword with some
additional information after it. For example the cluster for
A Cv: acknowledgments / Av B C: acknowledgements
A Cv: acknowledgment's / Av B C: acknowledgement's
The "<verified>" tag will be explained latter, and "(level 35)"
-indictated what level in SCOWL (see http://wordlist.sourceforge.net)
-the headword is found in. The levels generaly mean the following:
+indicate what level in SCOWL (see http://wordlist.sourceforge.net)
+the headword is found in. The levels generally mean the following:
<= 35: Very common word
<= 70: Can be found in the dictionary
80: Likely a valid word, can likely be found in an
Sometimes the spelling of a word depends on the usage. If so the word
is listed more than once within a cluster, with any usage information
-being indicated after a " | ". For example here is part of the cluser
+being indicated after a " | ". For example here is part of the cluster
for prize:
A B: prize | reward
A B: prizes | reward
A B: sake | :1
A C: sake / Av B Cv: saki | :2
-Sometimes part-of-speach (POS) info is given to help distinguish which
+Sometimes part-of-speech (POS) info is given to help distinguish which
form is used. For example:
A B C: practice / AV Cv: practise | <N>
A Cv: practice / AV B C: practise | <V>
<Adj>: Adjective
<Adv>: Adverb
-A "(-)" before the definition indicated a rarly used or archaic form
+A "(-)" before the definition indicated a rarely used or archaic form
of a word, for example:
A B: bark | :1
A: bark / Av B: barque | (-) ship
The file voc.tab is similar to varcon.txt but converts between
vocabulary instead of spelling. Unlike varcon.tab it is a simple tab
-seperated file with the fields correspoding to the American, British,
+separated file with the fields corresponding to the American, British,
and Canadian words. If more than one word if often used to describe
the same thing the words are separated with commas. The last column
contains additional notes on when the word is used. Unlike varcon.txt
common, or other. "common" is used for words which appear in
varcon.txt, yet are used in all versions of english, such as "prize",
and "other" is used for the "_" spelling category. The mapping from
-the variant indicators in varcon.txt to the numberic variant level is
+the variant indicators in varcon.txt to the numeric variant level is
as follows:
v => 0
V => 1
varcon.txt and is used by many of the scripts above.
If you discover any errors in these mappings or have suggestions for
-additions please file a bug report, which you can find instructions
-for at http://wordlist.sourceforge.net/, or alternativly email me
+additions please file a bug report at
+https://github.com/kevina/wordlist/issues, or alternatively email me
directly at kevina@gnu.org, but I will likely tell you to file a bug
report so that I don't forget about it.
Many other words have been added to abc.tab which were not in the
original Ispell word lists.
-Many different web sources were consuled when crating the tables. They
+Many different web sources were consulted when crating the tables. They
include:
The American-British British-American Dictionary
Inter-Play Translation: British/Canadian/American Vocabulary
http://www.inter-play.com/translation/voc-ukus.htm
-As well as several online dicionaries:
+As well as several online dictionaries:
Marriam-Webster: http://www.m-w.com/
American Heritage: http://www.bartleby.com/61/
Cambridge (ESL): http://dictionary.cambridge.org/
In version 5.0 a massive effort to correct the numerous errors in
-VarCon was done. The primary sources used for verification where:
+VarCon was done. The primary sources used for verification were:
Marriam-Webster: http://www.m-w.com/
Free version of Oxford Dictionaries Online:
but only to perform some initial verification, in the end I rechecked
his data using the dictionaries above. (However, his data is, by far,
more accurate than Tysto's)
+
+In Version 2016.11.20 Benjamin Titze added support for Australian English.
+The primary sources for this addition were:
+
+ The Macquarie Dictionary: https://www.macquariedictionary.com.au/
+ Style Manual: For Authors, Editors and Printers, 6th Edition. DCITA.
+ University of Technology Sydney Publications Style Guide:
+ http://www.gsu.uts.edu.au/publications/styleguide/spelling.html
+ Style Manual, Department of Treasury and Finance, Tasmania:
+ http://conference.tasa.org.au/wp-content/uploads/2015/03/Style-Manual.pdf
+ Editor Australia - Style Guide:
+ http://www.editoraustralia.com/styleguide_spelling.html
+ Webster in Australia (history of "our"/"or" spelling variants):
+ http://blogs.usyd.edu.au/elac/2008/01/webster_in_australia.html
+
CHANGELOG:
+From 2016.06.26 to 2016.11.20
+
+ - New Australian spelling category thanks to the work of Benjamin
+ Titze.
+
+ - Various other fixes.
+
+From 2016.01.19 to 2016.06.26
+
+ - Fix plural of "bus".
+
+From 2015.08.24 to 2016.01.19
+
+ - Undo the effects of PERL_UNICODE in the translate script.
+
+ - Other minor fixes and new entries.
+
+From 2014.02.15 to 2015.08.24 (Aug 24, 2015)
+
+ - Added entry for Koran/Koranic.
+
+ - Tweaked "adviser" cluster.
+
+ - Fix formatting problems.
+
+From 2015.01.28 to 2014.02.15 (February 15, 2015)
+
+ - Various new entries
+
+From 2014.11.17 to 2015.01.28 (January 28, 2015)
+
+ - Minor adjustments to a few entries (analytic, amid)
+
+ - Added entry for shareable
+
+ - Remove a junk entry (ted/taed).
+
+From 2014.08.11 to 2014.11.17 (November 17, 2014)
+
+ - Fix typos in README
+
+ - Enhancement to VarCon translate script. It will now, by default,
+ filter clusters with a SCOWL level > 80. This behavior can be
+ controlled with the new "--thresh" option.
+
+ - Remove a few junk entries.
+
+From Revision 5.1 to Version 2014.08.11 (August 8, 2014)
+
+ - Various corrections. Most of them minor. Two notable exceptions:
+
+ - Added an entry for furor as the correct British spelling is furore
+
+ - Fixed racket entries as Canadians still use racquet even
+ though it is a British English (at least according to the
+ Oxford dictionaries)
+
+ - Other minor changes.
+
From Revision 5.0 to Revision 5.1 (January 6, 2010)
- Corrected numerous errors after running various forms
COPYRIGHT:
-Copyright 2000-2010 by Kevin Atkinson
+Copyright 2000-2016 by Kevin Atkinson
Permission to use, copy, modify, distribute and sell this array, the
associated software, and its documentation for any purpose is hereby
representations about the suitability of this array for any
purpose. It is provided "as is" without express or implied warranty.
+Copyright 2016 by Benjamin Titze
+
+Permission to use, copy, modify, distribute and sell this array, the
+associated software, and its documentation for any purpose is hereby
+granted without fee, provided that the above copyright notice appears
+in all copies and that both that copyright notice and this permission
+notice appear in supporting documentation. Benjamin Titze makes no
+representations about the suitability of this array for any
+purpose. It is provided "as is" without express or implied warranty.
+
Since the original words lists come from the Ispell distribution:
Copyright 1993, Geoff Kuenning, Granada Hills, CA