GNU Aspell mkchardata perl script and Unicode data file
----------------------------------------------------------------------

The mkchardata perl script will read in a textual reference table(s)
and convert them into Aspell character data file(s).  Its usage is

  mkchardata [--no-ascii] unicode.dat <textual reference table(s)>

It will convert each textual reference table to an Aspell character
data file.  It expect the table to be in the form

  0x?? 0x???? # ...

Where 0x?? is the 8-bit character value in hex and 0x???? is the
Unicode value.  Anything after the '#' is ignored.  It table may
alternatively have the form:

  =?? U+???? ...

The upper 128 characters 0x80-0xFF may be mapped to anything.

As of Aspell 0.51 the following characters may also be remapped:

  02-1F (  2- 31) # Control characters
  41-5A ( 65- 90) # Uppercase Latin alphabet
  61-7A ( 97-122) # Lowercase Latin alphabet

Giving you a total of 220 characters to work with.

If any of the Latin ASCII letters are remapped than the option
--no-ascii should be given so that ALL the remaining letters are also
mapped to the private use area.  If you don't do this you may get
"interesting" results when trying to spell check a document that
contains words written using those letters, ie it may flag part of a
word as incorrect.

The Unicode data file was created in 1999 and only goes up 0xFFFF.  I
unfortunately lost the script which created it from the official
Unicode data.  In order to extend/update it I will need to rewrite the
script from scratch.  If the exiting Unicode data file is not
sufficient for your needs please let me know at kevina@gnu.org and I
will see what I can do.

