by Christian Wittern [Email: chris at chibs.edu.tw]
The following is a short outline of my experiences in dealing with using Chinese characters in A99/Alcarta (herafter A99) databases. It should be pointed out, that I am using a Traditional Chinese version of Windows, but I have done some tests on an English version with Chinese extensions.
Although I am explicitly dealing only with Chinese data in the Traditional Chinese (also known as Big5) encoding, the method describes here generally also applies to Simplified Chinese (aka as GB or Guobiao), Japanese or Korean.
An example database is available for download [Filesize 2.4MB], where the issues discussed here can be studied. It is a bibliographical database of (canonical) Chinese Buddhist texts that crossreferences a number of canonical collections. Some of the entries do contain the Sanskrit or Pali title as well.
Here is a screenshot of the database:
D.APT and O.APTThe codes used for Chinese characters are in the same area as characters with umlaut, diacritical marks and other special characters. Since A99 databases maintain compatibility with existing DOS databases and the codepoints used for these characters in DOS and Windows differ, A99 does a translation on reading and writing to the database files.
This translation is controlled through the files D.APT and O.APT. Since the codes of the Chinese characters are the same in DOS and Windows environments, any translation done here would damage the characters and render the result unintelligible. Therefore, these two tables have to be disabled for any database using Chinese. This is most effectively done by preceding two space characters to every line beginning with a p or o in these files. It is recommended to do this only to copies of these files placed in the same directory as the database with Chinese characters, since there are copies of these files in the A99 and Allegro program directories that might be needed by other databases.
DISPHEAD.RTFA99 displays database records in a RTF-Window. This enables the allegrologist to use different fonts, colors and the like for the display. The next step is thus to add an entry for the font used to display Chinese to the file DISPHEAD.RTF. This file is usually placed in the A99 program directory. I recommend to maintain only one copy of this file, to make the maintainance of parameter files more straightforward. The RTF standard defines that a list of available fonts has to be given in the header of a RTF document. A large number of fonts can be defined here (Microsofts own RTF-writers e.g. MS-WORD used to list every font available on a given system here), but again, maintainance is easier if one restricts the fonts to those needed. Within a RTF-Document, these fonts are referred to by the number they get assigned to in this list. The entry
{\f0\froman\fprq2\fcharset136 MingLiU;}
thus adds the font with the name MingLiU as first entry (\f0) to this list. Within the display parameter files, a sequence of e.g.
#320 p"{\f0 " P"}"
would select this font for the display of the content of category #320. In RTF, formatting blocks are enclosed in curly brackets '{}'. The sequence \fprq2\fcharset136 was added in later extensions to RTF. This information tells the system what characterset is used in the font. For the sake of completeness, for Japanese this has to be e.g.
{\f41\froman\fcharset128\fprq1 MS Mincho;}
and for Simplified Chinese
{\f191\fmodern\fcharset134\fprq1 NSimSun;}.
Unfortunately, the A99 RTF window apparently can't understand this yet. I still recommend using this information, just in case A99 gets updated:-)
D-RTF.cPTSome Chinese characters contain codes that have a special function in RTF. The following table list them:
| Character | Decimal code | Hexadecimal code | Escape |
| \ | 092 | 5C | p .92 "\'5c" |
| { | 123 | 7B | p .123 "\'7b" |
| } | 125 | 7D | p .125 "\'7d" |
I placed the lines listed in the column under 'Escape' in the file D-RTF.cPT to make it available to all parameter files using RTF. These lines have the effect that those characters in cases where they should not have its special RTF function, are translated to a special RTF-representation that ensures their safe handling.
With the above changes in place, their is not much to be left. Fields that contain Chinese characters need to be wrapped in the RTF commands as already mentioned above:
#320 p"{\f0 " P"}"
The use of pre- and postfixes ensures that these characters will not be subjected to the remapping through the p table, as the content of this field will be.
In theory, the selection of different fonts for German and Chinese should ensure the peacefully coexistence of content in both scripts. This is exactly how it is actually working in my Traditional Chinese system. However, on my English test environment, it turned out that a system that just installs the fonts for Chinese (as they are for example available for free download from Microsoft's Website or come bundled with Office97), without native support from within the system (I will call this a mixed system hereafter), is not able to display this correctly. (The reason for this is: The Chinese fonts I know of display also all the characters needed for standard English. Consequently, they tell this to the system asking about what charactersets are supported by this font. The English system notices that the font supports English and, satisfied with this, does not inquire further. The information in the RTF font header as described above, was ment to explicitly select a specific characterset. Since A99 appearently is currently not using this information, it can't display the characters correctly).
The bottomline of this is: A system extension for the display is needed. I tested the following two systems:
The main drawback of using this kind of setup is, that the special German characters and other characters with diacritical marks get erroneously displayed as Chinese characters. (I remember that some systems, like China Star or Twinbridge do have an option to turn this off for certain fonts. -- None of these systems are currently available to me for testing.)
The solution I am using in my canonical database avoids this problem in a different way: I am (mis)using the replacement method known to allegrologists as the 'V14 replacement method' for these special characters. This was necessary, since most of the diacritical characters needed for transcribed Sanskrit and Pali where not available in neither the DOS nor the Windows character set. In my database, a long vowel a (a with a macron) is encoded as &amacron&, '&' is selected as the V14 function character (i5=&). There is an normalizaion record for amacron, whith a parametrisation which makes sure that a construct like "{\f8 A}" ends up as the replacement that will be used by A99. The fonts listed as f8 contains in the position of the letter 'A' the small a with a macron above. A sideeffekt of using characters in the range of the English alphabet for this is, that there is no interference with Chinese characters.
I am not yet entirely satisfied with this method though. Unfortunately, as it turned out, the current allegro system does apparently not allow the use of two different registers for V14 replacement in two different parameter files. If this where possible, I would like to do use one in the index parameter file to normalize e.g. the amacron to a and in another parameter file for display, I use the method described here to select the appropriate display font. As can be seen in the index parameter file, I currently have to use the string replace command like "_\f8__" to get rid of the unwanted font selection for the index, which is a real ugly hack. Any information about a better way to handle this is much appreciated.
Christian Wittern, Taipei.