These are some SIL contributions to research and development in the area of multilingual computing:
- The first requirement discussed in The Nature of Linguistic Data and the Requirements of a Computing Environment for Linguistic Research is that: "The data are multilingual, so the computing environment must be able to keep track of what language each datum is in, and then display and process it accordingly."
- Multilingual Data Processing in the CELLAR Environment describes six facets of multilingual computing and how SIL's CELLAR system supports each.
- Fonts in CyberSpace is SIL's guide to finding fonts on the Internet. Includes pointers to fonts for more than 40 non-Roman writing systems.
- Non-Roman Script Initiative facilitates the use of non-Roman and complex scripts in linguistic study, translation, literacy and publishing
About multilingual computing in general:
- The journal Multilingual Communications and Technology is published six times a year. For information mail to firstname.lastname@example.org.
- Knowledge Computing's Multilingual PC Directory
- Two-letter codes for the representation of names of languages (from ISO 639:1988).
Resources for software developers:
On developing a truly multilingual World Wide Web:
- The World Wide Web Consortium's (W3C) site on internationalization and localization: Non-western Character Sets, Languages, and Writing Systems
Fundamental to the problem of multilingual computing is the problem of character encoding and rendering. Below is a glossary of key terms that arise in this area; basic definitions are supplemented with pointers to further information resources.
American Standard Code for Information Interchange. A standard character set that maps character codes 0 through 127 onto control functions, punctuation marks, digits, upper case letters, lower case letters, and other symbols.
The minimal unit of encoding for text files. A character typically corresponds to a single graphic sign of a writing system, like a letter of the alphabet or a punctuation mark.
Some sources that discuss concepts and terminology:
These sources describe the contents of particular character sets:
A small mark (such as an accent mark) added above, below, before, or after a base character to modify its pronunciation or significance.
- Fonts in CyberSpace, SIL's guide to finding fonts on the Internet. Includes pointers to fonts for 44 non-Roman writing systems.
- Microsoft Typography Home Page explains how TrueType and OpenType fonts work.
- Fonts and Text (from Win32 Software Developers Kit documentation) explains the Windows font system. Keep turning pages to the right to first find definitions of the basic concepts and then see descriptions of the system functions that comprise the font system.
The process of converting a stream of encoded characters (that is, character codes) to their correct graphic appearance on a terminal or printer.
The seminal work on encoding versus rendering is:
- Becker, Joseph D. (1984) Multilingual word processing. Scientific American, 251(1):96-107.
A character set which attempts to include every character from all the major writing systems of the world. It uses two bytes (16 bits) to encode each character. In its current version (2.0), the Unicode standard contains 38,885 distinct coded characters from 25 scripts (including the International Phonetic Alphabet).
- The Unicode Consortium's Unicode Home Page.
- The book for the Unicode Standard is: The Unicode Standard, Worldwide Character Encoding, Version 2.0, 1996. Reading, MA: Addison-Wesley. ISBN 0-201-48345-9.
- The Unicode Standard (gives general information and describes basic principles)
- A list of all the characters in the Unicode set as SGML entities (but without pictures of sample glyphs).
A subcomponent of the Macintosh operating system (version 7.1 and later) which gives programs access to script interface systems for multiple non-Roman writing systems.
Some relevant publications:
- Davis, Mark E. (1987) The Macintosh script system. Newsletter for Asian and Middle Eastern Languages on Computer, 2(1&2):9-24.