Note: Japanese spice names written in kanji are also found in the Chinese Index.
Japanese writing is a very serious candidate for the title of the
most complicated writing system in the universe. Actually, it is a mixed system consisting of four different scripts that are combined, according to rules and personal preferences. One of these four is the Latin alphabet, known as rōmaji [ローマ字] , which is used mainly for stylistic effect, although its use is standard for international abbreviations. There are different systems casting the Japanese language into Latin letters; the Hepburn system, which I use, marks long vowels with a macron, and this is hardly ever seen in real Japanese usage. In this romanization, consonants are typically pronouced as they would be in English, and vowels as they would be in most languages except English.
The most complicated part of that bundle is the Chinese logographic script, called kanji [漢字, かんじ, カンジ] in Japanese, which is used for Chinese loanwords and much of the basic Japanese vocabulary. The application of Chinese logograms to Japanese language is governed by many subtle rules (and even more exceptions).
As a rule, a kanji may represent either its Chinese sound value (of course, taken from historic Chinese and adapted to modern Japanese) or its Chinese meaning. This can be illustrated with the logogram 草, meaning
grass, herb and spoken cǎo in Chinese. In a Japanese context, it may represent the Japanese word kusa
herb or the syllable sō as the Japanese sound equivalent of cǎo. Examples are kusa-mochi [草餅]
a sweet flavoured with dried mugwort herb and kanzō [甘草] Sichuan pepper and wasabi.
In addition, Japanese also has two syllabaries known as kana. Both are capable of representing any Japanese text, but documents written in kana alone are very unusual; normally, their use is restricted to those part of the language that the kanji cannot handle. They are, thus, supplemental scripts.
The two syllabaries are called hiragana [平仮名, ひらがな, ヒラガナ] and katakana [片仮名, かたかな, カタカナ]. Basically, hiragana is employed for grammatical suffixes, and may act as a substitute for kanji if the writer wants to avoid the latter. Such avoidance is actually the default for some words, as many kanji are rare and/or only known to people of higher education. Katakana are used for everything else, particularly elements alien to the language: Unassimilated borrowings, neologisms, foreign names and quotations from foreign languages. It are also used for emphasis, both in running text and in section titles. The inherent freedom of the system allows for very specific stylistic effects.
Note that the previous four-paragraph summary of Japanese writing habits is very simplified and omits most that is really interesting. I shall now turn to a closer technical description of how the syllabaries really work.
The two kana systems are largely isomorphic with almost perfect one-to-one correspondence between their signs. Thus, most what can be said about their structure and orthography applies to both equally. I will give a broad overview only, and will therefore rarely have to distinguish between the two. As a typographic convention, the syllable signs will be referred to by names in capital letters, and their pronunciation is given in small letters. Occasionally, the symbols C and V will be used for an arbitrary consonant or vowel, respectively.
The kana inventory consists of pure vowel signs and simple open syllables of the type consonant+vowel, CV. Japanese has five vowels, A I U E and O, and eight basic consonants, K S T N H M Y R and (with some restrictions) W. Voiced consonants G Z and D are available from their unvoiced counterpart by adding a diacritical mark called dakuten [濁点, だくてん, ダクテン]. Moreover, B and P are considered variants of H; they are written as H with the diacritical mark dakuten and handakuten [半濁点, はんだくてん, ハンダクテン], respectively. Including the secondary consonants, this gives a grid of 5×15 syllable signs; however, a few obsolete combinations are missing from Hiragana (YI, YE and WU), and there are additional signs for syllabic N (which is very frequent in the language) and for the rare syllable VU.
Most languages could not reasonably be written with that inventory, as no signs for consonant clusters exist. In Japanese, however, the only consonant cluster is ts which can only appear in the syllable tsu and is basically a historic oddity. Rather, a typical word will just consist of an alternating sequence of consonants and vowels (two vowels in a row are quite common, though). Three more phenomena which look like consonant clusters (but are not) may appear in Japanese:
- Gemination (doubling of consonant)
- This is indicated by a smaller version of the TU kana which is written immediatly before the kana whose consonant is to be geminated. N and M cannot be geminated.
- Syllabic N (an N that constitutes a syllable for itself)
- This simply has a kana of its own, whose pronunciation is dependent on the environment. For example, it assimilates to following stops or M, and it is pronounced as a velar nasal word-finally. This is the only sign outside of the regular kana grid. Note that the term
syllableis inaccurate here and should be replaced by
mora, but I won’t explain that here.
- Palatalization (Y-sound attached to a consonant).
- Syllables of the form CYV do occur in Japanese. They are written with two kana: First the kana CI followed by a small version of the YV kana. For example, KYO would appear as KI plus small YO. This works only with the vowels A O U. Note the exceptions enumerated below.
There are some irregularities in the system, mostly arising from sound shifts: Some kana, though part of the regular grid, have unexpected pronunciation. The following list includes all such cases as long as only the common signs are considered.
- The SI sign is pronounced shi, and a hypothetical spoken syllable si is not representable in kana. The combinations SI plus small YA YU YO represent spoken sha shu sho. There is no she.
- Analog to the S-series, ZI is spoken ji, and the combinations of ZI with small YA YU YO signify ja ju jo. Again, there is no je.
- The signs TI and TU are spoken chi and tsu, respectively; again, spoken ti and tu cannot be represented in writing (they don’t occur, anyway). The combinations of TI with small YA YU YO are spoken cha chu cho. A syllable che cannot exist.
Somewhat analog to the T-series: DI is read ji and DU is read ju. These signs DI and DU are rarely used, because by default, ji is written ZI and ju is written ZI plus small YU. If you are confused, look up the
Z-seriesabove. Orthography demands the usage of DI and DU in a number of cases, though. The combination of DI plus small YA YU YO should give ja ju jo, but hardly ever occurs, as these syllables are written with ZI plus small YV instead.
- HU is spoken fu. Note that f cannot appear in any other syllable.
- This is a defective series: Only WA is common. Hiragana also has WI WE and WO (spoken as pure vowels), and katakana has the rarely-used WO (spoken o), and obsolete signs for the rest. When wo is needed in foreign words like water, it is approximated by U plus small O.
unrepresentablecan actually be written, in katakana only, with special digraphs involving small versions of the vowel signs: Small vowels override the vowel of the previous katakana. These combinations are very unusual, though, and not even used for everyday transcription of foreign expressions, but are mostly academic tools if increased accuracy to the foreign sounds is required. Nevertheless, there are a few examples of such writings in this index.
The common Japanese approach to foreign borrowing, however, is adaption to the Japanese sound system. A foreign syllable che, for example, would perhaps be spoken cho and written TI plus small YO instead of the exotic TI plus small E. Intractable consonant clusters in foreign words are resolved by inserting a vowel (if possible u, and o otherwise) whenever necessary, yielding charming results like ōrusupaisu (remember, L=R). The index contains quite some spice names that are basically English in a Japanese kimono; I have found them on Japanese web sites, but I do not know how frequent they would be used among Japanese speakers.
Japanese has syllables with long vowels, which are written CV followed by an extra V vowel sign. However, most instances of long o actually derive from an old diphthong ou and are thus spelled CO followed by U. This applies to Japanese words, whether written in hiragana or katakana, but foreign words written in katakana use a vowel length sign instead that looks like a horizontal bar (in the traditional vertical writing direction, it is a vertical bar).
The Unicode standard encodes every kana as a separate codepoint, although decomposition into the basic consonant kana plus a combining dakuten (or handakuten) is possible. The small kana also all have their own codepoints. Therefore, the script is rather easy to handle in a Unicode context. A subtle complication arises when fixed-width fonts are used: Each kana takes the double letterwidth. There are separate codepoint for half-width kana (rarely used), and also codepoints for double-width Latin letters which should be used if those are to fit into a fixed-width Japanese text.
The collation sequence of Japanese is evident from the kana table: It starts with the vowels (A I U E and O), then follow the consonants K S T N H M Y R and W. This order clearly betrays Indian influences. The dakuten and handakuten are ignored in sorting, meaning that K/G, S/Z, T/D and H/P/B are conflated (at least, in the first pass of sorting). Small kana sort like the normal-sized ones, which makes the sorting priciple difficult to grasp when looking at the romaji alone.
- Top of Japanese Index
- German version of this file
- Table of Contents
- Alphabetic Index
- Botanic Index
- Geographic Index
- Morphologic Index
- Spice mixture Index