The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese:
A Review Article of Some Recent Dictionaries and Current Lexicographical Projects

By Victor H. Mair

As a working Sinologist, each time I look up a word in my Webster's or Kenkyusha's I experience a sharp pang of deprivation Having slaved over Chinese dictionaries arranged in every imaginable order (by K'ang-hsi radical, left-top radical, bottom-right radical, left-right split, total stroke count, shape of successive strokes, four-comer, three-comer, two-corner, kuei-hsieh, ts'ang-chieh, telegraphic code, rhyme tables, "phonetic" keys, and so on ad nauseam), I have become deeply envious of specialists in those languages, such as Japanese, Indonesian, Hindi, Persian, Russian, Turkish, Korean, Vietnamese, and so forth, which possess alphabetically arranged dictionaries Even Zulu, Swahili, Akkadian (Assyrian), and now Sumerian have alphabetically ordered dictionaries for the convenience of scholars in these areas of research

Webmaster's note: This essay was instrumental in leading to production of the ABC Chinese-English Comprehensive Dictionary, which is by far my favorite Mandarin-English dictionary.

It is a source of continual regret and embarrassment that, in general, my colleagues in Chinese studies consult their dictionaries far less frequently than do those in other fields of area studies. But this is really not due to any glaring fault of their own and, in fact, they deserve more sympathy than censure. The difficulties are so enormous that very few students of Chinese are willing to undertake integral translations of texts, preferring instead to summarize, paraphrase, excerpt and render into their own language those passages which are relatively transparent Only individuals with exceptional determination, fortitude, and stamina are capable of returning again and again to the search for highly elusive characters in a welter of unfriendly lexicons. This may be one reason why Western Sinology lags so far behind Indology (where is our Böthlingk and Roth or Monier-Williams?), Greek studies (where is our Liddell and Scott?), Latin studies (Oxford Latin Dictionary), Arabic studies (Lane's, disappointing in its arrangement by "roots" and its incompleteness but grand in its conception and scope), and other classical disciplines. Incredibly, many Chinese scholars with advanced degrees do not even know how to locate items in supposedly standard reference works or do so only with the greatest reluctance and deliberation. For those who do make the effort, the number of hours wasted in looking up words in Chinese dictionaries and other reference tools is absolutely staggering. What is most depressing about this profligacy, however, is that it is completely unnecessary. I propose, in this article, to show why.

First, a few definitions are required, What do I mean by an "alphabetically arranged dictionary"? I refer to a dictionary in which all words (tz'u) are interfiled strictly according to pronunciation. This may be referred to as a "single sort/tier/layer alphabetical" order or series. I most emphatically do not mean a dictionary arranged according to the sounds of initial single graphs (tzu), i.e. only the beginning syllables of whole words. With the latter type of arrangement, more than one sort is required to locate a given term. The head character must first be found and then a separate sort is required for the next character, and so on. Modern Chinese languages and dialects are as polysyllabic as the vast majority of other languages spoken in the world today (De Francis, 1984). In my estimation, there is no reason to go on treating them as variants of classical Chinese, which is an entirely different type of language. Having dabbled in all of them, I believe that the difference between classical Chinese and modern Chinese languages is at least as great as that between Latin and Italian, between classical Greek and modern Greek or between Sanskrit and Hindi. Yet no one confuses Italian with Latin, modern Greek with classical Greek, or Sanskrit with Hindi. As a matter of fact there are even several varieties of pre-modern Chinese just as with Greek (Homeric, Horatian, Demotic, Koine), Sanskrit (Vedic, Prakritic, Buddhist Hybrid), and Latin (Ciceronian, Low, Ecclesiastical, Medieval, New, etc.). If we can agree that there are fundamental structural differences between modern Chinese languages and classical Chinese, perhaps we can see the need for devising appropriately dissimilar dictionaries for their study.

One of the most salient distinctions between classical Chinese and Mandarin is the high degree of polysyllabicity of the latter vis-a-vis the former. There was indeed a certain percentage of truly polysyllabic words in classical Chinese, but these were largely loan- words from foreign languages, onomatopoeic borrowings from the spoken language, and dialectical expressions of restricted currency. Conversely, if one were to compile a list of the 60,000 most commonly used words and expressions in Mandarin, one would discover that more than 92% of these are polysyllabic. Given this configuration, it seems odd, if not perverse, that Chinese lexicographers should continue to insist on ordering their general purpose dictionaries according to the sounds or shapes of the first syllables of words alone.

Even in classical Chinese, the vast majority of lexical items that need to be looked up consist of more than one character. The number of entries in multiple character phrase books (e.g., P'ien-tzu lei-pien [approximately 110,000 entries in 240 chüan], P'ei-wen yün-fu [roughly 560,000 items in 212 chüan]) far exceeds those in the largest single character dictionaries (e.g., Chung-hua ta tzu-tien [48,000 graphs in four volumes], K'ang-hsi tzu-tien [49,030 graphs]). While syntactically and grammatically many of these multisyllabic entries may not be considered as discrete (i.e. bound) units, they still readily lend themselves to the principle of single-sort alphabetical searches. Furthermore, a large proportion of graphs in the exhaustive single character dictionaries were only used once in history or are variants and miswritten forms. Many of them are unpronounceable and the meanings of others are impossible to deter- mine. In short, most of the graphs in such dictionaries are obscure and arcane. Well over two-thirds of the graphs in these comprehensive single character dictionaries would never be encountered in the entire lifetime of even the most assiduous Sinologist (unless, of course, he himself were a lexicographer). This is not to say that large single character dictionaries are unnecessary as a matter of record. It is, rather, only to point out that what bulk they do have is tremendously deceptive in terms of frequency of usage.

Just to give one example, only 622 characters account for 90% of the total running text of Lao She's Rickshaw Boy (Lo-t'o hsiang-tzu) and 1681 graphs account for 99%. Altogether there are a total of 107,360 characters in Rickshaw Boy but only 2,413 different graphs. Compare this with the 660,273 total characters in the four volumes of Mao Tse-tung's Selected Works which are composed of only 2,981 different graphs. The figure is actually not much different for the bulk of classical Chinese writings (Brooks). In 700 of the best-known T'ang poems, a considerable number by a variety of poets, there are no more than 3,856 different graphs (based on Stimson). It is generally acknowledged that a passive command of about 5,500 characters is sufficient for reading the overwhelming majority of literary texts. Five to six thousand distinct graphs are certainly quite enough for anyone to cope with, but they are a far cry from fifty to sixty thousand.

Functional literacy (the ability to read newspapers, letters, signs, and so forth) in today's world requires that an individual command a knowledge of no more than 1,500-2,000 graphs (cf. Ho, p. 33). Not surprisingly, this figure is approximately the same as the amount of jōyō or tōyō kanji (characters approved for common use by the Japanese Ministry of Education). It would appear that the mind of the common man rebels at the memorizaton of larger numbers of graphs. Two or three years out of high school, most Japanese -- including those who go on to college -- can only reproduce about 500-700 graphs. This number goes down in successive years as they increasingly resort to kana or romaji to express themselves. Even the most highly literate Chinese scholars can almost never recognize more than 10,000 characters and the person who can accurately produce as many as 5,000 is exceedingly rare. It is a simple fact that the written vocabulary of modem Chinese texts consists largely of words that can be written down using no more than 3,500 different characters....