Home > Events, Random facts, Reviews, Technology, The Sydney Grind > Lost in translation, all 8,000 characters of it

Lost in translation, all 8,000 characters of it

April 28th, 2009

The New York Times getting it wrong? Preposterous notion, right? And so I thought also – until reading this story recently.

The story is by now so well known that it’s featured on Wikipedia. A Chinese woman called Ma Cheng has a given name which uses an unusual character, which is visually composed of three repetitions of the character for “horse”. This character was recorded in a 1710 AD character dictionary as a variant form of a character meaning “gallop”. This variant form has fallen out of use over the last 300 years, and today is only found in comprehensive character dictionaries, with the 1710 publication being the most recent source. For a rough analogy, imagine having a þ in your name.

As might be expected, having an obsolete character in her name has caused Ms Ma some difficulty over the years. For one thing, modern type sets and computer character sets rarely feature the character. In earlier years, authorities would write-in the character “”cheng”” by hand on documents such as ID cards. However, with the conversion towards full digitisation, it is becoming more and more difficult to solve the problem.

A quirky story so far, but not too far out of the ordinary. The NY Times report takes a turn torwards the dark alley of dystopia, however, when it turns towards what it claimed was an 8,000 list of permissible characters. The Chinese government, it said, citing a Chinese newspaper report, had been developing this list in recent years, not just for standardising naming use, but for ordinary usage as well.  A Chinese linguistics official was quoted, via the state mouthpiece Xinhua, as saying that 8,000 characters (compared to the 85,000 in existence, and the roughly 30,000 in ordinary or literary usage) was ‘enough to convey most concepts’. Disturbing whiffs of doublespeak, newspeak, and the Thought Police?

I certainly thought it sounded just a little too shockingly Orwellian. So I went digging a little. The NY Times referenced two other sources for the statements about the 8,000-character “permissible word list”: a Xinhua news piece which it linked to, and a report from “another Chinese newspaper”.

First up, the Xinhua report. Headline? “Official refutes report that China will limit number of characters for new names“. Quite the opposite to the “limiting language to 8,000 characters” claim, it seems.

So how did this all come about? The Xinhua article cites – and refutes – a report by the Guangzhou-based Yangcheng Evening News that claimed that baby names would be limited to an 8,000 character list. It also offers another clue: an 8,000-character list of simplified characters, which a government official says “in combination, could convey almost any concept in any field”. So is it true? Is the Communist government embarking on a campaign to control thought by limiting the tools of thought?

Well, no. A little more digging shows that the “8,000-character list” that everyone’s getting so excited about is in fact the “Chart of Commonly Used Characters in Modern Chinese” (Chinese: 现代汉语通用字表), a list of commonly used, standard-form characters issued by the Chinese government. The list is not new: it was first issued in 1965, and last updated in 1988, at which time it contained 7,000 characters. It is due to be updated again this year, and is expected to contain 8,000 characters. Nor is it exclusive. The purpose of the list is to prescribe one standard form for the characters listed for printing and educational purposes. It did not exclude the use of other characters. To burst the bubble of those getting worked up about it – there is no “writing police” going around snipping out non-list characters wherever they see them. While the list was originally the basis of computer character sets for Chinese, modern-day character sets are far larger, and most computers today can support more than 30,000 characters. Computers used by the Department of Public Security in China can support around 76,000.

From there, several iterations of sensationalism and inapt translation then turned a typesetting standard for commonly used characters into an effort to implement newspeak. The first culprit is the Yangcheng Evening News. This front page story from 11 April 2009 reports on the upcoming 8,000 character version of the Commonly Used Character List, and bore the startling headline “In future, names must be chosen from amongst 8,000 characters“. A close reading of the story, though, showed that this claim was not repeated in the body of the story. What it had were two disparate facts: 1. the new list to be released this year will have 8,000 characters, and 2. one of the experts involved in compiling the list expresses a view that characters used in naming “must” be regulated – one day. Hardly the same as an official stating that people will be forced to choose names from the 8,000 character list.

That’s sensationalising Round One. “The government will be issuing a new, 8,000-character version of the Commonly Used Character List” combined with “An expert believes character choice for personal names must be regulated in future” has been conflated into “The government will restrict character choice for personal names to 8,000 characters”.

Next comes the Xinhua report. While setting out to refute the Yangcheng story, it only adds confusion by quoting an official who gets ahead of himself and says that the 8,000 list will be “enough to convey most concepts”. Bad writing Round Two: so now the government believes 8,000 characters should be enough for anybody. It takes a bit of logical juggling to work out the intended meaning, but it seems that Xinhua is saying that the 8,000 list will not be the limit of character choice for personal names, but in ordinary usage it should be enough for most communications.

Then we have sensationalisation Round Three: whoever was reading these stories at the New York Times chose to ignore the part of the Xinhua story refuting the Yangcheng story, and conflated the last quote into the story. So now we have the story in all its monstrous splendour: the Chinese government is developing an 8,000 character list; choice of characters for personal names will be restricted to this list; in fact, this list will apply to ordinary usage of language as well.

This is not a simple case of bad writing and misreading. It’s the temptation of sensationalism. The Chinese government planning some horrible cultural lobotomy is a much better story than the Chinese government planning to issue a new set of printers’ standards. Feeding into it are prejudices about what an authoritarian and theoretically socialist government (i.e. the opposite of the American ideal) would do to its own people.

Does this prove that the Chinese government is not out to disembowel the language in a bid for ultimate thought control? Well no. But there certainly isn’t much real evidence of it. Hopefully, everyone can sleep sound tonight, safe in the knowledge that the right of Chinese citizens to give their children weird and unpronouncible names is, for the moment, alive and well.

Events, Random facts, Reviews, Technology, The Sydney Grind , , , ,

  1. April 28th, 2009 at 23:19 | #1

    Although I have to wonder why it is so unreasonable for the government to implement a character set that covers all characters, ancient or otherwise.

  2. Jerry
    April 29th, 2009 at 00:01 | #2

    So many horses in one name!

  3. Tommy Chen
    April 29th, 2009 at 10:43 | #3

    It’s surprisingly hard to find how many Chinese characters are supported by the standard character sets. What I can find is that GB18030-2005, the Chinese national standard, contains 70,244 characters, while the national standard CNS11643 in Taiwan supports 76,067 characters. But most people use GB18030-2000 / Big5 encoding, which only contain less than 30,000 characters.

    I think the main problem is that nobody really knows how many characters there *are*, mainly due to the presence of variant characters which were used in one area or at one time, and faded into obscurity afterwards. For example, some of the characters in the expanded sets are characters found only in scripture from Dunhuang, a desert oasis in Western China which was a thriving Silk Road pilgrimage site some 1000 years ago.

    The current count for the number of characters for which meaning is known, according to the Hanyu Da Zidian (“Comprehensive Chinese Character Dictionary”) from mainland China, is 54,678. The Ministry of Education in Taiwan compiled a “Dictionary of Variant Characters”, which collected a total of 106,230 characters, including both “standard” and “variant” characters.

  4. legallyasian
    April 29th, 2009 at 17:35 | #4

    Perhaps the mistake was merely due to a lack of competent translation skills by the NY time journalist, as opposed to a conspiracy theory to defame the Chinese government. Or perhaps that is too naive of an explaination?

    The government switched from traditional to simplified text and replaced many duplicate words to eradicate confusion when the communists decided that education should be offered to the masses, including peasants of rural areas. Words were made easier to write and recognise so that farmers would have an opportunity to quickly learn these skills. Actions to simplify the language back then was not an example of the government’s autocratic nature but rather its attempt to provide access to literacy skills for its citizens, whilst also simplifying a language that had become largely archaic for its time. This action is hardly something that the US or the NY times could criticise. If it were repeated in modern times perhaps it may be considered heavy handed but if the Chinese language were to be further simplified today, it would result in making the language easier for foreigners to learn, and therefore less exclusive. Wouldn’t such an action to ‘dumb down’ the language only create an advantage for foreigners? If this were true why is the NY times complaining…

  5. Tommy Chen
    April 29th, 2009 at 18:22 | #5

    The NY Times were by no means the only people guilty of sensationalism in this episode. The Yangcheng Evening News were pretty bad with their headline that had nothing to do with the story. It didn’t help that China has a thriving cyber-media community and blogosphere where journalistic standard is, well, n/a. When I was doing my detective work, I found some pretty hilarious and exaggerated Chinese-language headlines as the “government limiting name choice” story spread through the internet, e.g. “Congratulations to the Chinese people on losing another human right: the right to choose a name”.

  6. John lee
    June 12th, 2009 at 20:59 | #6

    lol, I note though that every name requires government authorisation in Germany. To protect the child.

    So even inoffensive names may be banned as first names because they’re also last names.

  1. No trackbacks yet.