Back Society » Tech » Typing Vietnamese, Part 2: The Vietnamese Diaspora, Unicode and the Ubiquity of Unikey

This is part 2 of our two-part series on the history of Vietnamese-centric typing technologies. Part 1 can be accessed here.

Voices from the diasporas: Early virtual communities and the emergence of VIQR

After 1975, there was a significant wave of Vietnamese migration to North America, Europe, Hong Kong, China and Australia. In the United States, the Vietnamese immigrant population, which was once only several thousand, increased to 245,025 in 1980. By 1990, the number doubled to 593,213 and, by 2000, it reached 1,122,528. Displaced from their home country and otherized within their new nations, they faced an increasing need to reconnect to aspects of their native identity and culture. The Vietnamese language was a very concrete way to achieve this, as exemplified by Anh Tran's observation in Vietnamese Language Education in the United States that, several years after 1975, there was a surge in Vietnamese language schools in the US.

Efforts to maintain connections to Vietnam through language was playing out during a time of major technological advances. The computer industry underwent a change from mainframe computers to personal microcomputers. IBM released its first home computer model in 1977 named Altair 8800, and in 1981 it introduced the mass-produced IBM-PC which resembled a modern day PC. The computer gradually became a more personal and individualized device.

The Vietnamese diaspora gained fairly early access to these computer advances in the United States in the 1990s, thanks to a large number of Vietnamese immigrants, especially women, working as low-level technicians in Silicon Valley and later, engineers working in the Information Technology industry. Vietnamese were IT pioneers in Australia as well. For example, students at Australia National University worked on a project that eventually brought internet connectivity to Vietnam.

In the book Transnationalizing Vietnam: Community, Culture, and Politics in the Diaspora, Kieu Linh Caroline Valverde introduces computer programmer Tin Le, a member of a group of Vietnamese American computer scientists that worked on establishing links via wide area networks. In 1986, they created an email list called Vietnet, with the purpose of connecting members of Vietnamese diasporas via electronic communication. In an interview with Valverde, Tin Le said: “It was pretty hard to connect, especially in regions where few Vietnamese resided. We wanted to talk to each other and reach out to one another.”

The administrators of Vietnet later moved the mailing list to a Usenet newsgroup, a type of discussion forum, called soc.culture.vietnamese (SCV). Both the Vietnet mailing list and SCV predated the internet as both relied on smaller network precursors to the world wide web. The Google archive of the newsgroup discussion suggests that soc.culture.vietnamese was created as early as April 1991. One can find literally everything Vietnam-related on soc.cultural.vietnamese — Vietnamese poems, lyrics, recipes, advertisements, searches for relatives, academic project announcements, as well as discussions of larger issues.

Computers at that time period only supported the ASCII (American Standard Code for Information Interchange) character encoding standard. The set of codes could only represent English alphabets on computers, which didn't include diacritics. In order to communicate with each other in the newsgroup, Vietnet and SCV members used a set of rules that allowed members to write Vietnamese using the characters available in ASCII to connote Vietnamese diacritical marks. The set included ( .+ ^ ? and ). The rules were often collectively called quy ước Vietnet (Vietnet convention) or quy ước SCV (SCV convention) or quy ước VIQR (VIQR convention, in which VIQR was short for Vietnamese Quoted-Readable). VIQR conventions became the de facto standard for many Vietnamese online citizens during the heyday of newsgroups and forums and is still used by a modest proportion of the population today.

A Usenet post listing some Vietnamese names that sound derogatory in English for parents to avoid when naming their child in order to prevent being mocked by Americans. Screenshot via Google Group.

The flux of technological solutions

The handy VIQR conventions were only a temporary solution, however, as the Vietnamese texts displayed were unrecognizable to the uninitiated. There remained a need to establish standard Vietnamese character encoding for web pages and fonts, which is why during the late 1980s and the early 1990s, a plethora of software packages, character encodings and Vietnamese fonts entered the cybersphere. While some of these solutions worked well, the large number of them created yet another problem. As Kim An Lieberman explains in Asian America.Net: Ethnicity, Nationalism, and Cyberspace, “The problem has not become how to put Vietnamese on the internet, but which Vietnamese to use.”

One popular encoding standard and input method produced during this time was the VNI standard, developed by a Vietnamese software engineer Ho Thanh Viet who was living in Westminster at the time. In 1987, Viet proposed using numerical keys to represent diacritical marks. The input method was popularized and commercialized by Viet and his company VNI Software via a package that included a font and word processor designed for the MS-DOS operating system. The method took off and became the standard for dot matrix printing which improved the landscape of Vietnamese-language newspapers in the US. VNI was even adopted by Microsoft in their Windows 95 operating system in the 1990s. However, VNI Software sued Microsoft over unauthorized use, forcing the tech giant to remove it. Today, VNI is taught in computer textbooks and used by many Vietnamese in Vietnam.

It was also during this time that The Unicode Consortium was created. Established in 1987 in Silicon Valley with members belonging to many technology companies such as Apple, Xerox, Sun Microsystems, IBM and Microsoft, the consortium aimed to create a universal standard for encoding and displaying every language including Vietnamese. It enlarged the 8-bit standard often found in character encoding at the time to a 16-bit character set in order to increase the number of characters it could hold.

For Vietnamese, the consortium's original plan was to assign a code to each diacritical mark instead of assigning a code to a precomposed combination. The reason for this decision was that Unicode wanted to save space and avoid encoding anything that could be created through a combination of two or more characters that was already assigned a code. However, doing this would prove to be problematic. According to a memo written by the non-profit Viet-Std Group whose aim was to standardize Vietnamese for computers: “The heavy use of diacritical marks in Vietnamese text calls for a keyboard input scheme that does not require extra keystrokes such as a special ‘compose’ key to generate accented letters.” Dr. Ngo Dinh Hoc, one of the Viet-Std members, noted that the practice was unfair as French and German enjoyed the privilege of having every precomposed character encoded in the Unicode set.

Viet-Std Group sent a complaint to the Unicode committee for reconsideration, which the Unicode Consortium rejected on the grounds that the language didn't have a nation-wide character encoding, and therefore there was no need to ensure compatibility like other Latin-based languages. Not accepting Unicode's argument, Viet-Std Group developed its own character encoding standard VISCII (Vietnam's Standard Code for Information Interchange) in 1992. VISCII was a modified ASCII character set, in which the least problematic characters in the original ASCII were replaced with Vietnamese diacritical marks.

In 1993, Unicode finally agreed to encode every character in Vietnamese. From that year onwards, more typing conventions entered the cybersphere. Non-profit organization Vietnam Professionals Society (VPS) released its own input method software, VPSKey, in 1993 designed for Windows 3.1. In the same year, Vietnam's Ministry of Science, Technology and Environment (now Ministry of Science and Technology) issued TCVN 5712 — an 8-bit national standard character encoding for Vietnamese. The TCVN 5712 character encoding was called VSCII (Vietnam's Standard Code for Information Interchange) and included three versions: VN1, VN2, and VN3. The first was a modified ASCII set and the other two utilized the extended ASCII. TCVN 5712 was widely-used in northern Vietnam.

TCVN 5712 code charts: VN1 (left); VN2 (middle); VN3 (right). Images via Van Ban Phap Luat.

Web pages could finally properly display Vietnamese and users could write Vietnamese on the web if the output and the input were compatible with each other. However, typing and reading Vietnamese on computers remained a headache because the plethora of solutions allowed different web pages to use different encodings and fonts that were incompatible with one another. Therefore, users not equipped with the right tools were unable to neatly read and write Vietnamese.

Moving toward a unified standard: the story of Vietkey, WinVNKey and Unikey

Software and word processors continued to use 7-bit and 8-bit Vietnamese character encoding before Microsoft Windows included the Unicode encoding for Vietnamese in its 2000 version. WinVNkey was the first computer program to allow users to type Vietnamese on Windows 3.0 — the first version of the Windows operating system after MS-DOS. WinVNKey was designed and offered for free by TriChlor — a non-profit group that promoted the use of VISCII as a unified standard. WinVNkey started to support Unicode in 2000 after recognizing its potential. The project was taken over by Ngo Dinh Hoc, who was working with Unicode and designing a Vietnamese keyboard driver for Macintosh at the time. The program then became a multilingual input method software which facilitated more than 30 international languages that don't usually translate well into a computerized environment. Nom characters and Vietnamese ethnic minorities languages were also included. 

A notable WinVNKey equivalent is Vietkey. It was developed in 1991 and released in 1997 by Vietkey Group, a company based in Vietnam and founded by Đặng Minh Tuấn, who was a young engineer at the Ministry of Defence at the time. The program was first offered as freeware and later commercialized in conjunction with the company's other products. Vietkey supported Vietnamese, English, French, German and Russian. There was a version compatible with the Linux operating system as well. Just like the team behind WinVNKey, Tuấn was an advocate for a universal character encoding for typing Vietnamese. Tuấn and Vietkey offered the Unicode support in 1997 and fine-tuned it into more efficient software in 2000. However, the fact the Vietkey was not freely accessible was a drawback for many.

Seeing the need for a widely accessible software that supported Unicode other than Vietkey, Phạm Kim Long — a graduate student in Prague at the time — had the idea to develop his own input method software, which resulted in Unikey, released in 2000. The compact freeware is now ubiquitous among Vietnamese computer users. Long had been toying with the idea since 1991, when he and his classmates at the Hanoi University of Science and Technology challenged each other to write the most lightweight Vietnamese typing program using the Assembly language. Long won the challenge with a program that only weighed 2 kilobytes called LittleVNKey. However, LittleVNKey" did not support Unicode.

In 2000, Long decided to work on a Vietnamese input program with Unicode support after seeing online conversations about Windows 2000's multilingual support which included Vietnamese. He spent two days writing the program and released the first version of Unikey online. He then spent the next four months receiving feedback and fine-tuning the software. In 2006, through a Việt Kiều friend, Phạm Kim Long gave Apple the rights to integrate the software in its operating system. Unikey remains a free and accessible software now.

Although Long and Tuấn are the two most credited with making Vietnamese compatible with modern computers, the development of typing technologies is much more multi-faceted, and reflects the sociocultural and historical needs of a population yearning for connection with the world and with itself.

This article was originally published in 2018.

Related Articles

in Film & TV

Women in Post-Đổi Mới Vietnamese Cinema: From Archetypal to Multifaceted

In Vietnamese cinema, the female figure has long been employed to deliver macro-level messages rather than just mundane narratives.

Paul Christiansen

in Loạt Soạt

'Bronze Drum,' an Entertaining, TV-Ready Reimagining of the Legend of Hai Bà Trưng

Turning a beloved but brief legend based on scant historical evidence into a page-turning novel is no easy task. But Phong Nguyen’s book Bronze Drum succeeds in depicting the upbringing and rebel...

in Music & Arts

A Brief History of Paris by Night, the Anchor of Vietnamese Culture Abroad

If home cuisine could satiate exilic tongues that resist strange flavors, art might condole with nostalgic hearts. Paris by Night is one such therapeutic art, or the hallmark of performing arts for Vi...

in Vietnam

Brown, Princeton Universities Offer First-Ever Vietnamese-Language Classes

The two Ivy League colleges are collaborating on beginner and intermediate Vietnamese language courses after high demand from members of their student bodies.

Thi Nguyễn

in Food Culture

Chè, Bánh, Chả, Nem: The Curious Lives of Vietnam’s Regional Food Names

Realizing the word that one is using refers to an entirely different object in another region is a situation many can relate to. The last time this happened to me, it almost cost me a bowl of Hanoi’s ...

Khôi Phạm

in Vietnam

Cold War History With a Side of Nem Rán in Prague's Little Hanoi

Across English-speaking countries such as the US and Australia, the Vietnamese diaspora established close-knit “Little Saigon” towns whenever they settled down, founding large markets, starting financ...

Partner Content