Romanization

Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

Methods

There are many consistent or standardized romanization systems. They can be classified by their characteristics. A particular system's characteristics may make it better-suited for various, sometimes contradictory applications, including document retrieval, linguistic analysis, easy readability, faithful representation of pronunciation.

Source, or donor language – A system may be tailored to romanize text from a particular language, or a series of languages, or for any language in a particular writing system. A language-specific system typically preserves language features like pronunciation, while the general one may be better for cataloguing international texts.
Target, or receiver language – Most systems are intended for an audience that speaks or reads a particular language. (So-called international romanization systems for Cyrillic text are based on central-European alphabets like the Czech and Croatian alphabet.)
Simplicity – Since the basic Latin alphabet has a smaller number of letters than many other writing systems, digraphs, diacritics, or special characters must be used to represent them all in Latin script. This affects the ease of creation, digital storage and transmission, reproduction, and reading of the romanized text.
Reversibility – Whether or not the original can be restored from the converted text. Some reversible systems allow for an irreversible simplified version.

Transliteration

If the romanization attempts to transliterate the original script, the guiding principle is a one-to-one mapping of characters in the source language into the target script, with less emphasis on how the result sounds when pronounced according to the reader's language. For example, the Nihon-shiki romanization of Japanese allows the informed reader to reconstruct the original Japanese kana syllables with 100% accuracy, but requires additional knowledge for correct pronunciation.

Transcription

Phonemic

Most romanizations are intended to enable the casual reader who is unfamiliar with the original script to pronounce the source language reasonably accurately. Such romanizations follow the principle of phonemic transcription and attempt to render the significant sounds (phonemes) of the original as faithfully as possible in the target language. The popular Hepburn Romanization of Japanese is an example of a transcriptive romanization designed for English speakers.

Phonetic

A phonetic conversion goes one step further and attempts to depict all phones in the source language, sacrificing legibility if necessary by using characters or conventions not found in the target script. In practice such a representation almost never tries to represent every possible allophone—especially those that occur naturally due to coarticulation effects—and instead limits itself to the most significant allophonic distinctions. The International Phonetic Alphabet is the most common system of phonetic transcription.

Trade

For most language pairs, building a usable romanization involves trade between the two extremes. Pure transcriptions are generally not possible, as the source language usually contains sounds and distinctions not found in the target language, but which must be shown for the romanized form to be comprehensible. Furthermore, due to diachronic and synchronic variance no written language represents any spoken language with perfect accuracy and the vocal interpretation of a script may vary by a great degree among languages. In modern times the chain of transcription is usually spoken foreign language, written foreign language, written native language, spoken (read) native language. Reducing the number of those processes, i.e. removing one or both steps of writing, usually leads to more accurate oral articulations. In general, outside a limited audience of scholars, romanizations tend to lean more towards transcription. As an example, consider the Japanese martial art 柔術: the Nihon-shiki romanization zyûzyutu may allow someone who knows Japanese to reconstruct the kana syllables じゅうじゅつ, but most native English speakers, or rather readers, would find it easier to guess the pronunciation from the Hepburn version, jūjutsu.

Romanization of specific writing systems

Arabic

The Arabic alphabet is used to write Arabic, Persian, Urdu, Pashto and Sindhi as well as numerous other languages in the Muslim world, particularly African and Asian languages without alphabets of their own. Romanization standards include the following:

Deutsche Morgenländische Gesellschaft (1936): Adopted by the International Convention of Orientalist Scholars in Rome. It is the basis for the very influential Hans Wehr dictionary (ISBN 0-87950-003-4).
BS 4280 (1968): Developed by the British Standards Institution
SATTS (1970s): A one-for-one substitution system, a legacy from the Morse code era
UNGEGN (1972)
DIN 31635 (1982): Developed by the Deutsches Institut für Normung (German Institute for Standardization)
ISO 233 (1984). Transliteration.
Qalam (1985): A system that focuses upon preserving the spelling, rather than the pronunciation, and uses mixed case
ISO 233-2 (1993): Simplified transliteration.
Buckwalter transliteration (1990s): Developed at Xerox by Tim Buckwalter; does not require unusual diacritics
ALA-LC (1997)
Arabic chat alphabet

Persian

Consonants
Unicode	Persian letter	IPA	DMG (1969)	ALA-LC (1997)	BGN/PCGN (1958)	EI (1960)	EI (2012)	UN (1967)	UN (2012)	Pronunciation
U+0627	ا	ʔ, ∅	ʾ, —	ʼ, —				ʾ		- as in uh-oh
U+0628	ب	b	b							B as in Bob
U+067E	پ	p	p							P as in pet
U+062A	ت	t	t							T as in tall
U+062B	ث	s	s̱	s̱	s̄	t͟h	ṯ	s̄	s	S as in sand
U+062C	ج	dʒ	ǧ	j	j	d͟j	j	j		J as in jam
U+0686	چ	tʃ	č	ch	ch	č		ch	č	Ch as in Charlie
U+062D	ح	h	ḥ	ḥ	ḩ/ḥ	ḥ		ḩ	h	H as in holiday
U+062E	خ	x	ḫ	kh	kh	k͟h	ḵ	kh	x	somewhat resembling German Ch
U+062F	د	d	d							D as in Dave
U+0630	ذ	z	ẕ	ẕ	z̄	d͟h	ḏ	z̄	z	Z as in zero
U+0631	ر	r	r							R as in rabbit
U+0632	ز	z	z							Z as in zero
U+0698	ژ	ʒ	ž	zh	zh	z͟h	ž	zh	ž	S as in television or G as in genre
U+0633	س	s	s							S as in Sam
U+0634	ش	ʃ	š	sh	sh	s͟h	š	sh	š	Sh as in sheep
U+0635	ص	s	ṣ	ṣ	ş/ṣ	ṣ		ş	s	S as in Sam
U+0636	ض	z	ż	z̤	ẕ	ḍ	ż	ẕ	z	Z as in zero
U+0637	ط	t	ṭ	ṭ	ţ/ṭ	ṭ		ţ	t	t as in tank
U+0638	ظ	z	ẓ	ẓ	z̧/ẓ	ẓ	ẓ	z̧	z	Z as in zero
U+0639	ع	ʕ	ʿ	ʻ	ʼ	ʻ	ʻ	ʿ	ʿ	_____
U+063A	غ	ɢ~ɣ	ġ	gh	gh	g͟h	ḡ	gh	q	somewhat resembling French R
U+0641	ف	f	f							F as in Fred
U+0642	ق	ɢ~ɣ	q			ḳ		q		somewhat resembling French R
U+06A9	ک	k	k							C as in card
U+06AF	گ	ɡ	g							G as in go
U+0644	ل	l	l							L as in lamp
U+0645	م	m	m							M as in Michael
U+0646	ن	n	n							N as in name
U+0648	و	v~w	v				v, w	v		V as in vision
U+0647	ه	h	h	h	h	h		h	h	H as in hot
U+0629	ة	∅, t	—	h	—	t	h	—	—
U+06CC	ی	j	y							Y as in Yale
U+0621	ء	ʔ, ∅	ʾ	ʼ				ʾ
U+0623	أ	ʔ, ∅	ʾ	ʼ				ʾ
U+0624	ؤ	ʔ, ∅	ʾ	ʼ				ʾ
U+0626	ئ	ʔ, ∅	ʾ	ʼ				ʾ

Vowels
Unicode	Final	Medial	Initial	Isolated	IPA	DMG (1969)	ALA-LC (1997)	BGN/PCGN (1958)	EI (2012)	UN (1967)	UN (2012)	Pronunciation
U+064E	ـَ	ـَ	اَ	اَ	æ	a	a	a	a	a	a	A as in cat
U+064F	ـُ	ـُ	اُ	اُ	o	o	o	o	u	o	o	O as in go
U+0648 U+064F	ـوَ	ـوَ	—	—	o	o	o	o	u	o	o	O as in go
U+0650	ـِ	ـِ	اِ	اِ	e	e	i	e	e	e	e	E as in ten
U+064E U+0627	ـَا	ـَا	آ	آ	ɑː~ɒː	ā	ā	ā	ā	ā	ā	O as in hot
U+0622	ـآ	ـآ	آ	آ	ɑː~ɒː	ā, ʾā	ā, ʼā	ā	ā	ā	ā	O as in hot
U+064E U+06CC	ـَی	—	—	—	ɑː~ɒː	ā	á	á	ā	á	ā	O as in hot
U+06CC U+0670	ـیٰ	—	—	—	ɑː~ɒː	ā	á	á	ā	ā	ā	O as in hot
U+064F U+0648	ـُو	ـُو	اُو	اُو	uː, oː	ū	ū	ū	u, ō	ū	u	U as in actual
U+0650 U+06CC	ـِی	ـِیـ	اِیـ	اِی	iː, eː	ī	ī	ī	i, ē	ī	i	Y as in happy
U+064E U+0648	ـَو	ـَو	اَو	اَو	ow~aw	au	aw	ow	ow, aw	ow	ow	O as in go
U+064E U+06CC	ـَی	ـَیـ	اَیـ	اَی	ej~aj	ai	ay	ey	ey, ay	ey	ey	Ay as in play
U+064E U+06CC	ـیِ	—	—	—	–e, –je	–e, –ye	–i, –yi	–e, –ye	–e, –ye	–e, –ye	–e, –ye	Ye as in yes
U+06C0	ـهٔ	—	—	—	–je	–ye	–ʼi	–ye	–ye	–ye	–ye	Ye as in yes

Notes:

Georgian letter	IPA	National system (2002)	BGN/PCGN (1981—2009)	ISO 9984 (1996)	ALA-LC (1997)	Unofficial system	Kartvelo translit	NGR2
ა	/ɑ/	a	a	a	a	a	a	a
ბ	/b/	b	b	b	b	b	b	b
გ	/ɡ/	g	g	g	g	g	g	g
დ	/d/	d	d	d	d	d	d	d
ე	/ɛ/	e	e	e	e	e	e	e
ვ	/v/	v	v	v	v	v	v	v
ზ	/z/	z	z	z	z	z	z	z
ჱ	/eɪ/		ey	ē	ē	é	ej	ẽ
თ	/tʰ/	t	tʼ	t̕	tʻ	T or t	t	t / t̊
ი	/i/	i	i	i	i	i	i	i
კ	/kʼ/	kʼ	k	k	k	k	ǩ	k̉
ლ	/l/	l	l	l	l	l	l	l
მ	/m/	m	m	m	m	m	m	m
ნ	/n/	n	n	n	n	n	n	n
ჲ	/i/, /j/		j	y	y		j	ĩ
ო	/ɔ/	o	o	o	o	o	o	o
პ	/pʼ/	pʼ	p	p	p	p	p̌	p̉
ჟ	/ʒ/	zh	zh	ž	ž	J, zh or j	ž	g̃
რ	/r/	r	r	r	r	r	r	r
ს	/s/	s	s	s	s	s	s	s
ტ	/tʼ/	tʼ	t	t	t	t	t̆	t̉
ჳ	/w/			w	w		ŭ	f̃
უ	/u/	u	u	u	u	u	u	u
ფ	/pʰ/	p	pʼ	p̕	pʻ	p or f	p	p / p̊
ქ	/kʰ/	k	kʼ	k̕	kʻ	q or k	q or k	k / k̊
ღ	/ʁ/	gh	gh	ḡ	ġ	g, gh or R	g, gh or R	q̃
ყ	/qʼ/	qʼ	q	q	q	y	q	q
შ	/ʃ/	sh	sh	š	š	sh or S	š	x
ჩ	/t͡ʃ(ʰ)/	ch	chʼ	č̕	čʻ	ch or C	č	c̃
ც	/t͡s(ʰ)/	ts	tsʼ	c̕	cʻ	c or ts	c	c
ძ	/d͡z/	dz	dz	j	ż	dz or Z	ʒ	d̃
წ	/t͡sʼ/	tsʼ	ts	c	c	w, c or ts	ʃ	c̉
ჭ	/t͡ʃʼ/	chʼ	ch	č	č	W, ch or tch	ʃ̌	j̉
ხ	/χ/	kh	kh	x	x	x or kh (rarely)	x	k̃
ჴ	/q/, /qʰ/		qʼ	ẖ	x̣		q̌	q̊
ჯ	/d͡ʒ/	j	j	ǰ	j	j	-	j
ჰ	/h/	h	h	h	h	h	h	h
ჵ	/oː/			ō	ō		ȯ	h̃

Romanized	IPA	Greek	Cyrillic	Amazigh	Hebrew	Arabic	Persian	Katakana	Hangul	Bopomofo
A	a	A	А	ⴰ	ַ, ֲ, ָ	َ, ا	ا, آ	ア	ㅏ	ㄚ
AE	ai̯/ɛ	ΑΙ							ㅐ
AI	ai				י ַ					ㄞ
B	b	ΜΠ, Β	Б	ⴱ	בּ	ﺏ ﺑ ﺒ ﺐ	ﺏ ﺑ		ㅂ	ㄅ
C	k/s	Ξ								ㄘ
CH	ʧ	TΣ̈	Ч		צ׳		چ		ㅊ	ㄔ
CHI	ʨi							チ
D	d	ΝΤ, Δ	Д	ⴷ, ⴹ	ד	ﺩ — ﺪ, ﺽ ﺿ ﻀ ﺾ	د		ㄷ	ㄉ
DH	ð	Δ			דֿ	ﺫ — ﺬ
DZ	ʣ	ΤΖ	Ѕ
E	e/ɛ	Ε, ΑΙ	Э	ⴻ	, ֱ, י ֵֶ, ֵ, י ֶ			エ	ㅔ	ㄟ
EO	ʌ								ㅓ
EU	ɯ								ㅡ
F	f	Φ	Ф	ⴼ	פ (or its final form ף )	ﻑ ﻓ ﻔ ﻒ	ﻑ			ㄈ
FU	ɸɯ							フ
G	ɡ	ΓΓ, ΓΚ, Γ	Г	ⴳ, ⴳⵯ	ג		گ		ㄱ	ㄍ
GH	ɣ	Γ	Ғ	ⵖ	גֿ, עֿ	ﻍ ﻏ ﻐ ﻎ	ق غ
H	h	Η	Һ	ⵀ, ⵃ	ח, ה	ﻩ ﻫ ﻬ ﻪ, ﺡ ﺣ ﺤ ﺢ	ه ح ﻫ		ㅎ	ㄏ
HA	ha							ハ
HE	he							ヘ
HI	hi							ヒ
HO	ho							ホ
I	i/ɪ	Η, Ι, Υ, ΕΙ, ΟΙ	И, І	ⵉ	ִ, י ִ	دِ		イ	ㅣ	ㄧ
IY	ij					دِي
J	ʤ	TZ̈	ДЖ, Џ	ⵊ	ג׳	ﺝ ﺟ ﺠ ﺞ	ج		ㅈ	ㄐ
JJ	ʦ͈/ʨ͈								ㅉ
K	k	Κ	К	ⴽ, ⴽⵯ	כּ	ﻙ ﻛ ﻜ ﻚ	ک		ㅋ	ㄎ
KA	ka							カ
KE	ke							ケ
KH	x	X	Х	ⵅ	כ, חֿ (or its final form ך )	ﺥ ﺧ ﺨ ﺦ	خ
KI	ki							キ
KK	k͈								ㄲ
KO	ko							コ
KU	kɯ							ク
L	l	Λ	Л	ⵍ	ל	ﻝ ﻟ ﻠ ﻞ	ل		ㄹ	ㄌ
M	m	Μ	М	ⵎ	מ (or its final form ם )	ﻡ ﻣ ﻤ ﻢ	م		ㅁ	ㄇ
MA	ma							マ
ME	me							メ
MI	mi							ミ
MO	mo							モ
MU	mɯ							ム
N	n	Ν	Н	ⵏ	נ (or its final form ן )	ﻥ ﻧ ﻨ ﻦ	ن	ン	ㄴ	ㄋ
NA	na							ナ
NE	ne							ネ
NG	ŋ								ㅇ
NI	ɲi							ニ
NO	no							ノ
NU	nɯ							ヌ
O	o	Ο, Ω	О		, ֳ, וֹֹ		ُا	オ	ㅗ
OE	ø								ㅚ
P	p	Π	П		פּ		پ		ㅍ	ㄆ
PP	p͈								ㅃ
PS	ps	Ψ
Q	q	Θ		ⵇ	ק	ﻕ ﻗ ﻘ ﻖ	غ ق			ㄑ
R	r	Ρ	Р	ⵔ, ⵕ	ר	ﺭ — ﺮ	ر		ㄹ	ㄖ
RA	ɾa							ラ
RE	ɾe							レ
RI	ɾi							リ
RO	ɾo							ロ
RU	ɾɯ							ル
S	s	Σ	С	ⵙ, ⵚ	ס, שׂ	ﺱ ﺳ ﺴ ﺲ, ﺹ ﺻ ﺼ ﺺ	س ث ص		ㅅ	ㄙ
SA	sa							サ
SE	se							セ
SH	ʃ	Σ̈	Ш	ⵛ	שׁ	ﺵ ﺷ ﺸ ﺶ	ش			ㄕ
SHCH	ʃʧ		Щ
SHI	ɕi							シ
SO	so							ソ
SS	s͈								ㅆ
SU	sɯ							ス
T	t	Τ	Т	ⵜ, ⵟ	ט, תּ, ת	ﺕ ﺗ ﺘ ﺖ, ﻁ ﻃ ﻄ ﻂ	ت ط		ㅌ	ㄊ
TA	ta							タ
TE	te							テ
TH	θ	Θ			תֿ	ﺙ ﺛ ﺜ ﺚ
TO	to							ト
TS	ʦ	ΤΣ	Ц		צ (or its final form ץ )
TSU	ʦɯ							ツ
TT	t͈								ㄸ
U	u	ΟΥ, Υ	У	ⵓ	, וֻּ	دُ		ウ	ㅜ	ㄩ
UI	ɰi								ㅢ
UW	uw					دُو
V	v	B	В		ב		و
W	w	Ω		ⵡ	ו, וו	ﻭ — ﻮ
WA	wa							ワ	ㅘ
WAE	wɛ								ㅙ
WE	we							ヱ	ㅞ
WI	y/ɥi							ヰ	ㅟ
WO	wo							ヲ	ㅝ
X	x/ks	Ξ, Χ								ㄒ
Y	j	Υ, Ι, ΓΙ	Й, Ы, Ј	ⵢ	י	ﻱ ﻳ ﻴ ﻲ	ی
YA	ja		Я					ヤ	ㅑ
YAE	jɛ								ㅒ
YE	je		Е, Є						ㅖ
YEO	jʌ								ㅕ
YI	ji		Ї
YO	jo		Ё					ヨ	ㅛ
YU	ju		Ю					ユ	ㅠ
Z	z	Ζ	З	ⵣ, ⵥ	ז	ﺯ — ﺰ, ﻅ ﻇ ﻈ ﻆ	ز ظ ذ ض			ㄗ
ZH	ʐ/ʒ	Ζ̈	Ж		ז׳		ژ			ㄓ

Romanization

Methods

Transliteration

Transcription

Phonemic

Phonetic

Trade

Romanization of specific writing systems

Arabic

Persian

Armenian

Georgian

Greek

Hebrew

Indic (Brahmic) scripts

Devanagari–nastaʿlīq (Hindustani)

Chinese

Mandarin

Mainland China

Taiwan

Singapore

Cantonese

Wu

Min Nan or Hokkien

Teochew

Min Dong

Min Bei

Japanese

Korean

Thai

Nuosu

Cyrillic

Belarusian

Bulgarian

Kyrgyz

Macedonian

Russian

Syriac

Ukrainian

Overview and summary

See also

References

External links

Tags:

🔥 Trending searches on Wiki English: