Koi code table 8. KOI8-R encoding

- Departure (@ComradZampolit) August 17, 2017

How does Koi8-R work?

Koi8-R is an eight-bit code page designed to encode the letters of Cyrillic alphabets. The developers placed the symbols of the Russian alphabet in such a way that the positions of the symbols of Cyrillic corresponded to their phonetic analogues in the English alphabet at the bottom of the table. And if in the text written in this encoding, to remove the eighth bit of each symbol, then the text similar to Latin letters is obtained.

Such an exchange code was used in the seventies on computers of the EU EU Series, and from the middle of the eighties it began to use it in the first Russified versions. operating system Unix.

The coding was that a unique code was assigned to each symbol: from 00000000 to 11111111. Thus, a person distinguished the symbols on their design, and the computer - according to their code.

Is Chernova encoding now?

Not. It was relevant for old eight-bit computers, now the Unicode is mainly used in various formats.

KOI8-R encoding

ISO 8859-5 encoding

ISO 8859-5

Alternative encoding

"Alternative encoding" - Based on the CP437 code page, where all specific European symbols in the second half are replaced with Cyrillic, leaving pseudographic characters intact. Consequently, this does not spoil the type of programs using text windows, and also provides the use of Cyrillic characters in them.

Historically, there were many alternative encoding options, but all differences relate only to the 0xF0 - 0xFF region (240-255). The final standard was the IBM CP866 encoding, the support of which was added to MS-DOS version 6.22 (all sorts of "self-made" cracks were used. Alternative encoding is still alive and extremely popular in the DOS and OS / 2 environment. In addition, this encoding is recorded Names B. file System Fat. CP866 is still used in the console of Russified systems windows family NT.

.A.a.

.C.c.

.D.

.E.e.

.F.

A 410.

B 411.

In 412.

G 413.

D 414.

E 415.

416.

S 417.

And 418.

Th 419.

K 41A.

L 41b.

M 41c.

H 41d.

O 41E.

P 41f.

R 420

With 421.

T 422.

423.

F 424

X 425.

C 426.

H 427.

W 428.

Shch 429.

Kommersant 42a.

42b.

B 42c.

E 42d.

Yu 42e.

I am 42f.

A 430.

B 431.

in 432.

g 433.

d 434.

E 435.

2036.

s 437.

and 438.

th 439.

To 43a.

L 43b.

m 43c.

H 43d.

About 43E.

p 43f.

░ 2591

▒ 2592

▓ 2593

│ 2502

┤ 2524

╡ 2561

╢ 2562

╖ 2556

╕ 2555

╣ 2563

║ 2551

╗ 2557

╝ 255d.

╜ 255c.

╛ 255b.

┐ 2510

└ 2514

┴ 2534.

┬ 252c.

├ 251c.

─ 2500

┼ 253C.

╞ 255E.

╟ 255f.

╚ 255A.

╔ 2554

╩ 2569

╦ 2566

╠ 2560

═ 2550

╬ 256c.

╧ 2567

╨ 2568

╤ 2564

╥ 2565

╙ 2559

╘ 2558

╒ 2552

╓ 2553

╫ 256b.

╪ 256A.

┘ 2518

┌ 250C.

█ 2588

▄ 2584

▌ 258c.

▐ 2590

▀ 2580

P 440.

from 441.

T 442.

at 443.

F 444.

x 445.

C 446.

h 447.

sh 448.

Shch 449.

Kommersant 44a.

s 44b.

B 44C.

E4D.

Yu 44E.

I am 44f

E 301.

ё 451.

Є 404.

є 454.

Ї 407.

ї 457.

Ў 40E.

ў 45E.

° B0.

∙ 2219

· B7.

√ 221A.

№ 2116

¤ A4.

■ 25A0.

A0.

SO 8859-5 - 8-bit encoding from the ISO-8859 series for recording Cyrillic. In Russia is almost not used. In general, ISO 8859-5 is not very convenient encoding, since it does not have many of the necessary characters, such as a dash (-), choke-tree (""), degrees (°), etc.

.A.a.

.C.c.

.D.

.E.e.

.F.

8A.

8b.

8c.

8d.

8e.

8f.

9A.

9b.

9C.

9d.

9e.

9f.

A0.

E 301.

€ 402.

Ѓ 403.

Є 404.

Ѕ 405.

І 406.

Ї 407.

Ј 408.

Љ 409.

Њ 40A

Ћ 40b.

Ќ 40C.

Ў 40E.

Џ 40f.

A 410.

B 411.

In 412.

G 413.

D 414.

E 415.

416.

S 417.

And 418.

Th 419.

K 41A.

L 41b.

M 41c.

H 41d.

O 41E.

P 41f.

R 420

With 421.

T 422.

423.

F 424

X 425.

C 426.

H 427.

W 428.

Shch 429.

Kommersant 42a.

42b.

B 42c.

E 42d.

Yu 42e.

I am 42f.

A 430.

B 431.

in 432.

g 433.

d 434.

E 435.

2036.

s 437.

and 438.

th 439.

To 43a.

L 43b.

m 43c.

H 43d.

About 43E.

p 43f.

P 440.

from 441.

T 442.

at 443.

F 444.

x 445.

C 446.

h 447.

sh 448.

Shch 449.

Kommersant 44a.

s 44b.

B 44C.

E4D.

Yu 44E.

I am 44f

№ 2116

ё 451.

452.

ѓ 453.

є 454.

ѕ 455.

І 456.

ї 457.

ј 458.

љ 459.

њ 45A.

ћ 45B.

ќ 45C.

§ A7.

ў 45E.

џ 45F.

Koi-8 (information sharing code, 8 bits), Koi8 - an eight-bit symbol encoding standard in computer science. Designed for coding of letters of Cyrillic alphabets. There is also a seven-bit version of the encoding version - Koi-7. Koi-7 and Koi-8 are described in GOST 19768-74 (now invalid).

Koi-8 developers placed the symbols of the Russian alphabet in the top of the extended ASCII table in such a way that the positions of Cyrillic characters correspond to their phonetic analogues in the English alphabet at the bottom of the table. This means that if in the text written in Koi-8, to remove the eighth bit of each symbol, then it turns out a "readable" text, although it is written by Latin symbols. For example, the words "Russian text" would turn into "Russkij Tekst". As a side effect, the symbols of Cyrillic turned out to be arranged in alphabetical order.

.A.a.

.C.c.

.D.

.E.e.

.F.

─ 2500

│ 2502

┌ 250C.

┐ 2510

└ 2514

┘ 2518

├ 251c.

┤ 2524

┬ 252c.

┴ 2534.

┼ 253C.

▀ 2580

▄ 2584

█ 2588

▌ 258c.

▐ 2590

░ 2591

▒ 2592

▓ 2593

⌠ 2320

■ 25A0.

∙ 2219

√ 221A.

≈ 2248

≤ 2264

≥ 2265

A0.

⌡ 2321

° B0.

² B2.

· B7.

÷ F7.

═ 2550

║ 2551

╒ 2552

ё 451.

╓ 2553

╔ 2554

╕ 2555

╖ 2556

╗ 2557

╘ 2558

╙ 2559

╚ 255A.

╛ 255b.

╜ 255c.

╝ 255d.

╞ 255E.

╟ 255f.

╠ 2560

╡ 2561

E 301.

╢ 2562

╣ 2563

╤ 2564

╥ 2565

╦ 2566

╧ 2567

╨ 2568

╩ 2569

╪ 256A.

╫ 256b.

╬ 256c.

Yu 44E.

A 430.

B 431.

C 446.

d 434.

E 435.

F 444.

g 433.

x 445.

and 438.

th 439.

To 43a.

L 43b.

m 43c.

H 43d.

About 43E.

p 43f.

I am 44f

P 440.

from 441.

T 442.

at 443.

2036.

in 432.

B 44C.

s 44b.

s 437.

sh 448.

E4D.

Shch 449.

h 447.

Kommersant 44a.

Yu 42e.

A 410.

B 411.

C 426.

D 414.

E 415.

F 424

G 413.

X 425.

And 418.

Th 419.

K 41A.

L 41b.

M 41c.

H 41d.

O 41E.

P 41f.

I am 42f.

R 420

With 421.

T 422.

423.

416.

In 412.

B 42c.

42b.

S 417.

W 428.

E 42d.

Shch 429.

H 427.

Kommersant 42a.

Koi8-U encoding (Ukrainian)

Koi-8 became the first Russian standardized encoding on the Internet.

IETF approved several RFCs according to KOI-8 encoding options:

RFC 1489 - Koi8-R (letters of the Russian alphabet);
RFC 2319 - KOI8-U (the letters of the Ukrainian alphabet);
RFC 1345 - ISO-IR-111 (with an error in determining the main range).

In the above tables, the numbers are denoted by a hexadecimal code of the letter in Unicode.

KOI8-R encoding (Russian)

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.A.a.	.B	.C.c.	.D.	.E.e.	.F.
8.	─ 2500	│ 2502	┌ 250C.	┐ 2510	└ 2514	┘ 2518	├ 251c.	┤ 2524	┬ 252c.	┴ 2534	┼ 253C.	▀ 2580	▄ 2584	█ 2588	▌ 258c.	▐ 2590
9.	░ 2591	▒ 2592	▓ 2593	⌠ 2320	■ 25A0.	∙ 2219	√ 221A.	≈ 2248	≤ 2264	≥ 2265	A0.	⌡ 2321	° B0.	² B2.	· B7.	÷ F7.
A.	═ 2550	║ 2551	╒ 2552	e. 451	╓ 2553	╔ 2554	╕ 2555	╖ 2556	╗ 2557	╘ 2558	╙ 2559	╚ 255A.	╛ 255b	╜ 255c.	╝ 255d.	╞ 255E.
B.	╟ 255F.	╠ 2560	╡ 2561	E. 401	╢ 2562	╣ 2563	╤ 2564	╥ 2565	╦ 2566	╧ 2567	╨ 2568	╩ 2569	╪ 256A.	╫ 256b.	╬ 256c.	© A9.
C.	yu 44E.	but 430	b. 431	c. 446	d. 434	e. 435	f. 444	g. 433	h. 445	and 438	j. 439	to 43A.	l. 43b.	m. 43C.	n. 43d.	about 43E.
D.	p 43F.	i 44F.	r 440	from 441	t. 442	w. 443	j. 436	in 432	b 44C.	s 44B.	z. 437	sh 448	e. 44d.	sh 449	c. 447	kommersant 44A.
E.	YU 42E.	BUT 410	B. 411	C. 426	D. 414	E. 415	F. 424	G. 413	H. 425	AND 418	J. 419	TO 41A.	L. 41b.	M. 41c.	N. 41d.	ABOUT 41E.
F.	P 41f.	I 42f.	R 420	FROM 421	T. 422	W. 423	J. 416	IN 412	B 42c.	S 42b.	Z. 417	Sh 428	E. 42d.	Sh 429	C. 427	Kommersant 42A.

Other options

Only not matching table lines are shown, since everything else coincides.

Koi8-U encoding (Russian-Ukrainian)

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.A.a.	.B	.C.c.	.D.	.E.e.	.F.
A.	═ 2550	║ 2551	╒ 2552	e. 451	є 454	╔ 2554	і 456	ї 457	╗ 2557	╘ 2558	╙ 2559	╚ 255A.	╛ 255b	ґ 491	╝ 255d.	╞ 255E.
B.	╟ 255F.	╠ 2560	╡ 2561	E. 401	Є 404	╣ 2563	І 406	Ї 407	╦ 2566	╧ 2567	╨ 2568	╩ 2569	╪ 256A.	Ґ 490	╬ 256c.	© A9.

Koi8-RU encoding (Russian-Belorussko-Ukrainian)

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.A.a.	.B	.C.c.	.D.	.E.e.	.F.
A.	═ 2550	║ 2551	╒ 2552	e. 451	є 454	╔ 2554	і 456	ї 457	╗ 2557	╘ 2558	╙ 2559	╚ 255A.	╛ 255b	ґ 491	ў 45E.	╞ 255E.
B.	╟ 255F.	╠ 2560	╡ 2561	E. 401	Є 404	╣ 2563	І 406	Ї 407	╦ 2566	╧ 2567	╨ 2568	╩ 2569	╪ 256A.	Ґ 490	Ў 40E.	© A9.

KOI8-C encoding (Central Asia)

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.A.a.	.B	.C.c.	.D.	.E.e.	.F.
8.	ғ 493	җ 497	қ 49b.	ҝ 49d.	ң 4A3.	ү 4AF.	ұ 4B1	ҳ 4b3.	ҷ 4b7.	ҹ 4B9.	һ 4bb.	▀ 2580	ә 4D9.	ӣ 4E3.	ө 4E9.	ӯ 4EF.
9.	Ғ 492	Җ 496	Қ 49A.	Ҝ 49c.	Ң 4A2.	Ү 4ae.	Ұ 4b0.	Ҳ 4b2.	Ҷ 4b6.	Ҹ 4b8.	Һ 4ba.	⌡ 2321	Ә 4D8.	Ӣ 4E2.	Ө 4E8.	Ӯ 4ee.
A.	A0.	ђ 452	ѓ 453	E. 451	є 454	ѕ 455	і 456	ї 457	ј 458	љ 459	њ 45A.	ћ 45b.	ќ 45c.	ґ 491	ў 45E.	џ 45F.
B.	№ 2116	Ђ 402	Ѓ 403	E. 401	Є 404	Ѕ 405	І 406	Ї 407	Ј 408	Љ 409	Њ 40A.	Ћ 40b.	Ќ 40C.	Ґ 490	Ў 40E.	Џ 40f.

Koi8-T encoding (Tajik)

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.A.a.	.B	.C.c.	.D.	.E.e.	.F.
8.	қ 49b.	ғ 493	‚ 201A.	Ғ 492	„ 201e.	… 2026	† 2020	‡ 2021		‰ 2030	ҳ 4b3.	‹ 2039	Ҳ 4b2.	ҷ 4b7.	Ҷ 4b6.
9.	Қ 49A.	‘ 2018	’ 2019	“ 201C.	” 201D.	2022	– 2013	- 2014		™ 2122		› 203A.
A.		ӯ 4EF.	Ӯ 4ee.	E. 451	¤ A4.	ӣ 4E3.	¦ A6.	§ A7.				« AB	¬ AC	AD	® AE
B.	° B0.	± B1.	² B2.	E. 401		Ӣ 4E2.	¶ B6.	· B7.		№ 2116		» BB.				© A9.

Koi8-O, Koi8-S encoding (Slavic, old spelling)

0407

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.A.a.	.B	.C.c.	.D.	.E.e.	.F.
8.	Ђ 0402	Ѓ 0403	¸ 00B8.	ѓ 0453	„ 201e.	… 2026	† 2020	§ 00A7.	€ 20AC.	¨ 00A8.	Љ 0409	‹ 2039	Њ 040A.	Ќ 040C.	Ћ 040b.	Џ 040f.
9.	ђ 0452	‘ 2018	’ 2019	“ 201C.	” 201D.	2022	– 2013	— 2014	£ 00A3.	· 00B7.	љ 0459	› 203A.	њ 045A.	ќ 045c.	ћ 045b	џ 045F
A.	00A0.	ѵ 0475	ѣ 0463	e. 0451	є 0454	ѕ 0455	і 0456	ї 0457	ј 0458	® 00ae.	™ 2122	« 00ab	ѳ 0473	ґ 0491	ў 045E.	´ 00B4.
B.	° 00B0.	Ѵ 0474	Ѣ 0462	E. 0401	Є 0404	Ѕ 0405	І 0406	Ї 0407	Ј 0408	№ 2116	¢ 00A2	» 00bb.	Ѳ 0472	Ґ 0490	Ў 040E.	© 00A9.

ISO-IR-111, KOI8-E encoding

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.A.a.	.B	.C.c.	.D.	.E.e.	.F.
A.	00A0.	ђ 0452	ѓ 0453	e. 0451	є 0454	ѕ 0455	і 0456	ї 0457	ј 0458	љ 0459	њ 045A.	ћ 045b	ќ 045c.	00Ad	ў 045E.	џ 045F
B.	№ 2116	Ђ 0402	Ѓ 0403	E. 0401	Є 0404	Ѕ 0405	І 0406	Ї 0407	Ј 0408	Љ 0409	Њ 040A.	Ћ 040b.	Ќ 040C.	¤ 00A4.	Ў 040E.	Џ 040f.

Koi8-Unified Coding, Koi8-F

Koi8-Unified encoding (KOI8-F) is proposed by FingerTip Software.

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.A.a.	.B	.C.c.	.D.	.E.e.	.F.
8.	─ 2500	│ 2502	┌ 250C.	┐ 2510	└ 2514	┘ 2518	├ 251c.	┤ 2524	┬ 252c.	┴ 2534	┼ 253C.	▀ 2580	▄ 2584	█ 2588	▌ 258c.	▐ 2590
9.	░ 2591	‘ 2018	’ 2019	“ 201C.	” 201D.	2022	– 2013	— 2014	© 00A9.	™ 2122	00A0.	» 00bb.	® 00ae.	« 00ab	· 00B7.	¤ 00A4.
A.	00A0.	ђ 0452	ѓ 0453	e. 0451	є 0454	ѕ 0455	і 0456	ї 0457	ј 0458	љ 0459	њ 045A.	ћ 045b	ќ 045c.	ґ 0491	ў 045E.	џ 045F
B.	№ 2116	Ђ 0402	Ѓ 0403	E. 0401	Є 0404	Ѕ 0405	І 0406	Ї 0407	Ј 0408	Љ 0409	Њ 040A.	Ћ 040b.	Ќ 040C.	Ґ 0490	Ў 040E.	Џ 040f.

Non-smoking options koi-8

In some countries, the CMEAs were created by Koi-8 modifications for national latice options. The basic idea was the same - with the "cut" of the eighth bit, the text should remain more or less understandable.

- Well, start! - said Doolokh.
"Well," said Pierre, still smiling. - I got scared. It was obvious that the case, which began so easily, could no longer be prevented that it was going on, already regardless of the will of people, and should have been done. Denisov first came forward to the barrier and proclaimed:
- Since P "Svatstniki refused to" them ", whether it is not possible to start: take guns and according to the word T" and begin to converge.
"G ..." Az! Two! T "And! ..." Denisov shouted angrily and moved to the side. Both went on the trotted paths closer and closer, in the fog recognizing each other. Opponents had the right, taking out to the barrier, shoot when someone wants. Shelokh was slow, without raising the gun, peering with his bright, shiny, blue eyes in the face of her opponent. His mouth, as always, had a smile like.
- So when I want - I can shoot! - said Pierre, with the Word three rapid steps went ahead, confrontation from the trotan walkway and walking on whole snow. Pierre held a gun, stretching forward right handApparently afraid of how from this pistol do not kill yourself. He diligently possessed his left hand, because he wanted to support her right hand, and he knew that it was impossible. Passing the steps six and getting off from the track in the snow, Pierre looked at his feet, again looked at Dolokhov again, and pulling his finger, as he was taught, shot. I do not expect such a strong sound, Pierre shuddered from his shot, then smiled himself his impression and stopped. Smoke, especially thick from the fog, prevented him to see him in the first moment; But another shot he was waiting for, did not follow. Only DOLOHOV's hurried steps were heard, and his figure seemed because of the smoke. With one hand he kept behind the left side, the other compressed an omitted pistol. His face was pale. Rostov ran up and something told him.
"It's not ... e ... t," Solokov said through his teeth, "no, not over," and making some more falling, making steps to the saber, fell on the snow beside her. His left hand was in the blood, he overtakes her about the Sutuk and leaned her. His face was pale, frowning and trembled.
"I will write ..." began to share, but he could not immediately say ... "please, he agreed with effort." Pierre, barely holding sobs, ran to Dolohov, and wanted to go through the space separating the barriers, as Shahov shouted: - To the barrier! - And Pierre, who realized what was the matter, he stopped at his saber. Only 10 steps shared them. Shelokhov dropped his head to the snow, greedily bitten the snow, again raised his head, recovered, picked up his legs and sat down, looking for a solid center of gravity. He is mold cold snow and suck it; His lips trembled, but everything is smiling; The eyes glittered with effort and the womb of the last collected forces. He raised the gun and began to aim.
"Side, close up the gun," Nesvitsky said.
- 3AK "Me! - Without withstanding, even Denisov shouted to his opponent.
Pierre with the meek smile of regret and repentance, helplessly putting his legs and hands, straight with her wide breasts stood before Dolokhov and sadly looked at him. Denisov, Rostov and Nesvitsky climbed. At the same time, they heard the shot and evil cry of Dologov.
- by! - shouted by Shahov and powerlessly lay down on the snow face. Pierre grabbed his head and, turning back, went to the forest, walking entirely in the snow and aloud sentenced unknown words:
- Stupid ... stupid! Death ... False ... - He said it firing. Nesvitsky stopped him and took his home.
Rostov with Denisov was lucky by the wounded dolokhov.
Shelokhov, silently, with closed eyes, lay in the sleigh and did not answer the questions that he did; But, having entered into Moscow, he suddenly woke up and, with difficulty lifting his head, took Rostov who was sitting with his hand. Rostov struck completely changed and unexpectedly enthusiastic expression of the face of Dologov.
- Well? How do you feel? - asked Rostov.
- Splly! But not the point. My friend, "said the Shard voice to the intermittent voice, - where are we? We are in Moscow, I know. I am nothing, but I killed her, killed ... She will not take it. She will not happen ...
- Who? - asked Rostov.
- My mother. My mother, my angel, my adorable angel, mother, - and Solohov I cried, squeezing Rostov's hand. When he calmed down somewhat, he explained Rostov, which lives with his mother, that if the mother sees him dying, she will not take it. He begged Rostov to go to her and prepare her.
Rostov went forward to fulfill the order, and he learned to great surprise that she had learned that, this Buyan, the Brener of Solohov lived in Moscow with the old woman and a humpbed sister, and was the most gentle son and brother.

Pierre recently rarely seen his wife with an eye on the eye. And in St. Petersburg, and in Moscow, their house was constantly full of guests. The next night after a duel, he, as he often did, did not go to the bedroom, but remained in his huge, fatherly office, in the very one in which the Count of the Beggar died.
He faced the sofa and wanted to fall asleep, in order to forget everything that was with him, but he could not do this. Such a storm of feelings, thoughts, the memories suddenly rose in his soul that he not only could not sleep, but could not sit still and had to jump from the sofa and walk around the room. It was presented to him at first after marriage, with open shoulders and tired, passionate look, and immediately next to her seemed beautiful, brazen and firmly mocking face of Dolokhov, how it was at dinner, and the same Face Dologov, pale, trembling And suffering how it was when he turned and fell on the snow.
"What was there? - He asked himself. - I killed the lover, yes, killed his wife's lover. Yes, it was. From what? How did I come to this? - Because you married her, - answered the inner voice.
"But what am I guilty? He asked. - In the fact that you married not by loving her, in the fact that you deceived yourself and her, - and he was brought together that moment after dinner at Prince Vasilla, when he said these words that could not be called: "Je Vous Aime". [I love you.] Everything from this! I then felt, he thought, I felt then that it was not what I had no rights to it. So came out. " He remembered the honeymoon, and blushed with the memories. Especially lively, the memories of how one day, shortly after his marriage, he was 12 meters of the day, in a silk coat came from the bedroom to the office, and found the head of the head of the manager, who thoughtfully wondered, looked at Pierre's face, on his bathrobe and smiled slightly, as if by expressing this smile respectful sympathy of the happiness of his principle.
"And how many times I was proud of it, I was proud of her great beauty, her secular tact, he thought; It was proud of the house in which she took the entire Petersburg, proud of its inaccessibility and beauty. So what am I proud of?! I then thought I did not understand her. As often, thinking about her character, I told myself that I was to blame that I do not understand her, I do not understand this ever-sughment, satisfaction and absence of any addiction and desires, and the whole impact was in that terrible word that she is a depraved woman: said myself is a terrible word, and everything became clear!
"Anatole traveled to her to take money from her and kissed her into her bare shoulders. She did not give him money, but allowed to kiss himself. Father, joking, excited her jealousy; She, with a relaxed smile, said that she was not so stupid to be jealous: let him do that she wanted, she said about me. I asked her once, whether she feels signs of pregnancy. She laughed contemptuously and said that she was not a fool to wish to have children, and that she would not have children from me. "
Then he remembered the rudeness, the clarity of her thoughts and the vulgarity of expressions inherent in her despite her upbringing in the highest aristocratic circle. "I'm not some kind of a fool ... I'll try myself ... Allez Vous Promener," [Get out,] she said. Often, looking at her success in the eyes of old and young men and women, Pierre could not understand why he did not love her. Yes, I never loved her, I told myself Pierre; I knew that she was a depraved woman, he repeated himself, but did not bother to admit it.

Hello, dear blog readers Website. Today we will talk to you about where Krakoyarbra come from and in programs, which text encodings exist and which of them should be used. Let us consider in detail the history of their development, ranging from the basic ASCII, as well as its extended versions of CP866, KOI8-R, Windows 1251 and ending with modern codes of the Unicode UTF 16 and 8 consortium.

Someone this information may seem unnecessary, but you would know how much questions come to me exactly concerned the cracks (not reading a set of characters). Now I will have the opportunity to send everyone to the text of this article and independently search for your shoals. Well, get ready to absorb the information and try to monitor the narration.

ASCII - Basic Latiza Text Encoding

The development of text encodings occurs simultaneously with the formation of the IT industry, and during this time they had time to undergo quite a few changes. Historically, it all started with a rather harmful in Russian pronunciation of EBCDIC, which made it possible to encode the letters of the Latin alphabet, Arabic numbers and punctuation marks with control symbols.

But still the starting point for the development of modern text encodings should be considered a famous ASCII. (American Standard Code for Information Interchange, which in Russian is usually pronounced as "Aski"). It describes the first 128 characters from the most commonly used by English-speaking users -, Arabic numbers and punctuation marks.

Even in these 128 characters described in ASCII, some service symbols were crushed by brackets, lattices, asterisks, etc. Actually, you yourself can see them:

It is these 128 characters from the initial version of the ASCII have become the standard, and in any other encoding you will definitely meet and stand that they will be in such a manner.

But the fact is that with the help of one byte of the information, it is not 128, but as many as 256 different values \u200b\u200b(two to the degree eight equals 256), so after base version Aski appeared a number of advanced encodings ASCIIIn addition to 128 main signs, it was also possible to encode the national encoding symbols (for example, Russian).

Here, probably, it is worth a little more about the number system that are used in the description. First, as you know everything, the computer works only with numbers in a binary system, namely with zeros and units ("Boulev Algebra", if anyone held at the Institute or at School). Each of which is a decend to a degree, starting with zero, and to twos in the seventh:

It is not difficult to understand that all possible combinations of zeros and units in such a design can only be 256. Translate the number from the binary system in decimal is quite simple. It is necessary to simply fold all the degrees of twos above that one stands.

In our example, it turns out 1 (2 to the degree of zero) plus 8 (two to degrees 3), plus 32 (twice in the fifth degree), plus 64 (in the sixth), plus 128 (in the seventh). Total gets 233 in decimal system Note. As you can see, everything is very simple.

But if you look at the table with ASCII characters, you will see that they are presented in hexadecimal encoding. For example, the "asterisk" corresponds to ASKI hexadecimal 2a. Probably you know that hexadecimal system Numbers are used in addition to Arabic numbers, Latin letters from A (means ten) to F (means fifteen).

Well, so for transfer binary numbers In hexadecimal Resort to the next simple and visual way. Each byte of information is broken into two parts of four bits, as shown in the screenshot above. So In each half of the byte, the binary code can only be encode for sixteen values \u200b\u200b(two in the fourth degree), which can be easily represented by hexadecimal.

Moreover, in the left half of the byte, it will be necessary to consider extent again from zero, and not as shown in the screenshot. As a result, by non-good computing, we get that the number E9 is encoded in the screenshot. I hope that the course of my reasoning and the solidification of this rebus you were understandable. Well, now we will continue, actually talk about the text encoding.

Extended versions of ASKI - CP866 and KOI8-R encoding with pseudograph

So, we started talking about ASCII, which was like a starting point for the development of all modern encodings (Windows 1251, Unicode, UTF 8).

Initially, it was laid only 128 signs of the Latin alphabet, Arabic numbers and something else there, but in the extended version it was possible to use all 256 values \u200b\u200bthat can be encoded in one pate information. Those. An opportunity to add symbols of the letters of his tongue to Aska.

Here it will be necessary to once again be distracted to clarify - why do you need encoding texts and why it is so important. The characters on the screen of your computer are formed on the basis of two things - sets of vector forms (representations) of all kinds of characters (they are in files CO) and code that allows you to pull out this set of vector shapes (font file) it is the character to be inserted into Right place.

It is clear that the fonts are responsible for the vector forms, but the operating system and programs used in it are responsible for encoding. Those. Any text on your computer will be a set of bytes in each of which one single symbol of this text is encoded.

The program that displays this text on the screen (text editor, browser, etc.), when parsing the code, reads the encoding of the next sign and searches for the corresponding vector form in the desired file The font that is connected to display this text document. Everything is simple and trite.

So, to encode any symbol we need (for example, from the National Alphabet), two conditions must be completed - the vector form of this sign should be in the font used and this symbol could be encoded in the extended ASCII encodings into one byte. Therefore, there is a whole bunch of such options. Only for coding of the symbols of the Russian language there are several varieties of extended ASSS.

For example, initially appeared CP866.In which it was possible to use the symbols of the Russian alphabet and it was an extended version of ASCII.

Those. Its upper part completely coincided with the basic version of Aski (128 symbols of Latin, numbers, and even any labuda), which is represented on the screenshot slightly higher, but already the lower part of the CP866 encoding table had the specified in the screenshot slightly below the view and allowed to encode another 128 Signs (Russian letters and any pseudographic):

See, in the right column, the numbers begin with 8, because The numbers from 0 to 7 refer to the base part of the ASCII (see the first screenshot). So The Russian letter "M" in the CP866 will have code 9C (it is on the intersection of the corresponding rows with 9 and the column with a number C in a hexadecimal number system), which can be written in one byte information, and if there is a suitable font with Russian characters, this letter without Problems will be displayed in the text.

Where did this quantity come from pseudographers in CP866.? It's all the fact that this encoding for Russian text was developed in those bright years, when there was no such distribution of graphic operating systems as now. And in the doss, and similar text operations, the pseudographic allowed at least somehow diversify the design of texts and therefore it abounds with CP866 and all its other rows from the discharge of extended Versions of Aska.

CP866 distributed IBM company, but in addition, a number of encodings were developed for the symbols of the Russian language, for example, the same type (extended ASCII) can be attributed Koi8-R.:

The principle of its work remained the same as the CP866 described later - each text symbol is encoded by one single byte. The screenshot shows the second half of the KOI8-R table, because The first half is fully consistent with the base asus, which is shown on the first screenshot in this article.

Among the features of KOI8-R encoding, it can be noted that the Russian letters in its table are not in alphabetical order, like this, for example, made in CP866.

If you look at the very first screenshot (base part, which enters all extended encodings), then notice that in Koi8-R, Russian letters are located in the same tables of the table as the letters of the Latin alphabet from the first part of the table. This was done for the convenience of switching from Russian symbols to Latin by discarding only one bit (two in the seventh degree or 128).

Windows 1251 - a modern version of ASCII and why crackels get out

Further development of text encodings was due to the fact that graphic operating systems and the need to use pseudographics in them were gaining popularity. As a result, a whole group arose, which, at their essence, was still advanced versions of ASKI (one text symbol is encoded with only one byput of information), but without using pseudographic characters.

They treated the so-called ANSI coding, which were developed by the American Institute for Standardization. The name of Cyrillic was still used in the surchanting for an option with the support of the Russian language. An example of such an example.

It was favorably different from the previously used CP866 and Koi8-R in that the place of the characters of the pseudographic in it took the missing symbols of the Russian typography (the decreasing sign), as well as the symbols used in close to Russian Slavic languages \u200b\u200b(Ukrainian, Belarusian, etc. ):

Because of this abundance of the codings of the Russian language, manufacturers of fonts and manufacturers software He constantly arose a headache, and with you, dear readers, often got those the most notorious krakoyabryWhen the confusion was taught with the version used in the text.

Very often they got out when sending and receiving messages on e-mailWhat caused the creation of very complex transcoding tables, which, in fact, could not solve this problem in the root, and often users for correspondence were used to avoid notorious krakozyabs when using Russian encodings of such CP866, KOI8-R or Windows 1251.

In essence, krakozyabry, imparting instead of Russian text, were the result incorrect use Coding of this languagewhich did not correspond to the one in which it was encoded text message Initially.

Suppose if symbols encoded with CP866, try to display using the Windows 1251 code table, then these most cracked (meaningless set of characters) and get out, completely replacing the message text.

A similar situation is very often occurring at, forums or blogs, when text with Russian characters by mistake is not saved in that encoding that is used on the default website, or not in that text editorwhich adds to the code sebestin not visible to the naked eye.

In the end, such a situation with many encodings and constantly crawling cranebrams, many tired, there were prerequisites for creating a new universal variation, which would have replaced all existing and solve, finally, to the root of the problem with the advent of not readable texts. In addition, there was a problem of languages \u200b\u200bof similar Chinese, where the symbols of the language were much more than 256.

Unicode (Unicode) - Universal Codes UTF 8, 16 and 32

These thousands of signs of the Language group of Southeast Asia could not be described in one pape information that was allocated for encoding characters in advanced ASCII versions. As a result, a consortium was created called Unicode (Unicode - Unicode Consortium) In the collaboration of many IT leaders of the industry (those who produce a software that encodes iron, who creates fonts) who were interested in the appearance of a universal text encoding.

The first variation published under the auspices of the Unicode Consortium was UTF 32.. The digit in the name of the encoding means the number of bits that is used to encode one symbol. 32 bits are 4 bytes of information that will be needed to encode one single sign in the new Universal UTF encoding.

As a result, the same file with the text encoded in the extended version of ASCII and UTF-32 will in the latter case will have the size (weigh) four times more. It is bad, but now we have the opportunity to encode the number of signs equal to two to thirty second degrees with the help of UTF ( billions of characterswhich will cover any real value with a colossal margin).

But many countries with the languages \u200b\u200bof the European Group have such a huge number of signs to use in the encoding at all and there was no need, however, when using UTF-32, they didn't receive a four-time increase in the weight of text documents, and as a result, an increase in Internet traffic and volume stored data. This is a lot, and no one could afford such waste.

As a result of the development of Unicode appeared UTF-16which turned out so successful that was adopted by default as a basic space for all the characters that we use. It uses two bytes to encode one sign. Let's see how this thing looks like.

In the Windows operating system, you can pass along the path "Start" - "Programs" - "Standard" - "Service" - "Character Table". As a result, a table opens with vector forms of all installed in your fonts. If you choose in " Additional parameters»A set of Unicode signs, you can see for each font separately the entire range of characters included in it.

By the way, clicking on any of them, you can see it two-by code in UTF-16 formatconsisting of four hexadecimal digits:

How many characters can be encoded in UTF-16 using 16 bits? 65 536 (two to sixteen), and this number was taken for the basic space in Unicode. In addition, there are ways to encode with it and about two million characters, but limited to the expanded space in a million text symbols.

But even this successful version of Unicode's encoding did not bring much satisfaction with those who wrote, for example, programs only in English, because they have, after switching from the extended version of ASCII to UTF-16, the weight of the documents increased twice (one byte per one The symbol in ASKI and two bytes on the same symbol in UTF-16).

That's it precisely to satisfy everyone and all in the Unicode consortium was decided to come up with encoding variable length. She was called UTF-8. Despite the eight in the title, it really has a variable length, i.e. Each text symbol can be encoded into a sequence of one to six bytes.

In practice, the UTF-8 uses only a range from one to four bytes, because there is nothing even theoretically possible to submit anything to the four bytes of the code. All Latin signs are encoded in one byte, as well as in the old good ASCII.

What is noteworthy, in the case of coding only Latin, even those programs that do not understand Unicode will still read what is encoded in UTF-8. Those. The basic part of Aska simply switched to this off the Unicode Consortium.

Cyrillic signs in UTF-8 are encoded into two bytes, and, for example, Georgian - in three bytes. The Unicode Consortium after the creation of UTF 16 and 8 decided the main problem - now we have in the fonts there is a single code space. And now their manufacturers remain only on the basis of their forces and opportunities to fill it with vector forms of text symbols. Now in the sets even.

In the Symbol table below, it can be seen that different fonts support a different number of characters. Some symbols of Unicode fonts can weigh very well. But now they are not distinguished by the fact that they are created for different encodings, but by the fact that the font manufacturer filled or not filled the single code space by those or other vector forms to the end.

Krakoyabry instead of Russian letters - how to fix

Let's now see how the Crakozyabe text appears instead of the text or, in other words, how the correct encoding is selected for Russian text. Actually, it is set in the program in which you create or edit this same text, or code using text fragments.

To edit and create text files Personally, I use very good, in my opinion,. However, it can highlight the syntax still good hundreds of programming languages \u200b\u200band markup, and also has the ability to expand with plugins. Read detailed review This wonderful program according to the link.

In the NotePad ++ top menu, there is an "encoding" item, where you will have the ability to convert an existing option to one that is used on your default site:

In the case of a site on Joomla 1.5 and above, as well as in the case of a blog on Wordpress, you should choose the option in order to avoid the appearance of krakoyar UTF 8 without BOM. What is the BOM prefix?

The fact is that when the ETF-16 encoding was developed, for some reason decided to fasten such a thing to it as the ability to record a symbol code, both in direct sequence (for example, 0a15) and in the reverse (150a). And in order for the programs to understand which sequence reading codes, and was invented BOM. (Byte Order Mark or, in other words, signature), which was expressed in adding three additional bytes to the very beginning of documents.

In the utf-8 encoding, there were no BOM in the Unicode Consortium and therefore adding signature (these most notorious additional three bytes to the beginning of the document) Some programs simply prevent reading the code. Therefore, we always, when saving files in UTF, you must select an option without BOM (without signature). So you are in advance mustrase yourself from crackering.

What is noteworthy, some programs in Windows do not know how to do this (do not be able to save text in UTF-8 without BOM), for example, the same notorious notebook Windows. It saves the document in UTF-8, but still adds signature to its beginning (three additional bytes). Moreover, these bytes will always be the same - read the code in direct sequence. But on the servers, because of this little things, there may be a problem - crackels will get out.

Therefore, in no case do not use the usual notebook Windows To edit documents of your site, if you do not want the appearance of krakoyarbra. I consider the latest and easiest option for the already mentioned NotePad ++ editor, which practically does not have drawbacks and consists of one of the advantages.

In NotePad ++ when choosing an encoding, you will have the ability to convert text to the UCS-2 encoding, which is very close to the Unicode standard in essence. Also in a non-type can be encoded in ANSI, i.e. With reference to the Russian language, this will be already described by us just above Windows 1251. Where does this information come from?

She is spelled out in the registry of your operating room windows systems - What encoding to choose in the case of ANSI, what to choose in the case of OEM (for the Russian language it will be CP866). If you install another default language on your computer, then these encodings will be replaced with similar to ANSI or OEM discharge for the same language.

After you in NotePad ++, save the document in the encoding you need or open a document from the site to edit, then in the lower right corner of the editor you can see its name:

To avoid krakoyarbrovIn addition to the actions described above, it will be useful to register in its header of the source code of all pages of the site information about this coding in order for the server or local host it does not occur.

In general, in all languages \u200b\u200bof hypertext marking other than HTML, a special XML ad is used, which specifies the text encoding.

Before starting to disassemble the code, the browser will find out which version is used and how exactly you need to interpret the codes of the characters of this language. But what is noteworthy, in case you save the document in the default Unicode, this XML declaration can be omitted (the encoding will be considered UTF-8, if there is no BOM or UTF-16 if there is a BOM).

In the case of a document hTML language To specify the encoding used meta elementwhich is prescribed between the opening and closing HEAD tag:

... ...

This entry is quite different from the accepted B, but fully complies with the newly introduced slowly by the HTML 5 standard, and it will be absolutely correctly understood by anyone used on this moment browsers.

In theory, META element with an indication of encoding HTML document Better to put as high as possible in the dock headerSo that at the time of the meeting in the text of the first sign is not from the basic ANSI (which always read always and in any variation) the browser must already have information on how to interpret the codes of these characters.

Good luck to you! To ambiguous meetings on the blog pages Website