Windows-1251
Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages.
| MIME / IANA | windows-1251 | 
|---|---|
| Alias(es) | cp1251 (Code page 1251) | 
| Language(s) | Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Bosnian Cyrillic, Macedonian, Rotokas, Rusyn, English | 
| Created by | Microsoft | 
| Standard | WHATWG Encoding Standard | 
| Classification | extended ASCII, Windows-125x | 
| Other related encoding(s) | Amiga-1251, KZ-1048, RFC 1345's "ECMA-Cyrillic" | 
On the web, it is the second most-used single-byte character encoding (or third most-used character encoding overall), and most used of the single-byte encodings supporting Cyrillic. As of January 2024, 0.3% of all websites use Windows-1251.[1][2] It's by far mostly used for Russian, while a small minority of Russian websites use it, with 94.6% of Russian (.ru) websites using UTF-8,[3][4][5] and the legacy 8-bit encoding is distant second. In Linux, the encoding is known as cp1251.[6] IBM uses code page 1251 (CCSID 1251 and euro sign extended CCSID 5347) for Windows-1251.[7][8][9][10][11][12][13]
Windows-1251 and KOI8-R (or its Ukrainian variant KOI8-U) are much more commonly used than ISO 8859-5 (which is used by less than 0.0004% of websites).[14] In contrast to Windows-1252 and ISO 8859-1, Windows-1251 is not closely related to ISO 8859-5.
Unicode (e.g. UTF-8) is preferred to Windows-1251 or other Cyrillic encodings in modern applications, especially on the Internet, making UTF-8 the dominant encoding for web pages. (For further discussion of Unicode's complete coverage, of 436 Cyrillic letters/code points, including for Old Cyrillic, and how single-byte character encodings, such as Windows-1251 and KOI8-R, cannot provide this, see Cyrillic script in Unicode.)
Character set
    
The following table shows Windows-1251. Each character is shown with its Unicode equivalent and its Alt code.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 0x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI | 
| 1x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US | 
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / | 
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? | 
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | 
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ | 
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | 
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL | 
| 8x | Ђ | Ѓ | ‚ | ѓ | „ | … | † | ‡ | € | ‰ | Љ | ‹ | Њ | Ќ | Ћ | Џ | 
| 9x | ђ | ‘ | ’ | “ | ” | • | – | — | ™ | љ | › | њ | ќ | ћ | џ | |
| Ax | NBSP | Ў | ў | Ј | ¤ | Ґ | ¦ | § | Ё | © | Є | « | ¬ | SHY | ® | Ї | 
| Bx | ° | ± | І | і | ґ | µ | ¶ | · | ё | № | є | » | ј | Ѕ | ѕ | ї | 
| Cx | А | Б | В | Г | Д | Е | Ж | З | И | Й | К | Л | М | Н | О | П | 
| Dx | Р | С | Т | У | Ф | Х | Ц | Ч | Ш | Щ | Ъ | Ы | Ь | Э | Ю | Я | 
| Ex | а | б | в | г | д | е | ж | з | и | й | к | л | м | н | о | п | 
| Fx | р | с | т | у | ф | х | ц | ч | ш | щ | ъ | ы | ь | э | ю | я | 
Kazakh variants
    
An altered version of Windows-1251 was standardised in Kazakhstan as Kazakh standard STRK1048, and is known by the label KZ-1048. It differs in the rows shown below:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 8x | Ђ | Ѓ | ‚ | ѓ | „ | … | † | ‡ | € | ‰ | Љ | ‹ | Њ | Қ | Һ | Џ | 
| 9x | ђ | ‘ | ’ | “ | ” | • | – | — | ™ | љ | › | њ | қ | һ | џ | |
| Ax | NBSP | Ұ | ұ | Ә | ¤ | Ө | ¦ | § | Ё | © | Ғ | « | ¬ | SHY | ® | Ү | 
| Bx | ° | ± | І | і | ө | µ | ¶ | · | ё | № | ғ | » | ә | Ң | ң | ү | 
Code Page 1174 is another variant created for the Kazakh language, which matches Windows-1251 for the Russian subset of the Cyrillic letters. It differs from KZ-1048 by moving the Cyrillic letter Shha from 8E/9E to 8A/9A.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 8x | Ђ | Ѓ | ‚ | ѓ | „ | … | † | ‡ | € | ‰ | Һ | ‹ | Њ | Қ | Ћ | Џ | 
| 9x | ђ | ‘ | ’ | “ | ” | • | – | — | ™ | һ | › | њ | қ | ћ | џ | |
| Ax | NBSP | Ұ | ұ | Ә | ¤ | Ө | ¦ | § | Ё | © | Ғ | « | ¬ | SHY | ® | Ү | 
| Bx | ° | ± | І | і | ө | µ | ¶ | · | ё | № | ғ | » | ә | Ң | ң | ү | 
Amiga variant
    
| MIME / IANA | Amiga-1251 | 
|---|---|
| Alias(es) | Ami1251 | 
| Language(s) | English, Russian | 
| Classification | extended ASCII | 
| Based on | Windows-1251, ISO-8859-1, ISO-8859-15 | 
Russian Amiga OS systems used a version of code page 1251 which matches Windows-1251 for the Russian subset of the Cyrillic letters, but otherwise mostly follows ISO-8859-1. This version is known as Amiga-1251,[18] under which name it is registered with the IANA.[19]
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 8x | XXX | XXX | BPH | NBH | IND | NEL | SSA | ESA | HTS | HTJ | VTS | PLD | PLU | RI | SS2 | SS3 | 
| 9x | DCS | PU1 | PU2 | STS | CCH | MW | SPA | EPA | SOS | XXX | SCI | CSI | ST | OSC | PM | APC | 
| Ax | NBSP | ¡ | ¢ | £ | €[lower-alpha 1] | ¥ | ¦ | § | Ё | © | №[lower-alpha 2] | « | ¬ | SHY | ® | ¯ | 
| Bx | ° | ± | ² | ³ | ´ | µ | ¶ | · | ё | ¹ | º | » | ¼ | ½ | ¾ | ¿ | 
- Matching ISO-8859-15; at a different location than in Windows-1251
- Present in Windows-1251, but in a different location (absent from ISO-8859-1/15)
See also
    
- Latin script in Unicode
- Unicode
- Universal Character Set
- European Unicode subset (DIN 91379)
 
- UTF-8
References
    
- "Historical trends in the usage of character encodings, January 2024". Retrieved 2024-01-01.
- "Frequently Asked Questions".
- "Distribution of Character Encodings among websites that use .ru". w3techs.com. Retrieved 2024-01-01.
- "Distribution of Character Encodings among websites that use Russian". w3techs.com. Retrieved 2023-01-16.
- "Distribution of Character Encodings among websites that use Russian Federation". w3techs.com. Retrieved 2021-11-05.
- "cp1251(7) - Linux manual page". man7.org. Retrieved 2018-07-01.
- "Code page 1251 information document". Archived from the original on 2016-03-03.
- "CCSID 1251 information document". Archived from the original on 2014-11-29.
- "CCSID 5347 information document". Archived from the original on 2014-11-29.
- Code Page CPGID 01251 (pdf) (PDF), IBM
- Code Page CPGID 01251 (txt), IBM
- International Components for Unicode (ICU), ibm-1251_P100-1995.ucm, 2002-12-03
- International Components for Unicode (ICU), ibm-5347_P100-1998.ucm, 2002-12-03
- "Usage Statistics of Character Encodings for Websites". w3techs.com. Archived from the original on 2012-05-30.
- Steele, Shawn (1998). CP1251 to Unicode table. Unicode Consortium. CP1251.TXT.
- Whistler, Ken (2007). KZ-1048 to Unicode. Unicode Consortium. KZ1048.TXT.
- ibm-1174_X100-2007.ucm, IBM
- Malyshev, Michael (2003). "Amiga-1251 to Unicode table". Registration of new charset [Amiga-1251]. IANA.
- "Character Sets". IANA.
Further reading
    
- Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993-03-13). "CYRILLIC ENCODING FAQ Version 1.3". Retrieved 2020-06-24.
External links
    
- Windows 1251 reference chart
- IANA Charset Name Registration
- Unicode mappings of windows 1251 with "best fit"
- Universal Cyrillic decoder, an online program that may help recovering unreadable Cyrillic texts with broken Windows-1251 or other character encodings.
