Kodėl anglų simboliams reikia mažiau baitų, kad juos atstovaučiau už kitų abėcėlių simbolius?

Video: Kodėl anglų simboliams reikia mažiau baitų, kad juos atstovaučiau už kitų abėcėlių simbolius?

Video: What's the Difference Between CHKDSK /F and CHKDSK /R? - YouTube 2024, Balandis

2024 Autorius: Geoffrey Carr | [email protected]. Paskutinį kartą keistas: 2023-12-17 10:53

Nors dauguma iš mūsų tikriausiai niekada nesustojo galvoti apie tai, abėcėlės simboliai yra ne tokio paties dydžio baitų, kiek jų reikia jų atstovavimui. Bet kodėl taip? Šiandienos "SuperUser Q &A" žinute turi atsakymus į įdomų skaitytojo klausimą.

Šiandieninė klausimų ir atsakymų sesija pateikiama su "SuperUser" - "Stack Exchange", bendruomenės valdoma Q & A grupių asociacija.

Dalintis ASCII diagrama screenshot mandagiai iš Vikipedijos.

Klausimas

"SuperUser" skaitytuvas khajvah nori sužinoti, kodėl skirtingi abėcėlės įkeliami skirtingais vietos diske:


When I put ‘a’ in a text file and save it, it makes it 2 bytes in size. But when I put a character like ‘ա’ (a letter from the Armenian alphabet) in, it makes it 3 bytes in size.
What is the difference between alphabets on a computer? Why does English take up less space when saved?

Laiškai yra raidės, tiesa? O gal ir ne! Koks yra atsakymas į šį abėcėlinį paslaptį?

Atsakymas

"SuperUser" autoriai Doktoro Reichard ir Ernie turi mums atsakymą. Pirmiausia, Doktoras Reichardas:


One of the first encoding schemes to be developed for use in mainstream computers is the ASCII (American Standard Code for Information Interchange) standard. It was developed in the 1960s in the United States.







The English alphabet uses part of the Latin alphabet (for instance, there are few accented words in English). There are 26 individual letters in that alphabet, not considering case. And there would also have to exist the individual numbers and punctuation marks in any scheme that pretends to encode the English alphabet.
The 1960s was also a time when computers did not have the amount of memory or disk space that we have now. ASCII was developed to be a standard representation of a functional alphabet across all American computers. At the time, the decision to make every ASCII character 8 bits (1 byte) long was made due to technical details of the time (the Wikipedia article mentions the fact that perforated tape held 8 bits in a position at a time). In fact, the original ASCII scheme can be transmitted using 7 bits, and the eighth could be used for parity checks. Later developments expanded the original ASCII scheme to include several accented, mathematical, and terminal characters.







With the recent increase of computer usage across the world, more and more people from different languages had access to a computer. That meant that, for each language, new encoding schemes had to be developed, independently from other schemes, which would conflict if read from different language terminals.
Unicode came into being as a solution to the existence of different terminals by merging all possible meaningful characters into a single abstract character set.
UTF-8 is one way to encode the Unicode character set. It is a variable-width encoding (i.e. different characters can have different sizes) and it was designed for backwards compatibility with the former ASCII scheme. As such, the ASCII character set will remain one byte in size whilst any other characters are two or more bytes in size. UTF-16 is another way to encode the Unicode character set. In comparison to UTF-8, characters are encoded as either a set of one or two 16-bit code units.
As stated in other comments, the ‘a’ character occupies a single byte while ‘ա’ occupies two bytes, denoting a UTF-8 encoding. The extra byte in the original question was due to the existence of a newline character at the end.

Vykdant Ernie atsakymą:


1 byte is 8 bits, and can thus represent up to 256 (2^8) different values.
For languages that require more possibilities than this, a simple 1 to 1 mapping can not be maintained, so more data is needed to store a character.
Note that generally, most encodings use the first 7 bits (128 values) for ASCII characters. That leaves the 8th bit, or 128 more values for more characters. Add in accented characters, Asian languages, Cyrillic, etc. and you can easily see why 1 byte is not sufficient for holding all characters.

Ar turite ką nors įtraukti į paaiškinimą? Garsas išjungtas komentaruose. Norite skaityti daugiau atsakymų iš kitų "Tech-savvy Stack Exchange" vartotojų? Patikrinkite visą diskusijų temą čia.

Rekomenduojamas:

Kodėl mano klaviatūros dalis įveda neteisingus simbolius?

Nesvarbu, ar tai klaviatūroje esantis katinas, atsitiktinės atrankos klavišas ar kitokia klaviatūra, mūsų klaviatūros gali staiga pradėti rodyti keistą ir varginančią elgesį. Atsižvelgdami į tai, šiandieniniame "SuperUser" Q & A poste yra keli skaitytojo nešiojamo kompiuterio klaviatūros krizės sprendimai.

Kodėl "Google" teigia, kad "Mozilla Thunderbird" yra mažiau saugus?

Kartais, kai jūs ieškote atsakymo į vieną dalyką, galų gale sužinosite ką nors dar nenuostabu. Galima teigti, kad "Google" teiginys, kad "Mozilla Thunderbird" yra mažiau saugus, bet kodėl tai sako? Šiandien "SuperUser" Q & A įrašas turi atsakymą į painiavą skaitytojo klausimą.

Kaip įterpti simbolius ir simbolius į "Excel"

Daug kartų norime įvesti kitus simbolius ir simbolius nei standartas. Galite verstis verslu Europoje ir įtraukti į "Euro" arba kitą simboliką "Excel" lapuose. Mes galime tai lengvai pasiekti "Excel 2007".

WinCompose: Įrašykite pagrindinius laisvąsias programas, įterpdami specialius simbolius ir simbolius

Atsisiųskite "WinCompose", "Compose Key" nemokamą "Windows" programą, kuri leidžia įterpti specialius simbolius ir simbolius "Excel", "Photoshop", "Word", "SQL", "HTML" ir tt

Įrašykite akcentuotus ir specialiuosius simbolius naudodami anglų klaviatūrą

"WizKey" leidžia kurti sparčiuosius klavišus, kad būtų lengva įvesti akcentuotus ir specialiuosius simbolius naudojant anglų klaviatūrą. Taip pat galite kurti makrokomandas, galinčias įklijuoti visas pastraipas kur tik jums reikia. Jis taip pat palaiko "Unicode" palaikymą.

Kodėl anglų simboliams reikia mažiau baitų, kad juos atstovaučiau už kitų abėcėlių simbolius?

Turinys:

Video: Kodėl anglų simboliams reikia mažiau baitų, kad juos atstovaučiau už kitų abėcėlių simbolius?

Klausimas

Atsakymas

Rekomenduojamas:

Kodėl mano klaviatūros dalis įveda neteisingus simbolius?

Kodėl "Google" teigia, kad "Mozilla Thunderbird" yra mažiau saugus?

Kaip įterpti simbolius ir simbolius į "Excel"

WinCompose: Įrašykite pagrindinius laisvąsias programas, įterpdami specialius simbolius ir simbolius

Įrašykite akcentuotus ir specialiuosius simbolius naudodami anglų klaviatūrą

Įgalinti arba išjungti "Flip Ahead" funkciją "Internet Explorer"

Explorer.exe serverio vykdymas nepavyko Windows 7/8/10

"G-Sync" ir "FreeSync" paaiškino visiems žaidėjams

Kaip konfigūruoti MAC filtrą Dlink maršrutizatoriuje

Sukurkite animuotus filmus su "Plotagon" Windows PC

Kaip prisegti "Windows" naujinimą į užduočių juostą sistemoje "Windows 10/8/7"

MSRT prideda daugiau nepageidaujamos programinės įrangos prie aptikimo pajėgumų

Pakeiskite numatytąjį Vaizdų redaktorių sistemoje "Windows" naudodami registro informaciją

Kaip naudotis Wolfram Alpha žinių varikliu

"DuckDuckGo" paieškos patarimai ir gudrybės, kad būtų galima kuo geriau išnaudoti

Pakeiskite numatytą užduotį "Windows Task Scheduler"

"Office 365 sistemos reikalavimai"

"Licking Dog" ir "Licking Dog" ekrano užsklandos "Windows"

Gaukite nemokamą "MiniTool Partition Wizard Pro Edition" kopiją!

"Lookeen Free" apžvalga: greita alternatyva "Windows" paieškai