What the heck-a-roo-nie is UNICODE?

Posted:
in macOS edited January 2014
I have heard it been used and have just nodded my head, but really what is it?



From what I do not know, it has something to do with international characters... anyone?



<img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />

Comments

  • Reply 1 of 13
    defiantdefiant Posts: 4,876member
    <a href="http://www.unicode.org/"; target="_blank">http://www.unicode.org/</a>;

    <a href="http://www.unicode.org/unicode/standard/WhatIsUnicode.html"; target="_blank">What is Unicode ?</a>



    one search in google !
  • Reply 2 of 13
    nebagakidnebagakid Posts: 2,692member
    [quote]Originally posted by Defiant:

    <strong><a href="http://www.unicode.org/"; target="_blank">http://www.unicode.org/</a>;

    <a href="http://www.unicode.org/unicode/standard/WhatIsUnicode.html"; target="_blank">What is Unicode ?</a>



    one search in google !</strong><hr></blockquote>



    Shure, if you want to do it THAT way <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" /> <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />
  • Reply 3 of 13
    defiantdefiant Posts: 4,876member
    [quote]Originally posted by Nebagakid:

    <strong>



    Shure, if you want to do it THAT way <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" /> <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" /> </strong><hr></blockquote>



    how should I understand that ?
  • Reply 4 of 13
    spartspart Posts: 2,060member
    [quote]Originally posted by Defiant:

    <strong>



    how should I understand that ?</strong><hr></blockquote>



    It's A.K.A. the easy, obvious way.



    <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />
  • Reply 5 of 13
    Unicode is basically a way of allowing a computer to know what kind of character to render on the screen using latin symbols to define those characters. The same thing is true of latin text - you can give a series of symbols which maps to a particular character (like &amp;quot; is a quotation mark for instance)



    If you are an english speaking person you normally use a western latin character set. In HTML you define what character set the page uses (or else it just assumes the default for your browser) - so an english page normally has this in the headers:



    &lt;meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"&gt;



    Chinese for example uses a couple different character sets - Big5 and GB2313 two I know of. Someone help me out here, but the way I understand it unicode (UTF-8) is basically a superset of a lot of different types of character sets all defined in one set - so you can use characters from both those chinese character sets at the same time... So for instance in HTML to let the browser know Chinese is what the page should be rendered in you would:



    &lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;



    That way it translates combinations of latin symbols into the appropriate chinese characters.



    I just last week had my first foray into unicode - I am doing a website which will have a chinese version. Chinese having such a huge set of characters means you can't just type a letter and have it appear on screen as a chinese character. So you have to define a string of symbols which the browser will know to turn into 1 chinese symbol.



    Actually it is pretty cool - I was busy randomly typing away nonsense chinese gibberish in Fireworks and looking at HTML pages I had made with placeholder Chinese symbols in them.



    Check this out - make a post and type these characters in it somewhere:



    &amp;quot;



    then when you read your post that will be a quotation mark.



    That is a pretty simplistic explanation, and might be a bit off, but that is how I understand it...



    [ 07-28-2002: Message edited by: The Pie Man ]</p>
  • Reply 6 of 13
    Crap double post.



    [ 07-28-2002: Message edited by: The Pie Man ]</p>
  • Reply 7 of 13
    frawgzfrawgz Posts: 547member
    Since everything to a computer is just a series of 1's and 0's, you need a system of encoding those bits so your computer can decode them into text for you. Unicode is one such system, while ASCII is another. The big difference is that Unicode is capable of holding way, way more information and thus is suited well to international character sets. (Chinese alone has characters in the magnitude of tens of thousands, so imagine adding all the other non-Western character sets in there, including Kling-on.) ASCII, on the other hand, can handle little more than the typical Western alphabet and some accompanying symbols.
  • Reply 8 of 13
    Basically, Unicode is a way to map a value to the appropriate glyph. The main advantage that Unicode has is that it has character mappings for just about every language you can imagine (Klingon not included; the application was rejected) in one character set. This means that you can mix Japanese, Chinese, Latin, and Russian in one string. All of this is possible because Unicode allows up to 4 bytes for each character (for a total of 4.2 billion characters), as opposed to ASCII, which is one byte to one character (for a total od 256 characters).
  • Reply 9 of 13
    Sweet sweet Unicode. To best understand what's going on, it's important to know how text is represented in ASCII. Each character is represented by one byte (eight bits), so the maximum number of characters in ASCII is limited to 256. For historical reasons, characters 128-255 vary from architecture to architecture, which is why, for instance, we can't see fractions in Windows Word files -- fractions don't exist in the Mac version of ASCII.



    So, as computing spread, each culture created one or more encodings to represent their writing systems. In China, there are at least two encoding systems each for Traditional and Simplified Chinese. All of these different encodings make viewing and moving text around a nightmare.



    Enter Unicode. Please note that Unicode isn't an encoding -- it's just a gigantic list of glyphs (characters) that aim to include every writing system in every culture on Earth. And there's room left over for more, so when the Vulcans show up, we can add their writing system too.



    UTF-8 is an encoding of Unicode (UTF stands for Unicode Transformation Format), and it's becoming the most popular because it is very cleverly backwards compatible with ASCII. It uses from one to six bytes for each glyph, as needed. UTF-8 is a planet-wide solution for the encoding problem.



    Of course, Unicode isn't perfect and will always need updating. In Chinese, for instance, new characters (hanzi) are being added all the time. But it's the best solution at the moment.
  • Reply 10 of 13
    I should point out that UTF8 only allows for 1-4 byte sequences, not 6. Six byte sequences are improperly formed and therefore illegal.
  • Reply 11 of 13
    [quote]Originally posted by graphiteman:

    <strong>I should point out that UTF8 only allows for 1-4 byte sequences, not 6. Six byte sequences are improperly formed and therefore illegal.</strong><hr></blockquote>



    Nope, six bytes are legal for glyphs U-04000000 - U-7FFFFFFF.
  • Reply 12 of 13
    Though, I should add that four bytes are the most used at the moment. See, for instance, <a href="http://www.unicode.org/unicode/reports/tr17/tr17-2"; target="_blank">http://www.unicode.org/unicode/reports/tr17/tr17-2</a>; :



    UTF-8 (used only with Unicode/10646: mix of one to six 8 bit quantities; in practice only one to four, because of the actual range of integers used for encoding Unicode/10646.)
  • Reply 13 of 13
    naghanagha Posts: 71member
    Unicode is the solution to a problem that plagues all languages on this planet, including English. This problem is familiar to me because I also speak farsi and it's been impossible to exchange email easily on the 'net because of differences between how farsi has been handled on *nix, Mac, DOS & windows. the most chaotic situation has been on the windows/DOS side of the fence.



    with unicode, everyone will settle on one standard and i'll finally be able to exchange manuscripts with windows people on the other side of the planet. sadly, the mac is unheard of in the middle east.
Sign In or Register to comment.