PreviousNext…

Iñtërnâtiônàlizætiøn

I meant to do this ages ago, but Russell’s post has reminded me: must test whether this site can cope with iñtërnâtiônàlizætiøn. You can read more about this in Sam Ruby’s Survival guide to i18n.

Now, for some Japanese, Hindi, etc.!

これは日本語のテキスト

देखें हिन्दी कैसी नजर आती है। अरे वाह ये तो नजर आती है।

So far, so good. But I wonder if my feeds are broken.

Update: they are… kind of. The Japanese text comes through OK, but the Hindi does not. Weird. I shall double-check the UTF-8 encoding I use everywhere.

Further reading: Joel Spolsky: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Comments

  1. Must test comments too:

    これは日本語のテキスト

    देखें हिन्दी कैसी नजर आती है। अरे वाह ये तो नजर आती है।Ben Poole#
  2. Hindi okay, Japanese broken at my end. Canadian okay, eh?Stan Rogers#
  3. Kewl, thanks for testing Stan: at least the Canadian is coming through OK eh?Ben Poole#
  4. see, now everyone went and said "use iso-8859-1 encoding in ur feeds!", and i was using (at the time) utf-8. but some german char or something (umlauted perhaps?) went and "broke" my feed.

    this was a while back. now i'm using steve's code and i haven't even looked at it.

    BUT, since then i've been studying up on character sets here and there when i have the time. i came to the conclusion, however half baked, that xml parsers have to have something built in to handle each charset. just saying utf-8 is fine, but then the xml parser on the other end has to have all that fancy stuff built into it. or, somewhere on the computer those character sets have to be loaded. or something.

    but if you specify a particular charset like iso-8859-1 or whatever, then the xml parser can look at it and say, ok, this limited charset i can understand. or i can at least try to understand.

    but it depends on the parser. i'm thinking they aren't all equal. or aren't all implemented equally. or, if they are sitting on top of some framework like .net or whatever, they might still have to have a charset actually installed on the machine somewhere. i'm totally guessing here.

    maybe?

    well anyway… this is a sticky one. i think basically it has something to do with not just how correct your feed is, but how robust the xml parser is in whatever client happens to consume your feed.

    as a practical matter most of the chars coming out of my particular app (in this case my blog) fall within that iso spec mentioned above, and most if not all of my readers will use chars from that spec as well. if someone did type some japanese letters into a comment or something, i'd have no idea what they meant. for me, i think what joel sposky is up to goes way beyond what i need to do. i can understand, if i were developing software that might be used by anyone, i'd need to make sure it handled as many languages (in the user interface) as is possible on windows or whatever. but me, as a blogger, i have a limited audience ("people like me as julian's dad put it") and a limited amount of chars that mean anything to me.

    yada… sorry for the length, it just happens to be something i've thought about and wonder what others think too.

    :-)jonvon#
  5. well now i'm reading that excellent sam ruby article. which of course i should have done before typing all that stuff. very cool… thanks for the link.jonvon#

Comments on this post are now closed.

About

I’m a software architect / developer / general IT wrangler specialising in web, mobile web and middleware using things like node.js, Java, C#, PHP, HTML5 and more.

Best described as a simpleton, but kindly. You can read more here.

";