My recycled tweets for 2010-01-03

Powered by Twitter Tools

2 Responses to “My recycled tweets for 2010-01-03”

Read below or add a comment...

  1. I think my Erdös Number is 5

    Two things.  First, I’ve gotten it down to 4.  Second, “Erdös” is not right.  It’s not an umlat, of course, it’s a double-acute (hits self in head with a heavy Hungarian book!)

    So, quick poll?  Do either-or-both of these work for you?  #1: Erdős.  #2. Erdős.  Could you post your browser and platform?  The vowel should look like this:

    (Quick shout-out to the almost-indispensable fileformat.info  It’s also how I looked up the katakana in [the current incarnation of] my site name/link in the upper-left page corner.)

    WordPress is declaring my pages as UTF-8, and while ő is in the Basic Multilingual Plane, its codepoint is 337, which means it decomposes into two characters.  AFAIK, WordPress has to round-trip the character through both PHP and mySQL, and I’m rapidly running out of knowing what I’m talking about.

    I’ll note a fun “you might be a geek” story, though, which is when I learned that Unicode had no native way to extend the codepoints indefinitely, I actually cried.  It would be so trivial — say, for instance, that four bytes pegged to 255 would trigger the parser to look at the next character, which would specify from 1 to 128 more bytes to represent the character.  You could even align it on 4-byte boundaries: the next character would specify between 3 and 511 bytes, in 4-byte increments.  Yes, this would mean that some theoretical characters could take half a k to represent, but there could be [hold on scribble scribble] a thousand million googol codepoints.  But as it stands now, there are just over a million, and I cried because I knew that the Consortium would end up turning away perfectly lovable graphemes that never hurt other characters in their lives, regardless of language, code plane, or rendering color.  And you know what?  I was right.  Klingon and Shavian so totally belong in Unicode, and they aren’t there!

    Gah.

    OK, little rant.

  2. Thomas says:

    Shavian’s been in Unicode for a few years now.

Leave A Comment...

CommentLuv badge