- So, OK, yes, I was bored, but I think my Erdös Number is 5. #ErdosNumber #PaulErdos http://www.ams.org/mathscinet/collaborationDistance.html #
- Oh, IMPOSSIBLY clever @xkcd: http://xkcd.com/645/ (Reverse Polish Sausage) #xkcd #
Powered by Twitter Tools
















I think my Erdös Number is 5
Two things. First, I’ve gotten it down to 4. Second, “Erdös” is not right. It’s not an umlat, of course, it’s a double-acute (hits self in head with a heavy Hungarian book!)
So, quick poll? Do either-or-both of these work for you? #1: Erdős. #2. Erdős. Could you post your browser and platform? The vowel should look like this:
(Quick shout-out to the almost-indispensable fileformat.info It’s also how I looked up the katakana in [the current incarnation of] my site name/link in the upper-left page corner.)
WordPress is declaring my pages as UTF-8, and while ő is in the Basic Multilingual Plane, its codepoint is 337, which means it decomposes into two characters. AFAIK, WordPress has to round-trip the character through both PHP and mySQL, and I’m rapidly running out of knowing what I’m talking about.
I’ll note a fun “you might be a geek” story, though, which is when I learned that Unicode had no native way to extend the codepoints indefinitely, I actually cried. It would be so trivial — say, for instance, that four bytes pegged to 255 would trigger the parser to look at the next character, which would specify from 1 to 128 more bytes to represent the character. You could even align it on 4-byte boundaries: the next character would specify between 3 and 511 bytes, in 4-byte increments. Yes, this would mean that some theoretical characters could take half a k to represent, but there could be [hold on scribble scribble] a thousand million googol codepoints. But as it stands now, there are just over a million, and I cried because I knew that the Consortium would end up turning away perfectly lovable graphemes that never hurt other characters in their lives, regardless of language, code plane, or rendering color. And you know what? I was right. Klingon and Shavian so totally belong in Unicode, and they aren’t there!
Gah.
OK, little rant.
Shavian’s been in Unicode for a few years now.