Hashing T9 typing

Having gotten a humorous wrong-word error in a text message spelled out with a cellular phone number pad, I started to wonder about collisions of this sort in general.  What keystrokes match the most words?

Well, I turned to my trusty bigwordlist.txt, a big dictionary file I pieced together from multiple places, particularly orchy, and I wrote a Perl script to look at it.

If a phone were to have this dictionary (there are a lot of reasons it shouldn’t, mostly because a lot of the words are of much lower frequency than others) there would be more than 20,000 collisions — places where the phone would have to guess, whether by a stupid algorithm (“pick the lowest alphabetically”, say) or something more sophisticated (“rank by frequency of occurrence in the wild”) or something very sophisticated that took grammar into account.

Here are some facts I found:

* The most troublesome sequence is 2666.  That can stand for ammo, amon, anno, anon, bonn, bono, boom, boon, cmon, comm, como, comp, conn, coom, coon, or coop (16 possibilities).
* The most collisions for two-, three-, and five-letter words are for 66, 466, and 46637 with 13 possibilities each (that “66″ — “[mno][mno]” — shows up a lot, yes?)
* My mom’s allergic to shrimp, so would that make them a “non-mom-nom”?  Spell it out.
* A lot of long medical words collide, because the ending “-ia” is the same as “-ic” in T9 (“hypercholesterolemia”, “hypercholesterolemic”).  Below that, “-ser” and “-ses” and “-zer” and “-zes” in verbs cause a lot of collisions.  The longest not-trivial pair looks to be “unreasonableness” and “unseasonableness” at 16 letters, but I’m not sure those are standard usages.
* Any requests for more info?  Raw files?

3 Responses to “Hashing T9 typing”

Read below or add a comment...

  1. Karina says:

    I don’t have this problem, my phone has an alternate qwerty keypad, which is all I ever use :D

  2. Oooh, look at Miss Digital over there!  All I’ve got are wooden scissors.

  3. Karina says:

    You know, aside from watching the first season of The Riches, I’m really not that familiar with Eddie Izzard. :|

Leave A Comment...

CommentLuv badge