Sort by Library of Congress call number in Perl

In redesigning my book collection page this evening, I ran across the need for a routine to sort by Library of Congress call number.  This is actually nontrivial, as the following are all valid numbers:

  • DA870.F64
  • DK602.3.B76 1996
  • Q335.P416 1994
  • QA76.73.P22W35 1991
  • RS75.P5

To add even more complexity, some number fields are sorted in strict ascending order (e.g., in “DK602.3.B76 1996″ the bold number would come after 9, after 80, after 600 but before 603) and some are sorted as decimals (e.g., in “Q335.P416 1994″ the bold number would come after 3000 and after 35 but before 4161.)  I wrote some Perl code for this, and it understands all call number forms that I am aware of.  If you stumbled upon this page looking for something like this, here it is:

sub locsort ($a,$b)
{
    @a = ($a =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)(?: (\d\d\d\d))?/);
    @b = ($b =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)(?: (\d\d\d\d))?/);

    return
	$a[0] cmp $b[0]
	    ||
	$a[1] <=> $b[1]
	    ||
	$a[2] cmp $b[2]
	    ||
	"0.$a[3]" <=> "0.$b[3]"
	    ||
	$a[4] cmp $b[4]
	    ||
	"0.$a[5]" <=> "0.$b[5]"
	    ||
	$a[6] <=> $b[6]
	    ;
}

10 Responses to “Sort by Library of Congress call number in Perl”

Read below or add a comment...

  1. Thanks for posting this code!! I had a little trouble getting it to work, but then I figured out that the backslashes before the d’s didn’t show up in your code sample. In any case, here’s the script as I got it to work, with a little extra context. I added another element to the sort, because sometimes our LC numbers have additions after the year. I’ve included some of these in the example. Now…will the backslashes show?

    Geoff
    >>>>

    my @lcList = (‘DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);

    $beforeList = join (“n”, @lcList);
    print “Before:n$beforeListnn”;

    my @result = sort by_lc_number @lcList;
    $afterList = join (“n”, @result);

    print “nAfter:n$afterListnn”;

    sub by_lc_number {

    # print “A passed: $anB passed: $bn”;

    $a =~ /^([A-Z]+)(d+(?:.d+)?).?([A-Z]*)(d*).?([A-Z]*)(d*)( (?:d{4})?)?(.*)?/;
    @a = ($1,$2,$3,$4,$5,$6,$7,$8);
    $b =~ /^([A-Z]+)(d+(?:.d+)?).?([A-Z]*)(d*).?([A-Z]*)(d*)( (?:d{4})?)?(.*)?/;
    @b = ($1,$2,$3,$4,$5,$6,$7,$8);

    # $resultA = join (“::”, @a);
    # $resultB = join (“::”, @b);
    # print “A parsed: $resultAnB parsed: $resultBn”;

    return
    $a[0] cmp $b[0]
        ||
    $a[1] <=> $b[1]
        ||
    $a[2] cmp $b[2]
        ||
    “0.$a[3]” <=> “0.$b[3]”
        ||
    $a[4] cmp $b[4]
        ||
    “0.$a[5]” <=> “0.$b[5]”
        ||
    $a[6] <=> $b[6]
        ||
    $a[7] cmp $b[7]
        ;
    }

  2. Thanks!! You’re missing the backslashes before the d’s. Did WordPress strip them out?

  3. Here’s an example with a little more context. Hopefully it’s useful for newbies! I’ve added one more element to the sort, because our library sometimes makes additions after the year.

    >>>>

    my @lcList = (‘DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);

    $beforeList = join (“n”, @lcList);
    print “Before:n$beforeListnn”;

    my @result = sort by_lc_number @lcList;
    $afterList = join (“n”, @result);

    print “nAfter:n$afterListnn”;

    sub by_lc_number {

    # print “A passed: $anB passed: $bn”;

    $a =~ /^([A-Z]+)(d+(?:.d+)?).?([A-Z]*)(d*).?([A-Z]*)(d*)( (?:d{4})?)?(.*)?/;
    @a = ($1,$2,$3,$4,$5,$6,$7,$8);
    $b =~ /^([A-Z]+)(d+(?:.d+)?).?([A-Z]*)(d*).?([A-Z]*)(d*)( (?:d{4})?)?(.*)?/;
    @b = ($1,$2,$3,$4,$5,$6,$7,$8);

    # $resultA = join (“::”, @a);
    # $resultB = join (“::”, @b);
    # print “A parsed: $resultAnB parsed: $resultBn”;

    return
    $a[0] cmp $b[0]
        ||
    $a[1] <=> $b[1]
        ||
    $a[2] cmp $b[2]
        ||
    “0.$a[3]” <=> “0.$b[3]”
        ||
    $a[4] cmp $b[4]
        ||
    “0.$a[5]” <=> “0.$b[5]”
        ||
    $a[6] <=> $b[6]
        ||
    $a[7] cmp $b[7]
        ;
    }

  4. Yes, it sure did.  It stripped out the backslashes before the dots, too.  Thanks for catching that.

  5. Tom Clark says:

    I’m a newbie looking forward to making this work.  If it does it’ll save me a ton of time at my library.

    Question:  Shouldn’t the phrase:

          # print “A passed: $anB passed: $bn”;

    use ‘parsed’ instead of ‘passed’ ? 

    or conversely for the subsequent phrase

          # print “A parsed: $resultAnB parsed: $resultBn”;

    Perhaps I don’t yet understand enough about Perl.

    TC

  6. Tom, I don’t think so.  The first line, commented out, shows what was “passed” into the script.  The second line, also commented out, shows what one gets after the line is chopped up, or “parsed”.  But that’s Geoff’s modification of my code, not mine.

  7. Tom Clark says:

    I get it…sorry…like I said, a newbie.  Thanks for responding.

    TC

  8. Josh Purinton says:

    An LC Call No.is composed of a “call letter” [A-Z]+ followed optionally by a call number [0-9]+(\.[0-9]+)?, followed by zero or more cutter numbers ([A-Z]+[0-9]+). Everything is sorted ASCII-betically except the integer part of the call number, which is sorted numerically. Therefore, you can just zero-pad the integer part of the call number, and sort the resulting strings using Perl’s built-in cmp operator. (The code below does some extra work to strip non-letters and non-digits, but this shouldn’t matter if all your LC numbers are properly formatted to begin with.)

    my @lc_list = (‘DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);

    @lc_list = lc_sort(@lc_list);

    print map “$_\n”, @lc_list;

    sub lc_sort {
      my @sorted = sort map make_sortable_lc($_).”=”.$_, @_;
      foreach (@sorted) {
        s/^[^=]*=//;
      }
      return @sorted;
    }

    sub make_sortable_lc {
      local ($_) = @_;

      s/([0-9])[.]([0-9])/$1:$2/;
      s/[^:A-Z0-9]//g;
      s/(.):(.)/$1.$2/;
      s/^([A-Z]*)([0-9]+)/sprintf “%s%09d”, $1, $2/e;

      return $_;
    }

  9. Well, Josh, I suppose if you wanted to be “efficient” or “elegant” or “clever” about it, you could use your method.

    (Thanks.  I’m feeling a bit foolish for my original solution.)

Trackbacks

  1. [...] needed a function that will compare Library of Congress call numbers. I found a function in PERL to do this on Joshua McGees’s site, but I needed it in PHP and our call numbers use a space instead of a period for some of the [...]



Leave A Comment...

CommentLuv badge