Books + Tags and Graph Theory

SnakOpen Source / Volunteer for LibraryThing

Bliv bruger af LibraryThing, hvis du vil skrive et indlæg

Books + Tags and Graph Theory

Dette emne er markeret som "i hvile"—det seneste indlæg er mere end 90 dage gammel. Du kan vække emnet til live ved at poste et indlæg.

1MMcM
okt 3, 2006, 7:59 pm

(Well, maybe not so much theory, but how else to distinguish from graphs you draw?)

Tim's blog post mention of Erdős number reminded me of something I fooled around with briefly. Maybe somebody here can come up with a way to make it useful or interesting.

Tags relate similar books. Conversely, if one has several areas of interest, then one is likely drawn to a book that covers their overlap. For example, if someone has books on Italy and cookbooks, then it is a likely that they have an Italian Cooking book. So, books join tags.

Technically, there is a graph with nodes of two types with arcs between unlike or labelled arcs between nodes.

You can close over the joining links to divide the library up into the smallest number of pieces. I found that 90% of my books were in the one big piece. 5% didn't have any tags at all (oops). And the remaining 5% are tens of tags that are only assigned to a few books and rarely combined with other tags (in particular a subsuming tag -- again, oops). Moreover, there was no single book whose absense would divide the library radically, like into two big pieces. So the joining was pretty robust. I don't (yet) use all-encompassing tags like fiction / non-fiction or read / unread; they would probably be excluded for someone who did.

You can find the shortest path between two given books, where all the intermediate books share at least one tag with each neighbor.

You can find longest paths that wander around never visiting the same book or tag twice. I got to around 150 books/tags, which is about a third of the tags I define. Or you could look for cycles, where a path following those rules returns to its start.

A big problem is that all these searches seem to be computationally intractable, tending to be O(n!) or O(m^n) or like that. Even a moderate size library explodes and what you find is somewhat by luck. A factor of a few is knocked off by introducing the equivalence class for books that have the same tags in any order and joining them; but that's hardly a dent.

Another issue is that to make the results appealing you'd really like to lay out the book covers. But that's one of the pieces of information missing from the tab export, due to concerns about Amazon's IP.

If this sounds like programs in Clocksin and Mellish Programming in Prolog, well it did to me too. I did a first rough cut in Prolog. But the free implementations I could find for XP didn't seem to have enough performance, and I'd have had to move to using lots more difference lists and less backtracking. So I just went with Common Lisp, which does that more obviously. It hardly matters, though, since there is very little code to it right now.

Ideas, anyone?