Jun 30 2008
wikipedia semi-tech question
I would give my eyeteeth and all my pocket money right now to be able to search across the text of all revisions of a single page — so search bounded within one page title but across all diffs. Does anyone know of such a thing?
7 responses so far
I bet the answer is “download a complete history dump and get clever with SQL.”
sure… I was just hoping someone with, say, toolserver access had already done it
I am not that clever with SQL (yet!)
that’s a good idea, phoebe, why don’t you do that…!
hmmm… all implications that you should do this aside, this might be within my technical capacity.
first big issue i see, though: to build this as a third-party tool means a whole lotta http calls to build the database even for one page. this would make a lot more sense built from the inside — someone with query access to the WP servers. do you have that kind of access? i sure don’t…
let me introduce you to the great and glorious toolserver
http://tools.wikimedia.de/
but no, I don’t have an account on it or anywhere else. Reddragdiva’s comment that one download a dump for coding it is probably accurate. If you want to get started writing tools/hacks like this, the wikitech list is probably the place to get started…
eg:
https://wiki.toolserver.org/view/Query_service
though not practical for the longterm you could test it out.
ohhh..eyeteeth! i am so there.
if the tools are easy to write, maybe sasha and i could hack something out.