Jun 30 2008

wikipedia semi-tech question

Published by at 11:44 am under Uncategorized

I would give my eyeteeth and all my pocket money right now to be able to search across the text of all revisions of a single page — so search bounded within one page title but across all diffs. Does anyone know of such a thing?

7 responses so far

7 Responses to “wikipedia semi-tech question”

  1. reddragdiva says:

    I bet the answer is “download a complete history dump and get clever with SQL.”

  2. brassratgirl says:

    sure… I was just hoping someone with, say, toolserver access had already done it :) I am not that clever with SQL (yet!)

  3. kenllama says:

    that’s a good idea, phoebe, why don’t you do that…!

  4. kenllama says:

    hmmm… all implications that you should do this aside, this might be within my technical capacity.

    first big issue i see, though: to build this as a third-party tool means a whole lotta http calls to build the database even for one page. this would make a lot more sense built from the inside — someone with query access to the WP servers. do you have that kind of access? i sure don’t…

  5. brassratgirl says:

    let me introduce you to the great and glorious toolserver :)
    http://tools.wikimedia.de/

    but no, I don’t have an account on it or anywhere else. Reddragdiva’s comment that one download a dump for coding it is probably accurate. If you want to get started writing tools/hacks like this, the wikitech list is probably the place to get started…

  6. brassratgirl says:

    eg:
    https://wiki.toolserver.org/view/Query_service

    though not practical for the longterm you could test it out.

  7. kitty_scarboro says:

    ohhh..eyeteeth! i am so there.

    if the tools are easy to write, maybe sasha and i could hack something out.