Dec 11 2010

How about we start with just a little of the world’s information?

Sometimes — generally when I should be doing something more productive, like sleeping — I like to think about big systems at scale, and the difference large numbers of people taking small actions would make on any given problem.

It’s true that big systems can be made through small actions, a truth that has not always been self-evident. Take Wikipedia, a glorious and endlessly surprising collection of everything that’s great about being human — the stories, the histories, the languages, the science and tech, and the collaboration to collect it all in one place. To collect information well even against the barriers of time and space and poor recordkeeping is always a somewhat surprising accomplishment, which may be why so many librarian species in science fiction are aliens, not human. (But I digress). Wikipedia exists because many people take small actions to support it. And I don’t just mean editing; you may have seen the banners this month, asking for money — the average donation this fall to the Wikimedia Foundation, which hosts Wikipedia, is $30 or so, and with (we hope) an even 500,000 such donations we’ll raise enough to support the traffic and servers of the world’s fifth largest website, and the staff of 50 or so to maintain that infrastructure. Please give if you can. (But I digress again).

But the thing is, the reason the project’s worth supporting in the first place is that Wikipedia is an unsurpassed educational resource at scale, with a miraculous 17 million articles across 279 languages, all of which is due entirely to individuals making incremental and small edits (12 million a month!). And I’m not even counting Commons files, or Wiktionary definitions! It is also, unsurprisingly, very much a work in progress, and that project’s to-do list can be a little overwhelming.

Take unreferenced articles, a particular pet peeve of mine. We all agree: articles being referenced is a good thing. References help keep the accuracy of the text up and give the interested reader somewhere else to go. There are the moment 277,689 unreferenced articles on the English Wikipedia, with a backlog stretching back four years. (Yes, we dutifully count them all, or at least the ones that are tagged as being unreferenced). Add to that another 73,867 articles where someone thinks there needs to be better sources, or thinks a source needs to be checked to make sure it says what is claimed. And then, hey! if that’s not enough to keep you busy for a while, how about the [[citation needed]] category, the category of articles that have a statement or two that could really stand to have a footnote added, just so we can all be sure the claim really has its origin in fact. There’s 243,991 articles in that category, which ought to occupy a rainy Sunday afternoon nicely.

All told that’s somewhere right around 600,000 articles in English alone (leaving aside the inevitable duplication between some articles in these categories, and the fact that there are probably articles that need work that aren’t yet tagged) that could use some help with references. By comparison, that’s an order of magnitude more articles than any print encyclopedia has in total. In other words: many lifetimes of work for any individual, and more work than most full-time editorial shops could even contemplate. And yes, we have many thousands of volunteers, but not so many steadily active editors as you might think — and it often takes time and skill to track down references. The problem seems insurmountable, doesn’t it?

Let’s try a thought experiment. There are, according to the ALA, 149,521 librarians working in this country. That’s not counting special librarians; the ALA also counts around 8,500 special libraries. That’s also ignoring librarians in the rest of the English-speaking world. Even if you just count the 149,521 though — let’s round up to 150,000 for the sake of argument — that means that if each and every librarian pitched in and referenced four — just four — English Wikipedia articles we could knock the backlog out. Hell, we could probably do it in a week — or even a couple of days, assuming everyone had time, no overlap with one another, pre-existing editing skills, and good luck with their research. But let’s call it a week.

In this hypothetical week — we’ll call it the Week of Boldly Referencing — not all the librarians in the U.S. would be able to participate, of course. (Let’s be realistic). But many librarians from other places would participate, which would be great — Wikipedia’s coverage is global. Some of the most obscure topics could be tackled by academic librarians, which their access to special collections, research databases, and troves of journal articles. Others could be handled by anyone with access to a solid literature reference work, or biographies collection — checking author dates of birth and death, for instance, and fixing up bibliographies. I can imagine government docs librarians going to town, adding statistics citations and linking in to the mysterious depths of government research websites. And all the librarians I know are passionate about some hobby or another; no one ever claimed you had to stick just to your job description when editing.

Of course, many people would need to be trained how to edit, or would need some time to experiment on their own; fortunately, being computer-literate is more or less part of the job description for the profession. There would be some culture clashes, but I think we can all agree that building a solid bibliography — on any and every topic of human endeavor — is a worthwhile occupation. And imagine the ads: “Wikipedia Wants You!” Maybe we can get Nancy Pearl to pose as Uncle Sam.

——————————

Other than logistics, is there a reason this hasn’t happened yet? Over in the Wikimedia world, I along with many others have been thinking about the relationship between authors and publishers, and archivists and librarians and Wikipedia: the people who make stuff, and the people who collect it and make it accessible to others. We haven’t necessarily paid attention to all of these roles in the past when thinking about the future of Wikimedia. As I noted below, I attended the Charleston Conference back in October, and one of the questions there was “how can publishers help with Wikipedia?”

How can they? I have a ready answer when librarians ask this question: I tell them that not only can they help in fixing our factchecking backlogs, they can serve a role as educators, teaching people how to evaluate information in the context of (and on) Wikipedia; and I note that there are even some technical aspects of making metadata freely accessible, improving cataloging, and helping make seamless discovery systems that will benefit the average Wikipedia user who wants to find a referenced book or article. I sometimes mention [[special:booksources]]; always mention “cite this page”, and often talk about systematic projects to add references — projects that haven’t always gone as smoothly as one might desire, or been as large as what I imagine above, but that do overall have a good track record (note to Wikipedians: we should improve our documentation of these efforts). I am thinking of projects to add links to online special collections, for instance.

But publishers? It turns out that my answer is much the same on a lot of counts: publishers often have access to the best information about a topic via their own authors, and certainly they have access to the best metadata for the many thousands of articles about books and other media that we have. Publishers, like librarians, often have the kind of personalities and professional skills that make them great Wikipedia contributors. And certainly decisions to support open publishing, open access, and making information available online helps Wikipedia, as it does all other efforts to disseminate high-quality knowledge broadly.

Of course there is a potential conflict of interest in publishers contributing, as there is with any self-interested party (which is all of us, in one sphere or another) editing Wikipedia. But I think to focus too much on the pitfalls and potential violations of neutrality ignores this fundamental truth: Wikipedia is a tertiary source, and as such both publishers and the libraries that collect their work are an integral, hidden, foundational part of the project. If we are to build the best possible source in Wikipedia, incorporating the world’s knowledge on all the world’s topics in all the world’s languages, we must necessarily — and we do! — rely on a far greater repository of knowledge and investigation about the world than is held solely and privately by our 100,000 collective current contributors. We need an ongoing and growing army of researchers, both amateur and professional; an army of summarizers to go along with an army of writers and community-builders; and we also need for the world of knowledge production to continue as it has — for researchers and writers to continue their work and continue publishing, building the foundation, and making this work as open as possible so that others can draw from it.  Being a librarian in the world’s largest research library system has taught me something about the scale of recorded knowledge; Wikipedia’s accomplishments are certainly exciting and impressive, but we have still only just scraped the surface.

But we can only do more if we can somehow get to the rest of the world’s knowledge, and on that front librarians and publishers and archivists can all help.

Though I do dream of the Week of Boldly Referencing — I’ll make t-shirts! — the focus in talking to librarians and publishers about contributing should be on systemic institutional approaches; while professionals in both groups certainly make able editors as well, the institutions that libraries and publishing houses represent are in general well-suited to corporate projects. Wikipedia is not as a whole been terribly well set up for these kinds of projects in the past. Nonetheless, there is a strong and exciting model for partnership projects in the recent work that is being done around “GLAMS” — Galleries, Libraries, Archives and Museums — and Wikipedia, particularly with museum and archive partnerships. There was a conference on the subject in London a few weeks ago, and another in France; there museum professionals and Wikimedians met to discuss their issues and the future of such projects as image donations to Wikimedia Commons. And there are many small projects to “free knowledge” that are ongoing all the time. We live in an exciting future.

How do you think we can best collaborate with cultural institutions and libraries?

5 responses so far

5 Responses to “How about we start with just a little of the world’s information?”

  1. Librarians are no doubt a useful resource, but I was thinking that as a specific teaching exercise towards students would be to ask them to investigate and perhaps provide sourcing for an article of their choosing — from those that need it.

  2. John Broughton says:

    It hasn’t happened because the Wikimedia editing interface is *terrible*. It needs to be WYSIWYG, with tables and templates and footnotes *not* embedded (in full) in the text, in the same way that image information is *not*. This isn’t rocket science, just really good programming, and prototypes and examples exist. But the Foundation needs to stop tinkering with the current UI (half a milion dollars so far?) and do something radical to make editing easy, because librarians and publishers and others aren’t going to spend lots of time learning the current user editing interface.

    And it hasn’t happened because there are no browser plug-ins that give editors a single-click way of generating a Wikipedia citation, though this is technically possible (there is at least one bot that will generate a cite from a naked url, for example; others tools can generate complete cites via DOI or PMID). To be really effective, the Foundation would need to work with major online publishers, define metatags for title, author, publication date, etc., that publishers could follow if they wanted to make it easy to cite/link to their content. And the Foundation needs to do this because there is no way that a community of volunteers can organize, let alone formally meet with publishers, to do this.

    And it hasn’t happened because the Wikipedia community has no standards for when/how edits by publishers (content owners) are acceptable and when they are not. Again the Foundation could (and should, in my opinion) take the lead, issuing a general policy statement after consultation with (language) communities, then leaving the individual communities to decide whether/how to implement that policy.

  3. John, let us hope that “something easy to make editing easy” doesn’t mean making editing even more difficult on mobile devices and other low end browsers. There’s been a free PHP parser called LIME available for years and other people have made various parsers, too, but until someone at the Foundation decides to make it happen, none of the developers are even going to look at WYSIWYG because they let themselves get so far behind code review that they’re still trying to work their way out of cathedral mode. And a browser plug-in? No thanks, that’s just more platform dependence which will disadvantage low end client users in economically disadvantaged areas where our geographical coverage is the worst.

    Paid editing in general is a disaster. A lot of Foundation officials don’t even know about the Google medical articles paid editing collaboration, which was productive as far as it went, but ended up failing in some pretty spectacular ways when the experts weren’t willing to commit their recommended edits because of the liability issue which arises in the context of paid professional editing as opposed to volunteer collaboration.

  4. er, “something radical to make editing easy” sorry.

  5. The most up to date in Witch methods: Magic Wicca wands!