Most online transcription projects allow you to see or even review other people’s transcriptions, but Zooniverse projects (Operation War Diary, Ancient Lives, AnnoTate and Shakespeare’s World, to name a few) ask you to transcribe on your own. Rather than generating transcriptions and waiting for each one to be vetted by an expert, we try to harness what James Surowiecki calls ‘the wisdom of the crowd’—in this case multiple transcribers and their aggregated responses—to identify what is on a page without then doing a manual review of each page. If our small research team had the time to manually review every transcription, we would probably have the time to do it ourselves! But the dataset is much too large for that.
Multiple individuals’ responses are aggregated using two different algorithms. The first is a clustering algorithm, which uses the blue dots to identify where a line or word is, and the second is MAFFT alignment which is traditionally used for amino acid or nucleotide sequencing, and has been deployed by our friends over on Notes from Nature (see blog).
Aggregating multiple people’s responses minimizes the burden of transcription, as well as the burden of accuracy, on the individual. It’s unlikely that any two people, much less three, five or a dozen volunteers reading the same word or line will make exactly the same mistakes. But asking multiple people to do each page independently has its perils, as we have discovered on other projects: if a page is dense or the handwriting is hard to read, the average person will do a bit on the top third of the page, but often can’t complete the whole page because they don’t have the time or inclination (life happens!).
Many moons ago, when I was first thinking through how to make text transcription efficient in a system that relies upon multiple independent transcribers who can’t see one another’s work, it occurred to me that we could use a visual clue that a line or word had already been completed. Hence the grey dots that started to appear this month.
When you come across grey dots, this means that the section or line they are surrounding has been fully retired, but there may be more to do on a page. If you see a page where every line is encompassed by grey dots, this means you should click ‘I’m done’ and ‘yes, everything is transcribed’. Once three people have said a page is complete on the basis of having done the whole thing themselves or seeing the grey dots, the page will stop showing up in the transcription interface.