Data Futures’ Annotation of Scientific Literature Changes Preservation Game

Since 2016, the IMCC’s Data Futures project has worked with the repositories section at CERN in Geneva, and outputs of our collaboration—now the hasdai partnership—developing long-term preservation and scientific annotation standards, are being integrated with Zenodo (see

An international consortium ( has developed around this partnership, and developed a Free and Open Source Software platform—InvenioRDM—which multiple institutions in Europe and the U.S. are now deploying. CERN has commenced re-implementation of Zenodo itself using the RDM platform, and new projects supporting annotation of scientific literature at scale across the biodiversity, medical and social sciences and humanities communities were presented at iPRESS-2021. Two services based on these developments are now publicly available:

First, as species loss accelerates, it has become increasingly recognized that historic literature will become a crucial source of information. Significant progress has been made with automatically extracting ‘taxonomic treatments’ from publications going back as far as 1758 and legislation in many countries now recognizes such treatments as free of any copyright restrictions that may have applied to the original documents. In a collaboration with the editorial board of The European Journal of Taxonomy and Plazi, CERN and Data Futures have converted taxonomic treatments of the complete Journal to WADM annotations accessible in an InvenioRDM repository, making species reported independently discoverable. More information can be found at 

Second, in a parallel project with the Voltaire Foundation, CERN and Data Futures have developed InvenioRDM-based annotation and preservation infrastructure tailored to the eighteenth-century manuscripts relating to Voltaire. Including letters, drafts of books and plays and maps, this important new data resource will provide services to the research community as well as public presentation of Voltaire’s papers—many of which will be made accessible for the first time. The current pilot service is now available at and extensions to address more than twenty thousand such documents and books are now being planned.

We are happy to provide more information via email to

Written by on Monday, posted in News (No comments yet)

No comments yet

Leave a comment