Skip to the content.

Reference Parsing

One of the most fundamental challenges for any journal lies in dealing with references, especially if you intend to deliver them in a machine readable format such as JATS.

There are many reasons to convert your references to a machine-readable format – not in the least of which is that it can mean a more reader-friendly experience for your human readers.

Automated reference parsing with Anystyle

If you have unstructured, plain-text references, a good tool to use is the anystyle.io web app, which uses machine learning algorithms (based on conditional random field) to parse references.

Preparation of input data

Anystyle is not a magic bullet. In all likelihood, you will have to do some manual correction to the reference lists in the token editor. The reference lists you give it should be:

Tag clarifications

Anystyle uses its token editor to match parts of your reference lists to the structured data that it outputs. It is important to note that these _tokens_ do not map 1:1 onto the outputs. That is to say, do not consider the token presented in the anystyle editor in terms of either the a) outputs that you will get from that app or b) the scholarly reference terms you are acquainted with. In order to learn the ways in which the app processes data, it is best to parse some reference lists and compare the results with the original documents before proceeding.

Export

Upon completion, Anystyle will give you a few formats to export your references to: citeproc/json, xml, and bibtex. While all of these have their uses, citeproc/json is the most versatile and complete format. Choose this option.

Using the structured data

Implementing the structured data in your journal depends on what kind of ouput formats you intend to use. It is useful to consider, for instance, if you want to: