Pandoc is a powerful text conversion tool that allows to write scientific documents completely in Markdown, and to be transform in properly formatted pdfs, web document, Latex or even Docx files. With the use of filters, pandoc is able to extend the Markdown capabilities to reference in text previously published works, and to make use of figure, equations, table, etc. numbering and inside references as well.

I personally have found two filters very useful: Pandoc-citeproc and Pandoc-crossref. Citeproc is a filter that looks for references in the text with the form @referencetag and format them with the indicated style (like APA or IEEE) in text and in the reference block at the end of the document. In the other hand, the crossref filter give us a proper way of inserting equations, figures, tables and listings (code blocks), in such ways that they're automatically and properly numbered and referenced through custom tags.

Pandoc-citeproc requires to be pointed to a biblatex1 file which contains the information of the works cited, and optionally a csl file that determines the reference style used. These both can be indicated directly in the Markdown file using the YAML block in the beginning of the document as follows:

and the .bib file should contain the information of the reference such as:

Then the command pandoc -s paper.md --filter pandoc-citeproc -t html return the converted text from markdown to html with the references included:

In this way, it is fairly simple to write and manage the document and presented in the required format for collages to collaborate or to be submitted for publication (most likely in Latex).

However, with this framework the creation and maintaining of the references file (.bib) and the referring tags of the cited works is left to be done manually or by third parties, such as reference managers like Zotero or Mendeley.

Due to the fact that most recent publications make use of the digital object identifier (DOI)2, it is possible to use this index as the citation tag in our documents. By doing so, it is warranted that all citations reference to a unique document, different to usual tags on which an author could potentially have several publications for each year. This also open the window for further automatization, as there is reliable web services that offers the citation information of any given DOI, such as https://dx.doi.org/.

This concept give birth to a new pandoc filter called doi2bib. This filter make use of specified bibliography file (only .bib) in the YAML configuration, it search for all references with the format @DOI:XXX.XXXand updates the this file accordingly. This means that any new reference is automatically added using the reliable information offered in the correct format by doi.org.

This tool offers the following benefits:

• The specified file can be an empty file, previously existed .bib filed or not existent.

• Only newly references required to be downloaded, therefore it does not add significantly time of compilation.

• Several document can share this .bib file, or use a global one for all your documents.

• If all your reference uses this format, a new file with only the current citations in order of citation can be generated simply by changing the specified bibliography file in the document.

To make use of this filter, just download the last build from the Github and paste it in the same Path of your pandoc executable. Then this can be implemented using the command pandoc -s paper.md --filter pandoc-doi2bib --filter pandoc-citeproc -o paper.pdf

which results in:

I´m using this framework for my thesis and prospect publications, so I hope it might helps others as well.

1. Others file types like bibtex, json, or yaml can be used as well.↩︎

2. Digital object identifier (DOI) is a persistent identifier or handle used to identify objects uniquely, standardized by the International Organization for Standardization (ISO). Wikipedia↩︎