A Python CLI tool for converting JATS (Journal Article Tag Suite) XML files to Markdown format, with support for extracting peer review comments and author responses.
jats parses JATS XML files from scientific publishers (bioRxiv, eLife, etc.) and converts them to clean, readable Markdown. It's particularly useful for working with preprint manuscripts and their associated peer review materials.
- Convert JATS XML articles to Markdown
- Extract peer review comments and author responses from multi-article XML files
- Support for bioRxiv manifest files (optional metadata)
- Organize reviews and responses by revision round
- Simple CLI interface with stdout or file output
- Python >=3.10
cd jats
uv pip install -e .cd jats
pip install -e .Convert a JATS XML file to Markdown:
# Output to stdout
jats convert article.xml
# Output to file
jats convert article.xml -o article.md
# With bioRxiv manifest file (optional)
jats convert article.xml -m manifest.xml -o article.mdExtract peer review comments and author responses from JATS XML files that include sub-articles (common in eLife and some bioRxiv articles):
# Extract reviews and responses to separate files
jats convert article.xml -o article.md -r output_base
# Creates:
# - output_base_reviews.md (all review comments, organized by round)
# - output_base_responses.md (all author responses, organized by round)The -r flag extracts sub-articles with the following JATS article types:
- Review comments: decision-letter, referee-report, editor-report, reviewer-report
- Author responses: author-comment, reply
Reviews and responses are automatically organized by revision round using JATS4R peer-review-revision-round metadata (defaults to round 1 if not specified).
jats convert 2023.01.01.12345.xml -o paper.md# Convert main article and extract reviews
jats convert elife-12345-v1.xml -o paper.md -r elife-12345-v1
# Output files:
# - paper.md (main article)
# - elife-12345-v1_reviews.md (peer review comments)
# - elife-12345-v1_responses.md (author responses)# manifest.xml provides additional metadata
jats convert article.xml -m manifest.xml -o article.mdjats expects JATS XML files following the JATS (Journal Article Tag Suite) standard. This format is used by:
- bioRxiv and medRxiv preprint servers
- eLife journal
- PubMed Central (PMC)
- Many other scientific publishers
A typical JATS XML file contains:
<front>: Article metadata (title, authors, abstract)<body>: Main article content organized in sections<back>: References, acknowledgments, etc.<sub-article>: Optional peer review materials (eLife, some bioRxiv)
bioRxiv articles may include an optional manifest.xml file that provides:
- Collection/category information
- Version history
- Links to published versions
- Peer review URLs
jats converts JATS XML to clean, readable Markdown with:
- Article title as H1 heading
- Authors with affiliations
- Abstract
- Body sections with appropriate heading levels
- Inline figures with captions
- References (when available)
When using -r, peer review materials are extracted to separate Markdown files:
Reviews file (*_reviews.md):
# Revision Round 1
## Reviewer 1
[Review content...]
---
## Reviewer 2
[Review content...]Responses file (*_responses.md):
# Revision Round 1
## Author Response
[Response content...]# Install development dependencies
uv pip install -e ".[dev]"
# Run tests
pytestjats/
├── jats/
│ ├── __init__.py
│ ├── main.py # CLI entry point
│ ├── parser.py # JATS XML parsing
│ ├── converter.py # Markdown conversion
│ └── models.py # Data models
├── tests/
│ ├── test_*.py # Test files
│ └── *.xml # Test fixtures
├── pyproject.toml # Package configuration
└── README.md
See DEVELOPMENT.md for detailed development documentation and code style guide.
Currently, <supplementary-material> elements (such as source data files for figures) are excluded from the markdown output. These typically appear as:
<supplementary-material id="fig6sdata1">
<label>Figure 6—source data 1.</label>
<caption>
<title>PDF files containing original western blots...</title>
</caption>
<media mimetype="application" mime-subtype="zip" xlink:href="..."/>
</supplementary-material>Future Enhancement: Add support for extracting and linking to source data files, including:
- Source data download links
- Separate source data manifest
- Integration with figure references
- JATS Documentation
- JATS4R (JATS for Reuse) - Recommendations for peer review tagging
- bioRxiv JATS XML
- eLife JATS XML
MIT
For issues or questions, please open an issue on GitHub.