PaperTrack builds Obsidian-friendly Markdown reports for arXiv daily submissions and journal issues, then marks papers already present in a local Zotero library.
The current journal workflow is source-conscious:
- ACS journals such as JCTC use ACS issue pages as the primary source.
- AIP journals such as JCP use AIP issue pages as the primary source.
- CrossRef is a fallback only. For JCP, it can fill missing abstracts by DOI, but it does not decide which articles belong to an issue.
- Python 3.10+
- Zotero desktop, optional but recommended
- Obsidian with the Dataview plugin, if you want the generated task queries
Install locally:
pip install -e ".[dev]"The project depends on curl_cffi for publisher pages, plus pyzotero,
beautifulsoup4, lxml, feedparser, jinja2, and requests.
arXiv reports:
python main.py \
--category chem-ph,quant-ph \
--time 2026.04,2026.03 \
--data_dir /home/ansatz/data/obsidian/1/papertrack_datasLatest journal issue:
python main.py \
--source journal \
--journal jcp \
--data_dir /home/ansatz/data/obsidian/1/papertrack_datasBackfill journal issues:
python main.py \
--source journal \
--journal jcp \
--backfill \
--from_year 2018 \
--data_dir /home/ansatz/data/obsidian/1/papertrack_datasExplicit issue:
python main.py \
--source journal \
--journal jcp \
--volume 164 \
--issue 16 \
--year 2026 \
--data_dir /home/ansatz/data/obsidian/1/papertrack_datasrun.sh uses the local conda environment:
/home/ansatz/soft/miniconda3/bin/conda run -n arxiv python main.py ...--data_dir is the root. PaperTrack separates sources under that root:
papertrack_datas/
├── arxiv/
│ ├── chem-ph/2026/04/01.md
│ └── quant-ph/2026/04/01.md
├── acs/
│ └── jctc/22/8.md
└── aip/
└── jcp/164/16.md
Journal output is always:
{data_dir}/{provider}/{journal_slug}/{volume}/{issue}.md
JCTC is configured as an ACS journal:
[journals.jctc]
name = "Journal of Chemical Theory and Computation"
issn = "1549-9618"
slug = "jctc"
acs_code = "jctcce"Auto-discovery uses ACS list-of-issues pages. Article metadata comes from the ACS issue TOC page.
JCP is configured as an AIP journal:
[journals.jcp]
name = "Journal of Chemical Physics"
issn = "0021-9606"
slug = "jcp"
provider = "aip"For each issue, PaperTrack:
- Fetches the AIP issue page, for example
https://pubs.aip.org/aip/jcp/issue/164/16. - Extracts the official issue article list, DOI, title, authors, page, year,
TOC image, and
data-articleid. - Fetches abstracts from AIP's official issue-page AJAX endpoint:
https://pubs.aip.org/PlatformArticle/ArticleAbstractAjax. - If any AIP article still lacks an abstract, fills only that missing
abstractfield from CrossRef by matching the DOI.
PaperTrack does not visit each AIP article page. CrossRef is never allowed to add articles to an AIP issue or replace AIP metadata.
| Flag | Default | Description |
|---|---|---|
--source |
arxiv |
arxiv or journal |
--data_dir |
/home/ansatz/data/obsidian/1/papertrack_datas/ |
Output root |
--debug |
false | Enable debug logging |
--time |
1949.10 |
arXiv month list, YYYY.MM; sentinel means current month |
--category |
quant-ph |
Comma-separated arXiv categories |
--output_format |
category/year/month/day |
arXiv directory layout |
--journal |
empty | Journal key from categories.toml, e.g. jctc or jcp |
--volume |
empty | Explicit journal volume |
--issue |
empty | Explicit journal issue |
--year |
0 |
Publication year for explicit journal issue mode |
--backfill |
false | Process all discovered journal issues |
--from_year |
0 |
Start year for backfill/discovery |
Journal auto mode stores processed (volume, issue) pairs in
.papertrack_state.json in the current working directory.
On every journal auto run, PaperTrack checks previously processed issue files by relative path:
{journal_slug}/{volume}/{issue}.md
If a processed file is missing or the generated report structure is damaged, that issue is rebuilt even if it is not the latest issue.
The format check intentionally ignores Obsidian task edits. Marking a paper as complete, changing task state, or editing the Dataview task block does not count as corruption. The required skeleton is:
## collected## not collected- at least one
### ...paper entry
arXiv reports are queried one calendar day at a time. Re-running a month re-fetches each day and rewrites the day file; if an old file is missing or its generated skeleton is broken, that day is rebuilt without trying to preserve old IDs.
Zotero is optional. If Zotero is unavailable, all papers are rendered under
not collected.
For arXiv entries, matching uses three layers:
- arXiv DOI, such as
10.48550/arXiv.2502.07673 - External publisher DOI from arXiv metadata
- arXiv ID extracted from a Zotero URL
For journal entries, matching uses the publisher DOI.
Enable Zotero local API:
Zotero -> Settings -> Advanced -> Miscellaneous
-> Allow other applications on this computer to communicate with Zotero
Configuration lives in:
src/papertrack/categories.toml
Add arXiv categories under [arxiv.<key>].
Add ACS journals with acs_code:
[journals.example_acs]
name = "Example ACS Journal"
issn = "0000-0000"
slug = "example"
acs_code = "abcd"Add AIP-style journals with provider = "aip":
[journals.example_aip]
name = "Example AIP Journal"
issn = "0000-0000"
slug = "example"
provider = "aip"Run tests:
pytest -qRun syntax checks:
python -m compileall -q src testsCurrent verification commands used during development:
/home/ansatz/soft/miniconda3/bin/conda run -n arxiv pytest -q
/home/ansatz/soft/miniconda3/bin/conda run -n arxiv python -m compileall -q src tests