An interactive tool for cleaning and formatting BibTeX entries according to specific guidelines.
- Interactive Processing: Review and approve each change before applying
- Citation Tracking: Scans TeX files to identify which BibTeX entries are actually used
- Smart Formatting: Automatically fixes common BibTeX issues:
- Protects capital letters in titles (acronyms, proper nouns, etc.)
- Fixes entry types (e.g., @misc to @article for arXiv papers)
- Cleans up fields (removes abstracts, unnecessary publishers, etc.)
- Standardizes arXiv entries with proper journal fields
- Fixes page ranges to use double dashes
- Diff Display: Shows clear before/after comparisons with color highlighting
- Manual Editing: Option to manually edit entries in your text editor
- Safe Operation: Never overwrites original files; saves to new file
- Validation: Ensures output BibTeX is properly formatted and parseable
- Clone or download this tool into the
bibtext_cleanupfolder - Install required Python packages:
cd bibtext_cleanup
pip install -r requirements.txtBasic usage:
python cleanup_tool.py paper.tex references.bibSpecify output file:
python cleanup_tool.py paper.tex references.bib -o cleaned_refs.bibAlternative syntax:
python cleanup_tool.py --tex main.tex --bib refs.bib --output refs_clean.bibWhen processing each entry, you can:
- [a] Accept changes - Apply the suggested formatting
- [s] Skip entry - Keep the original formatting
- [e] Edit manually - Open in text editor for custom changes
- [d] Show detailed diff - Display the differences again
- [v] View side-by-side - Show original and formatted versions
- [q] Quit - Save progress and exit
- [x] Exit - Exit without saving
The tool follows these BibTeX formatting guidelines:
-
Entry Types:
- Uses @article for journal papers and arXiv preprints
- Uses @inproceedings for conference papers
- Uses @book for books
-
Capitalization:
- Protects acronyms (NASA, IEEE, etc.) with braces
- Protects proper nouns and CamelCase words
- Protects capitals after colons and periods
-
Field Cleaning:
- Removes abstract fields for readability
- Removes publisher field from articles
- Removes location information from booktitle/journal fields
- Ensures page ranges use double dashes (--)
-
arXiv Entries:
- Converts @misc to @article
- Adds journal field: "arXiv preprint arXiv:XXXX.XXXXX"
- Ensures URL field is present
cleanup_tool.py- Main scriptbibtex_parser.py- BibTeX parsing modulecitation_scanner.py- TeX file citation scannerbibtex_formatter.py- Formatting rules implementationinteractive_cli.py- User interface componentsrequirements.txt- Python dependenciesREADME.md- This file
- The tool only processes entries that are actually cited in your TeX file
- Original files are never modified; results are saved to a new file
- The output file is validated to ensure proper BibTeX formatting
- You can interrupt the process at any time and save partial progress
- Python 3.6+
- colorama (for colored terminal output)
- bibtexparser (optional, for enhanced parsing)