Skip to content

Conversation

@erinshek
Copy link
Contributor

@erinshek erinshek commented Apr 9, 2025

Description

This PR adds support for converting CSV files to Markdown tables. The converter handles various CSV formats and edge cases while maintaining proper table structure.

Features

  • Convert CSV files to properly formatted Markdown tables
  • Support for text/csv and application/csv MIME types
  • UTF-8 encoding with error replacement for invalid characters
  • Preserve table structure with headers and data rows
  • Handle edge cases:
    • Empty cells
    • Rows with mismatched column counts
    • Special characters in content

Example

Input CSV:

Name,Age,City
John,30,New York
Alice,25,London

Output Markdown:

| Name  | Age | City     |
|-------|-----|----------|
| John  | 30  | New York |
| Alice | 25  | London   |

Usage

Command Line

markitdown input.csv > output.md

Python Library

from markitdown import MarkItDown

converter = MarkItDown()
result = converter.convert("input.csv")
print(result.markdown)

Changes

  • Add new CsvConverter class
  • Register converter in MarkItDown class
  • Add CSV converter to available converters
  • Fix Azure Document Intelligence dependency handling

Testing

  • Tested with various CSV formats
  • Verified UTF-8 encoding handling
  • Checked edge cases with empty cells and mismatched columns
  • Validated Markdown table formatting

Related Issues

Closes #1144

- Add new CsvConverter class to convert CSV files to Markdown tables\n- Support text/csv and application/csv MIME types\n- Handle UTF-8 encoded files with error replacement\n- Preserve table structure with headers and data rows\n- Handle edge cases like empty cells and mismatched columns\n- Fix Azure Document Intelligence dependency handling\n- Register CsvConverter in MarkItDown class
@erinshek
Copy link
Contributor Author

erinshek commented Apr 9, 2025

@microsoft-github-policy-service agree

@erinshek
Copy link
Contributor Author

Hi, @afourney

Could you please take a look at the changes I’ve made? I tried to address the issue mentioned in #1144. It should now be possible to convert a CSV file to Markdown.

If you notice anything that could be improved, feel free to let me know—I'm happy to make adjustments.

@afourney
Copy link
Member

afourney commented Apr 13, 2025

This is pretty solid.

Note that #1171 also adds CSV support, but requires pandas (same as xlsx and xls currently), so I think I prefer basing csv support on this one here (#1176) for basic CSV support.

Only thing missing is a test vector added here (also include a test file in tests/test_files)
https://github.com/microsoft/markitdown/blob/main/packages/markitdown/tests/_test_vectors.py

Perhaps see how it is done in #1171.

Also, I would like to credit you both on the release notes.

EDIT: I went ahead and fixed these while also fixing the pre-commit format. No further action needed

@afourney afourney merged commit 8576f1d into microsoft:main Apr 13, 2025
3 checks passed
@erinshek
Copy link
Contributor Author

erinshek commented Apr 13, 2025

Thank you

azhao25 pushed a commit to azhao25/markitdown that referenced this pull request Oct 16, 2025
…t#1176)

* feat: Add CSV to Markdown table converter

- Add new CsvConverter class to convert CSV files to Markdown tables\n- Support text/csv and application/csv MIME types\n- Preserve table structure with headers and data rows\n- Handle edge cases like empty cells and mismatched columns\n- Fix Azure Document Intelligence dependency handling\n- Register CsvConverter in MarkItDown class

----

Thanks also to @benny123tw who submitted a very similar PR in microsoft#1171
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fail to convert CSV table

2 participants