MarkItDown Dify Plugin

A powerful document conversion plugin for Dify based on MarkItDown 0.0.2a1, designed to convert various file formats into Markdown with high accuracy and reliability.

Overview

This plugin serves as an excellent alternative to traditional document extraction nodes, offering robust file conversion capabilities within the Dify ecosystem. It leverages MarkItDown's plugin-based architecture to provide seamless conversion of multiple file formats to Markdown.

Supported File Formats

Documents
- PDF files (.pdf)
- Microsoft Word documents (.doc, .docx)
- Microsoft PowerPoint presentations (.ppt, .pptx)
- Microsoft Excel spreadsheets (.xls, .xlsx)
- HTML files (.html, .htm)
Media Files
- Images with EXIF metadata and OCR support
- Audio files with metadata and speech transcription
Data Formats
- CSV files
- JSON documents
- XML files
- Text files
Archives
- ZIP files (with automatic content iteration)

Usage in Dify

Parameters

The plugin accepts the following parameters in the Dify interface:

files:
  type: files
  required: false
  description: "Array of files to be converted to Markdown"

Response Format

The plugin provides three types of response formats for maximum flexibility:

JSON Response (New)

{
  "status": "success|error",
  "total_files": 2,
  "successful_conversions": 2,
  "results": [
    {
      "filename": "example1.pdf",
      "original_format": "pdf",
      "markdown_content": "# Content...",
      "status": "success"
    },
    {
      "filename": "example2.docx",
      "original_format": "docx",
      "markdown_content": "# Content...",
      "status": "success"
    }
  ]
}

Error response example:

{
  "filename": "failed.pdf",
  "original_format": "pdf",
  "error": "Error message",
  "status": "error"
}

Blob Response
- Returns raw markdown content as a blob
- Includes mime_type metadata: "text/markdown"
- Useful for direct content processing

Text Response (Legacy)

Single File:

[Markdown content of the file]

Multiple Files:

==================================================
File 1: example1.pdf
==================================================

[Markdown content of file 1]

==================================================
File 2: example2.docx
==================================================

[Markdown content of file 2]

Example Usage in Prompts

Please convert the attached files to Markdown format.
{@markitdown files=["document.pdf", "presentation.pptx"]}

Features

Batch Processing: Process multiple files in a single request
Clear File Separation: When processing multiple files, content is clearly separated with headers and dividers
Format Preservation: Maintains important formatting elements in the Markdown output
Error Handling: Provides clear error messages for failed conversions
Automatic Cleanup: Temporary files are automatically managed and cleaned up

Best Practices

File Size: While there's no strict limit, it's recommended to keep individual files under 50MB for optimal performance
Batch Processing: You can process multiple files simultaneously, but consider limiting batches to 5-10 files
Format Support: When possible, use standard file formats from the supported list for best results
Error Handling: Always check for error messages in the response when processing critical documents

Integration Example

Here's how to integrate the plugin into your Dify workflow:

Document Analysis Flow

Input: {files} -> MarkItDown Plugin -> LLM Analysis

Content Extraction Flow

Input: {files} -> MarkItDown Plugin -> Text Extraction -> Database Storage

Advantages Over Traditional Document Extraction

Format Support: Broader range of supported file formats
Metadata Preservation: Retains important metadata from source files
Structured Output: Consistently formatted Markdown output
Batch Processing: Efficient handling of multiple files
Clear Separation: Better organization of multiple file contents
Error Handling: Comprehensive error reporting and handling

Technical Notes

Based on MarkItDown 0.0.2a1
Maintains backward compatibility with 0.0.1a3
Implements plugin-based architecture for extensibility
Automatic temporary file management
Thread-safe processing for concurrent requests

Limitations

Network connectivity required for some conversion operations
Processing time may vary based on file size and complexity
Some advanced formatting may be simplified in the Markdown output

Support

For issues and feature requests, please create an issue in the repository or contact the plugin maintainer.

Note: This plugin is based on MarkItDown 0.0.2a1 and may receive updates as the base library evolves to its first non-alpha release.

Response Types Use Cases

JSON Response
- Workflow automation and data processing
- Status tracking and error handling
- Structured data extraction
- API integrations
Blob Response
- Direct content processing
- Raw markdown handling
- Stream processing
- File system operations
Text Response
- Human readable output
- Legacy system compatibility
- Simple content viewing
- Direct LLM input

contact：[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
_assets		_assets
provider		provider
tools		tools
.difyignore		.difyignore
.env.example		.env.example
.gitignore		.gitignore
GUIDE.md		GUIDE.md
PRIVACY.md		PRIVACY.md
README.md		README.md
main.py		main.py
manifest.yaml		manifest.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MarkItDown Dify Plugin

Overview

Supported File Formats

Usage in Dify

Parameters

Response Format

Example Usage in Prompts

Features

Best Practices

Integration Example

Advantages Over Traditional Document Extraction

Technical Notes

Limitations

Support

Response Types Use Cases

About

Uh oh!

Releases

Packages

Languages

DavideDelbianco/markitdown-dify-plugin

Folders and files

Latest commit

History

Repository files navigation

MarkItDown Dify Plugin

Overview

Supported File Formats

Usage in Dify

Parameters

Response Format

Example Usage in Prompts

Features

Best Practices

Integration Example

Advantages Over Traditional Document Extraction

Technical Notes

Limitations

Support

Response Types Use Cases

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages