[Draft] Exploring ways to allow Optional dependencies #1079

afourney · 2025-02-28T19:58:21Z

No description provided.

packages/markitdown/src/markitdown/converters/_pptx_converter.py

afourney · 2025-03-01T07:13:23Z

@Zahlii
@robfitzgerald
@casperdcl
@AdrianVollmer

LMK what you thing about this approach. Main thing is that I want the converts to print useful message when they think they could handle the file, but don't have the right dependencies.

(alternatively, you could simply not register the converters at all... but then I worry about discoverability for people not reading docs)

Zahlii · 2025-03-01T07:23:34Z

I think this will work fine for me - I didn’t check all of the dependencies in detail (maybe there are some more that could be marked as optional?) but at least those that currently lead to CVE warnings are now optional.

afourney · 2025-03-01T07:35:10Z

I think this will work fine for me - I didn’t check all of the dependencies in detail (maybe there are some more that could be marked as optional?) but at least those that currently lead to CVE warnings are now optional.

There's certainly a balance here. Some are needed just to help identify what filetypes are in use... and those need to be around in all cases. Others help with some of the fall-back converters (plain text etc.). And more generally, html-related dependencies are used by many converters (because the libraries might output HTML rather than markdown). Plus it's generally nice to be able to convert web pages out of the box, I'd argue. I think I'm satisfied with this first cut add organization.

casperdcl

lgtm

casperdcl · 2025-03-01T07:39:04Z

README.md

+* `[audio-transcription]` Installs dependencies for audio transcription of wav and mp3 files
+* `[youtube-transcription]` Installs dependencies for fetching YouTube video transcription


Personal preference...

Suggested change

* `[audio-transcription]` Installs dependencies for audio transcription of wav and mp3 files

* `[youtube-transcription]` Installs dependencies for fetching YouTube video transcription

* `[audio]` Installs dependencies for audio transcription of wav and mp3 files

* `[youtube]` Installs dependencies for fetching YouTube video transcription

Agree shorter is better. However, we can still get metadata for YouTube (title, video descriptions etc.) even if the transcription library is not installed. Likewise we can get metadata for audio files (runtime, track title, artist, album, etc) even when transcription is not enabled. To this end, I felt it was worth the added characters for precision.

Strongly disagree; developer precision has nothing to do with user experience.

AdrianVollmer · 2025-03-03T09:07:25Z

I like it

afourney · 2025-03-03T17:05:33Z

K, merging to main... but I'll hold of from releasing to PyPi or cutting a release until we've had a chance to shake-test this a little more, and address a few other potentially API-breaking changes in the works.

afourney added 7 commits February 28, 2025 08:55

Renamed exception.

2af4ba8

Merge branch 'main' into optional_dependencies

0f63a7e

Exploring ways to enable optional dependencies. Starting with pptx.

7d2e0bd

Merge main.

df80df0

Merge branch 'main' into optional_dependencies

10da043

Fix CLI tests.... have them install [all]

b9487b6

Added .docx to optional dependencies

98698a6

afourney mentioned this pull request Mar 1, 2025

optional dependencies #103

Open

gagb reviewed Mar 1, 2025

View reviewed changes

packages/markitdown/src/markitdown/converters/_pptx_converter.py Outdated Show resolved Hide resolved

afourney added 5 commits February 28, 2025 20:28

Reuse error messages for missing dependencies.

e5dc512

Added xlsx and xls

8362df8

Added pdfs

11ffd2e

Added Ole files.

a2cf8ee

Updated READMEs, and finished remaining feature-categories.

53feead

afourney marked this pull request as ready for review March 1, 2025 06:53

afourney requested a review from gagb March 1, 2025 07:03

Move openai to hatch-test environment.

a056192

casperdcl approved these changes Mar 1, 2025

View reviewed changes

afourney merged commit c5cd659 into main Mar 3, 2025
3 checks passed

afourney deleted the optional_dependencies branch March 3, 2025 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft] Exploring ways to allow Optional dependencies #1079

[Draft] Exploring ways to allow Optional dependencies #1079

Uh oh!

afourney commented Feb 28, 2025

Uh oh!

Uh oh!

afourney commented Mar 1, 2025

Uh oh!

Zahlii commented Mar 1, 2025

Uh oh!

afourney commented Mar 1, 2025

Uh oh!

casperdcl left a comment

Uh oh!

casperdcl Mar 1, 2025

Uh oh!

afourney Mar 1, 2025

Uh oh!

casperdcl Mar 1, 2025

Uh oh!

AdrianVollmer commented Mar 3, 2025

Uh oh!

afourney commented Mar 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		* `[audio-transcription]` Installs dependencies for audio transcription of wav and mp3 files
		* `[youtube-transcription]` Installs dependencies for fetching YouTube video transcription

[Draft] Exploring ways to allow Optional dependencies #1079

[Draft] Exploring ways to allow Optional dependencies #1079

Uh oh!

Conversation

afourney commented Feb 28, 2025

Uh oh!

Uh oh!

afourney commented Mar 1, 2025

Uh oh!

Zahlii commented Mar 1, 2025

Uh oh!

afourney commented Mar 1, 2025

Uh oh!

casperdcl left a comment

Choose a reason for hiding this comment

Uh oh!

casperdcl Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

afourney Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

casperdcl Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

AdrianVollmer commented Mar 3, 2025

Uh oh!

afourney commented Mar 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants