-
Notifications
You must be signed in to change notification settings - Fork 4.7k
[Draft] Exploring ways to allow Optional dependencies #1079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
packages/markitdown/src/markitdown/converters/_pptx_converter.py
Outdated
Show resolved
Hide resolved
|
@Zahlii LMK what you thing about this approach. Main thing is that I want the converts to print useful message when they think they could handle the file, but don't have the right dependencies. (alternatively, you could simply not register the converters at all... but then I worry about discoverability for people not reading docs) |
|
I think this will work fine for me - I didn’t check all of the dependencies in detail (maybe there are some more that could be marked as optional?) but at least those that currently lead to CVE warnings are now optional. |
There's certainly a balance here. Some are needed just to help identify what filetypes are in use... and those need to be around in all cases. Others help with some of the fall-back converters (plain text etc.). And more generally, html-related dependencies are used by many converters (because the libraries might output HTML rather than markdown). Plus it's generally nice to be able to convert web pages out of the box, I'd argue. I think I'm satisfied with this first cut add organization. |
casperdcl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| * `[audio-transcription]` Installs dependencies for audio transcription of wav and mp3 files | ||
| * `[youtube-transcription]` Installs dependencies for fetching YouTube video transcription |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference...
| * `[audio-transcription]` Installs dependencies for audio transcription of wav and mp3 files | |
| * `[youtube-transcription]` Installs dependencies for fetching YouTube video transcription | |
| * `[audio]` Installs dependencies for audio transcription of wav and mp3 files | |
| * `[youtube]` Installs dependencies for fetching YouTube video transcription |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree shorter is better. However, we can still get metadata for YouTube (title, video descriptions etc.) even if the transcription library is not installed. Likewise we can get metadata for audio files (runtime, track title, artist, album, etc) even when transcription is not enabled. To this end, I felt it was worth the added characters for precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly disagree; developer precision has nothing to do with user experience.
|
I like it |
|
K, merging to main... but I'll hold of from releasing to PyPi or cutting a release until we've had a chance to shake-test this a little more, and address a few other potentially API-breaking changes in the works. |
No description provided.