Skip to content

Conversation

@severo
Copy link
Collaborator

@severo severo commented Aug 31, 2021

A lot of URL use the query strings, for example
http://opus.nlpl.eu/download.php?f=Bianet/v1/moses/en-ku.txt.zip, we
must not remove it when trying to detect the protocol. We thus remove it
only in the case of the query string being ?dl=1 which occurs on dropbox
and dl.orangedox.com. Also: add unit tests.

See #2843 for the original
discussion.

A lot of URL use the query strings, for example
http://opus.nlpl.eu/download.php?f=Bianet/v1/moses/en-ku.txt.zip, we
must not remove it when trying to detect the protocol. We thus remove it
only in the case of the query string being ?dl=1 which occurs on dropbox
and dl.orangedox.com. Also: add unit tests.

See #2843 for the original
discussion.
@severo severo requested a review from lhoestq August 31, 2021 13:40
severo added a commit to huggingface/dataset-viewer that referenced this pull request Aug 31, 2021
See huggingface/datasets#2856. Also:
transformers 4.10.0 has just been released, no need for the github
dependency anymore.
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix :)
LGTM

merging since the CI failure isn't related to this PR and has been fixed on master

@lhoestq lhoestq merged commit a3ee723 into master Aug 31, 2021
@lhoestq lhoestq deleted the fix-get-extraction-protocol branch August 31, 2021 14:22
mattstern31 added a commit to mattstern31/datasets-server-storage-admin that referenced this pull request Nov 11, 2023
See huggingface/datasets#2856. Also:
transformers 4.10.0 has just been released, no need for the github
dependency anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants