-
Notifications
You must be signed in to change notification settings - Fork 398
Fix download_demo for data.zip files
#2699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix download_demo for data.zip files
#2699
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature-branch-download-demo #2699 +/- ##
================================================================
- Coverage 98.16% 96.88% -1.29%
================================================================
Files 74 74
Lines 7896 7923 +27
================================================================
- Hits 7751 7676 -75
- Misses 145 247 +102
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This Pull Request is not linked to an issue. To ensure our community is able to accurately track resolved issues, please link any issue that will be closed by this PR! |
c3bdfa2 to
62ee13b
Compare
32da919 to
74d3cc5
Compare
74d3cc5 to
af00578
Compare
|
This Pull Request is not linked to an issue. To ensure our community is able to accurately track resolved issues, please link any issue that will be closed by this PR! |
amontanez24
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make latin-1 a constant and explain why that encoding is special
| try: | ||
| data[table_name] = pd.read_csv(io.BytesIO(file_), low_memory=False) | ||
| except UnicodeDecodeError: | ||
| data[table_name] = pd.read_csv(io.BytesIO(file_), low_memory=False, encoding='latin-1') | ||
| except Exception as e: | ||
| skipped_files.append(f'{filename}: {e}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the previous bit of reading seems very similar, could we move it to its own function like read data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually tricky. One approach for example would be to substitute lines 241-244 with:
def _read_csv_with_fallback(filepath_or_buffer, **kwargs):
"""Read a CSV with a fallback encoding on UnicodeDecodeError."""
try:
return pd.read_csv(filepath_or_buffer, **kwargs)
except UnicodeDecodeError:
kwargs = {**kwargs, 'encoding': FALLBACK_ENCODING}
return pd.read_csv(filepath_or_buffer, **kwargs)But this doesn't work because you need to rewind the BytesIO for the second call, and implementing that into this function makes it confusing.
Did you have some other approach in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I had that in mind but I see why this won't work. It's okay to go ahead with your implementation. I think that if we did anything it would just be 'overcomplicating' a simple process.
|
This Pull Request is not linked to an issue. To ensure our community is able to accurately track resolved issues, please link any issue that will be closed by this PR! |
|
This Pull Request is not linked to an issue. To ensure our community is able to accurately track resolved issues, please link any issue that will be closed by this PR! |
|
This Pull Request is not linked to an issue. To ensure our community is able to accurately track resolved issues, please link any issue that will be closed by this PR! |
CU-86b6xp7a0, Resolve #2688
CU-86b6xrcah, Resolve #2690