-
Notifications
You must be signed in to change notification settings - Fork 234
Enable ruff's unspecified-encoding (PLW1514) rule and fix violations #3319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Default unsafe-fix is to use `encoding="locale"`
|
Setting as ready for review to test |
PEP0597 hints at UTF-8 becoming the default encoding in the future, so pre-emptively applying it here. Xref https://peps.python.org/pep-0597/#prepare-to-change-the-default-encoding-to-utf-8
weiji14
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main changes to check are in the plot.py and plot3d.py file since they are user-facing. The changes to the tests/test_*.py files should be ok.
| elif kind == "file" and str(data).endswith(".gmt"): # OGR_GMT file | ||
| try: | ||
| with Path(which(data)).open() as file: | ||
| with Path(which(data)).open(encoding="utf-8") as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that *.gmt files are always encoded in UTF-8, is this the case, or should we choose encoding="locale" instead?
To be clear, we're only trying to parse the vector geometry type (Multipoint/Point) from the *.gmt file, not read the whole file, so maybe ok to assume that the header lines are utf-8 compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*.gmt can be encoded in any encoding, so there is no way to make it work in all cases. I guess encoding="utf-8" is the best choice.
To be clear, we're only trying to parse the vector geometry type (Multipoint/Point) from the *.gmt file, not read the whole file, so maybe ok to assume that the header lines are utf-8 compatible?
It usually works, but will fail if the file uses a different encoding and contains comments on the first line.
Actually, we're assuming that the @G record is always on the first line but that's not always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will go with utf-8 then. Users can still manually override the style if needed, in case utf-8 encoding doesn't work and the style isn't automatically applied.
Actually, we're assuming that the @g record is always on the first line but that's not always true.
From https://docs.generic-mapping-tools.org/6.5/reference/ogrgmt-format.html#the-ogr-gmt-format, it says:
The first comment line must specify the version of the OGR/GMT data format, to allow for future changes or enhancements to be supported by future GMT programs. This document describes v1.0.
and the examples on the page below seem to suggest that # @VGMT1.0 is followed immediately by @GP... on the same line. Are there cases where @GP... goes onto the second line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ogr2ogr also produces OGR_GMT files with # @VGMT1.0 followed by @GP, so it should be fine, although users can manually putting @GP at the second line and even add any comments at the begin of the file.
Anyway, that's just rare cases and it's safe to ignore them.
Description of proposed changes
Enable ruff's unspecified-encoding (PLW1514) rule to check for uses of
openand related calls without an explicitencodingargument. Note that this is a preview mode feature.This lint rule tells us to consider using the
encodingparameter to enforce a specific encoding. PEP 597 recommends usingencoding="locale"on Python 3.10 and later, though we could also useencoding="utf-8"perhaps.For this PR, we went for setting
encoding="utf-8"References:
Addresses #2741 (comment)
Reminders
make formatandmake checkto make sure the code follows the style guide.doc/api/index.rst.Slash Commands
You can write slash commands (
/command) in the first line of a comment to performspecific operations. Supported slash command is:
/format: automatically format and lint the code