-
Notifications
You must be signed in to change notification settings - Fork 0
New Yahoo rules #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This PR fixes two bugs:
|
| is_quote_header = self.QUOTE_HDR_REGEX.match(line) is not None | ||
| is_quoted = self.QUOTED_REGEX.match(line) is not None | ||
| is_header = is_quote_header or self.HEADER_REGEX.match(line) is not None | ||
| stripped_line = line.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about changing HEADER_REGEX from r'^\*?(From|Sent|To|Subject):\*? .+' to r'^\s*\*?(From|Sent|To|Subject):\*? .+' instead?
Other expressions seem not to care about leading whitespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that Yahoo! started behaving weirdly - they add extra spaces everywhere, then they add what appear to be random |s where the original message should be (are they trying to recreate the table cells using |s?). See for example:
|
|
|
| New Message from Alexandru on Sailo |
|
|
|
|
| Ahoy Alexandru, |
|
|
|
|
| Alexandru has sent you a message regarding a trip aboard the X-Yachts Xp 38. |
|
|
I figured it was safer, for parsing purposes, if we ignored trailing spaces on every line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may result in working incorrectly when someone adds leading spaces intentionally.
For example, list items or code with indents.
I agree that there is a little chance that the spaces will be important in our use case, but for a general-purpose library it's not good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree an email may contain whitespace that's no accident. However, you're gonna have a hard time distinguishing between user intent and email servers' vagaries.
I updated the code to reflect the change that's been observed in Yahoo! behavior. Indeed, it's fair that I don't make assumptions about the other lines.
No description provided.