-
Notifications
You must be signed in to change notification settings - Fork 46
Parser refactoring #237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser refactoring #237
Conversation
cejaekl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid to me. I tried poking holes in it with a bit of ad hoc testing, and failed to find any bugs. 😄
lib/money/parser/fuzzy.rb
Outdated
| end | ||
|
|
||
| def last_digits_decimals?(digits, marks, currency) | ||
| # Thousands marks are always different from decimal marks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Grouping marks, not Thousands marks. In locales that adopt the "Chinese" 4-digit grouping system, one group is a myriad, not a thousand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL! This was already present in the old code, GH diff might make it look like a new file but it's really just whitespace changes.
lib/money/parser/locale_aware.rb
Outdated
| return unless currency | ||
|
|
||
| normalized_input = input | ||
| .tr('-0-9.,', '-0-9.,') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also convert 、 (U+3001) and 、 (U+FF64) to , (U+002C)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured I'd start small, fullwidth digits I've seen being used on Shopify Payments onboarding forms by Japanese merchants. Based on https://www.localeplanet.com/icu/ja-JP/index.html dot and comma seemed likely to be sent as fullwidth as well.
Anything else that was not 100% equivalent my hunch was to wait and see first how people are actually interacting with the system - but I will also readily admit I do not have a lot of cultural context to make a prediction. Do you think these characters to be likely used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think these characters to be likely used?
Well, that will depend on how the user's IME behaves.
I gave it a try just now (Mac OS, Japanese hiragana input, into a Chrome web page's text input field). Typing
100,000,000
and then hitting Enter left this in the text field:
100,000、000
So, yes, I think it's likely you'll see 、 or 、 instead of , or ,, at least some of the time.
6a21786 to
4739937
Compare
|
What do you think of moving the parsing functionality to a new gem? A bit like rubyMoney which split its parsing to https://github.com/RubyMoney/monetize |
lib/money/parser/simple.rb
Outdated
| currency = Money::Helpers.value_to_currency(currency) | ||
| return unless currency | ||
|
|
||
| amount = BigDecimal(input, exception: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings in this format are already handled by calling money directly, ex: Money.new("123.10", "USD"). Maybe we can make this slightly different by stripping white spaces and handling string with trailing dots "123."?
I would also not completely trust BigDecimal to raise exception, for example I don't think we should allow the exponential notation
> BigDecimal("10.10E2")
=> 0.101e4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Money directly adds the complexity of an app potentially having opted into deprecated behaviour and returning 0 instead of nil for invalid input, this doesn't work with the expectations from our parameter type coercion gem that is used in Shopify core and other apps.
This parser is meant for well-formatted data so I am wary of normalizing here to paper over bugs from other people. We can loosen restrictions later if we learn it's needed (the opposite is much harder).
Good call on the scientific notation, I added something for that 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opted into deprecated behaviour and returning 0 instead of nil
If we're already expecting a well formatted string, could we get them to not opt-in to that deprecated behaviour, instead of adding a new parser? IMO nil -> 0 should already never be happening and is fairly easy to do manually if needed Money.new(amount || 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow. Are you suggesting to remove the deprecation for invalid input for Money.new?
Assuming an app hasn't opted in to the deprecated path, Money.new accepts more input than we want for APIs. Money.new will raise on invalid input where our typed parameter code (see related PR) and GraphQL types need a nil returned. We could rescue for those cases but that's less performant for most inputs.
I don't understand how Money.new(amount || 0) is of use, but if you meant Money::Parser::Simple.parse(amount) || 0 then yes that's another reason to prefer returning nil.
1d056e2 to
8387fe5
Compare
| if SIGNED_DECIMAL_MATCHER.match?(coerced) && (amount = BigDecimal(coerced, exception: false)) | ||
| Money.new(amount, currency) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should avoid using an assignment on the second part of the boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our style guide allows it. I'll look into adding our Rubocop rules to this repo because linters can do this better than humans 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a fair bit of work: #240 will circle back after this and some other outstanding PRs are merged to minimize disruption.
| if SIGNED_DECIMAL_MATCHER.match?(coerced) && (amount = BigDecimal(coerced, exception: false)) | ||
| Money.new(amount, currency) | ||
| elsif strict | ||
| raise ArgumentError, "unable to parse input=\"#{input}\" currency=\"#{currency}\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth adding a custom error IMO, would make this easier to rescue
|
Left comment about additional test |
Unlike the existing legacy MoneyParser, this one doesn't use complicated logic to try and guess if a comma or dot is used as a thousands or decimal separator. There are cases where it can never properly parse numbers, especially for currencies with 3 digits. This new parser fixes that.
We already had two and introduced two more (with different return semantics). It's better to make an informed choice for every callsite than implicitly relying on global configuration and making a change everywhere. If a change everywhere is desired for simple apps, it's as simple as replacing 'Money.parse' with 'Money::Parser::Fuzzy.parse' to keep the same behaviour.
8387fe5 to
de68c3a
Compare
Railties allow us to hook into Action View's loading process and do nothing if we're not in a Rails app. Testing them in isolation is not trivial. The behaviour we setup here however is, and the core logic is already well tested, so skipping over a few lines of glue code is an acceptable tradeoff.
de68c3a to
4b478f2
Compare
This PR does a few things based on a couple of use cases that emerged in Shopify:
First it introduces a parser that is locale-aware and can properly parse amounts with 3 digits, for the currencies Bahraini dinar, Iraqi dinar, Jordanian dinar, Kuwaiti dinar, Libyan dinar, Omani rial and Tunisian dinar. This parser also normalizes fullwidth characters, this is useful for e.g. Japanese users where their phone keyboards might lead them to entering these characters. In the future normalizing other digits from the Unicode Character Category 'Number, Decimal Digit' could be possible but for now we're just solving the use case we've actually run into.
Second, it introduces a 'simple' parser for API cases (which can also handle 3 decimal currencies) in order to be used in conjunction Shopify's gem that adds type coercion to controller parameters. We had a recent issue where the Fuzzy parser was used in a scenario like this, multiplying amounts by a 1000 , and requiring an extensive data cleanup effort.