Dropped unreachable failure code.#5564
Dropped unreachable failure code.#5564hporten wants to merge 1 commit intoportfolio-performance:masterfrom
Conversation
The wkn attribute is matched with \w{9} or .{9} and therefore
cannot have a length different from 9.
| .match("^(?<month>[\\w]{3}) (?<day>[\\d]{1,2}) (?<name>.*) (?<shares>[\\.,\\d]+) (?<wkn>(?!Qualified).{9}) (Qualified )?Dividend (?<amount>[\\.,\\d]+)$") // | ||
| .match("^[\\w]{3} [\\d]{2} .* [\\w]{9} (NRA Withhold|Foreign Withholding) \\((?<tax>[\\.,\\d]+)\\)$") // | ||
| .assign((t, v) -> { | ||
| v.put("date", v.get("day") + " " + v.get("month") + " " + v.get("year")); | ||
| v.put("currency", CurrencyUnit.USD); | ||
|
|
||
| t.setDateTime(asDate(v.get("date"))); | ||
| t.setShares(asShares(v.get("shares"))); | ||
| t.setAmount(asAmount(v.get("amount")) - asAmount(v.get("tax"))); | ||
| t.setCurrencyCode(asCurrencyCode(v.get("currency"))); | ||
|
|
||
| // if CUSIP lenght != 9 | ||
| if (v.get("wkn").length() != 9) | ||
| v.markAsFailure("CUSIP is maybe incorrect. " + t.getDateTime() + " " + t.getSecurity()); | ||
| else | ||
| t.setSecurity(getOrCreateSecurity(v)); | ||
| t.setSecurity(getOrCreateSecurity(v)); |
There was a problem hiding this comment.
I agree that the length check is not needed here. The string must be exactly 9 characters to match in the first place.
Here is it matching with a dot (.) that matches any character and later the string is trimmed to lookup up an instrument with this CUSIP. So it could be an invalid CUSIP. However, your commit does not change this behavior.
There was a problem hiding this comment.
Yeah, the whitespace inclusion is not pretty. As mentioned in #5563, the usage of \w would be a more precise. But in light of unknown surroundings of whitespace, the patterns would need to be embedded in \s* on both sides.
Other PP code uses (?[A-Z0-9]{9})$ for matching CUSIPs. But according to the Wikipedia article '*', '@' and '#' are allowed as well. A Stackoverflow answer therefore recommends this pattern:
^[0-9]{3}[a-zA-Z0-9]{2}[a-zA-Z0-9*@#]{3}[0-9]$
But given that the current change is only about failures vs. skips only, I did not dare to dive deeper. A more sophisticated matching would be up for someone with more example documents in hand. And due to the thumb-down that the pull request got already, I do not mind postponing the the whole change entirely.
1.) Corrected placeholder value for error message 2.) Do what the comment says and reject all lengths other than 9 3.) Fixed typo in comment. It is questionable to match CUSIPS with "(?<wkn>.*)" btw. Particularly as the dot also matches whitespace that later has to be dropped with trim() again. \w would be better but without more test cases we have to guess whether it needs to be surrounded with \s.
The wkn attribute is matched with \w{9} or .{9} and therefore cannot have a length different from 9.