Skip to content

Dropped unreachable failure code.#5564

Open
hporten wants to merge 1 commit intoportfolio-performance:masterfrom
hporten:duplicated_checks
Open

Dropped unreachable failure code.#5564
hporten wants to merge 1 commit intoportfolio-performance:masterfrom
hporten:duplicated_checks

Conversation

@hporten
Copy link
Contributor

@hporten hporten commented Mar 12, 2026

The wkn attribute is matched with \w{9} or .{9} and therefore cannot have a length different from 9.

The wkn attribute is matched with \w{9} or .{9} and therefore
cannot have a length different from 9.
Comment on lines 143 to +153
.match("^(?<month>[\\w]{3}) (?<day>[\\d]{1,2}) (?<name>.*) (?<shares>[\\.,\\d]+) (?<wkn>(?!Qualified).{9}) (Qualified )?Dividend (?<amount>[\\.,\\d]+)$") //
.match("^[\\w]{3} [\\d]{2} .* [\\w]{9} (NRA Withhold|Foreign Withholding) \\((?<tax>[\\.,\\d]+)\\)$") //
.assign((t, v) -> {
v.put("date", v.get("day") + " " + v.get("month") + " " + v.get("year"));
v.put("currency", CurrencyUnit.USD);

t.setDateTime(asDate(v.get("date")));
t.setShares(asShares(v.get("shares")));
t.setAmount(asAmount(v.get("amount")) - asAmount(v.get("tax")));
t.setCurrencyCode(asCurrencyCode(v.get("currency")));

// if CUSIP lenght != 9
if (v.get("wkn").length() != 9)
v.markAsFailure("CUSIP is maybe incorrect. " + t.getDateTime() + " " + t.getSecurity());
else
t.setSecurity(getOrCreateSecurity(v));
t.setSecurity(getOrCreateSecurity(v));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the length check is not needed here. The string must be exactly 9 characters to match in the first place.

Here is it matching with a dot (.) that matches any character and later the string is trimmed to lookup up an instrument with this CUSIP. So it could be an invalid CUSIP. However, your commit does not change this behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the whitespace inclusion is not pretty. As mentioned in #5563, the usage of \w would be a more precise. But in light of unknown surroundings of whitespace, the patterns would need to be embedded in \s* on both sides.

Other PP code uses (?[A-Z0-9]{9})$ for matching CUSIPs. But according to the Wikipedia article '*', '@' and '#' are allowed as well. A Stackoverflow answer therefore recommends this pattern:

^[0-9]{3}[a-zA-Z0-9]{2}[a-zA-Z0-9*@#]{3}[0-9]$

But given that the current change is only about failures vs. skips only, I did not dare to dive deeper. A more sophisticated matching would be up for someone with more example documents in hand. And due to the thumb-down that the pull request got already, I do not mind postponing the the whole change entirely.

hporten referenced this pull request Mar 13, 2026
1.) Corrected placeholder value for error message
2.) Do what the comment says and reject all lengths other than 9
3.) Fixed typo in comment.

It is questionable to match CUSIPS with "(?<wkn>.*)" btw. Particularly
as the dot also matches whitespace that later has to be dropped with
trim() again. \w would be better but without more test cases we have
to guess whether it needs to be surrounded with \s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants