PDF Extractor for Oberbank documents#5559
PDF Extractor for Oberbank documents#5559iAppDeveloper88 wants to merge 10 commits intoportfolio-performance:masterfrom
Conversation
| new AssertImportActions().check(results, "EUR"); | ||
|
|
||
| // check security | ||
| var security = results.stream().filter(SecurityItem.class::isInstance).findFirst() |
There was a problem hiding this comment.
this is the old assertion style, please use the newer one. See https://github.com/portfolio-performance/portfolio/blob/master/CONTRIBUTING.md#pdf-importers for an example. You can also check other (recent) tests-methods e.g. in Scalable
|
|
||
| var pdfTransaction = new Transaction<BuySellEntry>(); | ||
|
|
||
| // var firstRelevantLine = new |
There was a problem hiding this comment.
please remove not necessary comments
| }) | ||
|
|
||
| // Is type --> "Verkauf" change from BUY to SELL | ||
| .section("type").optional() // |
| // Wertpapiernummer Bezeichnung Nominale/Stück | ||
| // CA09228F1036 BlackBerry Ltd. Zugang Stk . 14,00 | ||
| // Registered Shares o.N. | ||
| // Kurs 19,098 EUR Kurswert EUR 267,37 |
There was a problem hiding this comment.
think this line isn't necessary for the security, or?
| section -> section // | ||
| .attributes("isin", "name", "nameContinued", "local", "shares") // | ||
| .find("^Wertpapiernummer Bezeichnung Nominale/St.ck$") // | ||
| .match("^(?<isin>[A-Z]{2}[A-Z0-9]{9}[0-9]) (?<name>.*) (Zugang|Abgang) (?<local>Stk)\\s*\\.\\s+(?<shares>[\\.,\\d]+)$") // |
There was a problem hiding this comment.
maybe add Abgang to the comment above as well as example
| // Kurs 19,098 EUR Kurswert EUR 267,37 | ||
| // @formatter:on | ||
| section -> section // | ||
| .attributes("isin", "name", "nameContinued", "local", "shares") // |
There was a problem hiding this comment.
please see other extractors (like ScalableCapitalPDFExtractor). It's important to have the sections split in the same way, even that adds code-duplication. shares should be it's own section
There was a problem hiding this comment.
I had it like that first but I changed it as it looked so wrong ignoring shares in this regex just have to the exact same one again for it. But I understand that its important for consistency.
|
|
||
| // @formatter:off | ||
| // Kupon 4,55 % jährlich Stückzinsen f. 166 Tage EUR 165,55 | ||
|
|
|
I think it would make sense to merge a first version as soon as all the single transaction documents are done. I don't think it will take that long to finish that anyway. |
|
I just noticed that there is already a DreiBankenEDVPDFExtractor. Oberbank is part of the "3-Banken-Gruppe", but this PDFExtractor is only covering BKS Bank. |
They sound to be independent. There are similar cases already like Audi-Bank and Volkswagen-Bank, which have connections and use similar patterns. however they might split in the future or similar, so better to keep it as own classes imo. |
2463df1 to
f64f7cc
Compare
Nirus2000
left a comment
There was a problem hiding this comment.
Hello,
my comments to this importer
rename the importer to OberbankAG.....
name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/OberbankPDFExtractor.java
Outdated
Show resolved
Hide resolved
| section -> section // | ||
| .attributes("isin", "name", "nameContinued") // | ||
| .match("^(?<isin>[A-Z]{2}[A-Z0-9]{9}[0-9]) (?<name>.*) (Zugang|Abgang) Stk\\s*\\.\\s+[\\.,\\d]+$") // | ||
| .match("^(?<nameContinued>.*)$") // | ||
| .assign((t, v) -> { | ||
| t.setSecurity(getOrCreateSecurity(v)); | ||
| }), | ||
| // @formatter:off | ||
| // Wertpapiernummer Bezeichnung Nominale/Stück | ||
| // AT000B127337 Oberbank AG Zugang EUR 8.000,00 | ||
| // Nachr. Anleihe 2023-2031 | ||
| // @formatter:on | ||
| section -> section // | ||
| .attributes("isin", "name", "nameContinued") // | ||
| .match("^(?<isin>[A-Z]{2}[A-Z0-9]{9}[0-9]) (?<name>.*) (Zugang|Abgang) [A-Z]{3}\\s+[\\.,\\d]+$") // | ||
| .match("^(?<nameContinued>.*)$") // | ||
| .assign((t, v) -> { | ||
| t.setSecurity(getOrCreateSecurity(v)); | ||
| })) |
There was a problem hiding this comment.
Missing currency to create a security ... see getOrCreateSecurity function
name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/OberbankPDFExtractor.java
Outdated
Show resolved
Hide resolved
name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/OberbankPDFExtractor.java
Outdated
Show resolved
Hide resolved
name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/OberbankPDFExtractor.java
Show resolved
Hide resolved
| private void addDeliveryInOutBoundTransaction() | ||
| { | ||
| final var type = new DocumentType("(Durchf.hrungsanzeig\\s*e\\s+" // | ||
| + "(Freier Erhalt" // | ||
| + "|Freie Lieferung))"); | ||
| this.addDocumentTyp(type); | ||
|
|
||
| var pdfTransaction = new Transaction<PortfolioTransaction>(); | ||
|
|
||
| // Delivery inbound and outbound documents have multiple pages. The | ||
| // first two start with the same line, | ||
| // e.g.: Durchführungsanzeig e Freier Erhalt | ||
| // | ||
| // Repeated occurrences must be ignored to prevent the creation of | ||
| // duplicate blocks. | ||
| var startsWith = Pattern.compile("^Durchf.hrungsanzeig\\s*e\\s+(Freie Lieferung|Freier Erhalt)$"); | ||
| var splittingStrategy = (SplittingStrategy) lines -> { | ||
| var blockIdentifiers = new HashSet<String>(); | ||
|
|
||
| // first: find the start of the blocks | ||
| var blockStarts = new ArrayList<Integer>(); | ||
|
|
||
| for (var ii = 0; ii < lines.length; ii++) | ||
| { | ||
| var matcher = startsWith.matcher(lines[ii]); | ||
| if (matcher.matches() && blockIdentifiers.add(lines[ii])) | ||
| blockStarts.add(ii); | ||
| } | ||
|
|
||
| // second: convert to line spans | ||
| var spans = new ArrayList<LineSpan>(); | ||
| for (var ii = 0; ii < blockStarts.size(); ii++) | ||
| { | ||
| int startLine = blockStarts.get(ii); | ||
| var endLine = ii + 1 < blockStarts.size() ? blockStarts.get(ii + 1) - 1 : lines.length - 1; | ||
| spans.add(new LineSpan(startLine, endLine)); | ||
| } | ||
| return spans; | ||
| }; |
There was a problem hiding this comment.
Nope... look at other importers to fix this problem.
See .rangeAs or in the document class
| .section("isin", "name", "nameContinued") // | ||
| .match("^(?<isin>[A-Z]{2}[A-Z0-9]{9}[0-9]) (?<name>.*) (Zugang|Abgang) Stk\\s*\\.\\s+[\\.,\\d]+$") // | ||
| .match("^(?<nameContinued>.*)$") // | ||
| .assign((t, v) -> { | ||
| t.setSecurity(getOrCreateSecurity(v)); | ||
| }) |
name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/OberbankPDFExtractor.java
Outdated
Show resolved
Hide resolved
name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/OberbankAGPDFExtractor.java
Show resolved
Hide resolved
| // @formatter:on | ||
| .section("note").optional() // | ||
| .match("^(?<note>Auftrags-Nr\\. \\d+)-[\\d]{2}\\.[\\d]{2}\\.[\\d]{4}$") // | ||
| .assign((t, v) -> t.setNote(trim(v.get("note")))) |
|
Thank you for your comments @Nirus2000. I resolved most of them already; to some of them i added a question. I will rename the importer after i fixed all issues from your review. |
24b64a3 to
7f82160
Compare
…split and cancellation
3746e65 to
96322b4
Compare
Started working on issue #5548
So far buy and sell documents are working. I would appreciate a short feedback if the implementation is OK before i continue with the other documents.
This is just a draft, so I did not check formatting and comments too much.