Skip to content

PDF Extractor for Oberbank documents#5559

Open
iAppDeveloper88 wants to merge 10 commits intoportfolio-performance:masterfrom
iAppDeveloper88:feat-pdf-oberbank
Open

PDF Extractor for Oberbank documents#5559
iAppDeveloper88 wants to merge 10 commits intoportfolio-performance:masterfrom
iAppDeveloper88:feat-pdf-oberbank

Conversation

@iAppDeveloper88
Copy link

Started working on issue #5548
So far buy and sell documents are working. I would appreciate a short feedback if the implementation is OK before i continue with the other documents.
This is just a draft, so I did not check formatting and comments too much.

Copy link
Contributor

@gamue gamue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing. I added a few comments from what I know that should be adjusted.

Personally think it'll be fine to merge the new extractor step-by-step, however I'm not a team-member, so not my decision ;)

new AssertImportActions().check(results, "EUR");

// check security
var security = results.stream().filter(SecurityItem.class::isInstance).findFirst()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the old assertion style, please use the newer one. See https://github.com/portfolio-performance/portfolio/blob/master/CONTRIBUTING.md#pdf-importers for an example. You can also check other (recent) tests-methods e.g. in Scalable


var pdfTransaction = new Transaction<BuySellEntry>();

// var firstRelevantLine = new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove not necessary comments

})

// Is type --> "Verkauf" change from BUY to SELL
.section("type").optional() //
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that really optional?

// Wertpapiernummer Bezeichnung Nominale/Stück
// CA09228F1036 BlackBerry Ltd. Zugang Stk . 14,00
// Registered Shares o.N.
// Kurs 19,098 EUR Kurswert EUR 267,37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this line isn't necessary for the security, or?

section -> section //
.attributes("isin", "name", "nameContinued", "local", "shares") //
.find("^Wertpapiernummer Bezeichnung Nominale/St.ck$") //
.match("^(?<isin>[A-Z]{2}[A-Z0-9]{9}[0-9]) (?<name>.*) (Zugang|Abgang) (?<local>Stk)\\s*\\.\\s+(?<shares>[\\.,\\d]+)$") //
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add Abgang to the comment above as well as example

// Kurs 19,098 EUR Kurswert EUR 267,37
// @formatter:on
section -> section //
.attributes("isin", "name", "nameContinued", "local", "shares") //
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see other extractors (like ScalableCapitalPDFExtractor). It's important to have the sections split in the same way, even that adds code-duplication. shares should be it's own section

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it like that first but I changed it as it looked so wrong ignoring shares in this regex just have to the exact same one again for it. But I understand that its important for consistency.


// @formatter:off
// Kupon 4,55 % jährlich Stückzinsen f. 166 Tage EUR 165,55

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary blank line

@iAppDeveloper88
Copy link
Author

I think it would make sense to merge a first version as soon as all the single transaction documents are done. I don't think it will take that long to finish that anyway.

@iAppDeveloper88
Copy link
Author

I just noticed that there is already a DreiBankenEDVPDFExtractor. Oberbank is part of the "3-Banken-Gruppe", but this PDFExtractor is only covering BKS Bank.
Should the logic for Oberbank be included in DreiBankenEDVPDFExtractor instead of creating a new PDFExtractor as the documents are very similar (but not identical)?

@gamue
Copy link
Contributor

gamue commented Mar 9, 2026

Die „3 Banken Gruppe“ besteht mit Oberbank, BTV Vier Länder Bank AG und BKS Bank aus drei selbstständigen und unabhängigen Regionalbanken. Die 3 Banken sind im Verbund in allen österreichischen Bundesländern und im angrenzenden Ausland aktiv.

They sound to be independent. There are similar cases already like Audi-Bank and Volkswagen-Bank, which have connections and use similar patterns. however they might split in the future or similar, so better to keep it as own classes imo.
Maybe even rename the other one to BksBank when you say it's just for this atm

Copy link
Member

@Nirus2000 Nirus2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
my comments to this importer
rename the importer to OberbankAG.....

Comment on lines +96 to +114
section -> section //
.attributes("isin", "name", "nameContinued") //
.match("^(?<isin>[A-Z]{2}[A-Z0-9]{9}[0-9]) (?<name>.*) (Zugang|Abgang) Stk\\s*\\.\\s+[\\.,\\d]+$") //
.match("^(?<nameContinued>.*)$") //
.assign((t, v) -> {
t.setSecurity(getOrCreateSecurity(v));
}),
// @formatter:off
// Wertpapiernummer Bezeichnung Nominale/Stück
// AT000B127337 Oberbank AG Zugang EUR 8.000,00
// Nachr. Anleihe 2023-2031
// @formatter:on
section -> section //
.attributes("isin", "name", "nameContinued") //
.match("^(?<isin>[A-Z]{2}[A-Z0-9]{9}[0-9]) (?<name>.*) (Zugang|Abgang) [A-Z]{3}\\s+[\\.,\\d]+$") //
.match("^(?<nameContinued>.*)$") //
.assign((t, v) -> {
t.setSecurity(getOrCreateSecurity(v));
}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing currency to create a security ... see getOrCreateSecurity function

Comment on lines +293 to +331
private void addDeliveryInOutBoundTransaction()
{
final var type = new DocumentType("(Durchf.hrungsanzeig\\s*e\\s+" //
+ "(Freier Erhalt" //
+ "|Freie Lieferung))");
this.addDocumentTyp(type);

var pdfTransaction = new Transaction<PortfolioTransaction>();

// Delivery inbound and outbound documents have multiple pages. The
// first two start with the same line,
// e.g.: Durchführungsanzeig e Freier Erhalt
//
// Repeated occurrences must be ignored to prevent the creation of
// duplicate blocks.
var startsWith = Pattern.compile("^Durchf.hrungsanzeig\\s*e\\s+(Freie Lieferung|Freier Erhalt)$");
var splittingStrategy = (SplittingStrategy) lines -> {
var blockIdentifiers = new HashSet<String>();

// first: find the start of the blocks
var blockStarts = new ArrayList<Integer>();

for (var ii = 0; ii < lines.length; ii++)
{
var matcher = startsWith.matcher(lines[ii]);
if (matcher.matches() && blockIdentifiers.add(lines[ii]))
blockStarts.add(ii);
}

// second: convert to line spans
var spans = new ArrayList<LineSpan>();
for (var ii = 0; ii < blockStarts.size(); ii++)
{
int startLine = blockStarts.get(ii);
var endLine = ii + 1 < blockStarts.size() ? blockStarts.get(ii + 1) - 1 : lines.length - 1;
spans.add(new LineSpan(startLine, endLine));
}
return spans;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope... look at other importers to fix this problem.
See .rangeAs or in the document class

Comment on lines +363 to +368
.section("isin", "name", "nameContinued") //
.match("^(?<isin>[A-Z]{2}[A-Z0-9]{9}[0-9]) (?<name>.*) (Zugang|Abgang) Stk\\s*\\.\\s+[\\.,\\d]+$") //
.match("^(?<nameContinued>.*)$") //
.assign((t, v) -> {
t.setSecurity(getOrCreateSecurity(v));
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing currency

// @formatter:on
.section("note").optional() //
.match("^(?<note>Auftrags-Nr\\. \\d+)-[\\d]{2}\\.[\\d]{2}\\.[\\d]{4}$") //
.assign((t, v) -> t.setNote(trim(v.get("note"))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trim() for what?

@iAppDeveloper88
Copy link
Author

Thank you for your comments @Nirus2000. I resolved most of them already; to some of them i added a question. I will rename the importer after i fixed all issues from your review.
The TODO is because i have some more dividend documents (for "Thesaurierende und Ausschüttende Fonds/ETFs"). It wont be there in my final pull request

@iAppDeveloper88 iAppDeveloper88 marked this pull request as ready for review March 19, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants