Skip to content

Conversation

@Shohail-Ismail
Copy link
Contributor

@Shohail-Ismail Shohail-Ismail commented Aug 25, 2025

Description

Adds support for backfilling German solar generation data from ENTSOE for the last 5 years and saving the results to a CSV.

Key changes:

  • Introduced _fetch_de_window for fetching arbitrary time windows from ENTSOE
  • Added fetch_de_data_range(start, end, chunk_hours) to handle multi-year ranges via chunked requests
  • Added a __main__ entrypoint in fetch_de_data.py to run a 5-year backfill directly as a script
  • Normalised solar filter to accept both B16 and A-10Y1001A1001A83H psr codes, covering both live API and test fixtures
  • CSV output written to data/de_solar/germany_solar_generation.csv

This PR keeps the existing 24-hour fetch_de_data() API intact (used elsewhere) while adding range support as a separate function.

Tests

  • Extended unit tests with:
    • Range fetch returning rows and correct schema
    • Range fetch handling empty windows gracefully
  • Patched ENTSOE_API_KEY and session calls in tests to avoid real network calls
  • Manual dry-run of the main script verified CSV creation and row counts

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation (docstrings, inline comments)
  • I have added tests that prove my feature is effective and that my changes work
  • I have checked my code and corrected any misspellings

Closes #88

@Shohail-Ismail
Copy link
Contributor Author

Shohail-Ismail commented Aug 26, 2025

Completed a few extra minor corrections for the PR. If there's any further changes you'd like me to make @peterdudfield , please let me know.

@Shohail-Ismail
Copy link
Contributor Author

Shohail-Ismail commented Sep 10, 2025

@peterdudfield Would appreciate a review when you're free.

@Shohail-Ismail
Copy link
Contributor Author

Shohail-Ismail commented Sep 15, 2025

Updates:

Code

  • Moved backfill entrypoint into a new script de_export.py, which fetches from 01/01/2020 to yesterday and writes results to data/de_solar/germany_solar_generation.csv
  • Corrected bidding zone to '10Y1001A1001A82H' in API requests and fixed solar PSR filter to accept both B16 and 'A-10Y1001A1001A83H' as per ENTSOE's new API docs

Tests

  • Modified XML schema to fit the new API docs better, and accordingly adjusted the tests
  • Fixed test_http_error to raise properly on 500 responses

@@ -0,0 +1 @@
target_datetime_utc,solar_generation_kw,tso_zone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove this csv, its only the headings, and I think its more confusion. The code should add headings if needed

from solar_consumer.data.fetch_de_data import fetch_de_data_range


def main():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you move this to a scripts folder please

@peterdudfield
Copy link
Contributor

Thanks, im gona ask @zakwatts to have a look at this

@zakwatts
Copy link

zakwatts commented Nov 3, 2025

Thanks @Shohail-Ismail, I've just requested an API key from ENTSOE so I can test this out. It might take a day or two for them to get back to me so I'll update you in due course.

@Shohail-Ismail
Copy link
Contributor Author

All good @zakwatts, just pushed the changes you requested. Let me know of any updates.

@zakwatts
Copy link

zakwatts commented Dec 22, 2025

Hi @Shohail-Ismail, I've had another review and think what you have written might be quite tricky for others to maintain.

Instead of creating an xml passer, we could use an existing one from entsoe-py to significantly reduce the lines of code and readability.

This library handles the chunking, XML parsing and mapping quite well and it well maintained from what I can tell.

It should allow us to go from around 300 lines of code to ~30 in the pull data function.

@zakwatts
Copy link

@zakwatts
Copy link

zakwatts commented Dec 22, 2025

You could do something like:

from entsoe import EntsoePandasClient

API_KEY = os.getenv("ENTSOE_API_KEY")

client = EntsoePandasClient(api_key=API_KEY)

def fetch_de_solar_data(start, end):
    df = client.query_generation(country_code='DE', start=start, end=end, psr_type='B16')
    
    return df

Might need to modify the final formats, and check the country code, but I think this could help simplify things quite a bit!

@zakwatts zakwatts self-requested a review December 22, 2025 14:50
…ts client, replacing ENTSOE HTTP/XML parsing
@Shohail-Ismail
Copy link
Contributor Author

Shohail-Ismail commented Jan 20, 2026

Thanks for your review and suggestions @zakwatts.

I've replaced EnstoeFileClient with EntsoePandasClient, with all ENTSOE access delegated to the latter. The core call is essentially the same (query_generation(country_code="DE", …)), with additions including a wrapper to preserve the existing setup (for easier maintenance) and chunking for large ranges.

Additionally, the format is normalised to UTC, kW, tso_zone, with corresponding tests updated and 2 new tests added:

  • Ensuring that datetime without timezone (tz-naive) indices returned by client maintain the expected output format datetime64[ns, UTC] by normalising to UTC (as mentioned above).
  • Ensuring that sub-hourly (eg., 15-minute) generation data is preserved 1:1 and not implicitly resampled/aggregated.

Regarding the CI failure - as far as I can tell, this is due to a live integration test against the Elia API (Belgium) causing an intermittent failure - fix for this seems relatively simple (add a skip guard to integration/test_fetch_be_forecast.py unless an env flag is set), though not sure if it is within the scope of the PR.


Would appreciate a review and any further thoughts when you have some time @zakwatts, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collect 5 years of german solar data

3 participants