-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat(ingestion/airflow) Add Airflow 3.x support #13790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
|
||
| from datahub.utilities._markupsafe_compat import MARKUPSAFE_PATCHED | ||
|
|
||
| assert MARKUPSAFE_PATCHED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dangerous use of assert - low severity
When running Python in production in optimized mode, assert calls are not executed. This mode is enabled by setting the PYTHONOPTIMIZE command line flag. Optimized mode is usually ON in production. Any safety check done using assert will not be executed.
Remediation: Raise an exception instead of using assert.
View details in Aikido Security
|
|
||
| dagrun: "DagRun" = _get_dagrun_from_task_instance(task_instance) | ||
| task = task_instance.task | ||
| assert task is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dangerous use of assert - low severity
When running Python in production in optimized mode, assert calls are not executed. This mode is enabled by setting the PYTHONOPTIMIZE command line flag. Optimized mode is usually ON in production. Any safety check done using assert will not be executed.
Remediation: Raise an exception instead of using assert.
View details in Aikido Security
| logger.debug(f"Completed emitting all DataFlow MCPs for {dataflow.urn}") | ||
|
|
||
| if dag.dag_id == _DATAHUB_CLEANUP_DAG: | ||
| assert self.graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dangerous use of assert - low severity
When running Python in production in optimized mode, assert calls are not executed. This mode is enabled by setting the PYTHONOPTIMIZE command line flag. Optimized mode is usually ON in production. Any safety check done using assert will not be executed.
Remediation: Raise an exception instead of using assert.
View details in Aikido Security
| f"DataHub listener got notification about dag run start for {dag_run.dag_id}" | ||
| ) | ||
|
|
||
| assert dag_run.dag_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dangerous use of assert - low severity
When running Python in production in optimized mode, assert calls are not executed. This mode is enabled by setting the PYTHONOPTIMIZE command line flag. Optimized mode is usually ON in production. Any safety check done using assert will not be executed.
Remediation: Raise an exception instead of using assert.
View details in Aikido Security
Summary
Adds comprehensive support for Apache Airflow 3.x with a clean, version-specific architecture that eliminates complex compatibility shims and provides robust, maintainable code for both Airflow 2.x
and 3.x.
🏗️ Architecture: Separate Implementations for Clean Type Safety
The plugin now has separate implementations for Airflow 2.x and 3.x (instead of version conditionals scattered throughout code):
Benefits:
🚀 Key Technical Improvements
📦 Installation (IMPORTANT)
Users MUST specify the appropriate extra when installing:
For Airflow 2.x (2.7+)
pip install 'acryl-datahub-airflow-plugin[plugin-v2]'
For Airflow 3.x (3.0+)
pip install 'acryl-datahub-airflow-plugin[plugin-v2-airflow3]'
For Airflow 3.0.x specifically (pydantic issue)
pip install 'acryl-datahub-airflow-plugin[plugin-v2-airflow3]' 'pydantic>=2.11.8'
Why different extras? Airflow 2.x and 3.x have different OpenLineage dependencies:
Installing without the appropriate extra will result in missing OpenLineage dependencies and lineage extraction will not work.
🔧 Airflow 3.x Specific Changes
API & Configuration
Database Access Restrictions
Hook & Parameter Changes
✅ Testing & Compatibility
Test Coverage:
Compatibility Matrix:
📚 Documentation
Updated Files
Migration Guide Covers
Installation Method Change:
Users must now specify the appropriate extra when installing the plugin. Installing with just pip install acryl-datahub-airflow-plugin (without extras) will NOT work - OpenLineage dependencies will
be missing.
For Airflow 2.x users:
Before (may have worked):
pip install acryl-datahub-airflow-plugin
Now (required):
pip install 'acryl-datahub-airflow-plugin[plugin-v2]'
For Airflow 3.x users:
Required:
pip install 'acryl-datahub-airflow-plugin[plugin-v2-airflow3]'
Other Changes for Airflow 3.x:
🎯 Code Quality Improvements
📝 Additional Notes