-
Notifications
You must be signed in to change notification settings - Fork 6
Description
User Story
- As a data platform engineer I want to have all of the system metadata collected to improve data discovery and power data governance
Description/Context
Now that we have OpenMetadata deployed we need to populate it with metadata from all of the platform components. The data ingestion is managed with the OpenMetadata ingestion library (https://docs.open-metadata.org/latest/deployment/ingestion/external). The majority of the data sources can be managed with the connection workflows (https://docs.open-metadata.org/latest/connectors). Clicking a connector and selecting the "Run The Connector Externally" link will display the YAML configuration details.
Acceptance Criteria
Metadata from the following systems is ingested and regularly updated in our deployment of OpenMetadata
- Trino (Starburst Galaxy)
- dbt
- Dagster
- Redash
- Superset
- S3
- Iceberg
- Airbyte
Lineage information is from the following systems is ingested and maintained in OpenMetadata
- Trino (Starburst Galaxy)
- dbt
Profiling and quality information is collected from the following sources
- Trino
- Iceberg
Plan/Design
For the majority of sources we should be able to use the MetadataWorkflow object for managing ingestion from the out-of-the-box sources (https://docs.open-metadata.org/latest/deployment/ingestion/external). More detailed or custom metadata ingestion will be managed as custom Dagster assets. All of the execution will be managed via Dagster pipelines.