Skip to content

Conversation

@mabdh
Copy link
Member

@mabdh mabdh commented Nov 25, 2021

  • Add google logadmin API integration basic function
  • Usage feature will be a feature flag and pre-calculated before extracting BQ tables and columns
  • Integrate with the main bigquery flow
  • Calculate table usage
  • Calculate common join usage
  • Parse sql query and get these information:
    • Join conditions (detect ON/USING keyword and store it as plain string condition)
      • e.g. ON t1.column1 = t2.column3 AND t1.column4 = t2.column5
    • Filter conditions (detect WHERE/HAVING keyword and store it as plain string condition)
      • e.g. WHERE t1.column1 IS TRUE AND event_timestamp > TIMESTAMP("2021-02-33")

Example Output

  {
        "_index": "table",
        "_type": "_doc",
        "_id": "bigquery::project_id/dataset_id/table_id",
        "_score": 1.0,
        "_source": {
          "urn": "bigquery::project_id/dataset_id/table_id",
          "name": "table_id",
          "service": "bigquery",
          "description": "",
          "data": {
            "preview": {},
            "profile": {
              "common_join": [
                {
                  "conditions": [
                    "ON target.column_1 = source.column_1 and target.param_name = source.param_name and DATE(target.event_timestamp) = DATE(source.event_timestamp)"
                  ],
                  "count": 1,
                  "urn": "bigquery::project_id_2/dataset_id_2/table_id_2"
                }
              ],
              "filter_conditions": [
                "WHERE t.param_3 = 'the_param' AND t.column_1 = \"280481a2-2384-4b81-aa3e-214ac60b31db\" AND event_timestamp >= TIMESTAMP(\"2021-10-29\", \"UTC\") AND event_timestamp < TIMESTAMP(\"2021-11-22T02:01:06Z\")"
              ],
              "usage_count": 1
            },
            "properties": {
              "attributes": {
                "dataset": "dataset_id",
                "full_qualified_name": "project_id:dataset_id.table_id",
                "partition_field": "event_timestamp",
                "project": "project_id",
                "type": "TABLE"
              },
              "labels": {
                "owner": "owner_name"
              }
            },
            "resource": {
              "name": "table_id",
              "service": "bigquery",
              "urn": "bigquery::project_id/dataset_id/table_id"
            },
            "schema": {
              "columns": [...

TODO

  • Synchronize table proto with proton (PR is here)

@mabdh mabdh force-pushed the bigquery-usage-with-sqlparser branch from 0b3c4b4 to de4da97 Compare November 25, 2021 12:02
@mabdh mabdh changed the title Bigquery usage with sqlparser (collect join conditions and filter conditions) feat(bigquery usage): bq usage with sqlparser (collect join conditions and filter conditions) Nov 25, 2021
@mabdh mabdh linked an issue Nov 26, 2021 that may be closed by this pull request
@mabdh mabdh changed the title feat(bigquery usage): bq usage with sqlparser (collect join conditions and filter conditions) feat(bigquery): extract common join conditions and filter conditions usage Nov 26, 2021
@mabdh mabdh changed the title feat(bigquery): extract common join conditions and filter conditions usage feat: add bigquery table usage, common join conditions, and filter conditions extractor Nov 26, 2021
@mabdh mabdh changed the title feat: add bigquery table usage, common join conditions, and filter conditions extractor feat: add bigquery metadata usage extractor Nov 26, 2021
@mabdh mabdh mentioned this pull request Nov 26, 2021
1 task
@mabdh mabdh marked this pull request as ready for review November 26, 2021 09:32
@mabdh mabdh force-pushed the bigquery-usage-with-sqlparser branch from 66a30b2 to d241816 Compare November 26, 2021 11:04
@mabdh mabdh added the enhancement New feature or request label Nov 29, 2021
@mabdh mabdh force-pushed the bigquery-usage-with-sqlparser branch from b02861f to 97af926 Compare November 30, 2021 11:03
@StewartJingga StewartJingga self-requested a review December 1, 2021 10:14
Copy link
Contributor

@StewartJingga StewartJingga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mabdh mabdh merged commit 32ea001 into main Dec 1, 2021
@mabdh mabdh deleted the bigquery-usage-with-sqlparser branch December 1, 2021 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add metadata extractor for bigquery_usage

3 participants