Skip to content

(Postgres DB impact)Airflow query causing long running active sessions and causing high CPU utilization #56088

@hctrd

Description

@hctrd

Apache Airflow version

3.0.3

If "Other Airflow 2 version" selected, which one?

No response

What happened?

Seeing long running active sessions in the postgres database and causing high CPU utilization.

Caused by this query:

SELECT serialized_dag.data, serialized_dag.data_compressed, serialized_dag.id, serialized_dag.dag_id, serialized_dag.created_at, serialized_dag.last_updated, serialized_dag.dag_hash, serialized_dag.dag_version_id
FROM serialized_dag
WHERE serialized_dag.dag_version_id IN (SELECT task_instance.dag_version_id
FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id
WHERE dag_run.id IN (103279, 103260, 103240, 103221, 103202, 103183, 103164, 103143, 103125, 103104, 103086, 103068, 103047, 103037, 103009, 102990, 102971, 102952, 102933, 102916, 102895, 102874, 102856, 102837, 102817, 102800, 102777, 102762, 102740, 102723, 102702, 102686, 102663, 102645, 102624, 102606, 102589, 102573, 102553, 102533, 102511, 102495, 102476, 102458, 102436, 102415, 102398, 102376, 102358, 102344, 102322, 102297, 102282, 102261, 102243, 102223, 102202, 102181, 102162, 102146, 102124, 102106, 102090, 102068, 102047, 102031, 102012, 101992, 101976, 101953, 101932, 101912, 101894, 101877, 101859, 101836, 101819, 101797, 101780, 101764, 101743, 101722, 101704, 101685, 101668, 101646, 101634, 101609, 101592, 101571, 101550, 101533, 101516, 101497, 101479, 101460, 101441, 101423, 101406, 101386, 101365, 101346, 101326, 101308, 101286, 101266, 101246, 101224, 101208, 101191, 101172, 101152, 101128, 101113, 101091, 101070, 101053, 101033, 101014, 100993, 100975, 100956, 100938, 100916, 100896, 100877, 100860, 100841, 100821, 100805, 100783, 100766, 100745, 100727, 100711, 100686, 100666, 100648, 100631, 100609, 100589, 100574, 100551, 100535, 100515, 100493, 100473, 100458, 100436, 100417, 100400, 100381, 100363, 100341, 100324, 100308, 100286, 100271, 100249, 100232, 100211, 100200, 100195, 100182, 100167, 100162, 100152, 100140, 100130, 100122, 100109, 100098, 100092, 100082, 100070, 100064, 100056, 100041, 100035, 100025, 100018, 100010, 100001, 99991, 99978, 99970, 99960, 99954, 99942, 99933, 99925, 99915, 99906, 99897, 99885, 99878, 99867, 99856, 99850, 99839, 99830, 99820, 99808, 99799, 99791, 99783, 99775, 99764, 99751, 99744, 99736, 99727, 99712, 99707, 99695, 99687, 99678, 99667, 99655, 99646, 99637, 99629, 99618, 99613, 99597, 99589, 99578, 99572, 99560, 99551, 99542, 99533, 99525, 99515, 99506, 99496, 99488, 99476, 99468, 99459, 99450, 99443, 99433, 99421, 99412, 99403, 99395, 99386, 99377, 99361, 99352, 99345, 99334, 99324, 99319, 99305, 99294, 99289, 99277, 99270, 99256, 99249, 99237, 99229, 99223, 99210, 99201, 99192, 99183, 99174, 99162, 99156, 99143, 99133, 99125, 99114, 99107, 99097, 99085, 99077, 99066, 99058, 99050, 99040, 99032, 99021, 99008, 99001, 98991, 98983, 98976, 98963, 98956, 98947, 98936, 98927, 98915, 98908, 98897, 98890, 98879, 98871, 98859, 98851, 98843, 98832, 98825, 98812, 98805, 98795, 98783, 98776, 98766, 98758, 98749, 98737, 98730, 98722, 98712, 98702, 98693, 98682, 98672, 98662, 98650, 98644, 98633, 98622, 98612, 98602, 98593, 98584, 98573, 98564, 98554, 98546, 98538, 98529, 98520, 98505, 98496, 98487, 98481, 98468, 98459, 98450, 98440, 98430, 98421, 98413, 98402, 98392, 98384, 98375, 98365, 98359, 98351, 98339, 98328, 98318, 98308, 98300, 98290, 98281, 98272) AND serialized_dag.id != '01997ae2-22b1-7959-8639-dcf8ab870d45'::uuid);

looping (i.e correlation) and and sequential scan due to serialized_dag.id present inside the inner sql block is causing the query to run for long..

What you think should happen instead?

Take out "AND serialized_dag.id != '01997ae2-22b1-7959-8639-dcf8ab870d45'::uuid)” from inner SQL and add this condition to outer/main/parent sql. This will help to reduce looping (i.e avoid correlation) and avoid sequential scan and use index scan which helps to seed-up up the execution.

We have seen significant improvement from 4+ hours execution time to 35ms with same result set as per our analysis/observation.

SELECT serialized_dag.data, serialized_dag.data_compressed, serialized_dag.id, serialized_dag.dag_id, serialized_dag.created_at, serialized_dag.last_updated, serialized_dag.dag_hash, serialized_dag.dag_version_id
FROM serialized_dag
WHERE serialized_dag.dag_version_id IN (SELECT task_instance.dag_version_id
FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id
WHERE dag_run.id IN (103279, 103260, 103240, 103221, 103202, 103183, 103164, 103143, 103125, 103104, 103086, 103068, 103047, 103037, 103009, 102990, 102971, 102952, 102933, 102916, 102895, 102874, 102856, 102837, 102817, 102800, 102777, 102762, 102740, 102723, 102702, 102686, 102663, 102645, 102624, 102606, 102589, 102573, 102553, 102533, 102511, 102495, 102476, 102458, 102436, 102415, 102398, 102376, 102358, 102344, 102322, 102297, 102282, 102261, 102243, 102223, 102202, 102181, 102162, 102146, 102124, 102106, 102090, 102068, 102047, 102031, 102012, 101992, 101976, 101953, 101932, 101912, 101894, 101877, 101859, 101836, 101819, 101797, 101780, 101764, 101743, 101722, 101704, 101685, 101668, 101646, 101634, 101609, 101592, 101571, 101550, 101533, 101516, 101497, 101479, 101460, 101441, 101423, 101406, 101386, 101365, 101346, 101326, 101308, 101286, 101266, 101246, 101224, 101208, 101191, 101172, 101152, 101128, 101113, 101091, 101070, 101053, 101033, 101014, 100993, 100975, 100956, 100938, 100916, 100896, 100877, 100860, 100841, 100821, 100805, 100783, 100766, 100745, 100727, 100711, 100686, 100666, 100648, 100631, 100609, 100589, 100574, 100551, 100535, 100515, 100493, 100473, 100458, 100436, 100417, 100400, 100381, 100363, 100341, 100324, 100308, 100286, 100271, 100249, 100232, 100211, 100200, 100195, 100182, 100167, 100162, 100152, 100140, 100130, 100122, 100109, 100098, 100092, 100082, 100070, 100064, 100056, 100041, 100035, 100025, 100018, 100010, 100001, 99991, 99978, 99970, 99960, 99954, 99942, 99933, 99925, 99915, 99906, 99897, 99885, 99878, 99867, 99856, 99850, 99839, 99830, 99820, 99808, 99799, 99791, 99783, 99775, 99764, 99751, 99744, 99736, 99727, 99712, 99707, 99695, 99687, 99678, 99667, 99655, 99646, 99637, 99629, 99618, 99613, 99597, 99589, 99578, 99572, 99560, 99551, 99542, 99533, 99525, 99515, 99506, 99496, 99488, 99476, 99468, 99459, 99450, 99443, 99433, 99421, 99412, 99403, 99395, 99386, 99377, 99361, 99352, 99345, 99334, 99324, 99319, 99305, 99294, 99289, 99277, 99270, 99256, 99249, 99237, 99229, 99223, 99210, 99201, 99192, 99183, 99174, 99162, 99156, 99143, 99133, 99125, 99114, 99107, 99097, 99085, 99077, 99066, 99058, 99050, 99040, 99032, 99021, 99008, 99001, 98991, 98983, 98976, 98963, 98956, 98947, 98936, 98927, 98915, 98908, 98897, 98890, 98879, 98871, 98859, 98851, 98843, 98832, 98825, 98812, 98805, 98795, 98783, 98776, 98766, 98758, 98749, 98737, 98730, 98722, 98712, 98702, 98693, 98682, 98672, 98662, 98650, 98644, 98633, 98622, 98612, 98602, 98593, 98584, 98573, 98564, 98554, 98546, 98538, 98529, 98520, 98505, 98496, 98487, 98481, 98468, 98459, 98450, 98440, 98430, 98421, 98413, 98402, 98392, 98384, 98375, 98365, 98359, 98351, 98339, 98328, 98318, 98308, 98300, 98290, 98281, 98272)) AND serialized_dag.id != '01997ae2-22b1-7959-8639-dcf8ab870d45'::uuid;

How to reproduce

Connect airflow to postgres and have it query multiple dag ids and observe postgres metrics.

Operating System

linux

Versions of Apache Airflow Providers

apache-airflow-providers-common-sql==1.27.3
apache-airflow-providers-fab==2.3.0
apache-airflow-providers-mysql==6.3.2
apache-airflow-providers-odbc==4.10.1
apache-airflow-providers-postgres==6.2.1
apache-airflow-providers-sendgrid==4.1.2
apache-airflow-providers-standard==1.4.1

Deployment

Official Apache Airflow Helm Chart

Deployment details

Airflow: 3.0.3

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

affected_version:3.0Issues Reported for 3.0area:MetaDBMeta Database related issues.area:corekind:bugThis is a clearly a bugpriority:highHigh priority bug that should be patched quickly but does not require immediate new release

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions