Skip to content

Commit 11f64b3

Browse files
authored
feat: lint pattern to extract and process logs (capture invalid sql) (#6407)
## 📝 Summary Captures log information and formats it as a lint error if possible: <img width="858" height="308" alt="image" src="https://github.com/user-attachments/assets/38a9a9bf-7856-4641-8841-c19cf4d8d49c" /> Added a `SQLErrorMetadata(TypedDict)` we can potentially surface elsewhere: https://github.com/marimo-team/marimo/pull/6407/files#diff-ddd9a90856776e58792b931fec7c9a68366432991b6e9b0957f5731d0ae0aed6R22 A bit more churn here because manually generating the graph opposed to going through `cell_manager` gives cell level granularity for log and error collection. Commit history is a bit messy because it has just been in the backlog, but let me know if you'd appreciate a rebase
1 parent fe873a7 commit 11f64b3

File tree

22 files changed

+864
-84
lines changed

22 files changed

+864
-84
lines changed

development_docs/adding_lint_rules.md

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -309,7 +309,11 @@ Bad: `"Multiple definition error"`
309309

310310
### Fixability
311311

312-
TODO: Right now, the only fixable errors are via re-serialization.
312+
Safely fixable errors are applied by default via re-serialization. However,
313+
rules may implement unsafe fixes (mutating the notebook structure) that require
314+
the `--unsafe-fixes` flag. To do this, implement an `async def
315+
apply_unsafe_fixes(self, notebook, diagnostics) -> Notebook` method in your
316+
rule class, and inherit from `UnsafeFixRule` instead of `LintRule`.
313317

314318
## Common Patterns
315319

@@ -388,6 +392,22 @@ Before submitting your rule:
388392
- [ ] Error messages are clear and actionable
389393
- [ ] Performance is reasonable for large notebooks
390394

391-
## Example: Simple Rule Implementation
395+
## Examples
392396

393-
Here's a complete example of a simple rule that checks for syntax errors: https://github.com/marimo-team/marimo/pull/6384
397+
### Simple Rule Implementation
398+
399+
Example of a simple rule that checks for syntax errors: https://github.com/marimo-team/marimo/pull/6384
400+
401+
### Rule Implementation with `--unsafe-fixes`
402+
403+
Some rules may have "fixes" that mutate the notebook structure.
404+
An example of a rule that mutates the notebook structure (i.e. an unsafe fix) by removing empty cells: https://github.com/marimo-team/marimo/pull/6398
405+
406+
### Rule Implementation with Log Context
407+
408+
Some parsing issues result in log warnings or errors.
409+
An example of a rule that hooks into issued logs can be found here:
410+
411+
**Note**: Resist intentionally adding log statements such that they trigger
412+
lint rules. Log statements should be added only when they provide useful
413+
context, on notebook startup.

docs/guides/lint_rules/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,8 @@ These are style and formatting issues.
4343
| [MF002](rules/parse_stdout.md) | parse-stdout | Parse captured stdout during notebook loading ||
4444
| [MF003](rules/parse_stderr.md) | parse-stderr | Parse captured stderr during notebook loading ||
4545
| [MF004](rules/empty_cells.md) | empty-cells | Empty cells that can be safely removed. | ⚠️ |
46+
| [MF005](rules/sql_parse_error.md) | sql-parse-error | SQL parsing errors during dependency analysis ||
47+
| [MF006](rules/misc_log_capture.md) | misc-log-capture | Miscellaneous log messages during processing ||
4648

4749
## Legend
4850

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# MF006: misc-log-capture
2+
3+
**Formatting** ❌ Not Fixable
4+
5+
MF006: Miscellaneous log messages during processing.
6+
7+
## What it does
8+
9+
Captures warning and error level log messages that aren't handled by
10+
other specific log rules and creates diagnostics to surface them.
11+
12+
## Why is this bad?
13+
14+
Unhandled log messages may indicate:
15+
- Unexpected issues during notebook processing
16+
- Configuration problems
17+
- Library warnings that affect execution
18+
- Performance or resource issues
19+
20+
## Examples
21+
22+
**Triggered by:**
23+
- General warnings from imported libraries
24+
- Configuration issues
25+
- Unexpected errors during processing
26+
27+
## References
28+
29+
- [Understanding Errors](https://docs.marimo.io/guides/understanding_errors/)
30+
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# MF005: sql-parse-error
2+
3+
**Formatting** ❌ Not Fixable
4+
5+
MF005: SQL parsing errors during dependency analysis.
6+
7+
## What it does
8+
9+
Captures SQL parsing error logs and creates diagnostics pointing to
10+
problematic SQL statements in cells.
11+
12+
## Why is this bad?
13+
14+
SQL parsing failures can lead to:
15+
- Incorrect dependency analysis for SQL-using cells
16+
- Missing dataframe references in dependency graph
17+
- Reduced effectiveness of reactive execution
18+
- Potential runtime errors when SQL is executed
19+
20+
## Examples
21+
22+
**Triggered by:**
23+
- Invalid SQL syntax in cell code
24+
- Unsupported SQL dialects or extensions
25+
- Complex SQL that exceeds parser capabilities
26+
27+
## References
28+
29+
- [Understanding Errors](https://docs.marimo.io/guides/understanding_errors/)
30+
- [SQL Support](https://docs.marimo.io/guides/sql/)
31+

examples/sql/connect_to_motherduck.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ def _(MONTHS, duckdb, mo):
212212
"""
213213
SELECT DISTINCT type as 'HN Type'
214214
FROM sample_data.hn.hacker_news
215-
WHERE score NOT NULL AND descendants NOT NULL
215+
WHERE score IS NOT NULL AND descendants IS NOT NULL
216216
LIMIT 10;
217217
"""
218218
).df()
@@ -259,8 +259,7 @@ def _(hn_type_select, mo, month_list):
259259
AND
260260
MONTH(timestamp) in ({month_list})
261261
AND
262-
descendants NOT NULl
263-
262+
descendants IS NOT NULL
264263
)
265264
266265
SELECT

examples/third_party/motherduck/embeddings/embeddings_explorer.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,17 @@
1616

1717
import marimo
1818

19-
__generated_with = "0.15.5"
19+
__generated_with = "0.16.0"
2020
app = marimo.App(width="medium")
2121

2222
with app.setup:
2323
# Use this notebook to follow along with the tutorial at
2424
# https://motherduck.com/blog/MotherDuck-Visualize-Embeddings-Marimo/
2525
import marimo as mo
2626

27+
28+
29+
30+
2731
if __name__ == "__main__":
2832
app.run()

marimo/_ast/sql_utils.py

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Copyright 2025 Marimo. All rights reserved.
22

3-
from typing import Literal, Optional, Union
3+
import ast
4+
import re
5+
from typing import Callable, Literal, Optional, TypedDict, Union
46

57
from marimo import _loggers
68
from marimo._dependencies.dependencies import DependencyManager
@@ -17,6 +19,20 @@
1719
]
1820

1921

22+
class SQLErrorMetadata(TypedDict):
23+
"""Structured metadata for SQL parsing errors."""
24+
25+
lint_rule: str
26+
error_type: str
27+
clean_message: str # Just the meaningful error without SQL trace
28+
node_lineno: int
29+
node_col_offset: int
30+
sql_statement: str # Truncated if needed
31+
sql_line: Optional[int] # 0-based line within SQL
32+
sql_col: Optional[int] # 0-based column within SQL
33+
context: str
34+
35+
2036
def classify_sql_statement(
2137
sql_statement: str, dialect: Optional[SQLGLOT_DIALECTS] = None
2238
) -> Union[SQL_TYPE, Literal["unknown"]]:
@@ -33,7 +49,14 @@ def classify_sql_statement(
3349
with _loggers.suppress_warnings_logs("sqlglot"):
3450
expression_list = parse(sql_statement, dialect=dialect)
3551
except ParseError as e:
36-
LOGGER.debug(f"Unable to parse SQL. Error: {e}")
52+
log_sql_error(
53+
LOGGER.debug,
54+
message="Failed to parse SQL statement for classification.",
55+
exception=e,
56+
rule_code="MF005",
57+
node=None,
58+
sql_content=sql_statement,
59+
)
3760
return "unknown"
3861

3962
for expression in expression_list:
@@ -52,3 +75,45 @@ def classify_sql_statement(
5275
return "DQL"
5376

5477
return "unknown"
78+
79+
80+
def log_sql_error(
81+
logger: Callable[..., None],
82+
*,
83+
message: str,
84+
exception: BaseException,
85+
rule_code: str,
86+
node: Optional[ast.expr] = None,
87+
sql_content: str = "",
88+
context: str = "",
89+
) -> None:
90+
"""Utility to log SQL-related errors with consistent metadata."""
91+
# Parse SQL position from exception message if available
92+
sql_line = None
93+
sql_col = None
94+
95+
exception_msg = str(exception)
96+
line_col_match = re.search(r"Line (\d+), Col: (\d+)", exception_msg)
97+
if line_col_match:
98+
sql_line = int(line_col_match.group(1)) - 1 # Convert to 0-based
99+
sql_col = int(line_col_match.group(2)) - 1 # Convert to 0-based
100+
101+
# Truncate long SQL content
102+
truncated_sql = sql_content
103+
if sql_content and len(sql_content) > 200:
104+
truncated_sql = sql_content[:200] + "..."
105+
106+
# Create metadata using TypedDict
107+
metadata: SQLErrorMetadata = {
108+
"lint_rule": rule_code,
109+
"error_type": type(exception).__name__,
110+
"clean_message": exception_msg.split("\n", 1)[0],
111+
"node_lineno": node.lineno if node else 0,
112+
"node_col_offset": node.col_offset if node else 0,
113+
"sql_statement": truncated_sql,
114+
"sql_line": sql_line,
115+
"sql_col": sql_col,
116+
"context": context,
117+
}
118+
119+
logger(message, exception, extra=metadata)

marimo/_ast/sql_visitor.py

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,7 @@ def find_sql_defs(sql_statement: str) -> SQLDefs:
250250
LOGGER.warning(
251251
"Unexpected number of parts in CREATE TABLE: %s",
252252
parts,
253+
extra={"parts": parts},
253254
)
254255

255256
if is_table:
@@ -497,7 +498,6 @@ def find_sql_refs(sql_statement: str) -> set[SQLRef]:
497498
DependencyManager.sqlglot.require(why="SQL parsing")
498499

499500
from sqlglot import exp, parse
500-
from sqlglot.errors import ParseError
501501
from sqlglot.optimizer.scope import build_scope
502502

503503
def get_ref_from_table(table: exp.Table) -> Optional[SQLRef]:
@@ -507,7 +507,6 @@ def get_ref_from_table(table: exp.Table) -> Optional[SQLRef]:
507507
catalog_name = table.catalog or None
508508

509509
if table_name is None:
510-
LOGGER.warning("Table name cannot be found in the SQL statement")
511510
return None
512511

513512
# Check if the table name looks like a URL or has a file extension.
@@ -520,12 +519,9 @@ def get_ref_from_table(table: exp.Table) -> Optional[SQLRef]:
520519
table=table_name, schema=schema_name, catalog=catalog_name
521520
)
522521

523-
try:
524-
with _loggers.suppress_warnings_logs("sqlglot"):
525-
expression_list = parse(sql_statement, dialect="duckdb")
526-
except ParseError as e:
527-
LOGGER.error(f"Unable to parse SQL. Error: {e}")
528-
return set()
522+
# May raise a ParseError
523+
with _loggers.suppress_warnings_logs("sqlglot"):
524+
expression_list = parse(sql_statement, dialect="duckdb")
529525

530526
refs: set[SQLRef] = set()
531527

marimo/_ast/visitor.py

Lines changed: 47 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
from marimo import _loggers
1414
from marimo._ast.errors import ImportStarError
15+
from marimo._ast.sql_utils import log_sql_error
1516
from marimo._ast.sql_visitor import (
1617
SQLDefs,
1718
SQLKind,
@@ -26,8 +27,10 @@
2627

2728
LOGGER = _loggers.marimo_logger()
2829

30+
2931
Name = str
3032

33+
3134
Language = Literal["python", "sql"]
3235

3336

@@ -632,6 +635,18 @@ def visit_Call(self, node: ast.Call) -> ast.Call:
632635
and sql
633636
):
634637
import duckdb # type: ignore[import-not-found,import-untyped,unused-ignore] # noqa: E501
638+
639+
# Import ParseError outside try block so we can except it
640+
# Use a Union approach to handle both cases
641+
ParseErrorType: type[Exception]
642+
if DependencyManager.sqlglot.has():
643+
from sqlglot.errors import ParseError as SQLGLOTParseError
644+
645+
ParseErrorType = SQLGLOTParseError
646+
else:
647+
# Fallback when sqlglot is not available
648+
ParseErrorType = Exception
649+
635650
# TODO: Handle other SQL languages
636651
# TODO: Get the engine so we can differentiate tables in diff engines
637652

@@ -655,7 +670,14 @@ def visit_Call(self, node: ast.Call) -> ast.Call:
655670
# We catch base exceptions because we don't want to
656671
# fail due to bugs in duckdb -- users code should
657672
# be saveable no matter what
658-
LOGGER.warning("Unexpected duckdb error %s", e)
673+
log_sql_error(
674+
LOGGER.warning,
675+
message=f"Unexpected duckdb error {e}",
676+
exception=e,
677+
node=node,
678+
rule_code="MF005",
679+
sql_content=sql,
680+
)
659681
self.generic_visit(node)
660682
return node
661683

@@ -667,7 +689,15 @@ def visit_Call(self, node: ast.Call) -> ast.Call:
667689
except duckdb.ProgrammingError:
668690
sql_defs = SQLDefs()
669691
except BaseException as e:
670-
LOGGER.warning("Unexpected duckdb error %s", e)
692+
log_sql_error(
693+
LOGGER.warning,
694+
message=f"Unexpected duckdb error {e}",
695+
exception=e,
696+
node=node,
697+
rule_code="MF005",
698+
sql_content=sql,
699+
context="sql_defs_extraction",
700+
)
671701
sql_defs = SQLDefs()
672702

673703
defined_names = set()
@@ -699,13 +729,23 @@ def visit_Call(self, node: ast.Call) -> ast.Call:
699729

700730
sql_refs: set[SQLRef] = set()
701731
try:
732+
# Take results
702733
sql_refs = find_sql_refs_cached(statement.query)
703-
except (duckdb.ProgrammingError, duckdb.IOException):
704-
LOGGER.debug(
705-
"Error parsing SQL statement: %s", statement.query
734+
except (
735+
duckdb.ProgrammingError,
736+
duckdb.IOException,
737+
ParseErrorType,
738+
BaseException,
739+
) as e:
740+
# Use first_arg (SQL string node) for accurate positioning
741+
log_sql_error(
742+
LOGGER.error,
743+
message=f"Error parsing SQL statement: {e}",
744+
exception=e,
745+
node=first_arg,
746+
rule_code="MF005",
747+
sql_content=statement.query,
706748
)
707-
except BaseException as e:
708-
LOGGER.warning("Unexpected duckdb error %s", e)
709749

710750
for ref in sql_refs:
711751
name = ref.qualified_name

0 commit comments

Comments
 (0)