feat: add field-level access restrictions, config file support, and sensitive field scanner by drharunyuksel · Pull Request #14 · ergut/mcp-bigquery-server

drharunyuksel · 2026-03-23T12:22:51Z

Why This Matters

Data warehouses often contain highly sensitive information — patient records, social security numbers, financial data, personal contact details, and authentication secrets. When an AI agent has direct access to query a BigQuery data warehouse, there is no human in the loop to prevent it from reading sensitive columns. A simple query like SELECT * FROM patients could expose thousands of PII/PHI records in a single response.

LLM inference happens in the cloud. When the agent runs a query, the results are sent to the LLM provider's servers (Anthropic, OpenAI, etc.) for processing — they leave your network. BigQuery IAM controls who can reach your data; field restrictions control what the AI agent surfaces into LLM responses. These are different protection boundaries.

This PR gives administrators fine-grained control over which tables and columns an AI agent can access, ensuring sensitive data stays protected while still allowing the AI to perform useful analytical queries on non-sensitive fields.

Security Model: Cooperative Guardrails, Not a SQL Firewall

The field restrictions and table allowlists in this server are designed as cooperative guardrails for AI agents, not as a hard security boundary against adversarial attackers.

When the agent encounters a restriction error, it reads the guidance in the error message and reformulates its query — using aggregate functions, EXCEPT clauses, or simply dropping the restricted field. In practice, AI agents cooperate immediately and consistently.

This system uses regex-based SQL analysis to detect restricted field usage. We performed penetration testing during development and fixed several bypass vectors (struct-alias expansion, comma-join evasion, implicit SELECT *). The enforcement logic is designed to fail closed (block ambiguous queries rather than allow them). Could a deliberately crafted adversarial query still slip through? Possibly — but AI agents don't write adversarial queries. They write straightforward SQL to answer the user's question. The only time we saw data leak was during our own manual penetration testing with intentionally crafted bypass queries that no AI agent would produce in normal operation.

For environments requiring strict compliance guarantees, combine these guardrails with BigQuery's native column-level security and authorized views.

Addressing Review Feedback

This update addresses all three points from your review:

1. SQL Parsing Robustness

"Regex-based SQL parsing can be bypassed through CTEs, subqueries, or aliasing. I would like to understand how robust it is against these patterns."

We performed extensive penetration testing and fixed several bypass vectors:

Struct-alias bypass (SELECT t FROM users AS t) — returns entire row as STRUCT, now detected and blocked
Comma-join evasion (FROM table1, table2) — second table was invisible to old regex, now correctly extracted
CTE chains (WITH a AS (...), b AS (SELECT * FROM a)) — restricted field references inside CTEs are detected
Alias shadowing (FROM restricted_table AS safe SELECT safe.restricted_field) — aliases are resolved back to real tables
Implicit SELECT * (FROM table |> LIMIT 10) — no SELECT clause means all columns returned, now treated as violation
Fail-closed design — if extractReferencedTables returns empty on a data query, the query is rejected rather than allowed

2. Test Coverage

"I would prefer to have test coverage before we merge. Could you please add tests, especially for the edge cases?"

Added 92 unit tests in src/sql-enforcement.test.ts using vitest, covering:

Unit tests for all SQL parsing helpers
Cooperative guardrail tests (standard query patterns)
Adversarial bypass tests (struct-alias, nested CTEs, comma-joins, alias shadowing, subqueries)
BigQuery pipe syntax penetration tests (EXTEND, SET, DROP, RENAME, AGGREGATE)
allowedTables enforcement tests (allowlist, fail-closed, CTE filtering, INFORMATION_SCHEMA exemption)

3. allowedTables Feature

"An allowedTables list in config.json would let users restrict the agent to a specific subset."

Implemented as a full protection mode. Users set protectionMode: "allowedTables" with an allowedTables array. Queries against any unlisted table are rejected immediately. Optional per-table field restrictions via preventedFieldsInAllowedTables. INFORMATION_SCHEMA queries are always allowed for schema discovery.

What's Included

Protection Modes

The server now supports three protection modes, configured via protectionMode in config.json:

Mode	Description	When active
`off`	No protection — all tables and fields accessible	No `--config-file` flag, or explicit `"protectionMode": "off"`
`allowedTables`	Table allowlist — only listed tables can be queried	Explicit `"protectionMode": "allowedTables"`
`autoProtect`	Auto-scans for sensitive fields, enforces `preventedFields`	Explicit or config without `protectionMode` key (backward compat)

Field-Level Access Restrictions

Define preventedFields (in autoProtect mode) or preventedFieldsInAllowedTables (in allowedTables mode) to block the AI agent from accessing sensitive columns:

{
  "protectionMode": "autoProtect",
  "maximumBytesBilled": "1000000000",
  "preventedFields": {
    "healthcare.patients": ["first_name", "last_name", "ssn", "date_of_birth", "email"],
    "billing.transactions": ["credit_card_number", "bank_account"]
  }
}

When the agent tries to query a restricted field:

SELECT first_name, last_name, diagnosis FROM healthcare.patients

The server blocks the query and returns a clear, instructive error:

Restricted fields detected — table "healthcare.patients" has restricted columns:
"first_name", "last_name", "ssn", "date_of_birth", "email".
You can only use these columns inside ["count", "countif", "avg", "sum"]
aggregate functions or exclude them with SELECT * EXCEPT (...).

The error lists ALL restricted fields for the table (not just the violated ones), so the agent can fix the query in one try without a retry loop.

Supported query patterns:

Query Pattern	Behavior
`SELECT restricted_col FROM table`	Blocked with error message
`SELECT * FROM table`	Blocked (would expose restricted fields)
`SELECT * EXCEPT(restricted_cols) FROM table`	Allowed
`COUNT(restricted_col)`, `AVG(...)`, `SUM(...)`, `COUNTIF(...)`	Allowed (aggregates don't expose individual values)
`MIN(restricted_col)`, `MAX(restricted_col)`	Blocked (returns actual individual values)
`SELECT non_restricted_col FROM table`	Allowed

Table-Level Restrictions (`allowedTables` mode)

{
  "protectionMode": "allowedTables",
  "maximumBytesBilled": "10000000000",
  "allowedTables": [
    "analytics.page_views",
    "analytics.sessions",
    "reporting.daily_summary"
  ],
  "preventedFieldsInAllowedTables": {
    "analytics.page_views": ["user_ip", "user_agent"]
  }
}

Queries against any unlisted table are rejected immediately. INFORMATION_SCHEMA queries are always allowed for schema discovery.

Automated Sensitive Field Scanner

Automatically discovers sensitive columns across all BigQuery datasets by querying INFORMATION_SCHEMA.COLUMNS with configurable SQL LIKE patterns. Runs on server startup when the config is stale (based on lastScannedAt timestamp). The merge is additive-only — manually added restrictions are never removed.

First startup — running sensitive field scan...
Scanning all datasets for sensitive fields...
Found 1166 sensitive column(s) across 278 table(s)
Scan complete: config updated with 278 tables.

Graceful Startup on Missing Config

When --config-file points to a missing file, the server no longer crashes silently. It starts and returns an actionable error on every query:

Config file not found: /path/to/config.json. Your MCP server is configured with
--config-file, which requires a valid config file. To fix this: (1) create a config
file at the path above (see the example in the repository), or (2) correct the path
in --config-file, or (3) remove the --config-file flag from your MCP server settings
to run without protection.

Without --config-file, the server runs in simple/off mode. It no longer auto-discovers config.json in the working directory, avoiding collisions with unrelated config files in user projects.

Backward Compatibility

No breaking changes. Without --config-file, the server behaves identically to v1.0.3.
Existing config files without protectionMode default to autoProtect.
The scanner only runs in autoProtect mode.

Files Changed

File	Description
`src/index.ts`	Protection mode system, config loading, graceful startup
`src/sql-enforcement.ts`	New — extracted SQL enforcement module (field + table restrictions)
`src/sql-enforcement.test.ts`	New — 92 unit tests
`src/sensitive-field-scanner.ts`	`lastScannedAt` timestamp for scan freshness
`config.json.example`	Combined all modes into single example file
`README.md`	Security model, protection modes, updated query patterns
`package.json`	Added vitest, test scripts
`tsconfig.json`	Excluded test files from compilation

Manual Test Results

Test 1: Simple/Off Mode (no `--config-file` flag)

Setup: Remove --config-file flag from MCP settings, config file may or may not exist.
Expected: Server runs in simple/off mode — all queries execute without protection.
Result: ✅ SELECT address FROM myproject.users returns data

Test 2: `autoProtect` Mode

Setup: --config-file flag pointing to valid config with protectionMode: "autoProtect".
Expected: Auto-scans for sensitive fields on first startup, blocks restricted fields with guidance.
Result: ✅ address column detected as sensitive and blocked with aggregate guidance

Test 3: `allowedTables` Mode

Setup: --config-file flag pointing to valid config with protectionMode: "allowedTables", allowedTables containing only myproject.users, and preventedFieldsInAllowedTables restricting address, email, first_name, last_name.
Expected: Only listed tables are queryable; restricted fields within allowed tables are blocked.
Result: ✅ Non-allowed table myproject.internal.restricted_table blocked; first_name in myproject.users blocked with guidance; id (unrestricted) allowed

Test 4: Off Mode (with config)

Setup: --config-file flag pointing to valid config with protectionMode: "off".
Expected: All protection bypassed, queries execute normally.
Result: ✅ SELECT address FROM myproject.users returns data

Test 5: Missing Config File (with flag)

Setup: --config-file flag points to non-existent path.
Expected: Server starts but all queries are blocked with helpful error message directing user to fix the path or remove the flag.
Result: ✅ Error returned: "Config file not found... To fix this: (1) create a config file..., (2) correct the path in --config-file, or (3) remove the --config-file flag"

Test 6: Simple Mode (no flag, config exists)

Setup: No --config-file flag, valid config file exists on disk but is ignored.
Expected: Server ignores existing config, runs in simple/off mode.
Result: ✅ SELECT address FROM myproject.users returns data

Test Plan

92 unit tests pass (npm test)
TypeScript builds cleanly (npm run build)
Manual test 1: Simple/off mode (no flag) — all queries allowed
Manual test 2: autoProtect mode — sensitive fields auto-discovered and blocked
Manual test 3: allowedTables mode — table allowlist + field restrictions enforced
Manual test 4: Off mode (with config) — all protection bypassed
Manual test 5: Missing config file — server starts, queries return helpful error
Manual test 6: No flag with config on disk — config ignored, simple mode
Backward compatible — config without protectionMode defaults to autoProtect

Introduce configurable field restriction system to prevent querying of sensitive columns. Users can specify a JSON file mapping table names to restricted column names. The system validates restriction file accessibility during startup and enforces restrictions at query time by parsing SQL statements and blocking queries that contain restricted fields. This provides an additional security layer for organizations needing to limit access to specific data columns while allowing broader table access.

Add support for using restricted columns within aggregate functions like count, sum, avg, min, and max. The field restriction enforcement now distinguishes between direct column access and aggregated usage, allowing sensitive fields to be used in statistical queries while preventing direct access to individual values. Also adds support for detecting SELECT * queries and table aliases when enforcing restrictions, and updates the error message to clearly communicate the allowed aggregate functions.

…tions Enhance the field restriction system to support SELECT * EXCEPT (...) syntax, allowing users to exclude specific sensitive columns when using star (*) in their queries. This adds flexibility to query writing while maintaining security by preventing access to restricted fields. The implementation includes parsing of EXCEPT clauses, tracking of star usages with their qualifiers and excluded columns, and validation logic to ensure restricted fields are properly excluded or aggregated. The error message has also been updated to inform users about the EXCEPT option.

…w_data

- Add shared scanner module (sensitive-field-scanner.ts) with BigQuery INFORMATION_SCHEMA scan, merge logic, and daily staleness check - Add standalone CLI script (scan-sensitive-fields.ts) for manual runs - Integrate daily scan into server startup — runs on first connection of the day, skips on subsequent starts - Add scan-fields npm script - Update config.json with 278 tables of sensitive field restrictions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- config.json contains environment-specific HIPAA field mappings, should not be committed - Add config.template.json as a starting point for new users - The scan-fields script auto-populates config.json on first run Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add sensitiveFieldPatterns to config for user-extensible LIKE patterns - Add sensitiveFieldScanFrequencyDays to config (default: 1, set 0 to disable) - Scanner reads both settings from config, falls back to defaults if missing - Standalone CLI always runs regardless of frequency setting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ensitive field scanner - Add config.json support for centralized server configuration (optional — server uses safe defaults without a config file) - Add field-level access restrictions via preventedFields config to block queries from accessing sensitive columns (PII/PHI) - Support SELECT *, SELECT * EXCEPT, and aggregate functions in field restriction enforcement - Add automated sensitive field scanner that discovers sensitive columns by querying BigQuery INFORMATION_SCHEMA.COLUMNS - Add configurable scan patterns and frequency - Add standalone CLI tool: npm run scan-fields - Auto-scan on server startup with configurable frequency - Move maximumBytesBilled from per-query parameter to server config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tion Add comprehensive documentation for field-level access restrictions, automated sensitive field scanner, custom patterns, and configurability. Also add AGENTS.md to gitignore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Conflicts: # .gitignore # README.md # src/index.ts

Prevents the scanner from creating an unexpected config.json when the user never provided one. The scan now only runs if a config file already exists or was explicitly passed via --config-file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The extractSelectClause function only matched standard SQL (SELECT ... FROM) pattern. Pipe syntax queries (FROM table |> SELECT *) bypassed the star detection entirely, allowing restricted fields to be returned via SELECT *. Now extracts SELECT clauses from both standard and pipe syntax queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…alse positives - extractSelectClause now detects SELECT clauses in pipe syntax (|> SELECT) in addition to standard SQL (SELECT ... FROM) - Strip EXCEPT clauses before scanning for direct field references, preventing false positives when restricted fields appear in EXCEPT lists Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove MIN/MAX from allowed aggregates — they return actual individual values (e.g. MIN(name) leaks a real name). Only COUNT, COUNTIF, AVG, SUM are now allowed on restricted fields. - Strip SQL comments and string literals before checking field references, preventing false positives when restricted field names appear in comments (-- patient_name) or strings ('Dr patient_name Clinic'). - Support complex aggregate expressions like COUNTIF(field IS NOT NULL) and COUNT(DISTINCT field) which were previously blocked as false positives. - Detect implicit SELECT * in pipe syntax queries with no SELECT clause (e.g. FROM table |> LIMIT 10). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…CP client support - Add fork notice at the top linking back to upstream - Replace Smithery/npx quick install with clone-and-build instructions pointing to this fork (upstream package lacks security features) - Add warning that Smithery/npx installs the original without field restrictions - Update all clone URLs from ergut/ to drharunyuksel/ - Replace "Claude Desktop only" references with "any MCP-compatible client" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove sponsorship section (not applicable to this fork) - Update author section to credit original author Salih Ergüt and add Harun Yüksel as fork maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ergut

Hi Harun, thank you for this PR. It's clear you put a lot of thought and effort into it, and the documentation is really well done.

After reviewing it, I have a question about the core use case. This MCP server runs locally over stdio, so the person configuring the restricted fields is the same person (or agent) running the queries. Anyone can just edit the config file or skip it entirely. What scenario are you thinking of where this provides real protection?

If this were a remote MCP server where an admin deploys it for other users, I could see the value. But as a local server, I'm not sure field restrictions can be meaningfully enforced on the client side.

I want to keep the server focused and minimal, so I'd need a clear use case before adding this. If you see an angle I'm missing, I'm happy to hear it.

Thanks again for the contribution.

Align fork README with PR branch content. Adds the 'Which Setup Is Right for You?' table and explains the two deployment modes clearly: Simple Mode (npx/Smithery, no config) and Protected Mode (with --config-file for PHI/PII environments). Also explains why LLM inference in the cloud makes field restrictions meaningful even for local server deployments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Renamed 'Developer Setup' to 'Local Build' to clarify it's a valid deployment option (not just for contributors). Added separate config examples for Simple Mode (no --config-file) and Protected Mode (with --config-file), so all three deployment methods consistently support both modes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

drharunyuksel · 2026-03-27T09:29:49Z

Hi @ergut

Thanks for the thoughtful review. You're right to push on this.

On the use case:

The threat model isn't "a human bypassing the config file." It's about what happens when an AI agent queries BigQuery autonomously. LLM inference happens in the cloud: when the agent runs a query, the results are sent to the LLM provider's servers (Anthropic, OpenAI, etc.) for processing. They leave your network.

Consider a healthcare data analyst using Claude or Cursor with this MCP server to explore patient data. They ask something innocent like "how many patients were admitted last month?" The AI agent autonomously writes and runs SELECT * FROM patients. That query returns thousands of rows containing names, emails, dates of birth, SSNs, and medical record numbers; and every single one of those values is sent to Anthropic's or OpenAI's cloud servers to generate the next response. The data has now left your network and been processed by a third-party cloud provider. Under HIPAA, that's a reportable data breach.

This isn't a hypothetical. It's the default behavior of any AI agent with unrestricted BigQuery access. The agent isn't malicious: it's just doing its job. But there's no human in the loop reviewing each query before it executes.

BigQuery IAM controls who can reach your data. Field restrictions control what the AI agent surfaces into LLM responses. These are completely different protection boundaries. IAM can't prevent the AI from reading a column it has permission to access. Field restrictions can.

The AI agent interacts only through MCP tools: it has no filesystem access to modify or skip the config file. So the restrictions are genuinely enforceable against the agent, even in a local stdio setup.

On the two deployment modes:

We updated the README to clearly define two modes:

Simple Mode: No config.json → server starts with 1GB query limit, no field restrictions. Anyone who clones the repo gets this by default since config.json is in .gitignore and only config.json.example ships with the repo.
Protected Mode: config.json present (via --config-file or auto-discovered in the working directory) → field restrictions enforced, scanner runs on startup.
The two modes don't conflict. We tested both locally with Node.js running the same compiled dist/index.js:

Simple Mode: SELECT address FROM users → returned data freely
Protected Mode: SELECT address FROM users → blocked with a clear error message
Protected Mode safe alternative: SELECT * EXCEPT(address, ...) → returned data with sensitive fields excluded
On minimalism:

The feature is fully opt-in. Without a config.json, the server behaves identically to v1.0.3: zero behavior change for existing users. The scanner only runs when a config file is present and stale. No new required dependencies.

Happy to adjust anything if you see a simpler way to structure this.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

npx @ergut/mcp-bigquery-server installs the upstream package which does not yet support --config-file. Updated Option 2 to use the local fork build with node dist/index.js. Added a note linking to the open PR. Will revert to npx once the PR is merged and a new version is published. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ergut

Hi Harun, thanks for the detailed explanation. I had the threat model wrong. I was thinking about a human bypassing the config, not the agent itself being the untrusted party. The point about query results being sent to LLM provider servers is a real concern, especially in regulated environments. That makes the use case clear.

A few things before we move forward:

The SQL parsing logic is now load-bearing from a privacy standpoint. Regex-based SQL parsing can be bypassed through CTEs, subqueries, or aliasing. Before merging, I would like to understand how robust it is against these patterns.

Given the large amount of new code, I would prefer to have test coverage before we merge. Could you please add tests, especially for the edge cases?

One extension worth considering: table-level restrictions. I have datasets with many tables where only a few are relevant for analysis. An allowedTables list in config.json would let users restrict the agent to a specific subset. It is a simpler enforcement problem than field restrictions and would be useful on its own. Would you be open to adding that?

preventedFields was empty and the file existed with a fresh mtime, causing the staleness check to skip the scan entirely — exposing PHI/PII fields for up to 24 hours after initial deployment. Now the scan always runs when preventedFields is empty, populating protection immediately on first startup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lds enforcement - Extract SQL enforcement logic into src/sql-enforcement.ts module - Add three protection modes: off, allowedTables, autoProtect - Add restrictedFields support with aggregate-only access control - Add comprehensive test suite (92 tests) including pipe syntax support - Update config.json.example with all three protection modes - Improve error messages with actionable fix guidance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace isEmpty + file mtime check with a lastScannedAt timestamp written to config.json after each scan. This ensures the first-run scan always executes regardless of preventedFields content — fixing the case where users copy config.json.example with placeholder entries and the scan is incorrectly skipped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…covery - Server now starts even when --config-file points to a missing file, returning an actionable error on every query instead of crashing silently - Without --config-file flag, server runs in off mode — no longer auto-discovers config.json in working directory to avoid collisions with unrelated config files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Auto-discovery of config.json in working directory was removed, so the README now states that --config-file must be passed explicitly to enable protection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

drharunyuksel · 2026-04-01T12:32:49Z

Hi @ergut, thanks for the clear feedback. Here's what we've done to address each point:

1. SQL Parsing Robustness

First, an important framing point: these field restrictions are cooperative guardrails for AI agents, not a SQL firewall. The threat model is straightforward — when an AI agent queries BigQuery, the results are sent to the LLM provider's servers (Anthropic, OpenAI, etc.). Field restrictions prevent the agent from inadvertently including sensitive columns (PII, PHI, secrets) in those results. When the agent encounters a restriction error, it reads the guidance in the error message and reformulates its query. In practice, AI agents cooperate immediately and consistently.

That said, we took robustness seriously. We ran extensive penetration testing and found (and fixed) several bypass vectors:

Struct-alias bypass — SELECT t FROM users AS t returns the entire row as a STRUCT. Now detected and blocked.
Comma-join evasion — FROM table1, table2 made the second table invisible to the old regex. Fixed with comma-aware extraction.
CTE chains and subqueries — restricted field references inside CTEs and nested subqueries are now caught.
Alias shadowing — FROM restricted_table AS safe SELECT safe.restricted_field is resolved back to the real table.
Implicit SELECT * — FROM table |> LIMIT 10 (no SELECT clause) returns all columns. Now treated as a violation.

Are there still edge cases where a very unusual SQL construct could slip through? Possibly — regex-based parsing has limits. But here's the thing: AI agents don't write unusual SQL and they don't try to hack or penetrate the database. They write straightforward queries to answer the user's question. The only time we saw restricted data leak through was during our own manual penetration testing, where we intentionally crafted bypass queries like SELECT t FROM users AS t — queries that no AI agent would produce in normal operation. In real usage, the agent hits a restriction, reads the error guidance, and reformulates its query — every time.

We've added a "Security Model" section to the README that's transparent about this:

This system uses regex-based SQL analysis to detect restricted field usage. We performed penetration testing during development and fixed several bypass vectors. However, regex-based parsing cannot guarantee coverage of every possible SQL construct. The enforcement logic is designed to fail closed, but it is not equivalent to a database-level security policy.

For environments requiring strict compliance guarantees, we recommend combining these guardrails with BigQuery's native column-level security and authorized views.

2. Test Coverage

Added 92 unit tests using vitest in src/sql-enforcement.test.ts. Coverage includes:

Unit tests for all SQL parsing helpers
Cooperative guardrail tests (standard query patterns, aggregates, EXCEPT)
Adversarial bypass tests (struct-alias, nested CTEs, comma-joins, alias shadowing)
BigQuery pipe syntax penetration tests (EXTEND, SET, DROP, RENAME, AGGREGATE)
allowedTables enforcement tests (allowlist, fail-closed, CTE filtering, INFORMATION_SCHEMA exemption)

All SQL enforcement logic has been extracted into a dedicated src/sql-enforcement.ts module for testability.

3. allowedTables

Implemented as a full protection mode. The server now supports three modes via protectionMode in config.json:

off — no restrictions
allowedTables — only listed tables can be queried, with optional per-table field restrictions via preventedFieldsInAllowedTables
autoProtect — the original behavior (auto-scan + preventedFields)

Backward compatible: existing configs without protectionMode default to autoProtect.

We also improved startup resilience — if --config-file points to a missing file, the server starts but blocks all queries with an actionable error message instead of crashing silently. Without --config-file, it runs in simple/off mode.

I've updated both the PR description and the README to reflect all of this — including the security model rationale, protection modes documentation, corrected query pattern tables (MIN/MAX now listed as blocked), manual test results (6 scenarios, all passing), and the complete test plan. Please take a look.

feat: add protection modes, SQL robustness fixes, and test coverage

ergut

Hi @drharunyuksel, thanks for the thorough update. The test coverage is solid and the adversarial cases are well thought out: CTEs, struct-alias bypass, comma-joins, alias shadowing, pipe syntax variants are all covered. The discriminated union design for the protection modes is clean too. Good work overall.

A few things to address before we merge:

The sensitive field scanner interpolates patterns from the config file directly into SQL without validation. If someone puts a crafted string in sensitiveFieldPatterns, it gets injected into the INFORMATION_SCHEMA query. The impact is limited but it should be fixed, either by validating that patterns match a safe format before use, or by using parameterized queries.

The scanner hardcodes region-us and location: 'US' in sensitive-field-scanner.ts. Any non-US deployment will silently fail the auto-scan on startup, leaving preventedFields empty. The config.location is already threaded through the rest of the codebase so it just needs to be passed into the scanner as well.

The --maximum-bytes-billed CLI flag is silently ignored when a config file is present. loadConfiguration reads the value from the file only and the CLI value is discarded. The fix is to apply the CLI value on top of what loadConfiguration returns, if it is set.

One design question worth discussing: field names appearing in WHERE or ORDER BY clauses currently block the query even though the field is not being returned. For example SELECT id FROM users WHERE email = 'x@example.com' would be blocked. Is this intentional? If so, it should be documented clearly since it will surprise users.

…n support, CLI override - Validate sensitiveFieldPatterns against safe-character allowlist before SQL interpolation - Thread --location CLI flag through scanner (main server + manual scan-fields script) - CLI --maximum-bytes-billed now overrides config file value (applied after config reload) - Fail closed in autoProtect when scanner fails and preventedFields is empty - Document WHERE/ORDER BY blocking behavior in README query pattern table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: address upstream PR ergut#14 review feedback

drharunyuksel · 2026-04-08T08:00:19Z

Hi @ergut, thanks for the detailed review. Here's what we've done to address each point:

1. SQL Injection in Scanner

Fixed. scanSensitiveFields now validates every pattern against a safe-character allowlist (/^[a-zA-Z0-9_%.\-]+$/) before building the SQL. Patterns containing quotes, semicolons, spaces, or any other character that could break out of a LIKE clause are rejected with a clear error message. The scanner won't run if any pattern fails validation.

We chose input validation over parameterized queries since LIKE patterns have a very constrained format — there's no legitimate reason for a pattern to contain ', ;, or whitespace.

2. Hardcoded US Region in Scanner

Fixed. scanSensitiveFields and runDailyScanIfNeeded now accept a location parameter, which is passed from config.location (the existing --location CLI flag, defaulting to 'US'). Both the INFORMATION_SCHEMA region in the SQL (region-{location}) and the BigQuery query location option now use this value. Non-US deployments will scan the correct regional INFORMATION_SCHEMA.

3. `--maximum-bytes-billed` CLI Flag Override

Fixed. After loadConfiguration() returns, the CLI --maximum-bytes-billed value (if provided) now overrides the config file value. This follows the standard convention where CLI flags take precedence over config file settings.

4. WHERE / ORDER BY Blocking

This is intentional. The full SQL query text is sent to the LLM provider as part of the conversation — so SELECT id FROM users WHERE email = 'patient@example.com' means the restricted value appears in the prompt sent to the cloud, even though BigQuery doesn't return it in the results. Additionally, if the agent is writing a WHERE filter on a restricted field, it means the agent already has or is probing for that value.

We've added this to the README with a clear note in the query pattern table explaining why WHERE, ORDER BY, and other non-SELECT references are blocked.

Additional Hardening

We ran an internal adversarial code review and found two more issues beyond the four review items. Both are now fixed:

5. autoProtect fails closed on scanner failure

Previously, if the scanner failed in autoProtect mode (bad pattern, network error, permissions issue) and preventedFields was empty, the server would silently serve queries with no restrictions — effectively unprotected. Now, when the scanner throws and preventedFields is empty, the server blocks all queries with an actionable error message explaining how to fix it. If existing preventedFields were already populated from a previous scan, those continue to work normally.

Note: a successful scan that finds zero sensitive columns is fine — it means the user's tables don't have matching column names yet. The fail-closed behavior only activates on scanner failure, not on empty results.

6. Manual scanner location support

The standalone npm run scan-fields script was still hardcoded to region-us. It now accepts --location <region> and passes it to scanSensitiveFields, consistent with how the main server handles --location. Without the flag, it defaults to US.

Manual Test Results

We tested all changes against a live BigQuery instance using the local build, running the MCP server from a clean test directory (simulating a fresh install).

#	Scenario	Result
1	Simple mode — no config file, no `--config-file` flag. Queries ran without restrictions.	Pass
2	autoProtect mode — empty `preventedFields`, scanner ran on first startup, discovered sensitive fields across datasets. Restricted field → blocked, `SELECT * EXCEPT(restricted)` → allowed.	Pass
3	allowedTables mode — allowed table → allowed, disallowed table → rejected, restricted field on allowed table → blocked, aggregate on restricted field → allowed.	Pass
4	`--maximum-bytes-billed` CLI override — config file set to 10GB, CLI set to 1 byte. Query rejected by BigQuery billing limit → CLI overrode config file.	Pass
5	SQL injection pattern validation — malicious pattern `"'; DROP TABLE foo; --"` in config. Scanner rejected it and did not run.	Pass
6	Fail-closed on scanner failure — scanner failed (bad pattern) with empty `preventedFields`. All queries blocked with actionable error message.	Pass
7	Scanner failure with existing restrictions — scanner failed but pre-populated `preventedFields` continued to work. Non-restricted → allowed, restricted → blocked.	Pass
8	CLI override in autoProtect mode — config file set to 10GB, CLI set to 1 byte in autoProtect mode. CLI override persists after config reload.	Pass

ergut

Hi @drharunyuksel, all four points are addressed and verified. The pattern validation, region threading, CLI override, and fail-closed behavior are all in the code and the tests pass. The WHERE/ORDER BY design decision makes sense and is now documented.

This is a well-executed contribution. Merging now.

ergut

Hi @drharunyuksel, all four points are addressed and verified. The pattern validation, region threading, CLI override, and fail-closed behavior are all in the code and the tests pass. The WHERE/ORDER BY design decision makes sense and is now documented.

This is a well-executed contribution. Merging now.

drharunyuksel · 2026-04-08T21:22:03Z

Thank you @ergut! I appreciate the thorough review process. The three rounds of feedback made the implementation significantly more robust. Happy to contribute more in the future.

ergut · 2026-04-09T06:11:21Z

Thanks @drharunyuksel! You were patient throughout the process and delivered a high quality result. Hope to see more from you.

drharunyuksel and others added 25 commits October 9, 2025 11:37

feat(query): add field restrictions for rpm_compliance_snapshot

61ab039

feat(config): implement field restrictions and configurable query limits

464221b

feat: Add PII column configuration for drchrono patients

e3da45a

feat: add PII column configuration for financial_dashboard.Mankato_ra…

85acde4

…w_data

chore: rename config.template.json to config.json.example

f140376

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: add .claude/ to gitignore

66c92e1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: add AGENTS.md to gitignore

381cc70

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'hy/feat-field-restrictions-and-scanner'

07ac66e

# Conflicts: # .gitignore # README.md # src/index.ts

Merge branch 'hy/feat-field-restrictions-and-scanner'

67b13df

docs: update authorship and remove upstream sponsorship

73f7893

- Remove sponsorship section (not applicable to this fork) - Update author section to credit original author Salih Ergüt and add Harun Yüksel as fork maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: update support links to point to this fork

22eab26

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: fix remaining npx reference in command line example

e7c9141

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ergut reviewed Mar 26, 2026

View reviewed changes

drharunyuksel and others added 2 commits March 27, 2026 11:44

drharunyuksel force-pushed the hy/feat-field-restrictions-and-scanner branch from 562daa6 to a4f067e Compare March 27, 2026 09:36

drharunyuksel and others added 2 commits March 27, 2026 12:48

chore: align README whitespace with PR branch

8f6a445

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ergut reviewed Mar 28, 2026

View reviewed changes

drharunyuksel and others added 5 commits March 29, 2026 10:39

drharunyuksel force-pushed the hy/feat-field-restrictions-and-scanner branch from 7dc6613 to f345ca4 Compare April 1, 2026 12:19

drharunyuksel mentioned this pull request Apr 1, 2026

feat: add protection modes, SQL robustness fixes, and test coverage drharunyuksel/mcp-bigquery-server#1

Merged

9 tasks

Merge pull request #1 from drharunyuksel/feat/add_allowed_tables_config

dd911ef

feat: add protection modes, SQL robustness fixes, and test coverage

ergut reviewed Apr 3, 2026

View reviewed changes

drharunyuksel added a commit to drharunyuksel/mcp-bigquery-server that referenced this pull request Apr 8, 2026

Merge pull request #2 from drharunyuksel/fix/pr14-review-feedback

df1c972

fix: address upstream PR ergut#14 review feedback

ergut approved these changes Apr 8, 2026

View reviewed changes

ergut merged commit 8781a84 into ergut:main Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add field-level access restrictions, config file support, and sensitive field scanner#14

feat: add field-level access restrictions, config file support, and sensitive field scanner#14
ergut merged 36 commits into
ergut:mainfrom
drharunyuksel:hy/feat-field-restrictions-and-scanner

drharunyuksel commented Mar 23, 2026 •

edited

Loading

Uh oh!

ergut left a comment

Uh oh!

drharunyuksel commented Mar 27, 2026 •

edited

Loading

Uh oh!

ergut left a comment

Uh oh!

drharunyuksel commented Apr 1, 2026 •

edited

Loading

Uh oh!

ergut left a comment

Uh oh!

drharunyuksel commented Apr 8, 2026

Uh oh!

ergut left a comment

Uh oh!

ergut left a comment

Uh oh!

drharunyuksel commented Apr 8, 2026

Uh oh!

ergut commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drharunyuksel commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why This Matters

Security Model: Cooperative Guardrails, Not a SQL Firewall

Addressing Review Feedback

1. SQL Parsing Robustness

2. Test Coverage

3. allowedTables Feature

What's Included

Protection Modes

Field-Level Access Restrictions

Table-Level Restrictions (allowedTables mode)

Automated Sensitive Field Scanner

Graceful Startup on Missing Config

Backward Compatibility

Files Changed

Manual Test Results

Test 1: Simple/Off Mode (no --config-file flag)

Test 2: autoProtect Mode

Test 3: allowedTables Mode

Test 4: Off Mode (with config)

Test 5: Missing Config File (with flag)

Test 6: Simple Mode (no flag, config exists)

Test Plan

Uh oh!

ergut left a comment

Choose a reason for hiding this comment

Uh oh!

drharunyuksel commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ergut left a comment

Choose a reason for hiding this comment

Uh oh!

drharunyuksel commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. SQL Parsing Robustness

2. Test Coverage

3. allowedTables

Uh oh!

ergut left a comment

Choose a reason for hiding this comment

Uh oh!

drharunyuksel commented Apr 8, 2026

1. SQL Injection in Scanner

2. Hardcoded US Region in Scanner

3. --maximum-bytes-billed CLI Flag Override

4. WHERE / ORDER BY Blocking

Additional Hardening

5. autoProtect fails closed on scanner failure

6. Manual scanner location support

Manual Test Results

Uh oh!

ergut left a comment

Choose a reason for hiding this comment

Uh oh!

ergut left a comment

Choose a reason for hiding this comment

Uh oh!

drharunyuksel commented Apr 8, 2026

Uh oh!

ergut commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drharunyuksel commented Mar 23, 2026 •

edited

Loading

Table-Level Restrictions (`allowedTables` mode)

Test 1: Simple/Off Mode (no `--config-file` flag)

Test 2: `autoProtect` Mode

Test 3: `allowedTables` Mode

drharunyuksel commented Mar 27, 2026 •

edited

Loading

drharunyuksel commented Apr 1, 2026 •

edited

Loading

3. `--maximum-bytes-billed` CLI Flag Override