Skip to content

fix(core-tools): resolve Jupyter Notebook and JSON corruption in write_file#28000

Closed
amelidev wants to merge 12 commits into
google-gemini:mainfrom
amelidev:b_446760218
Closed

fix(core-tools): resolve Jupyter Notebook and JSON corruption in write_file#28000
amelidev wants to merge 12 commits into
google-gemini:mainfrom
amelidev:b_446760218

Conversation

@amelidev

@amelidev amelidev commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR resolves a critical bug where the write_file tool silently corrupts Jupyter Notebook (.ipynb) and JSON files, rendering them unparseable or causing the environment (like Colab or JupyterLab) to discard changes and revert to checkpoints.

Following code review and subsequent format safety expansions, we have broadened this protection to cover highly structured data, configuration, environment, markup, tabular data, and infrastructure-as-code formats that are equally vulnerable to aggressive unescaping corruption.

Details

The root cause was isolated to the unescapeStringForGeminiBug utility in packages/core/src/utils/editCorrector.ts. The utility aggressively unescapes sequences (such as replacing \\n with raw newlines \n or \\" with raw double quotes "). While useful for fixing over-escaped code block lines in standard source files, this corrupts the syntax of JSON, YAML, TOML, INI, properties, XML, HCL, and dotfile configurations because raw control characters and unescaped double quotes are strictly illegal inside string values in these specifications.

Changes:

  • JSON/Structured format bypass (Surgical Unescape Skip): Added an optional isJsonLike parameter to ensureCorrectFileContent(). Rather than bypassing the entire function (which would skip crucial markdown wrapper stripping), we selectively bypass only the aggressive unescaping utility (unescapeStringForGeminiBug), allowing LLM-based correction/sanitization if disableLLMCorrection is set to false.
  • Ultimate format-aware checking: Configured getCorrectedFileContent() in packages/core/src/tools/write-file.ts to detect the target file:
    • Standard extensions: .json, .ipynb, .map, .yaml, .yml, .toml, .lock, .ini, .properties, .cfg, .conf, .config, .xml, .svg, .csv, .tsv, .tf, .tfvars, .hcl, .tfstate, .webmanifest, .env.
    • Environment files: Supports both .env extensions (e.g. production.env) and files starting with .env (e.g., .env, .env.local).
    • Common configuration dotfiles (without typical extensions): .eslintrc, .babelrc, .prettierrc, .stylelintrc, .watchmanconfig, .nycrc, .releaserc, .gitconfig, .editorconfig, .npmrc, .yarnrc, .jshintrc, .markdownlintrc by checking their basenames.
    • Sets isJsonLike = true (and aggressiveUnescape = false) for these files.
  • Exhaustive test coverage: Added a comprehensive parameter-matrix test inside write-file.test.ts to ensure that all of the mentioned extensions/files correctly trigger JSON-like protection. Also updated editCorrector.test.ts to verify isJsonLike behavior when disableLLMCorrection is false.

Related Issues

Resolves Jupyter Notebook (.ipynb) modification failure in Google3 and cloud development environments (Colab).

How to Validate

1. Execute Unit Tests:

Run the modified test suites in the core package:

npm test -w @google/gemini-cli-core -- src/utils/editCorrector.test.ts
npm test -w @google/gemini-cli-core -- src/tools/write-file.test.ts

Expected: All 67 tests pass successfully.

2. Manual Verification:

Verify that ensureCorrectFileContent with isJsonLike: true does not corrupt JSON/YAML/TOML strings containing escaped values:

const result = await ensureCorrectFileContent(
  validJsonString, 
  mockBaseLlmClient, 
  abortSignal, 
  true, 
  true, 
  true // isJsonLike
);
JSON.parse(result); // Does not throw SyntaxError

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@amelidev amelidev requested review from a team as code owners June 18, 2026 01:43
@github-actions github-actions Bot added the size/m A medium sized PR label Jun 18, 2026
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

📊 PR Size: size/M

  • Lines changed: 220
  • Additions: +192
  • Deletions: -28
  • Files changed: 5

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

🛑 Action Required: Evaluation Approval

Steering changes have been detected in this PR. To prevent regressions, a maintainer must approve the evaluation run before this PR can be merged.

Maintainers:

  1. Go to the Workflow Run Summary.
  2. Click the yellow 'Review deployments' button.
  3. Select the 'eval-gate' environment and click 'Approve'.

Once approved, the evaluation results will be posted here automatically.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where the write_file tool was corrupting JSON and Jupyter Notebook files by applying aggressive unescaping logic intended for standard source code. By introducing a bypass mechanism for JSON-like files, the tool now preserves the integrity of these structured data formats, ensuring they remain valid and parseable in their respective environments.

Highlights

  • JSON and Jupyter Notebook Protection: Introduced an isJsonLike flag to prevent aggressive unescaping of JSON and Jupyter Notebook files, which previously caused syntax corruption.
  • Automated File Detection: Updated getCorrectedFileContent to automatically detect .json and .ipynb file extensions and bypass potentially destructive correction logic.
  • Test Suite Enhancements: Added comprehensive unit tests to verify that JSON content remains intact and that the unescape utility correctly avoids problematic sequences.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request prevents corruption of escaped sequences in JSON-like files (such as .json and .ipynb) by bypassing aggressive unescaping and LLM correction. This is achieved by introducing an isJsonLike flag based on the file extension, which is passed to ensureCorrectFileContent to return the content untouched. The review feedback suggests expanding this check to support other JSON-based formats like .jsonc, .json5, .geojson, and .map to prevent similar issues.

Comment thread packages/core/src/tools/write-file.ts Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@TrueAlpha-spiral

TrueAlpha-spiral commented Jun 18, 2026 via email

Copy link
Copy Markdown

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Thank you for the detailed architectural perspective on integrating structural integrity checks into the Universal Verification Kernel (UVK). Your proposal for a formal _validate_json_integrity gate is a compelling approach to ensuring long-term data consistency and preventing semantic drift.

While the concept of a kernel-level structural audit is an excellent direction for the system's evolution, my current focus is on addressing the immediate, critical bug in the write-file tool within the TypeScript codebase. The current PR aims to provide a reliable, localized fix to prevent the silent corruption of JSON and Jupyter Notebook files during standard operations.

I will continue to prioritize the robustness of the isJsonLike bypass in the current implementation. If you have any specific feedback or suggestions regarding the current TypeScript logic in packages/core/src/tools/write-file.ts or packages/core/src/utils/editCorrector.ts, I would be very interested in hearing them to ensure the immediate fix is as effective as possible.

@gemini-cli gemini-cli Bot added the status/need-issue Pull requests that need to have an associated issue. label Jun 18, 2026
@luisfelipe-alt

Copy link
Copy Markdown
Contributor

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to bypass unescaping and LLM correction for JSON-like files (such as .json, .ipynb, and .map files) to prevent syntax corruption. It updates getCorrectedFileContent and ensureCorrectFileContent to detect these file types and return their content untouched, accompanied by comprehensive unit tests. The reviewer suggests improving the JSON detection logic by checking if the file extension contains 'json' rather than starting with '.json', which would robustly support other JSON-derived formats like .geojson and .topojson.

Comment thread packages/core/src/tools/write-file.ts Outdated
@TrueAlpha-spiral

TrueAlpha-spiral commented Jun 18, 2026 via email

Copy link
Copy Markdown

@TrueAlpha-spiral

TrueAlpha-spiral commented Jun 18, 2026 via email

Copy link
Copy Markdown

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to bypass aggressive unescaping and LLM correction for JSON-like files (such as .json, .ipynb, and .map) to prevent content corruption during file writes. The feedback suggests expanding this detection to include other structured configuration files, such as YAML, TOML, and common JSON-based dotfiles (e.g., .eslintrc, .prettierrc), which are also vulnerable to corruption.

Comment thread packages/core/src/tools/write-file.ts Outdated
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to identify JSON-like files (such as .json, .ipynb, .yaml, .toml, and common configuration dotfiles) and bypasses aggressive unescaping and LLM correction for them to prevent corruption of structured data. The review feedback suggests expanding this list of JSON-like files to include lockfiles (e.g., .lock) and additional common dotfiles (e.g., .nycrc, .releaserc), along with updating the corresponding unit tests to cover these new cases.

Comment thread packages/core/src/tools/write-file.ts
Comment thread packages/core/src/tools/write-file.test.ts
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an isJsonLike flag to bypass aggressive unescaping and LLM correction for JSON-like and structured configuration files (such as .json, .ipynb, .yaml, and common dotfiles), preventing potential content corruption. The review feedback suggests expanding this protection to cover other structured formats like .env, .ini, and .gitconfig files, along with updating the test suite to verify these additional cases.

Comment thread packages/core/src/tools/write-file.ts
Comment thread packages/core/src/tools/write-file.test.ts
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to identify JSON-like and configuration files (such as .json, .yaml, .toml, .env, and common dotfiles) to bypass aggressive unescaping and correction during file writes, preventing corruption of escaped sequences. The feedback suggests extending this protection to Java/Android .properties files, which also rely on backslash escaping and could be corrupted by aggressive unescaping.

Comment thread packages/core/src/tools/write-file.ts
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an isJsonLike flag to bypass aggressive unescaping and LLM correction for structured configuration and data files (such as JSON, YAML, TOML, and common dotfiles) to prevent content corruption. Unit tests have been added to document and verify this bypass behavior. The reviewer recommends expanding this check to cover other structured formats vulnerable to corruption, such as XML, SVG, CSV, TSV, and Terraform/HCL files, and updating the test suite to include them.

Comment on lines +137 to +165
const isJsonLike =
fileExtension.includes('json') ||
fileExtension === '.ipynb' ||
fileExtension === '.map' ||
fileExtension === '.yaml' ||
fileExtension === '.yml' ||
fileExtension === '.toml' ||
fileExtension === '.lock' ||
fileExtension === '.ini' ||
fileExtension === '.properties' ||
fileExtension === '.cfg' ||
fileExtension === '.conf' ||
fileExtension === '.config' ||
fileName.startsWith('.env') ||
[
'.eslintrc',
'.babelrc',
'.prettierrc',
'.stylelintrc',
'.watchmanconfig',
'.nycrc',
'.releaserc',
'.gitconfig',
'.editorconfig',
'.npmrc',
'.yarnrc',
'.jshintrc',
'.markdownlintrc',
].includes(fileName);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To prevent silent corruption of other highly structured data and configuration formats, we should expand the isJsonLike check to cover additional common formats such as XML, SVG, CSV, TSV, Terraform/HCL (.tf, .tfvars, .hcl), Terraform state (.tfstate), and Web App Manifests (.webmanifest). These formats are equally vulnerable to aggressive unescaping corruption (e.g., breaking escaped quotes in CSV/TSV or XML attributes).

  const isJsonLike =
    fileExtension.includes('json') ||
    fileExtension === '.ipynb' ||
    fileExtension === '.map' ||
    fileExtension === '.yaml' ||
    fileExtension === '.yml' ||
    fileExtension === '.toml' ||
    fileExtension === '.lock' ||
    fileExtension === '.ini' ||
    fileExtension === '.properties' ||
    fileExtension === '.cfg' ||
    fileExtension === '.conf' ||
    fileExtension === '.config' ||
    fileExtension === '.xml' ||
    fileExtension === '.svg' ||
    fileExtension === '.csv' ||
    fileExtension === '.tsv' ||
    fileExtension === '.tf' ||
    fileExtension === '.tfvars' ||
    fileExtension === '.hcl' ||
    fileExtension === '.tfstate' ||
    fileExtension === '.webmanifest' ||
    fileName.startsWith('.env') ||
    [
      '.eslintrc',
      '.babelrc',
      '.prettierrc',
      '.stylelintrc',
      '.watchmanconfig',
      '.nycrc',
      '.releaserc',
      '.gitconfig',
      '.editorconfig',
      '.npmrc',
      '.yarnrc',
      '.jshintrc',
      '.markdownlintrc',
    ].includes(fileName);

Comment on lines +397 to +412
it('should set isJsonLike to true and aggressiveUnescape to false for .ipynb, .json, map, yaml, toml, and common JSON-based dotfiles', async () => {
const abortSignal = new AbortController().signal;
const filesToTest = [
'notebook.ipynb',
'data.json',
'bundle.js.map',
'workflow.yaml',
'config.yml',
'Cargo.toml',
'.eslintrc',
'.prettierrc',
'.env',
'.env.local',
'config.ini',
'.gitconfig',
];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

We should update the test suite to verify that the newly added structured formats (XML, SVG, CSV, TSV, Terraform/HCL, Terraform state, and Web App Manifests) are also correctly identified as isJsonLike and bypass aggressive unescaping.

    it('should set isJsonLike to true and aggressiveUnescape to false for .ipynb, .json, map, yaml, toml, and common JSON-based dotfiles', async () => {
      const abortSignal = new AbortController().signal;
      const filesToTest = [
        'notebook.ipynb',
        'data.json',
        'bundle.js.map',
        'workflow.yaml',
        'config.yml',
        'Cargo.toml',
        '.eslintrc',
        '.prettierrc',
        '.env',
        '.env.local',
        'config.ini',
        '.gitconfig',
        'data.xml',
        'image.svg',
        'data.csv',
        'data.tsv',
        'main.tf',
        'terraform.tfvars',
        'config.hcl',
        'terraform.tfstate',
        'manifest.webmanifest',
      ];

     from unescaping corruption
   2
   3 Treats standard markup, tabular data, properties, and Terraform/HCL
     configurations as json-like to bypass aggressive unescaping in
     write_file. Updates test matrices to assert correct parameter
     propagation.
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to identify JSON-like files (such as .json, .ipynb, .yaml, and common configuration files) to prevent aggressive unescaping from corrupting their contents. It also updates package-lock.json by removing the peer flag from several dependencies. Feedback on these changes highlights two key areas: first, completely bypassing ensureCorrectFileContent for JSON-like files is risky as it skips essential sanitization (like stripping markdown code block wrappers), and it is recommended to only bypass the aggressive unescaping logic instead; second, the check for .env files should be expanded to include fileExtension === '.env' to correctly match files like production.env.

Comment thread packages/core/src/utils/editCorrector.ts
Comment thread packages/core/src/tools/write-file.ts
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@amelidev amelidev marked this pull request as draft June 22, 2026 18:03

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to identify "JSON-like" files (such as JSON, YAML, TOML, and various configuration dotfiles) and bypasses aggressive unescaping for them to prevent string corruption. Feedback on these changes highlights that the cache check in ensureCorrectFileContent is bypassed for JSON-like files, potentially causing redundant LLM calls. Additionally, it is recommended to avoid using module-level global variables for state to prevent race conditions in concurrent environments.

Comment thread packages/core/src/utils/editCorrector.ts Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@amelidev

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a check to identify JSON-like files (such as .json, .ipynb, .yaml, etc.) and bypasses aggressive unescaping and certain corrections for them to prevent data corruption. However, the implementation in packages/core/src/utils/editCorrector.ts contains critical syntax and runtime errors, including an extra closing brace and references to an undefined cache variable instead of fileContentCorrectionCache. Additionally, it is recommended to avoid module-level global variables for caching to prevent race conditions.

Comment on lines +39 to +56
const cachedResult = cache.get(content);
if (cachedResult) {
return cachedResult;
}

if (isJsonLike) {
if (disableLLMCorrection) {
return content;
}
const correctedContent = await correctStringEscaping(
content,
baseLlmClient,
abortSignal,
);
cache.set(content, correctedContent);
return correctedContent;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are multiple issues in this block:

  1. An undefined variable cache is referenced on lines 39 and 53, which will cause a runtime ReferenceError.
  2. There is an extra closing brace } on line 56, which will cause a syntax error.
  3. Avoid using module-level global variables for state like fileContentCorrectionCache to prevent race conditions and memory issues in concurrent environments. Instead, use session-scoped or instance-scoped state and leverage standard cache implementations like LruCache.
References
  1. Avoid module-level global variables for state like caches to prevent race conditions and memory issues in concurrent environments. Instead, use session-scoped or instance-scoped state and leverage standard cache implementations like LRUCache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m A medium sized PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants