Skip to content

docs: use centralized sensitive file check script#1120

Open
nilanjan-sikdar wants to merge 7 commits intoPalisadoesFoundation:developfrom
nilanjan-sikdar:feat/Detect-sensitive-files-docs
Open

docs: use centralized sensitive file check script#1120
nilanjan-sikdar wants to merge 7 commits intoPalisadoesFoundation:developfrom
nilanjan-sikdar:feat/Detect-sensitive-files-docs

Conversation

@nilanjan-sikdar
Copy link

@nilanjan-sikdar nilanjan-sikdar commented Feb 7, 2026

What kind of change does this PR introduce?

Decoupling Configuration from YAML Workflows

Issue Number:

Fixes #1119

Did you add tests for your changes?

No

Snapshots/Videos:

If relevant, did you update the documentation?

Summary

Refactored the Check-Sensitive-Files job in .github/workflows/pull-request.yml to utilize the centralized Python script from the PalisadoesFoundation/.github repository Moved the sensitive file regex patterns into a new configuration file at .github/workflows/config/sensitive_files.txt for better maintainability and centralization

Does this PR introduce a breaking change?

No

Other information

Have you read the contributing guide?

Yes

Summary by CodeRabbit

  • Chores
    • CI/CD workflow now pulls shared pipeline scripts from a central location to streamline maintenance.
    • Replaced many shell-based pattern checks with a Python-driven sensitive-file validator invoked only when files changed and still skippable via PR label.
    • Added a curated configuration of sensitive filename patterns used by the validator.
    • Workflow now prepares a Python environment and exits early when there are no relevant changes.

@coderabbitai
Copy link

coderabbitai bot commented Feb 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds .github/workflows/config/sensitive_files.txt containing seven sensitive-file regex patterns and updates .github/workflows/pull-request.yml to checkout centralized CI scripts, set up Python, compute changed files via git diff --name-only, and invoke sensitive_file_check.py with the new config (file ends without a trailing newline).

Changes

Cohort / File(s) Summary
Sensitive Files Configuration
.github/workflows/config/sensitive_files.txt
Adds new config file listing seven sensitive-file regex patterns (^\.github/, ^package\.json$, ^sidebars\.js$, ^docusaurus\.config\.js$, ^babel\.config\.js$, ^CODEOWNERS$, ^LICENSE$). File ends with no newline.
Workflow Refactor — Sensitive File Check
.github/workflows/pull-request.yml
Replaces inline Bash sensitive-file detection with: checkout of centralized CI scripts, Python setup (setup-python@v5), compute ALL_CHANGED_FILES via git diff --name-only, and call sensitive_file_check.py with the new config and changed files. Retains PR-label gating and early exits when no base/head or no changed files.

Sequence Diagram(s)

sequenceDiagram
  participant PR as Pull Request
  participant Runner as GitHub Actions Runner
  participant Repo as Repository
  participant Central as .github-central
  participant Py as sensitive_file_check.py

  PR->>Runner: trigger pull-request workflow
  Runner->>Repo: checkout repository
  Runner->>Central: checkout centralized CI scripts into `.github-central`
  Runner->>Runner: setup Python environment
  Runner->>Runner: compute ALL_CHANGED_FILES (git diff --name-only)
  alt ALL_CHANGED_FILES non-empty
    Runner->>Py: run with config `.github/workflows/config/sensitive_files.txt` and changed files
    Py->>Py: read regex config, match patterns against changed files
    Py-->>Runner: return exit code/report (matches found or none)
    Runner-->>PR: step pass/fail based on script result
  else no changed files
    Runner-->>PR: skip sensitive-file check
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR partially addresses issue #1119 requirements. Configuration file is added (sensitive_files.txt) and workflow is refactored, but the required Python script under .github/workflows/scripts is not present in the provided changes. Add the Python script at .github/workflows/scripts/sensitive_file_check.py that implements argparse-based file checking and integrates with the workflow as required by issue #1119.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: using a centralized sensitive file check script instead of inline patterns.
Description check ✅ Passed The description follows the template, provides issue reference (#1119), explains the refactoring approach, and indicates no breaking changes or tests were added.
Out of Scope Changes check ✅ Passed All changes are within scope: configuration file addition and workflow refactoring align with issue #1119 objectives to decouple configuration from YAML workflows.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @.github/workflows/config/sensitive_files.txt:
- Around line 55-57: The sensitive-files config currently flags all Markdown and
text files via the patterns "\.md$", "\.txt$", and "\.TXT$", which causes false
positives for this docs repo; remove those three patterns from
.github/workflows/config/sensitive_files.txt or replace them with specific
filename anchors (for example use "^README\.md$" or other exact paths) so only
truly sensitive documentation files are protected.
- Around line 42-47: The regex patterns like ".*.pem$", ".*.key$", ".*.cert$",
".*.password$", ".*.secret$", and ".*.credentials$" are ambiguous because the
dot is unescaped; decide whether you intend to match all files with those
extensions or only dot-prefixed hidden files, then update each pattern
accordingly — for all files replace ".*.ext$" with ".*\.ext$" (e.g.,
".*\.pem$"), or for only dot-prefixed files use "^\..*\.ext$" (e.g.,
"^\..*\.pem$") and apply the same change to the other listed patterns.
- Around line 5-6: Remove application-specific patterns that don't exist in this
Docusaurus docs repo: delete the entries `vitest.config.js$` and `src/App.tsx$`
and audit the rest of .github/workflows/config/sensitive_files.txt to keep only
files relevant to this project (e.g., retain pnpm-lock.yaml, Docusaurus config
files, and eslint.config.mjs if present) and remove Python, Vite, Docker, Yarn,
package-lock.json, .node-version, schema.graphql, index.html, and other
non-applicable patterns; ensure the final list matches actual repository files
so the sensitive list reflects real artifacts.

In @.github/workflows/pull-request.yml:
- Around line 136-137: Replace the direct execution of the script with an
explicit Python interpreter call: stop relying on chmod +x and shebangs for
.github-central/.github/workflows/scripts/sensitive_file_check.py and instead
invoke it via python3 (or python) with the same arguments and the
"${ALL_CHANGED_FILES[@]}" array; update the workflow step that currently runs
the script directly to call the interpreter so execution is robust across
runners and file-permission states.
- Around line 86-91: The workflow currently checks out the centralized CI/CD
repo with "repository: PalisadoesFoundation/.github" and "ref: main", which is a
supply-chain risk; update the checkout step to pin "ref" to a specific commit
SHA or an explicit release tag (instead of "main") so the action always uses a
known good commit — locate the checkout step that uses "actions/checkout@v4" and
replace the ref value with the chosen commit SHA or tag, and consider
documenting the chosen SHA/tag in the workflow or repository README for future
updates.

Copy link
Contributor

@palisadoes palisadoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix

Image

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @.github/workflows/pull-request.yml:
- Around line 118-137: Rename the GitHub Actions step named "Get Changed
Unauthorized files" and remove the unused id "changed-unauth-files" to reflect
its actual behavior of detecting and validating sensitive files; for example
change the step name to "Check for sensitive file changes" and delete the id
line so it's clear this step simply runs the sensitive_file_check.py script
(referencing the python script path
.github-central/.github/workflows/scripts/sensitive_file_check.py and the config
.github/workflows/config/sensitive_files.txt) and ensure no other steps rely on
the removed id.
- Around line 131-137: Replace the use of the generic `python` command with
`python3` when invoking the sensitive file check script: update the command that
calls ".github-central/.github/workflows/scripts/sensitive_file_check.py" (the
line that currently runs `python ...`) to use `python3` so the workflow
explicitly runs Python 3 and avoids environments where `python` is missing or
points to an older interpreter.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @.github/workflows/pull-request.yml:
- Around line 130-136: The process substitution using mapfile with git diff can
hide git diff failures because the exit code is lost; modify the workflow to run
git diff into a temporary variable or file first, check its exit status, and
only then populate ALL_CHANGED_FILES (the array used by mapfile) from that
captured output; specifically run git diff --name-only --diff-filter=ACMR
"$BASE_SHA" "$HEAD_SHA" and verify its return code, exit or fail the job if it
failed, then feed the captured output into mapfile (or pass the file path)
before invoking python3
.github-central/.github/workflows/scripts/sensitive_file_check.py with --files
"${ALL_CHANGED_FILES[@]}" so failures are not silently swallowed.
- Around line 133-135: The sensitive_files.txt config lost patterns during
migration; update .github/workflows/config/sensitive_files.txt to restore the
original SENSITIVE_PATTERNS from the previous inline list by adding patterns for
".github/" (entire directory), "package.json", "sidebar.js$", ".gitignore",
".prettierignore", ".prettierrc", and "CNAME$" in addition to the existing
entries (docusaurus.config.js, babel.config.js, CODEOWNERS, LICENSE) so the
Python sensitive_file_check script sees all original sensitive targets; ensure
patterns are formatted consistently with the existing entries and include
anchors (e.g., trailing $) where shown.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @.github/workflows/config/sensitive_files.txt:
- Line 11: The file .github/workflows/config/sensitive_files.txt is missing a
POSIX trailing newline; open that file and add a single newline character at the
end so the final pattern "^CNAME$" ends with a trailing newline (ensure the file
ends with '\n'), then save the file.
- Line 7: The sensitive_files pattern currently matches '^sidebar\.js$' which is
incorrect for this repo's Docusaurus config; update the pattern in
.github/workflows/config/sensitive_files.txt to match the real filename by
replacing '^sidebar\.js$' with '^sidebars\.js$' (or include both patterns if you
want to cover either naming), ensuring the change targets the pattern string
that currently reads '^sidebar\.js$'.

In @.github/workflows/pull-request.yml:
- Around line 112-116: The "Set up Python" workflow step currently uses
actions/setup-python@v5 with python-version: 3.11 (and runs when if:
steps.check-labels.outputs.skip != 'true'); to avoid the ~5–10s overhead on
ubuntu-latest, either remove this step entirely if your jobs work with the
runner's preinstalled Python 3, or make it conditional only when a specific
Python version is required (e.g., gate it behind a new input/label or an
existing condition) so you keep actions/setup-python@v5 only when you truly need
python-version: 3.11.
- Around line 130-136: The current use of mapfile with here-string causes
ALL_CHANGED_FILES to always contain one empty element when DIFF_OUTPUT is empty,
so replace the flow around DIFF_OUTPUT, mapfile and the python3 call: first
check DIFF_OUTPUT (the variable populated by git diff) for emptiness and skip
invoking sensitive_file_check.py if empty; when feeding DIFF_OUTPUT into mapfile
(which populates ALL_CHANGED_FILES) use a safe input method that does not append
a trailing newline (e.g., use printf '%s' or a proper process substitution) so
filenames aren’t quoted with a stray newline before calling python3
.github-central/.github/workflows/scripts/sensitive_file_check.py with the
ALL_CHANGED_FILES array.

@nilanjan-sikdar
Copy link
Author

@palisadoes PTAL

Copy link
Contributor

@palisadoes palisadoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please apply the changes
  2. Do more research on the purpose and usage of the script. Your work shows that you have not done so.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/pull-request.yml (1)

118-125: ⚠️ Potential issue | 🟡 Minor

Orphaned step output — any_changed=false is written but never consumed.

This step has no id, so echo "any_changed=false" >> $GITHUB_OUTPUT on Line 123 is dead code. No downstream step can reference it. Either remove the write or assign an id if the output is intended for use.

Suggested cleanup
           # Skip if not in PR context
           if [ -z "${{ github.event.pull_request.base.sha }}" ]; then
-            echo "any_changed=false" >> $GITHUB_OUTPUT
+            echo "Not in PR context, skipping sensitive file check."
             exit 0
           fi
🤖 Fix all issues with AI agents
In @.github/workflows/config/sensitive_files.txt:
- Around line 1-7: Add missing regex patterns for the sensitive files
.gitignore, .prettierignore, .prettierrc, CNAME, pnpm-lock.yaml, and
eslint.config.mjs by adding one anchored regex per line (e.g., ^\.gitignore$
etc.); remove any redundant duplicate dollar signs so all patterns end with a
single $ (not $$); ensure each pattern is on its own line and that the file ends
with a trailing newline. Target the list of existing symbols in the file (e.g.,
.github/, package.json, sidebars.js, docusaurus.config.js, babel.config.js,
CODEOWNERS, LICENSE) when normalizing the anchors and adding the six new
patterns.

@nilanjan-sikdar
Copy link
Author

@palisadoes PTAL
Regarding the changes to pull-request.yml I applied the automated suggestions from CodeRabbit which may have make the noise
Regarding the sensitive files I limited the list strictly to what was already defined in the existing yml file as I update on the admin sorry for the inconvenience caused by me

@nilanjan-sikdar
Copy link
Author

@palisadoes Sorry, I don't want to spamming but if you have time please review and tell if any other changes needed in the pr

@github-actions
Copy link

This pull request did not get any activity in the past 10 days and will be closed in 360 days if no update occurs. Please verify it has no conflicts with the develop branch and rebase if needed.

@github-actions github-actions bot added the no-pr-activity No pull request activity label Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-pr-activity No pull request activity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docs - Python: Script to detect sensitive files with configuration file

2 participants