Skip to content

add a backend for running codespell as a linter#22989

Open
cburroughs wants to merge 4 commits intopantsbuild:mainfrom
cburroughs:csb/codespell
Open

add a backend for running codespell as a linter#22989
cburroughs wants to merge 4 commits intopantsbuild:mainfrom
cburroughs:csb/codespell

Conversation

@cburroughs
Copy link
Contributor

To quote from https://github.com/codespell-project/codespell: "Fix common misspellings in text files. It's designed primarily for checking misspelled words in source code". The intent is to have few enough false positives that it could be used as a linter. When I run it at a $DAYJOB repo it picks all sorts of embarrassments like:

lastest ==> latest, last
worfklow ==> workflow
imapct ==> impact
Nmber ==> Number

LLM Disclosure: I tried to code this as an experiment in learning hard on Claude to understand Pants backends, the
vast majority of the code was generated by Claude which I then had it iterated on before minor cleanup edits. My first prompt was:

> We are going to create a new backend for https://pypi.org/project/codespell  in the Pants build system under
src/python/pants/backend/tools/backend.  Look at these SHAs for examples of adding new backends
f6e51c2873d51df2b63853a0b8db13b4e94292f3
cb63bba66817677a1dcb862c150e6fc7ca9f96dd
9465d3d75091d7ca44bbfc492c09e3c6d418a4e8

I went back a forth a bunch on partitioning strategies. It seemed to me that what users expect is to have multiple config files and things Just Work -- albeit with uncertainty with regards to the expected behavior being "use the nearest config file" or "magically merge them". So I went with per config partitioning, but leaned in the process that most backends use a single partition.

  • partition_inputs is long and hard to follow...
  • But! it is almost identical to yamllint I think we are lacking in a good abstraction for config based partitioning and if we had one.
  • codespell uses different flags based on the format of the config file, which adds some more incidental conditionals.
  • But on the third hand, I don't think a human would bother with supporting this complicated a partitioning strategy just to check some words.

This is real backend I'd like to land and use it, but I'm neutral on which partitioning strategy is best to keep.

@cburroughs cburroughs self-assigned this Jan 8, 2026


class CodespellRequest(LintFilesRequest):
tool_subsystem = Codespell # type: ignore[assignment]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this at a bunch of places

src/python/pants/backend/adhoc/code_quality_tool.py:        class CodeQualityProcessingRequest(LintFilesRequest):
src/python/pants/backend/adhoc/code_quality_tool.py-            tool_subsystem = CodeQualityToolInstance  # type: ignore[assignment]
--
src/python/pants/backend/project_info/regex_lint.py:class RegexLintRequest(LintFilesRequest):
src/python/pants/backend/project_info/regex_lint.py-    tool_subsystem = RegexLintSubsystem  # type: ignore[assignment]
--
src/python/pants/backend/tools/codespell/rules.py:class CodespellRequest(LintFilesRequest):
src/python/pants/backend/tools/codespell/rules.py-    tool_subsystem = Codespell  # type: ignore[assignment]
--
src/python/pants/backend/tools/trufflehog/rules.py:class TrufflehogRequest(LintFilesRequest):
src/python/pants/backend/tools/trufflehog/rules.py-    tool_subsystem = Trufflehog  # type: ignore[assignment]
--
src/python/pants/backend/tools/yamllint/rules.py:class YamllintRequest(LintFilesRequest):
src/python/pants/backend/tools/yamllint/rules.py-    tool_subsystem = Yamllint  # type: ignore[assignment]

@cburroughs
Copy link
Contributor Author

What this currently looks like in this repo: https://gist.github.com/cburroughs/30e3752899a39b769abe44b5435092be

@cburroughs cburroughs marked this pull request as ready for review January 8, 2026 20:58
@cburroughs
Copy link
Contributor Author

#21937 has an example of sophisticated partitioning

@cburroughs cburroughs changed the title add a backend for running codespell a linter add a backend for running codespell as a linter Jan 21, 2026
@cburroughs
Copy link
Contributor Author

Thanks for the contribution. We've just branched for 2.31.x, so merging this pull request now will come out in 2.32.x, please move the release notes updates to docs/notes/2.32.x.md if that's appropriate.

To quote from <https://github.com/codespell-project/codespell>: "Fix
common misspellings in text files. It's designed primarily for
checking misspelled words in source code".  The intent is to have few
enough false positives that it could be used as a linter.  When I run
it at a $DAYJOB repo it picks all sorts of embarrassments like:

```
lastest ==> latest, last
worfklow ==> workflow
imapct ==> impact
Nmber ==> Number
```

LLM Disclosure:  I tried to code this as an experiment in learning
hard on Claude to understand Pants backends, the
vast majority of the code was generated by Claude which I then
had it iterated on before minor cleanup edits.  My first prompt was:

```
> We are going to create a new backend for https://pypi.org/project/codespell  in the Pants build system under
src/python/pants/backend/tools/backend.  Look at these SHAs for examples of adding new backends
f6e51c2
cb63bba
9465d3d
```

I went back a forth a bunch on partitioning strategies.  It seemed to
me that what users expect is to have multiple config files and things
Just Work -- albeit with uncertainty with regards to the expected
behavior being "use the nearest config file" or "magically merge
them".  So I went with per config partitioning, but leaned in the
process that most backends use a single partition.
 * partition_inputs is long and hard to follow...
 * But! it is almost identical to yamllint  I think we are lacking in
 a good abstraction for config based partitioning and if we had one.
 * codespell uses different flags based on the format of the
 config file, which adds some more incidental conditionals.
 * But on the third hand, I don't think a human would bother with
 supporting this complicated a partitioning strategy just to check
 some words.

So this is real code and I'd like to land and use it, but I'm neutral
on which partitioning strategy is best to keep.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant