Skip to content

Conversation

@huonw
Copy link
Contributor

@huonw huonw commented Sep 23, 2022

The setup cache includes symlinks to things outside the cache, such as the (system) Python executable. Thus, the cache should only be reused on machines where those parts of the system exactly match up. This switches from trying to compute an appropriate cache key externally to letting the pants bootstrap script do it itself, via the new PANTS_BOOTSTRAP_TOOLS=1 ./pants bootstrap-cache-key (pantsbuild/setup#128).

This is a breaking change, in that it requires updating the checked in pants scripts to support the new tool.

Fixes #5

The setup cache includes symlinks to things outside the cache, such as
the (system) Python executable. Thus, the cache should only be reused
on machines where those parts of the system exactly match up.
PANTS_VERSION=$(grep -r '^pants_version\s*=' pants.toml)
echo "::set-output name=pants_version::$PANTS_VERSION"
PYTHON_VERSION=$(which python)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, pyenv will break this, as which will return a path to a shim, which can be different underlying interpreters at different times.

The only reliable thing to do is actually run the python interpreter and get it to report its version.

But a bigger issue here is that the version returned by which python may not be the one the pants script selects, I think. That script has its own interpreter selection logic. And folks can override that logic anyway.

So we may need to have a way for Pants itself to report the interpreter it's running on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hm, I think there's potentially two dimensions of issues for a CI cache here:

  1. the symlinks to the chosen Python executable (installed somewhere on the system outside Pants' control) in ~/.cache/pants/setup/.../install/bin/
  2. the python version matching

The setup-python action for GHA hits both, since the system Python file path ends up containing the exact python version, and thus changes to the GHA runner config can lead to those symlinks breaking.

So we may need to have a way for Pants itself to report the interpreter it's running on.

Maybe I misunderstand what you're imagining, but I think this will be too late for the setup cache? As in, CI will need to be able to extract this info before bootstrapping pants, or else it'll need to bootstrap before it can determine the cache to use to save on the bootstrapping time...


Some possible options here (just me brainstorming):

  1. make sure this init-pants action to works with the current behaviour of the setup-python action, without necessarily handling cases that might come up in other environments (e.g. pyenv/local dev), meaning the which-based approach of this PR as written may be okay
  2. adjust to something that'll handle more cases of python being a shim
    Suggested change
    PYTHON_VERSION=$(which python)
    PYTHON_VERSION=$(python -c 'import sys; print(sys.executable)')
  3. something even fancier, like having the pants shell script able to report this sort of info before bootstrapping (e.g. ./pants --bootstrap-cache-key or something, that'd print some 'opaque' string like pants_version_2.13.0-python_path_/usr/... and whatever other info is required), which could package up the TOML file parsing too. I don't know how easy it is to propagate changes to that script though.

Since this init-pants action will only ever run in GitHub actions, 1 seems fine to me, but I'm more than happy to take guidance about other approaches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that whatever python is on the path here may not be the one the pants script selects.

So yeah, we may have to have a ./pants cmd that returns the full path to intepreter it selects. This would actually be useful in general, when debugging pants bootstrapping issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, your 3. seems like the way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I've proposed a possible step towards 3 in pantsbuild/setup#128. I'm imagining this action is then updated to do something like:

    - name: Get Pants bootstrap cache key
      id: pants_version
      shell: bash
      run: |
        if ! grep 'PANTS_BOOTSTRAP_TOOLS' ./pants; then
            echo "This action requires a newer ./pants script. <update instructions here...>"
            exit 1
        fi
        
        PANTS_BOOTSTRAP_CACHE_KEY=$(PANTS_BOOTSTRAP_TOOLS=1 ./pants bootstrap-cache-key)
        echo "::set-output name=pants_bootstrap_cache_key::$PANTS_BOOTSTRAP_CACHE_KEY"

    - name: Cache Pants Setup
      id: cache-pants-setup
      uses: actions/cache@v2
      with:
        path: |
          ~/.cache/pants/setup
        key: pants-setup-${{ steps.pants_version.outputs.pants_bootstrap_cache_key }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made that update, and validated on our repo (seems to work, including the too-old pants script detection).

I've also added a drive-by fix to not run the base-commit look-up when it isn't going to be used, since it was breaking in our repo (due to a lack of gh script, I think).

benjyw pushed a commit to pantsbuild/setup that referenced this pull request Oct 7, 2022
This expands on #128 to add another bootstrap tool subcommand: `bootstrap-version`. This is designed to just be a simple number that's bumped whenever there's a feature scripts might need to query for. The version number is also automatically checked against the (numeric) value of the `PANTS_BOOTSTRAP_TOOLS` key.

For instance, suppose the bootstrap tools already existed (at version 1), and then `bootstrap-cache-key` command was added, the version number would be bumped to 2, and then a consumer (such as pantsbuild/actions#6) could run something like:

```shell
PANTS_BOOTSTRAP_CACHE_KEY=$(PANTS_BOOTSTRAP_TOOLS=2 ./pants bootstrap-cache-key)
```
@huonw huonw changed the title Use the full Python path for the setup cache Use ./pants bootstrap-cache-key for more reliable setup caching Oct 9, 2022
@benjyw benjyw merged commit d692e94 into pantsbuild:main Oct 10, 2022
@huonw huonw deleted the bugfix/5-full-python-path branch October 11, 2022 01:05
cognifloyd added a commit to StackStorm/st2 that referenced this pull request Oct 22, 2022
Github updated the default minor version of python 3.9.x
The cache key did not include the minor version which broke things for us.

Luckily, this was already fixed in pantsbuild/actions#6.
The fix requires an update to the `./pants` entrypoint/bootstrap script
which was updated in pantsbuild/setup#128.

updated `./pants` with:
curl -L -O https://static.pantsbuild.org/setup/pants && chmod +x ./pants
cognifloyd added a commit to StackStorm/st2 that referenced this pull request Oct 24, 2022
update GHA pants-init action to fix build

Github updated the default minor version of python 3.9.x
The cache key did not include the minor version which broke things for us.

Luckily, this was already fixed in pantsbuild/actions#6.
The fix requires an update to the `./pants` entrypoint/bootstrap script
which was updated in pantsbuild/setup#128.

updated `./pants` with:
curl -L -O https://static.pantsbuild.org/setup/pants && chmod +x ./pants
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

init-pants action needs use exact Python version/path in setup cache key

2 participants