feat: python notebooks template repo#1195
Conversation
| import sys | ||
|
|
||
| MINIMUM_MAJOR_VERSION = 3 | ||
| MINIMUM_MINOR_VERSION = 3 |
There was a problem hiding this comment.
This is current with AI platform (updated 12 days ago), but will do!
| sys.version_info.major < MINIMUM_MAJOR_VERSION | ||
| and sys.version_info.minor < MINIMUM_MINOR_VERSION | ||
| ): | ||
| print("Error: Python version less than 3.5") |
There was a problem hiding this comment.
Maybe dynamically generate the string to reflect the constants?
There was a problem hiding this comment.
Ah I understand that this might be old code that was inherited from several people.
There was a problem hiding this comment.
I'm fixing it in next commit!
| @@ -0,0 +1,40 @@ | |||
| steps: | |||
There was a problem hiding this comment.
Can you make sure you have the most up-to-date version of this folder? I see this file is using an older version.
| nbconvert>=6.0 | ||
| papermill | ||
| numpy | ||
| pandas No newline at end of file |
|
Question, I see you are including Does that mean this will overwrite any changes in the downstream repos? Regarding non-code files, such as MD, READMES, .gitignore: I imagine different teams can pick and choose what to integrate from synthtool. They might want their own README's but just want the notebook testing pipeline in .cloud-build. |
|
shall we include just .github and .cloud-build in this PR? @ivanmkc |
…htool into python-notebooks-templates updating pr
|
|
||
| if ( | ||
| sys.version_info.major < MINIMUM_MAJOR_VERSION | ||
| and sys.version_info.minor < MINIMUM_MINOR_VERSION |
There was a problem hiding this comment.
This might need to be or? Wouldn't 2.7 result in the else statement?
Alternatively, you could flip the if/else and check that major and minor are >= expected value. (Probably worth checking for == 3 for the major since who knows what Python 4 will be if that ever happens.
There was a problem hiding this comment.
Done in latest commit!
| @@ -0,0 +1,8 @@ | |||
| steps: | |||
There was a problem hiding this comment.
Probably not 100% necessary, but YAML supports comments, so we really should have a license header here.
| @@ -0,0 +1,20 @@ | |||
| # Google Cloud [PRODUCT] Notebooks | |||
There was a problem hiding this comment.
Probably best to omit this file for now. Or perhaps template it (see other files with .j2 extension)
There was a problem hiding this comment.
Removed with recent commit!
ivanmkc
left a comment
There was a problem hiding this comment.
LGTM except that I flagged some files to remove.
| @@ -0,0 +1,31 @@ | |||
| --- | |||
There was a problem hiding this comment.
This seems unrelated to notebook testing so perhaps remove it?
| @@ -0,0 +1,20 @@ | |||
| --- | |||
There was a problem hiding this comment.
This seems unrelated to notebook testing so perhaps remove it?
| @@ -0,0 +1,12 @@ | |||
| Fixes #<issue_number_goes_here> | |||
There was a problem hiding this comment.
This seems unrelated to notebook testing so perhaps remove it?
There was a problem hiding this comment.
sgtm! removed in latest commit
| else: | ||
| manager.delete(resource) | ||
|
|
||
| print("") |
There was a problem hiding this comment.
Does this print statement need to be here?
There was a problem hiding this comment.
It was probably to add a newline for formatting purposes. Feel free to remove.
There was a problem hiding this comment.
removing, but depending we can add back formatting if it makes things confusing.
| print("") | ||
|
|
||
|
|
||
| is_dry_run = False |
There was a problem hiding this comment.
Will this setting of the dry run override any time it's set to true?
There was a problem hiding this comment.
This is hard-coded, but could be changed to accept an arg. There was no need at the time for an arg, so it was left hard-coded.
There was a problem hiding this comment.
@ivanmkc I've rewritten the cleanup file and updated the source files for the repo in the last two commits if you want to take a look!
| for notebook in notebooks: | ||
| print(f"Running notebook: {notebook}") | ||
|
|
||
| # TODO: Handle cases where multiple notebooks have the same name |
There was a problem hiding this comment.
Is there an issue open tracking this todo somewhere?
There was a problem hiding this comment.
Not at the moment, could track in the synthtool issue tracker.
There was a problem hiding this comment.
agreed, removing and let's track via issues.
| stderr_file=sys.stderr, | ||
| ) | ||
|
|
||
| except Exception: |
There was a problem hiding this comment.
I'm a little skittish about catching ALL exceptions, but at least we're raising them so I think I feel better about that. Do other @googleapis/python-samples-owners have opinions here?
There was a problem hiding this comment.
If an exception isn't caught, then Cloud Build will fail on that step, basically failing all the notebooks (even ones that would have otherwise passed.)
Also, the cleanup and debug steps at the end of the Cloud Build pipeline will never run, thus the user wouldn't be able to debug the error.
There was a problem hiding this comment.
Actually, this whole file seems out of date, it should be matching this file: https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/.cloud-build/ExecuteNotebook.py
| @@ -0,0 +1 @@ | |||
| google-cloud-aiplatform No newline at end of file | |||
There was a problem hiding this comment.
For most other requirements files, we prefer to pin to specific versions and have them updated by renovate bot. (You can add this non-standard filename to the renovate.json). Is there a reason we're not doing that here?
There was a problem hiding this comment.
Ooh, I think we just weren't aware of the other standards for synthtool. I'll look into changing this in the PR.
| args: | ||
| - -c | ||
| - 'python3 -m pip install -U -r .cloud-build/cleanup/cleanup-requirements.txt && python3 .cloud-build/cleanup/cleanup.py' | ||
| timeout: 86400s No newline at end of file |
There was a problem hiding this comment.
Is there a reason why we have such a high timeout?
| # limitations under the License. | ||
| steps: | ||
| # Show the gcloud info and check if gcloud exists | ||
| - name: ${_PYTHON_IMAGE} |
There was a problem hiding this comment.
Are the cloud build variables needed documented somewhere for our internal reference? I'm not sure if we do this well in general or not - @dinagraves opinions welcome here 😁
There was a problem hiding this comment.
These would generally be in trigger definition, which doesn't have a lot of visibility outside of the project itself -- unless you're using terraform. However, I would suggest adding defaults in the cloudbuild.yaml - these can be overwritten by anyone when they run it, but it'll provide more context and a good default.
There was a problem hiding this comment.
Yup, adding defaults would work fine @loferris
Also, +1 on documentation.
FYI $_BASE_BRANCH is injected by Cloud Build (see https://cloud.google.com/build/docs/configuring-builds/substitute-variable-values#using_default_substitutions)
$_FORCE_BASE_BRANCH is used because sometimes that variable is not populated (may be a bug), when you rerun a build.
| args: | ||
| - -c | ||
| - 'echo "Download executed notebooks with this command: \"mkdir -p artifacts && gsutil rsync -r gs://${_GCS_ARTIFACTS_BUCKET}/test-artifacts/PR_${_PR_NUMBER}/BUILD_${BUILD_ID} artifacts/\"" && if [ "$(ls -A /workspace/${BUILD_ID}/failure | wc -l)" -ne 0 ]; then exit 1; else exit 0; fi' | ||
| timeout: 86400s No newline at end of file |
There was a problem hiding this comment.
flagging another very high timeout
There was a problem hiding this comment.
It's 24 hours, VertexAI notebooks can take many hours to run.
| @@ -0,0 +1,8 @@ | |||
| ipython>=7.0 | |||
There was a problem hiding this comment.
similar to above comment, can we pin to exact versions and use renovate to keep up with it? or is that less of an issue with a cloud build script compared to with samples maintenance?
There was a problem hiding this comment.
I think that's possible, although I've never used renovate
There was a problem hiding this comment.
@leahecole I think my main question here is if this template will be incorporated into repos downstream, wouldn't that mean that the template itself should contain its own renovate bot so its usable downstream? And if I do that, will that interfere with the renovate set up for this repo? I can add the requirements to this repo's requirements.txt at project root but I'm not sure how that would be passed to all the downstream repos using this template.
There was a problem hiding this comment.
Good question. Will it be incorporated into samples libraries with owlbot, which already have renovatebot enabled? If so, we would want to modify this template. I believe that adding a new renovate.json template would make our 🤖 friend and the folks who maintain it extremely confused.
Or, are these notebook templates going to be used ad-hoc in random repos? If it's going to be used in random repos under the GoogleCloudPlatform and googleapis orgs, we do have renovatebot enabled at the org level and the way to bring it to life would be to create a new renovate.json template (like that other one I linked).
There was a problem hiding this comment.
Other than python-docs-samples I'm not sure any of the other repos already have owlbot enabled, though that would be our next step! However, both of the main repos currently are under GCP not googleapis.
Would the suggestion be to add a renovate.json to the root of this template, make sure the right file extensions are flagged, and then leave blank (?) requirements files?
There was a problem hiding this comment.
They all probably have it, but similar to renovate, it can magically be turned on with the addition of its config file (in owlbots case, owlbot.py)
Given that it sounds like the main repos are in the GCP org, I think that we actually might be covered - renovate will match any requirements.txt file now that I think about it, so as long as it's in a repo that has a renovate.json we are good!
There was a problem hiding this comment.
awesome, so I've gone ahead and made sure all dependencies have specific versions and made sure the file names are easy for renovatebot to parse. see latest two commits!
ivanmkc
left a comment
There was a problem hiding this comment.
Blocking on execute_notebooks.
It should match: https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/.cloud-build/ExecuteNotebook.py
Can you double-check that all the other files match https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/.cloud-build/**, unless there are intentional changes?
| @@ -0,0 +1,160 @@ | |||
| #!/usr/bin/env python | |||
There was a problem hiding this comment.
This file also looks out-of-date, it should match https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/.cloud-build/ExecuteChangedNotebooks.py
There was a problem hiding this comment.
Will do - I think it's just out of date because the PR has been in flight for a minute.
ivanmkc
left a comment
There was a problem hiding this comment.
execute_changed_notebooks.py should match
https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/.cloud-build/ExecuteChangedNotebooks.py
| entrypoint: /bin/sh | ||
| args: | ||
| - -c | ||
| - 'python3 -m pip freeze && python3 .cloud-build/ExecuteChangedNotebooks.py --test_paths_file "${_TEST_PATHS_FILE}" --base_branch "${_FORCED_BASE_BRANCH}" --output_folder ${BUILD_ID} --variable_project_id ${PROJECT_ID} --variable_region ${_GCP_REGION}' |
There was a problem hiding this comment.
ExecuteChangedNotebooks.py -> execute_changed_notebooks.py
ivanmkc
left a comment
There was a problem hiding this comment.
Not sure how to add it in this repo, but can we add CI tests to test that all of this works?
| # limitations under the License. | ||
| steps: | ||
| # Show the gcloud info and check if gcloud exists | ||
| - name: ${_PYTHON_IMAGE} |
There was a problem hiding this comment.
These would generally be in trigger definition, which doesn't have a lot of visibility outside of the project itself -- unless you're using terraform. However, I would suggest adding defaults in the cloudbuild.yaml - these can be overwritten by anyone when they run it, but it'll provide more context and a good default.
| entrypoint: /bin/sh | ||
| args: | ||
| - -c | ||
| - 'gcloud config list' |
There was a problem hiding this comment.
Is this going to work? I guess I don't know what your python image is. Usually I would use the gcloud SDK image for this. If you're using something custom, I would add some comments about it.
There was a problem hiding this comment.
It works. The image has the following requirements: https://docs.google.com/document/d/1MjbMDrm4GvfLJfqnPmdTUS8dueKdRAUsn5bmm2aCoZo/edit?resourcekey=0-LE_XfwNqTclOKG8hbFnxrQ#bookmark=id.412ps3ifvobk
There was a problem hiding this comment.
leaving as is for now
| args: | ||
| - -c | ||
| - 'gcloud config list' | ||
| # Check the Python version |
There was a problem hiding this comment.
Do all of these need to be separate steps? Since you're using the shell entrypoint, you could combine some of them, unless you want them to be distinct failures.
There was a problem hiding this comment.
@ivanmkc what was your thinking on combining these?
There was a problem hiding this comment.
They are kept separate for readability.
| entrypoint: /bin/sh | ||
| args: | ||
| - -c | ||
| - 'python3 -m pip freeze && python3 .cloud-build/ExecuteChangedNotebooks.py --test_paths_file "${_TEST_PATHS_FILE}" --base_branch "${_FORCED_BASE_BRANCH}" --output_folder ${BUILD_ID} --variable_project_id ${PROJECT_ID} --variable_region ${_GCP_REGION}' |
There was a problem hiding this comment.
I would recommend splitting this up on separate lines since it's quite long. See example here: https://github.com/dinagraves/cloud-build-empathy-solution/blob/main/cloudbuild.yaml#L36
There was a problem hiding this comment.
changed in last commit - I followed that pattern for the cited example as I didn't see more multi-line commands.
| - 'python3 -m pip freeze && python3 .cloud-build/ExecuteChangedNotebooks.py --test_paths_file "${_TEST_PATHS_FILE}" --base_branch "${_FORCED_BASE_BRANCH}" --output_folder ${BUILD_ID} --variable_project_id ${PROJECT_ID} --variable_region ${_GCP_REGION}' | ||
| env: | ||
| - 'IS_TESTING=1' | ||
| # Manually copy artifacts to GCS |
There was a problem hiding this comment.
You can use the built-in artifacts feature: https://github.com/GoogleCloudPlatform/cloud-build-samples/blob/main/python-example-flask/cloudbuild.yaml#L54
There was a problem hiding this comment.
changed in latest commit!
There was a problem hiding this comment.
Hi Dinah - I had to revert this change as it wasn't working for our particular setup!
Agreed, let's add it as an issue for a followup PR and I'll investigate with Synthtool folks. |
|
@dinagraves the current commit reflects some adjustments based on your comments and @ivanmkc's input! would love any additional feedback before I ping folks on approvals! |
chingor13
left a comment
There was a problem hiding this comment.
Is this used by multiple repositories? If not, we can just manage the files in the one python-notebooks repository rather than indirectly managing the files here.
|
@chingor13 It's being used in multiple repositories and hopefully many more. |
bcoe
left a comment
There was a problem hiding this comment.
I'm very much in support of codifying how notebooks are created.
A few non blocking questions:
- it looks like unlike other templates this will use Cloud Build for testing, rather than kokoro?
- where will the environment variables be set, such as project name, for running the notebook tests.
- what project will the notebook tests run against?
I'm going to make one more commit to fix a little bug I found on my testing branch before merging! |
|
I just came here to catch up on the comments and am thrilled this is merged!! Great work @loferris !!! 😁 |
|
(and great work many many reviewers!! 😀 ) |
Adding a template repo for Python notebooks, updating common.py