-
Notifications
You must be signed in to change notification settings - Fork 754
Enhanced GitHub Token Integration for Docker Builds #1322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,166 @@ | ||
| # GitHub Token Integration | ||
|
|
||
| This document describes the enhanced GitHub token integration feature that automatically handles GitHub API rate limiting during Docker builds. | ||
|
|
||
| ## Overview | ||
|
|
||
| The GitHub token integration automatically detects GitHub API calls in Dockerfiles and pre-fetches the required data using authenticated API requests. This provides several benefits: | ||
|
|
||
| - **Higher rate limits**: Authenticated requests get 5000 requests/hour vs 60 for unauthenticated | ||
| - **More reliable builds**: No more build failures due to rate limiting | ||
| - **Better caching**: Uses `COPY` instead of `ADD` for improved Docker layer caching | ||
| - **Automatic fallback**: Falls back to the `--no-github-api` workaround if needed | ||
|
|
||
| ## How It Works | ||
|
|
||
| 1. **Detection**: The build system scans Dockerfiles for `ADD https://api.github.com/...` lines | ||
| 2. **Authentication**: Uses GitHub token from environment variables if available | ||
| 3. **Pre-fetching**: Fetches commit hashes and other data using authenticated API calls | ||
| 4. **File creation**: Creates temporary JSON files with the fetched data | ||
| 5. **Dockerfile modification**: Replaces `ADD` with `COPY` instructions | ||
| 6. **Build integration**: Passes commit hashes as build arguments | ||
| 7. **Cleanup**: Automatically removes temporary files after build completion | ||
|
|
||
| ## Environment Variables | ||
|
|
||
| The system supports multiple environment variable names for GitHub tokens: | ||
|
|
||
| - `GITHUB_TOKEN` (primary) | ||
| - `GITHUB_PAT` (Personal Access Token) | ||
| - `GH_TOKEN` (GitHub CLI style) | ||
|
|
||
| Set any of these variables before running builds: | ||
|
|
||
| ```bash | ||
| export GITHUB_TOKEN=ghp_your_token_here | ||
| ./build.sh sudonim | ||
| ``` | ||
|
|
||
| ## Usage Examples | ||
|
|
||
| ### Basic Usage (with token) | ||
|
|
||
| ```bash | ||
| # Set your GitHub token | ||
| export GITHUB_TOKEN=ghp_your_token_here | ||
|
|
||
| # Build normally - GitHub API calls will be automatically handled | ||
| ./build.sh sudonim | ||
| ./build.sh mlc vllm | ||
| ``` | ||
|
|
||
| ### Fallback Usage (without token) | ||
|
|
||
| ```bash | ||
| # If no token is set, the system will warn you but continue | ||
| ./build.sh sudonim | ||
|
|
||
| # Or explicitly disable GitHub API usage | ||
| ./build.sh --no-github-api sudonim | ||
| ``` | ||
|
|
||
| ### Multiple Packages | ||
|
|
||
| ```bash | ||
| # Build multiple packages with automatic GitHub API handling | ||
| ./build.sh --multiple sudonim mlc vllm | ||
|
|
||
| # Or chain them together | ||
| ./build.sh sudonim mlc vllm | ||
| ``` | ||
|
|
||
| ## Build Arguments | ||
|
|
||
| When GitHub API calls are pre-processed, the commit hashes are automatically added as build arguments: | ||
|
|
||
| - `GITHUB_DUSTY_NV_SUDONIM_COMMIT`: Latest commit SHA for dusty-nv/sudonim | ||
| - `GITHUB_DUSTY_NV_MLC_COMMIT`: Latest commit SHA for dusty-nv/mlc | ||
| - etc. | ||
|
|
||
| These can be used in Dockerfiles or build scripts for version tracking. | ||
|
|
||
| ## Technical Details | ||
|
|
||
| ### File Structure | ||
|
|
||
| During preprocessing, the system creates: | ||
|
|
||
| ``` | ||
| package_directory/ | ||
| ├── Dockerfile # Original Dockerfile | ||
| ├── Dockerfile.with-github-data # Modified Dockerfile (temporary) | ||
| └── .github-api-temp/ # Temporary data directory | ||
| ├── dusty_nv_sudonim_main.json | ||
| └── dusty_nv_mlc_main.json | ||
| ``` | ||
|
|
||
| ### Dockerfile Transformation | ||
|
|
||
| **Before (original):** | ||
| ```dockerfile | ||
| ADD https://api.github.com/repos/dusty-nv/sudonim/git/refs/heads/main /tmp/sudonim_version.json | ||
| ``` | ||
|
|
||
| **After (processed):** | ||
| ```dockerfile | ||
| COPY .github-api-temp/dusty_nv_sudonim_main.json /tmp/sudonim_version.json | ||
| ``` | ||
|
|
||
| ### Error Handling | ||
|
|
||
| - **Token missing**: System continues with unauthenticated requests (may hit rate limits) | ||
| - **API failures**: Falls back to original Dockerfile behavior | ||
| - **Preprocessing errors**: Logs warnings and continues with original approach | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Rate Limit Errors | ||
|
|
||
| If you still see rate limit errors: | ||
|
|
||
| 1. **Check token**: Verify your GitHub token is set correctly | ||
| 2. **Token permissions**: Ensure token has `repo` access for private repositories | ||
| 3. **Token expiration**: GitHub tokens can expire; generate a new one if needed | ||
|
|
||
| ### Build Failures | ||
|
|
||
| If builds fail with GitHub API issues: | ||
|
|
||
| 1. **Use fallback**: Add `--no-github-api` flag | ||
| 2. **Check logs**: Look for GitHub API preprocessing messages | ||
| 3. **Verify connectivity**: Ensure your system can reach GitHub's API | ||
|
|
||
| ### Debug Mode | ||
|
|
||
| Enable verbose logging to see detailed GitHub API processing: | ||
|
|
||
| ```bash | ||
| ./build.sh --verbose --log-level=debug sudonim | ||
| ``` | ||
|
|
||
| ## Migration from --no-github-api | ||
|
|
||
| The `--no-github-api` flag is still supported and works as before. The new integration: | ||
|
|
||
| - **Enhances** the existing system rather than replacing it | ||
| - **Automatically** handles GitHub API calls when possible | ||
| - **Falls back** to the original workaround when needed | ||
| - **Maintains** backward compatibility | ||
|
|
||
| ## Future Enhancements | ||
|
|
||
| Potential improvements for future versions: | ||
|
|
||
| - **Caching**: Cache GitHub API responses to reduce API calls | ||
| - **Batch processing**: Process multiple repositories in single API calls | ||
| - **Webhook integration**: Trigger rebuilds on repository updates | ||
| - **Rate limit monitoring**: Track and report API usage | ||
|
|
||
| ## Contributing | ||
|
|
||
| To contribute to this feature: | ||
|
|
||
| 1. **Test thoroughly**: Ensure your changes work with various GitHub API scenarios | ||
| 2. **Handle errors gracefully**: Always provide fallback behavior | ||
| 3. **Update documentation**: Keep this document current with any changes | ||
| 4. **Follow patterns**: Use the existing logging and error handling patterns |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -28,6 +28,7 @@ | |||||||||||||||||
| split_container_name, query_yes_no, needs_sudo, sudo_prefix, | ||||||||||||||||||
| get_env, get_dir, get_repo_dir | ||||||||||||||||||
| ) | ||||||||||||||||||
| from .network import preprocess_dockerfile_for_github_api | ||||||||||||||||||
|
|
||||||||||||||||||
| _NEWLINE_=" \\\n" # used when building command strings | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
@@ -179,8 +180,28 @@ def build_container( | |||||||||||||||||
| if 'dockerfile' in pkg: | ||||||||||||||||||
| cmd = f"{sudo_prefix()}DOCKER_BUILDKIT=0 docker build --network=host" + _NEWLINE_ | ||||||||||||||||||
| cmd += f" --tag {container_name}" + _NEWLINE_ | ||||||||||||||||||
| if no_github_api: | ||||||||||||||||||
| dockerfilepath = os.path.join(pkg['path'], pkg['dockerfile']) | ||||||||||||||||||
|
|
||||||||||||||||||
| dockerfilepath = os.path.join(pkg['path'], pkg['dockerfile']) | ||||||||||||||||||
| github_build_args = {} | ||||||||||||||||||
|
|
||||||||||||||||||
| # Try to pre-process GitHub API calls if not explicitly disabled | ||||||||||||||||||
| if not no_github_api: | ||||||||||||||||||
| try: | ||||||||||||||||||
| processed_dockerfile, github_build_args = preprocess_dockerfile_for_github_api(dockerfilepath, pkg['path']) | ||||||||||||||||||
| if github_build_args: | ||||||||||||||||||
| # Merge with existing build args | ||||||||||||||||||
| if 'build_args' not in pkg: | ||||||||||||||||||
| pkg['build_args'] = {} | ||||||||||||||||||
| pkg['build_args'].update(github_build_args) | ||||||||||||||||||
| dockerfilepath = processed_dockerfile | ||||||||||||||||||
| log_info(f"Pre-processed GitHub API calls for {package}") | ||||||||||||||||||
| except Exception as e: | ||||||||||||||||||
| log_warning(f"Failed to pre-process GitHub API calls for {package}: {e}") | ||||||||||||||||||
| # Fall back to original Dockerfile | ||||||||||||||||||
| dockerfilepath = os.path.join(pkg['path'], pkg['dockerfile']) | ||||||||||||||||||
|
|
||||||||||||||||||
| # Fall back to no_github_api logic if preprocessing failed or was disabled | ||||||||||||||||||
| if no_github_api or (dockerfilepath == os.path.join(pkg['path'], pkg['dockerfile']) and 'ADD https://api.github.com' in open(dockerfilepath, 'r').read()): | ||||||||||||||||||
| with open(dockerfilepath, 'r') as fp: | ||||||||||||||||||
| data = fp.read() | ||||||||||||||||||
| if 'ADD https://api.github.com' in data: | ||||||||||||||||||
|
|
@@ -189,9 +210,9 @@ def build_container( | |||||||||||||||||
| os.system(f"sed 's|^ADD https://api.github.com|#[minus-github-api]ADD https://api.github.com|' -i {dockerfilepath_minus_github_api}") | ||||||||||||||||||
| cmd += f" --file {os.path.join(pkg['path'], pkg['dockerfile'] + '.minus-github-api')}" + _NEWLINE_ | ||||||||||||||||||
| else: | ||||||||||||||||||
| cmd += f" --file {os.path.join(pkg['path'], pkg['dockerfile'])}" + _NEWLINE_ | ||||||||||||||||||
| cmd += f" --file {dockerfilepath}" + _NEWLINE_ | ||||||||||||||||||
| else: | ||||||||||||||||||
| cmd += f" --file {os.path.join(pkg['path'], pkg['dockerfile'])}" + _NEWLINE_ | ||||||||||||||||||
| cmd += f" --file {dockerfilepath}" + _NEWLINE_ | ||||||||||||||||||
|
|
||||||||||||||||||
| cmd += f" --build-arg BASE_IMAGE={base}" + _NEWLINE_ | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
@@ -280,6 +301,31 @@ def build_container( | |||||||||||||||||
| log_success(f'✅ <b>`jetson-containers build {repo_name}`</b> ({name})') | ||||||||||||||||||
| log_success(f'⏱️ Total build time: {total_duration:.1f} seconds ({total_duration/60:.1f} minutes)') | ||||||||||||||||||
| log_success('=====================================================================================') | ||||||||||||||||||
|
|
||||||||||||||||||
| # Clean up temporary GitHub API files | ||||||||||||||||||
| try: | ||||||||||||||||||
| for pkg in packages: | ||||||||||||||||||
| pkg_info = find_package(pkg) | ||||||||||||||||||
|
Comment on lines
+306
to
+308
|
||||||||||||||||||
| try: | |
| for pkg in packages: | |
| pkg_info = find_package(pkg) | |
| # Cache package info for all packages to avoid repeated find_package() calls | |
| package_infos = {pkg: find_package(pkg) for pkg in packages} | |
| try: | |
| for pkg in packages: | |
| pkg_info = package_infos[pkg] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,7 +5,7 @@ | |
| import time | ||
| from typing import Dict, Literal | ||
|
|
||
| from .logging import log_error, log_warning, log_verbose | ||
| from .logging import log_error, log_warning, log_verbose, log_info | ||
|
|
||
|
|
||
| def handle_text_request(url, retries=3, backoff=5): | ||
|
|
@@ -69,6 +69,19 @@ def handle_json_request(url: str, headers: Dict[str, str] = None, | |
| return None | ||
|
|
||
|
|
||
| def get_github_token(): | ||
| """Get GitHub token from environment variables with fallbacks""" | ||
| token = os.environ.get('GITHUB_TOKEN') or \ | ||
| os.environ.get('GITHUB_PAT') or \ | ||
| os.environ.get('GH_TOKEN') | ||
|
|
||
| if not token: | ||
| log_warning("No GitHub token found. API calls will be unauthenticated and may hit rate limits.") | ||
| log_info("Set GITHUB_TOKEN, GITHUB_PAT, or GH_TOKEN environment variable for higher rate limits.") | ||
|
|
||
| return token | ||
|
|
||
|
|
||
| @functools.lru_cache(maxsize=None) | ||
| def github_api(url: str): | ||
| """ | ||
|
|
@@ -80,7 +93,7 @@ def github_api(url: str): | |
| Returns: | ||
| dict or None: The parsed JSON response data as a dictionary, or None if an error occurs. | ||
| """ | ||
| github_token = os.environ.get('GITHUB_TOKEN') | ||
| github_token = get_github_token() | ||
| headers = {'Authorization': f'token {github_token}'} if github_token else None | ||
| request_url = f'https://api.github.com/{url}' | ||
|
|
||
|
|
@@ -144,3 +157,77 @@ def get_json_value_from_url(url: str, notation: str = None): | |
| return None | ||
|
|
||
| return data | ||
|
|
||
|
|
||
| def preprocess_dockerfile_for_github_api(dockerfile_path: str, pkg_path: str): | ||
| """ | ||
| Pre-process Dockerfile to replace GitHub API calls with pre-fetched data. | ||
|
|
||
| This function: | ||
| 1. Detects GitHub API calls in Dockerfiles | ||
| 2. Pre-fetches the data using authenticated API calls | ||
| 3. Creates temporary files with the fetched data | ||
| 4. Modifies the Dockerfile to use COPY instead of ADD | ||
|
|
||
| Args: | ||
| dockerfile_path (str): Path to the original Dockerfile | ||
| pkg_path (str): Path to the package directory | ||
|
|
||
| Returns: | ||
| tuple: (modified_dockerfile_path, build_args_dict) or (original_path, None) if no changes | ||
| """ | ||
| import re | ||
| import json | ||
| import os | ||
|
|
||
| with open(dockerfile_path, 'r') as fp: | ||
| content = fp.read() | ||
|
|
||
| # Find all GitHub API calls | ||
|
||
| github_api_pattern = r'ADD https://api\.github\.com/repos/([^/]+/[^/]+)/git/refs/heads/([^\s]+)\s+([^\s]+)' | ||
| matches = re.findall(github_api_pattern, content) | ||
|
|
||
| if not matches: | ||
| return dockerfile_path, None | ||
|
|
||
| # Create a temporary directory for pre-fetched data | ||
| temp_dir = os.path.join(pkg_path, '.github-api-temp') | ||
| os.makedirs(temp_dir, exist_ok=True) | ||
|
|
||
| modified_content = content | ||
| build_args = {} | ||
|
|
||
| for owner_repo, branch, dest_path in matches: | ||
| log_info(f"Pre-fetching GitHub data for {owner_repo}/{branch}") | ||
|
|
||
| # Fetch the commit hash using authenticated API | ||
| commit_sha = github_latest_commit(owner_repo, branch) | ||
|
|
||
| if commit_sha: | ||
| # Create a temporary file with the commit data | ||
| temp_file = os.path.join(temp_dir, f"{owner_repo.replace('/', '_')}_{branch}.json") | ||
| with open(temp_file, 'w') as f: | ||
| json.dump({"sha": commit_sha, "ref": f"refs/heads/{branch}"}, f) | ||
|
|
||
| # Replace ADD with COPY | ||
| old_line = f'ADD https://api.github.com/repos/{owner_repo}/git/refs/heads/{branch} {dest_path}' | ||
| new_line = f'COPY .github-api-temp/{os.path.basename(temp_file)} {dest_path}' | ||
| modified_content = modified_content.replace(old_line, new_line) | ||
|
|
||
| # Add build arg for the commit SHA | ||
| build_args[f'GITHUB_{owner_repo.replace("/", "_").upper()}_COMMIT'] = commit_sha | ||
|
|
||
| log_info(f"Successfully pre-fetched commit {commit_sha[:8]} for {owner_repo}/{branch}") | ||
| else: | ||
| log_warning(f"Failed to fetch commit for {owner_repo}/{branch}, keeping original ADD line") | ||
|
|
||
| if modified_content != content: | ||
| # Write modified Dockerfile | ||
| modified_dockerfile = dockerfile_path + '.with-github-data' | ||
| with open(modified_dockerfile, 'w') as fp: | ||
| fp.write(modified_content) | ||
|
|
||
| log_info(f"Created modified Dockerfile: {modified_dockerfile}") | ||
| return modified_dockerfile, build_args | ||
|
|
||
| return dockerfile_path, None | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file is being opened and read multiple times unnecessarily. The content should be read once and stored in a variable to avoid repeated I/O operations.