Skip to content

Conversation

@tokk-nv
Copy link
Contributor

@tokk-nv tokk-nv commented Aug 24, 2025

Problem Statement

This PR addresses the persistent and frustrating GitHub API rate limiting issue that plagues Docker builds:

ADD failed: failed to GET https://api.github.com/repos/dusty-nv/sudonim/git/refs/heads/main with status 403 Forbidden: {"message":"API rate limit exceeded for 216.228.112.22. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)","documentation_url":"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"}

Current Impact:

Solution

Implement intelligent GitHub token integration that automatically handles API rate limiting while maintaining all existing functionality.

Key Features

🔑 Enhanced GitHub Token Detection

  • Support for multiple environment variables: GITHUB_TOKEN, GITHUB_PAT, GH_TOKEN
  • Intelligent fallback with helpful warnings when tokens are missing
  • Clear guidance on rate limit implications

🎯 Automatic Dockerfile Pre-processing

  • Detects ADD https://api.github.com/... lines in Dockerfiles
  • Pre-fetches commit data using authenticated API calls (5000 requests/hour vs 60)
  • Transforms ADD instructions to COPY with pre-fetched data
  • Injects commit hashes as build arguments for version tracking

🔄 Seamless Fallback System

  • Success path: Uses enhanced approach with pre-fetched data
  • Fallback path: Automatically falls back to existing --no-github-api workaround
  • Explicit bypass: --no-github-api flag still works exactly as before
  • No build failures due to GitHub API issues

♻️ Automatic Cleanup

  • Removes temporary files after builds
  • Maintains a clean package directory state
  • No file accumulation over time

User Experience

With Valid Token (Recommended)

export GITHUB_TOKEN=ghp_your_token_here
./build.sh sudonim  # Works automatically with enhanced caching

Without Token (Graceful Fallback)

./build.sh sudonim  # Automatically falls back with helpful warnings

Explicit Fallback (Always Works)

./build.sh --no-github-api sudonim  # Original behavior unchanged

Technical Implementation

Dockerfile Transformation

Before:

ADD https://api.github.com/repos/dusty-nv/sudonim/git/refs/heads/main /tmp/sudonim_version.json

After:

COPY .github-api-temp/dusty-nv_sudonim_main.json /tmp/sudonim_version.json

Build Arguments

Automatically adds commit hashes as build arguments:

- Add get_github_token() function with multiple environment variable support
- Implement preprocess_dockerfile_for_github_api() for automatic GitHub API call handling
- Integrate GitHub token authentication into build_container() function
- Replace ADD https://api.github.com calls with COPY instructions using pre-fetched data
- Add automatic cleanup of temporary files after builds
- Maintain backward compatibility with --no-github-api flag
- Add comprehensive documentation and test script
- Support multiple token environment variables: GITHUB_TOKEN, GITHUB_PAT, GH_TOKEN

This enhancement provides higher rate limits (5000 vs 60 requests/hour) and more reliable builds by pre-fetching GitHub data using authenticated API calls instead of relying on Docker's ADD instruction during build time.
- Add cleanup logic for Dockerfile.minus-github-api files
- Ensures both types of temporary files are removed after builds
- Prevents accumulation of temporary files in package directories
- Maintains clean package directory state after all build scenarios
@tokk-nv
Copy link
Contributor Author

tokk-nv commented Aug 30, 2025

Tested following on test-github-token-integration branch (merged on the current upstream dev).

# Make sure no GitHub token is set
unset GITHUB_TOKEN GITHUB_PAT GH_TOKEN
   
# Try building a package that uses GitHub API calls
./build.sh pytorch
   
# Set your GitHub token (replace with your actual token)
export GITHUB_TOKEN=ghp_your_actual_token_here
   
# Build the same package
./build.sh pytorch

@tokk-nv tokk-nv marked this pull request as ready for review August 30, 2025 05:18
@OriNachum OriNachum requested a review from Copilot September 5, 2025 15:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements enhanced GitHub token integration for Docker builds to address persistent GitHub API rate limiting issues. The solution automatically detects GitHub API calls in Dockerfiles, pre-fetches data using authenticated requests, and provides graceful fallback behavior.

  • Adds automatic GitHub token detection from multiple environment variables (GITHUB_TOKEN, GITHUB_PAT, GH_TOKEN)
  • Implements Dockerfile preprocessing to replace ADD instructions with COPY using pre-fetched GitHub data
  • Provides seamless fallback to existing --no-github-api workaround when token is unavailable or preprocessing fails

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test_github_token_integration.py Comprehensive test script for validating GitHub token detection and Dockerfile preprocessing functionality
jetson_containers/network.py Core implementation of GitHub token detection and Dockerfile preprocessing logic
jetson_containers/container.py Integration of GitHub API preprocessing into the container build pipeline with cleanup
jetson_containers/build.py Updated help text for --no-github-api flag to reflect enhanced functionality
docs/github-token-integration.md Complete documentation for the new GitHub token integration feature

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

dockerfilepath = os.path.join(pkg['path'], pkg['dockerfile'])

# Fall back to no_github_api logic if preprocessing failed or was disabled
if no_github_api or (dockerfilepath == os.path.join(pkg['path'], pkg['dockerfile']) and 'ADD https://api.github.com' in open(dockerfilepath, 'r').read()):
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is being opened and read multiple times unnecessarily. The content should be read once and stored in a variable to avoid repeated I/O operations.

Copilot uses AI. Check for mistakes.
with open(dockerfile_path, 'r') as fp:
content = fp.read()

# Find all GitHub API calls
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regular expression pattern is complex and could benefit from documentation explaining what each capture group represents (owner/repo, branch, destination path).

Copilot uses AI. Check for mistakes.
Comment on lines +306 to +308
try:
for pkg in packages:
pkg_info = find_package(pkg)
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup loop calls find_package() for each package, which may be inefficient if this function performs expensive operations. Consider reusing the package information already loaded during the build process.

Suggested change
try:
for pkg in packages:
pkg_info = find_package(pkg)
# Cache package info for all packages to avoid repeated find_package() calls
package_infos = {pkg: find_package(pkg) for pkg in packages}
try:
for pkg in packages:
pkg_info = package_infos[pkg]

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant