Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 166 additions & 0 deletions docs/github-token-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# GitHub Token Integration

This document describes the enhanced GitHub token integration feature that automatically handles GitHub API rate limiting during Docker builds.

## Overview

The GitHub token integration automatically detects GitHub API calls in Dockerfiles and pre-fetches the required data using authenticated API requests. This provides several benefits:

- **Higher rate limits**: Authenticated requests get 5000 requests/hour vs 60 for unauthenticated
- **More reliable builds**: No more build failures due to rate limiting
- **Better caching**: Uses `COPY` instead of `ADD` for improved Docker layer caching
- **Automatic fallback**: Falls back to the `--no-github-api` workaround if needed

## How It Works

1. **Detection**: The build system scans Dockerfiles for `ADD https://api.github.com/...` lines
2. **Authentication**: Uses GitHub token from environment variables if available
3. **Pre-fetching**: Fetches commit hashes and other data using authenticated API calls
4. **File creation**: Creates temporary JSON files with the fetched data
5. **Dockerfile modification**: Replaces `ADD` with `COPY` instructions
6. **Build integration**: Passes commit hashes as build arguments
7. **Cleanup**: Automatically removes temporary files after build completion

## Environment Variables

The system supports multiple environment variable names for GitHub tokens:

- `GITHUB_TOKEN` (primary)
- `GITHUB_PAT` (Personal Access Token)
- `GH_TOKEN` (GitHub CLI style)

Set any of these variables before running builds:

```bash
export GITHUB_TOKEN=ghp_your_token_here
./build.sh sudonim
```

## Usage Examples

### Basic Usage (with token)

```bash
# Set your GitHub token
export GITHUB_TOKEN=ghp_your_token_here

# Build normally - GitHub API calls will be automatically handled
./build.sh sudonim
./build.sh mlc vllm
```

### Fallback Usage (without token)

```bash
# If no token is set, the system will warn you but continue
./build.sh sudonim

# Or explicitly disable GitHub API usage
./build.sh --no-github-api sudonim
```

### Multiple Packages

```bash
# Build multiple packages with automatic GitHub API handling
./build.sh --multiple sudonim mlc vllm

# Or chain them together
./build.sh sudonim mlc vllm
```

## Build Arguments

When GitHub API calls are pre-processed, the commit hashes are automatically added as build arguments:

- `GITHUB_DUSTY_NV_SUDONIM_COMMIT`: Latest commit SHA for dusty-nv/sudonim
- `GITHUB_DUSTY_NV_MLC_COMMIT`: Latest commit SHA for dusty-nv/mlc
- etc.

These can be used in Dockerfiles or build scripts for version tracking.

## Technical Details

### File Structure

During preprocessing, the system creates:

```
package_directory/
├── Dockerfile # Original Dockerfile
├── Dockerfile.with-github-data # Modified Dockerfile (temporary)
└── .github-api-temp/ # Temporary data directory
├── dusty_nv_sudonim_main.json
└── dusty_nv_mlc_main.json
```

### Dockerfile Transformation

**Before (original):**
```dockerfile
ADD https://api.github.com/repos/dusty-nv/sudonim/git/refs/heads/main /tmp/sudonim_version.json
```

**After (processed):**
```dockerfile
COPY .github-api-temp/dusty_nv_sudonim_main.json /tmp/sudonim_version.json
```

### Error Handling

- **Token missing**: System continues with unauthenticated requests (may hit rate limits)
- **API failures**: Falls back to original Dockerfile behavior
- **Preprocessing errors**: Logs warnings and continues with original approach

## Troubleshooting

### Rate Limit Errors

If you still see rate limit errors:

1. **Check token**: Verify your GitHub token is set correctly
2. **Token permissions**: Ensure token has `repo` access for private repositories
3. **Token expiration**: GitHub tokens can expire; generate a new one if needed

### Build Failures

If builds fail with GitHub API issues:

1. **Use fallback**: Add `--no-github-api` flag
2. **Check logs**: Look for GitHub API preprocessing messages
3. **Verify connectivity**: Ensure your system can reach GitHub's API

### Debug Mode

Enable verbose logging to see detailed GitHub API processing:

```bash
./build.sh --verbose --log-level=debug sudonim
```

## Migration from --no-github-api

The `--no-github-api` flag is still supported and works as before. The new integration:

- **Enhances** the existing system rather than replacing it
- **Automatically** handles GitHub API calls when possible
- **Falls back** to the original workaround when needed
- **Maintains** backward compatibility

## Future Enhancements

Potential improvements for future versions:

- **Caching**: Cache GitHub API responses to reduce API calls
- **Batch processing**: Process multiple repositories in single API calls
- **Webhook integration**: Trigger rebuilds on repository updates
- **Rate limit monitoring**: Track and report API usage

## Contributing

To contribute to this feature:

1. **Test thoroughly**: Ensure your changes work with various GitHub API scenarios
2. **Handle errors gracefully**: Always provide fallback behavior
3. **Update documentation**: Keep this document current with any changes
4. **Follow patterns**: Use the existing logging and error handling patterns
2 changes: 1 addition & 1 deletion jetson_containers/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@

parser.add_argument('--simulate', action='store_true', help="print out the build commands without actually building the containers")
parser.add_argument('--push', type=str, default='', help="repo or user to push built container image to (no push by default)")
parser.add_argument('--no-github-api', action='store_true', help="disalbe Github API use to force rebuild on new git commits")
parser.add_argument('--no-github-api', action='store_true', help="disable GitHub API use to force rebuild on new git commits (fallback when no token is available)")

parser.add_argument('--log-dir', '--logs', type=str, default=None, help="sets the directory to save container build logs to (default: jetson-containers/logs)")
parser.add_argument('--log-level', type=str, default=None, choices=LogConfig.levels, help="sets the logging verbosity level")
Expand Down
54 changes: 50 additions & 4 deletions jetson_containers/container.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
split_container_name, query_yes_no, needs_sudo, sudo_prefix,
get_env, get_dir, get_repo_dir
)
from .network import preprocess_dockerfile_for_github_api

_NEWLINE_=" \\\n" # used when building command strings

Expand Down Expand Up @@ -179,8 +180,28 @@ def build_container(
if 'dockerfile' in pkg:
cmd = f"{sudo_prefix()}DOCKER_BUILDKIT=0 docker build --network=host" + _NEWLINE_
cmd += f" --tag {container_name}" + _NEWLINE_
if no_github_api:
dockerfilepath = os.path.join(pkg['path'], pkg['dockerfile'])

dockerfilepath = os.path.join(pkg['path'], pkg['dockerfile'])
github_build_args = {}

# Try to pre-process GitHub API calls if not explicitly disabled
if not no_github_api:
try:
processed_dockerfile, github_build_args = preprocess_dockerfile_for_github_api(dockerfilepath, pkg['path'])
if github_build_args:
# Merge with existing build args
if 'build_args' not in pkg:
pkg['build_args'] = {}
pkg['build_args'].update(github_build_args)
dockerfilepath = processed_dockerfile
log_info(f"Pre-processed GitHub API calls for {package}")
except Exception as e:
log_warning(f"Failed to pre-process GitHub API calls for {package}: {e}")
# Fall back to original Dockerfile
dockerfilepath = os.path.join(pkg['path'], pkg['dockerfile'])

# Fall back to no_github_api logic if preprocessing failed or was disabled
if no_github_api or (dockerfilepath == os.path.join(pkg['path'], pkg['dockerfile']) and 'ADD https://api.github.com' in open(dockerfilepath, 'r').read()):
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is being opened and read multiple times unnecessarily. The content should be read once and stored in a variable to avoid repeated I/O operations.

Copilot uses AI. Check for mistakes.
with open(dockerfilepath, 'r') as fp:
data = fp.read()
if 'ADD https://api.github.com' in data:
Expand All @@ -189,9 +210,9 @@ def build_container(
os.system(f"sed 's|^ADD https://api.github.com|#[minus-github-api]ADD https://api.github.com|' -i {dockerfilepath_minus_github_api}")
cmd += f" --file {os.path.join(pkg['path'], pkg['dockerfile'] + '.minus-github-api')}" + _NEWLINE_
else:
cmd += f" --file {os.path.join(pkg['path'], pkg['dockerfile'])}" + _NEWLINE_
cmd += f" --file {dockerfilepath}" + _NEWLINE_
else:
cmd += f" --file {os.path.join(pkg['path'], pkg['dockerfile'])}" + _NEWLINE_
cmd += f" --file {dockerfilepath}" + _NEWLINE_

cmd += f" --build-arg BASE_IMAGE={base}" + _NEWLINE_

Expand Down Expand Up @@ -280,6 +301,31 @@ def build_container(
log_success(f'✅ <b>`jetson-containers build {repo_name}`</b> ({name})')
log_success(f'⏱️ Total build time: {total_duration:.1f} seconds ({total_duration/60:.1f} minutes)')
log_success('=====================================================================================')

# Clean up temporary GitHub API files
try:
for pkg in packages:
pkg_info = find_package(pkg)
Comment on lines +306 to +308
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup loop calls find_package() for each package, which may be inefficient if this function performs expensive operations. Consider reusing the package information already loaded during the build process.

Suggested change
try:
for pkg in packages:
pkg_info = find_package(pkg)
# Cache package info for all packages to avoid repeated find_package() calls
package_infos = {pkg: find_package(pkg) for pkg in packages}
try:
for pkg in packages:
pkg_info = package_infos[pkg]

Copilot uses AI. Check for mistakes.
if 'path' in pkg_info:
temp_dir = os.path.join(pkg_info['path'], '.github-api-temp')
if os.path.exists(temp_dir):
shutil.rmtree(temp_dir)
log_debug(f"Cleaned up temporary GitHub API directory: {temp_dir}")

# Also clean up modified Dockerfiles
modified_dockerfile = os.path.join(pkg_info['path'], pkg_info['dockerfile'] + '.with-github-data')
if os.path.exists(modified_dockerfile):
os.remove(modified_dockerfile)
log_debug(f"Cleaned up modified Dockerfile: {modified_dockerfile}")

# Clean up fallback Dockerfiles
fallback_dockerfile = os.path.join(pkg_info['path'], pkg_info['dockerfile'] + '.minus-github-api')
if os.path.exists(fallback_dockerfile):
os.remove(fallback_dockerfile)
log_debug(f"Cleaned up fallback Dockerfile: {fallback_dockerfile}")
except Exception as e:
log_warning(f"Failed to clean up temporary GitHub API files: {e}")

log_success('=====================================================================================')

return name
Expand Down
91 changes: 89 additions & 2 deletions jetson_containers/network.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import time
from typing import Dict, Literal

from .logging import log_error, log_warning, log_verbose
from .logging import log_error, log_warning, log_verbose, log_info


def handle_text_request(url, retries=3, backoff=5):
Expand Down Expand Up @@ -69,6 +69,19 @@ def handle_json_request(url: str, headers: Dict[str, str] = None,
return None


def get_github_token():
"""Get GitHub token from environment variables with fallbacks"""
token = os.environ.get('GITHUB_TOKEN') or \
os.environ.get('GITHUB_PAT') or \
os.environ.get('GH_TOKEN')

if not token:
log_warning("No GitHub token found. API calls will be unauthenticated and may hit rate limits.")
log_info("Set GITHUB_TOKEN, GITHUB_PAT, or GH_TOKEN environment variable for higher rate limits.")

return token


@functools.lru_cache(maxsize=None)
def github_api(url: str):
"""
Expand All @@ -80,7 +93,7 @@ def github_api(url: str):
Returns:
dict or None: The parsed JSON response data as a dictionary, or None if an error occurs.
"""
github_token = os.environ.get('GITHUB_TOKEN')
github_token = get_github_token()
headers = {'Authorization': f'token {github_token}'} if github_token else None
request_url = f'https://api.github.com/{url}'

Expand Down Expand Up @@ -144,3 +157,77 @@ def get_json_value_from_url(url: str, notation: str = None):
return None

return data


def preprocess_dockerfile_for_github_api(dockerfile_path: str, pkg_path: str):
"""
Pre-process Dockerfile to replace GitHub API calls with pre-fetched data.

This function:
1. Detects GitHub API calls in Dockerfiles
2. Pre-fetches the data using authenticated API calls
3. Creates temporary files with the fetched data
4. Modifies the Dockerfile to use COPY instead of ADD

Args:
dockerfile_path (str): Path to the original Dockerfile
pkg_path (str): Path to the package directory

Returns:
tuple: (modified_dockerfile_path, build_args_dict) or (original_path, None) if no changes
"""
import re
import json
import os

with open(dockerfile_path, 'r') as fp:
content = fp.read()

# Find all GitHub API calls
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regular expression pattern is complex and could benefit from documentation explaining what each capture group represents (owner/repo, branch, destination path).

Copilot uses AI. Check for mistakes.
github_api_pattern = r'ADD https://api\.github\.com/repos/([^/]+/[^/]+)/git/refs/heads/([^\s]+)\s+([^\s]+)'
matches = re.findall(github_api_pattern, content)

if not matches:
return dockerfile_path, None

# Create a temporary directory for pre-fetched data
temp_dir = os.path.join(pkg_path, '.github-api-temp')
os.makedirs(temp_dir, exist_ok=True)

modified_content = content
build_args = {}

for owner_repo, branch, dest_path in matches:
log_info(f"Pre-fetching GitHub data for {owner_repo}/{branch}")

# Fetch the commit hash using authenticated API
commit_sha = github_latest_commit(owner_repo, branch)

if commit_sha:
# Create a temporary file with the commit data
temp_file = os.path.join(temp_dir, f"{owner_repo.replace('/', '_')}_{branch}.json")
with open(temp_file, 'w') as f:
json.dump({"sha": commit_sha, "ref": f"refs/heads/{branch}"}, f)

# Replace ADD with COPY
old_line = f'ADD https://api.github.com/repos/{owner_repo}/git/refs/heads/{branch} {dest_path}'
new_line = f'COPY .github-api-temp/{os.path.basename(temp_file)} {dest_path}'
modified_content = modified_content.replace(old_line, new_line)

# Add build arg for the commit SHA
build_args[f'GITHUB_{owner_repo.replace("/", "_").upper()}_COMMIT'] = commit_sha

log_info(f"Successfully pre-fetched commit {commit_sha[:8]} for {owner_repo}/{branch}")
else:
log_warning(f"Failed to fetch commit for {owner_repo}/{branch}, keeping original ADD line")

if modified_content != content:
# Write modified Dockerfile
modified_dockerfile = dockerfile_path + '.with-github-data'
with open(modified_dockerfile, 'w') as fp:
fp.write(modified_content)

log_info(f"Created modified Dockerfile: {modified_dockerfile}")
return modified_dockerfile, build_args

return dockerfile_path, None
Loading