Skip to content

v2.1.24#680

Merged
joellidin merged 4 commits intomainfrom
dev
Jan 11, 2026
Merged

v2.1.24#680
joellidin merged 4 commits intomainfrom
dev

Conversation

@joellidin
Copy link
Copy Markdown
Collaborator

@joellidin joellidin commented Jan 11, 2026

  • (hparams) Lower anneal peak LR to 0.25
  • (neurons) Switch anneal mode to shard 2
  • Bump run version

Description

Related Issue(s)

  • Closes #[issue number]

Type of Change

  • Feature (adding new functionality)
  • Fix (resolving a bug or issue)
  • Docs (documentation updates)
  • Refactor (code changes that don't affect functionality)
  • Maintenance (dependency updates or other maintenance)
  • Tests (adding or improving tests)
  • Breaking change (fix or feature with incompatible API changes)
  • Other: _____

Branch Naming

  • My branch follows the project's naming convention (e.g., feature/add-new-capability)

Commit Messages

  • My commits are small, atomic, and have proper commit messages
  • Commit messages are in imperative mood with a capitalized summary under 50 chars

Code Quality

  • I've performed a self-review of my code
  • I've added appropriate docstrings following the project's conventions
  • I've added proper logging where necessary (without trailing periods)
  • I've applied linting and formatting with Ruff
  • My code generates no new warnings

Testing

  • I've added tests for new functionality or bug fixes
  • All tests pass locally with my changes
  • Test coverage has not decreased

Documentation

  • I've updated documentation to reflect my changes
  • I've updated comments in hard-to-understand areas

If this is a breaking change

Screenshots/Examples

Additional Notes

Summary by CodeRabbit

  • Documentation

    • Updated partial-migration example to reference current anneal shard configuration.
  • Configuration

    • Adjusted anneal mode peak learning rate factor from 0.3 to 0.25.
  • Refactor

    • Modified initial shard selection during anneal mode initialization.
  • Chores

    • Version bumped to 2.1.24.

✏️ Tip: You can customize this high-level summary in your review settings.

Reduce peak_lr_factor from 0.3 to 0.25 for improved training stability.
Update all neurons to use shard 2 instead of shard 0 for anneal mode.

- Change anneal shard in miner and validator
- Clarify sharded_dataset.py comment
- Update docs example to use shard 2
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 11, 2026

Walkthrough

The PR modifies anneal mode to initialize with shard 2 instead of shard 0 across miner and validator components, updates the documentation example to reflect this change, adjusts the anneal mode hyperparameter peak_lr_factor from 0.3 to 0.25, and increments the package version.

Changes

Cohort / File(s) Summary
Anneal mode shard initialization
neurons/miner.py, neurons/validator.py
Changes initial dataset shard from 0 to 2 in anneal mode startup; sets current_shard = 2 and current_shard_epoch = 0 when dataset_manager.anneal_mode is active
Documentation and comments
docs/shared_sharded_dataset.md, src/tplr/sharded_dataset.py
Updates partial-migration example to reference shard 2 (anneal_000002.npy) instead of shard 0; adjusts inline comment from "we stay on shard 0" to "we stay on one shard"
Configuration and versioning
hparams/hparams.json, src/tplr/__init__.py
Reduces anneal_mode.peak_lr_factor from 0.3 to 0.25; bumps package version from 2.1.23 to 2.1.24

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • shivam-MBZUAI
  • amiiir-sarfi

Poem

🐰 From shard zero we hop to shard two,
Anneal mode dances in a different view!
Peak_lr_factor turns down with a bound,
Version 2.1.24—new magic we've found! ✨

🚥 Pre-merge checks | ❌ 3
❌ Failed checks (1 warning, 2 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'v2.1.24' is a version bump indicator that lacks specificity about the actual changes being introduced in this pull request. Consider a more descriptive title that highlights the main changes, such as 'Lower anneal peak LR and switch to shard 2' or similar that indicates the substantive modifications.
Description check ❓ Inconclusive The description provides bullet-point summaries of changes but lacks detail on the rationale and context; most template sections are left unchecked or incomplete without substantive content. Expand the description with more context on the 'why' behind changes, particularly for the learning rate adjustment and shard switch; consider filling in relevant checklist items and related issues if applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (57.74%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #680   +/-   ##
=======================================
  Coverage   57.74%   57.74%           
=======================================
  Files          27       27           
  Lines        4977     4977           
=======================================
  Hits         2874     2874           
  Misses       2103     2103           
Files with missing lines Coverage Δ
src/tplr/__init__.py 100.00% <100.00%> (ø)
src/tplr/sharded_dataset.py 22.43% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
neurons/validator.py (1)

1261-1273: Verify shard-2 artifacts are deployed; hardcoding is acceptable if configuration isn't needed.

Shard-2 artifacts (anneal_000002.npy / sample_ids_anneal_000002.npy) are properly documented in the dataset setup guide, and no stale references to shard 0 in anneal mode exist in the codebase. If these files are guaranteed to be available in all deployment environments, the hardcoding is safe for this PR.

Optional: make anneal shard configurable (consistent with existing anneal_config pattern)

The codebase already uses anneal_config.get(key, default) extensively. Consider adding shard_index as a configuration option:

--- a/neurons/validator.py
+++ b/neurons/validator.py
@@ -1266,7 +1266,7 @@ class Validator:
         # In anneal mode, always use shard 2
         if self.dataset_manager.anneal_mode:
-            current_shard = 2
+            current_shard = anneal_config.get("shard_index", 2)
             shard_epoch = 0
🧹 Nitpick comments (1)
neurons/miner.py (1)

432-447: Miner/validator shard selection is now consistent (anneal shard 2).

Main thing to double-check is that shard 2 is universally present/accessible for anneal-mode datasets (otherwise miners will fail early on startup).

Optional: match validator and read shard index from hparams
--- a/neurons/miner.py
+++ b/neurons/miner.py
@@
-        # In anneal mode, always use shard 2
+        # In anneal mode, lock to a single shard (default: 2)
         if self.dataset_manager.anneal_mode:
-            current_shard = 2
+            current_shard = int(anneal_config.get("shard_index", 2))
             current_shard_epoch = 0
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a948591 and 1fd65b4.

📒 Files selected for processing (6)
  • docs/shared_sharded_dataset.md
  • hparams/hparams.json
  • neurons/miner.py
  • neurons/validator.py
  • src/tplr/__init__.py
  • src/tplr/sharded_dataset.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
🔇 Additional comments (4)
hparams/hparams.json (1)

3-11: Anneal LR peak reduction looks fine; verify effective LR peak + stability.

Given outer_learning_rate: 0.4, confirm the new anneal peak (outer_lr * peak_lr_factor) is the intended magnitude and doesn’t under-train during anneal runs.

src/tplr/sharded_dataset.py (1)

395-402: Comment update matches new anneal behavior.

src/tplr/__init__.py (1)

23-23: Version bump to 2.1.24 is consistent with the PR title.

docs/shared_sharded_dataset.md (1)

149-154: Documentation correctly reflects shard 2 migration.

The updated section accurately documents the testing workflow with shard 2, aligning with the code changes that now initialize anneal mode using shard 2 instead of shard 0. The rclone commands and file references (anneal_000002.npy) are correct.

@joellidin joellidin merged commit 3dfeec1 into main Jan 11, 2026
7 of 8 checks passed
This was referenced Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant