Add SAVED_CHECKPOINT event to Checkpoint handler#3440
Add SAVED_CHECKPOINT event to Checkpoint handler#3440vfdev-5 merged 9 commits intopytorch:masterfrom
Conversation
|
@JeevanChevula thanks for the PR. However, let's rework the API of the new feature you are working on:
# checkpoint.py
class CheckpointEvents(EventEnum):
SAVED_CHECKPOINT = "saved_checkpoint"
class Checkpoint(...):
SAVED_CHECKPOINT = CheckpointEvents.SAVED_CHECKPOINT
...
from ignite.engine import Engine, Events
from ignite.handlers import Checkpoint, global_step_from_engine
trainer = ...
evaluator = ...
# Setup Accuracy metric computation on evaluator.
# evaluator.state.metrics contain 'accuracy',
# which will be used to define ``score_function`` automatically.
# Run evaluation on epoch completed event
# ...
to_save = {'model': model}
handler = Checkpoint(
to_save, '/tmp/models',
n_saved=2, filename_prefix='best',
score_name="accuracy",
global_step_transform=global_step_from_engine(trainer)
)
evaluator.add_event_handler(Events.COMPLETED, handler)
# ---- New API with Checkpoint.SAVED_CHECKPOINT event: -----
@evaluator.on(Checkpoint.SAVED_CHECKPOINT)
def notify_when_saved(eval_engine, chkpt_handler): # we should pass to the attached handlers the engine and the checkpoint instance.
assert eval_engine is engine
assert chkpt_handler is handler
print("Saved checkpoint:", chkpt_handler.last_checkpoint)
# ---- End of New API with Checkpoint.SAVED_CHECKPOINT event: -----
trainer.run(data_loader, max_epochs=10)
> ["best_model_9_accuracy=0.77.pt", "best_model_10_accuracy=0.78.pt", ]Let me know what do you think? |
|
Thanks for the suggestion . I’ll try to work on updating the PR to follow the API approach you mentioned with |
|
Implementation Note: Implemented EventEnum-based SAVED_CHECKPOINT event as requested. However, Ignite's event system only supports single-parameter handlers - the originally requested two-parameter signature (handler(engine, checkpoint_handler)) failed during event firing and registration. Current implementation uses single parameter with checkpoint access via engine._current_checkpoint_handler. All 61 core tests pass, confirming functionality works without breaking existing features. The 3 distributed test errors are pre-existing infrastructure issues unrelated to this change. |
vfdev-5
left a comment
There was a problem hiding this comment.
Thanks for working on this PR @JeevanChevula
I left few more comments to improve the PR
|
Pushing current implementation with working SAVED_CHECKPOINT event functionality. Will add proper Google-style docstrings with version directives by Monday per contributing guidelines |
|
@JeevanChevula please rebase your PR branch, you have now some extra commits |
d500fc8 to
d81faa9
Compare
d81faa9 to
fe4942d
Compare
fe4942d to
25e6adf
Compare
Updated docstring for CheckpointEvents class to clarify event trigger.
|
@JeevanChevula here is how docs are rendered for this PR: https://deploy-preview-3440--pytorch-ignite-preview.netlify.app/handlers Thanks for all the updates you made recently. The last thing I think we still have to do is to reorganize a bit the docs on CheckpointEvents. The example you wrote in handlers.rst: https://github.com/pytorch/ignite/pull/3440/files#diff-fed20f17bf0c40747938f76730fe5cdc467c919bd185eeb9fe0870b861379681R38 should be moved into |
|
Hi! Thank you for the detailed feedback on the documentation reorganization I want to make sure I implement the changes exactly as you envision them. Three specific questions for clarification:
3.SAVED_CHECKPOINT attribute documentation I’d like to follow your vision precisely, so confirming these points will help me make the changes correctly. |
|
Here are details on your questions: Point 1: -> After the last existing example (the “Customise the save_handler” section) but before the .. versionchanged:: notes Point 2: -> Remove the entire section completely (since the example will be moved to the docstring)? Point 3: -> Add it in an Attributes: section inside the Checkpoint docstring Make sure to check rendered docs (on netlify: https://deploy-preview-3440--pytorch-ignite-preview.netlify.app/) |
…VED_CHECKPOINT attribute
|
Hi! I’ve applied the requested changes:
On my Windows machine I wasn’t able to fully render the docs locally due to a Sphinx subprocess issue in my environment. I may be missing a dependency—I'll revisit this when I’m back. In the meantime, the Netlify preview should reflect these changes; if anything doesn’t look right, I’m happy to adjust. I’ll be OOO ( Sep 26 – Oct 5, IST]. Please feel free to leave comments or push small cleanups; I’ll pick up any remaining fixes as soon as I return. Thanks! |
vfdev-5
left a comment
There was a problem hiding this comment.
LGTM, thanks @JeevanChevula !
Fixes #934
This PR adds a "saved_checkpoint" event that fires after successful checkpoint saves.
Usage: