-
Notifications
You must be signed in to change notification settings - Fork 26
Several fixes to worker logic #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
C-Achard
wants to merge
22
commits into
jaap/add_compatibility_tests
Choose a base branch
from
cy/pre-release-2.0-review1
base: jaap/add_compatibility_tests
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 13 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
4e8a0e5
Fix worker stop, timestamp handling, and imports
C-Achard ec86f28
Add WorkerState enum for worker lifecycle
C-Achard cad824c
Improve DLCLive processor lifecycle and safety
C-Achard d5ea95a
Restart worker if dead and clear stop event
C-Achard a4d21f7
Mark worker FAULTED on queue errors
C-Achard 76d9a30
Force-terminate frozen camera threads on stop
C-Achard 16b0d2a
Count unique cameras when starting multi-camera
C-Achard ae1754c
Improve VideoRecorder shutdown and writer loop
C-Achard 29f2604
Refactor writer thread error handling and stats
C-Achard 64f820e
Revert DLCLive imports due to unguarded torch
C-Achard 9b6e1c8
Improve camera thread cleanup and recorder finalization
C-Achard acd15da
Update dlc_processor.py
C-Achard 369908e
Add lifecycle management & robust writer shutdown
C-Achard 91dff83
Improve lifecycle handling and stale-writer cleanup
C-Achard 2b0a359
Cleanup stale camera workers on start/shutdown
C-Achard d0999a7
Make stop join timeout configurable and add test
C-Achard 5c06b5c
Fix locking, camera check, and writer finalization
C-Achard b4e57d6
Add background reaper for stalled DLC worker
C-Achard 5ec6266
Cache _queue to local variable in VideoRecorder
C-Achard 6be4047
Prevent configure while processor running
C-Achard 71bb1aa
Cleanup abandoned recorder and set pending reset
C-Achard 60da106
Refactor VideoRecorder.stop lifecycle handling
C-Achard File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,8 @@ | |
| from typing import Any | ||
|
|
||
| import numpy as np | ||
|
|
||
| # from dlclive import DLCLive | ||
| from PySide6.QtCore import QObject, Signal | ||
|
|
||
| from dlclivegui.config import DLCProcessorSettings, ModelType | ||
|
|
@@ -22,9 +24,6 @@ | |
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| # Enable profiling | ||
| ENABLE_PROFILING = True | ||
|
|
||
| try: # pragma: no cover - optional dependency | ||
| from dlclive import ( | ||
| DLCLive, # type: ignore | ||
|
|
@@ -34,10 +33,22 @@ | |
| DLCLive = None # type: ignore[assignment] | ||
|
|
||
|
|
||
| # Enable profiling to get more detailed timing metrics for debugging and optimization. | ||
| ENABLE_PROFILING = True | ||
|
Comment on lines
+35
to
+36
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. still needed? |
||
|
|
||
|
|
||
| class PoseBackends(Enum): | ||
| DLC_LIVE = auto() | ||
|
|
||
|
|
||
| class WorkerState(Enum): | ||
| STOPPED = auto() | ||
| STARTING = auto() | ||
| RUNNING = auto() | ||
| STOPPING = auto() | ||
| FAULTED = auto() | ||
|
|
||
|
|
||
| @dataclass | ||
| class PoseResult: | ||
| pose: np.ndarray | None | ||
|
|
@@ -135,11 +146,15 @@ class DLCLiveProcessor(QObject): | |
|
|
||
| def __init__(self) -> None: | ||
| super().__init__() | ||
| # DLCLive instance and config | ||
| self._settings = DLCProcessorSettings() | ||
| self._dlc: Any | None = None | ||
| self._processor: Any | None = None | ||
| # Worker thread and queue | ||
| self._queue: queue.Queue[Any] | None = None | ||
| self._worker_thread: threading.Thread | None = None | ||
| self._state = WorkerState.STOPPED | ||
| self._lifecycle_lock = threading.Lock() | ||
| self._stop_event = threading.Event() | ||
| self._initialized = False | ||
|
|
||
|
|
@@ -169,7 +184,12 @@ def configure(self, settings: DLCProcessorSettings, processor: Any | None = None | |
|
|
||
C-Achard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| def reset(self) -> None: | ||
| """Stop the worker thread and drop the current DLCLive instance.""" | ||
| self._stop_worker() | ||
| stopped = self._stop_worker() | ||
| if not stopped: | ||
| logger.warning( | ||
| "Reset requested but worker thread is still alive; skipping DLCLive reset to avoid potential issues." | ||
| ) | ||
| return | ||
| self._dlc = None | ||
| self._initialized = False | ||
| with self._stats_lock: | ||
|
|
@@ -186,22 +206,34 @@ def reset(self) -> None: | |
| self._processor_overhead_times.clear() | ||
|
|
||
| def shutdown(self) -> None: | ||
| self._stop_worker() | ||
| stopped = self._stop_worker() | ||
| if not stopped: | ||
| logger.warning( | ||
| "Shutdown requested but worker thread is still alive; DLCLive instance may not be fully released." | ||
C-Achard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ) | ||
| return | ||
| self._dlc = None | ||
| self._initialized = False | ||
|
|
||
| def enqueue_frame(self, frame: np.ndarray, timestamp: float) -> None: | ||
| # Start worker on first frame | ||
| if self._worker_thread is None: | ||
| self._start_worker(frame.copy(), timestamp) | ||
| return | ||
| frame_c = frame.copy() | ||
| enq_time = time.perf_counter() | ||
|
|
||
| with self._lifecycle_lock: | ||
| if self._state in (WorkerState.STOPPING, WorkerState.FAULTED) or self._stop_event.is_set(): | ||
| return | ||
C-Achard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| t = self._worker_thread | ||
| if t is None or not t.is_alive(): | ||
| self._start_worker_locked(frame_c, timestamp) | ||
| return | ||
|
|
||
| q = self._queue # snapshot under lock | ||
|
|
||
C-Achard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # As long as worker and queue are ready, ALWAYS enqueue | ||
| if self._queue is None: | ||
| if q is None: | ||
| return | ||
|
|
||
| try: | ||
| self._queue.put_nowait((frame.copy(), timestamp, time.perf_counter())) | ||
| q.put_nowait((frame_c, timestamp, enq_time)) | ||
| with self._stats_lock: | ||
| self._frames_enqueued += 1 | ||
| except queue.Full: | ||
|
|
@@ -259,12 +291,13 @@ def get_stats(self) -> ProcessorStats: | |
| avg_processor_overhead=avg_proc_overhead, | ||
| ) | ||
|
|
||
| def _start_worker(self, init_frame: np.ndarray, init_timestamp: float) -> None: | ||
| def _start_worker_locked(self, init_frame: np.ndarray, init_timestamp: float) -> None: | ||
| # lifecycle_lock must already be held | ||
| if self._worker_thread is not None and self._worker_thread.is_alive(): | ||
| return | ||
|
|
||
| self._queue = queue.Queue(maxsize=1) | ||
| self._stop_event.clear() | ||
| self._state = WorkerState.STARTING | ||
| self._worker_thread = threading.Thread( | ||
| target=self._worker_loop, | ||
| args=(init_frame, init_timestamp), | ||
|
|
@@ -273,19 +306,34 @@ def _start_worker(self, init_frame: np.ndarray, init_timestamp: float) -> None: | |
| ) | ||
| self._worker_thread.start() | ||
|
|
||
| def _stop_worker(self) -> None: | ||
| if self._worker_thread is None: | ||
| return | ||
|
|
||
| self._stop_event.set() | ||
|
|
||
| # Just wait for the timed get() loop to observe the flag and drain | ||
| self._worker_thread.join(timeout=2.0) | ||
| if self._worker_thread.is_alive(): | ||
| logger.warning("DLC worker thread did not terminate cleanly") | ||
|
|
||
| self._worker_thread = None | ||
| self._queue = None | ||
| def _start_worker(self, init_frame: np.ndarray, init_timestamp: float) -> None: | ||
| with self._lifecycle_lock: | ||
| self._start_worker_locked(init_frame, init_timestamp) | ||
|
|
||
| def _stop_worker(self) -> bool: | ||
| with self._lifecycle_lock: | ||
| t = self._worker_thread | ||
| if t is None: | ||
| self._state = WorkerState.STOPPED | ||
| self._stop_event.clear() | ||
| return True | ||
| self._state = WorkerState.STOPPING | ||
| self._stop_event.set() | ||
|
|
||
| t.join(timeout=2.0) | ||
| if t.is_alive(): | ||
| qsize = self._queue.qsize() if self._queue is not None else -1 | ||
| logger.warning("DLC worker thread did not terminate cleanly (qsize=%s)", qsize) | ||
| with self._lifecycle_lock: | ||
| self._state = WorkerState.FAULTED | ||
| return False | ||
C-Achard marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| with self._lifecycle_lock: | ||
| self._worker_thread = None | ||
| self._queue = None | ||
| self._state = WorkerState.STOPPED | ||
| self._stop_event.clear() | ||
| return True | ||
|
|
||
| @contextmanager | ||
| def _timed_processor(self): | ||
|
|
@@ -328,6 +376,8 @@ def _process_frame( | |
| Single source of truth for: inference -> (optional) processor timing -> signal emit -> stats. | ||
| Updates: frames_processed, latency, processing timeline, profiling metrics. | ||
| """ | ||
| if self._dlc is None: | ||
| raise RuntimeError("DLCLive instance is not initialized.") | ||
| # Time GPU inference (and processor overhead when present) | ||
| with self._timed_processor() as proc_holder: | ||
| inference_start = time.perf_counter() | ||
|
|
@@ -377,8 +427,6 @@ def _process_frame( | |
| def _worker_loop(self, init_frame: np.ndarray, init_timestamp: float) -> None: | ||
| try: | ||
| # -------- Initialization (unchanged) -------- | ||
| if DLCLive is None: | ||
| raise RuntimeError("The 'dlclive' package is required for pose estimation.") | ||
| if not self._settings.model_path: | ||
| raise RuntimeError("No DLCLive model path configured.") | ||
|
|
||
|
|
@@ -403,7 +451,18 @@ def _worker_loop(self, init_frame: np.ndarray, init_timestamp: float) -> None: | |
| if self._settings.device is not None: | ||
| options["device"] = self._settings.device | ||
|
|
||
| self._dlc = DLCLive(**options) | ||
| try: | ||
| if DLCLive is None: | ||
| raise RuntimeError( | ||
| "DLCLive class is not available. Ensure the dlclive package is installed and can be imported." | ||
| ) | ||
| self._dlc = DLCLive(**options) | ||
| except Exception as exc: | ||
| with self._lifecycle_lock: | ||
| self._state = WorkerState.FAULTED | ||
| raise RuntimeError( | ||
| f"Failed to initialize DLCLive with model '{self._settings.model_path}': {exc}" | ||
| ) from exc | ||
C-Achard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # First inference to initialize | ||
| init_inference_start = time.perf_counter() | ||
|
|
@@ -416,6 +475,8 @@ def _worker_loop(self, init_frame: np.ndarray, init_timestamp: float) -> None: | |
|
|
||
| self._initialized = True | ||
| self.initialized.emit(True) | ||
| with self._lifecycle_lock: | ||
| self._state = WorkerState.RUNNING | ||
|
|
||
| total_init_time = time.perf_counter() - init_start | ||
| logger.info( | ||
|
|
@@ -435,14 +496,24 @@ def _worker_loop(self, init_frame: np.ndarray, init_timestamp: float) -> None: | |
| self.initialized.emit(False) | ||
| return | ||
C-Achard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| q = ( | ||
| self._queue | ||
| ) # Assign to local to avoid issues if self._queue is set to None during shutdown while loop is still running. | ||
| if q is None: | ||
| logger.warning("Worker started without a queue; exiting") | ||
| with self._lifecycle_lock: | ||
| self._state = WorkerState.FAULTED | ||
| self.error.emit("Worker started without a queue") | ||
| return | ||
|
|
||
| # -------- Main processing loop: stop-flag + timed get + drain -------- | ||
| # NOTE: We never exit early unless _stop_event is set. | ||
| while True: | ||
| # If stop requested, only exit when queue is empty | ||
| if self._stop_event.is_set(): | ||
| if self._queue is not None: | ||
| if q is not None: | ||
| try: | ||
| frame, ts, enq = self._queue.get_nowait() | ||
| frame, ts, enq = q.get_nowait() | ||
| except queue.Empty: | ||
| # NOW it is safe to exit | ||
| break | ||
|
|
@@ -455,18 +526,24 @@ def _worker_loop(self, init_frame: np.ndarray, init_timestamp: float) -> None: | |
| self.error.emit(str(exc)) | ||
| finally: | ||
| try: | ||
| self._queue.task_done() | ||
| q.task_done() | ||
| except ValueError: | ||
| pass | ||
| continue # check stop_event again WITHOUT breaking | ||
|
|
||
| # Normal operation: timed get | ||
| try: | ||
| wait_start = time.perf_counter() | ||
| item = self._queue.get(timeout=0.05) | ||
| item = q.get(timeout=0.05) | ||
| queue_wait_time = time.perf_counter() - wait_start | ||
| except queue.Empty: | ||
| continue | ||
| except Exception as exc: | ||
| logger.exception("Error getting item from queue", exc_info=exc) | ||
| with self._lifecycle_lock: | ||
| self._state = WorkerState.FAULTED | ||
| self.error.emit(str(exc)) | ||
| break | ||
|
|
||
| try: | ||
| frame, ts, enq = item | ||
|
|
@@ -476,7 +553,7 @@ def _worker_loop(self, init_frame: np.ndarray, init_timestamp: float) -> None: | |
| self.error.emit(str(exc)) | ||
| finally: | ||
| try: | ||
| self._queue.task_done() | ||
| q.task_done() | ||
| except ValueError: | ||
| pass | ||
|
|
||
|
|
@@ -513,6 +590,10 @@ def enqueue(self, frame, ts): | |
| self._proc.enqueue_frame(frame, ts) | ||
|
|
||
| def configure(self, settings: DLCProcessorSettings, scanned_processors: dict, selected_key) -> bool: | ||
| with self._proc._lifecycle_lock: | ||
| if self._proc._state != WorkerState.STOPPED: | ||
| raise RuntimeError("Cannot configure DLCLiveProcessor while it is running. Please stop it first.") | ||
|
|
||
C-Achard marked this conversation as resolved.
Show resolved
Hide resolved
C-Achard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| processor = None | ||
| if selected_key is not None and scanned_processors: | ||
| try: | ||
|
|
@@ -526,11 +607,9 @@ def configure(self, settings: DLCProcessorSettings, scanned_processors: dict, se | |
| def start(self): | ||
| self._proc.reset() | ||
| self.active = True | ||
| self.initialized = False | ||
|
|
||
| def stop(self): | ||
| self.active = False | ||
| self.initialized = False | ||
| self._proc.reset() | ||
| self._last_pose = None | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.