You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
generate_on_context() — Third latent primitive on LlamaCppConnector. Autoregressive generation on a caller-owned context with streaming via token_callback, n_ctx awareness for capacity checking, extra_stop_strings for custom stops, and generated_ids in the return tuple. Completes the create/think/generate primitive set alongside create_inference_context() and run_latent_steps().
tokenize(add_bos=True) — Optional add_bos parameter on tokenize() across all connectors (ABC, LlamaCpp, HuggingFace, vLLM). Default False preserves backward compatibility. Use True when tokenizing for manual decoding onto a fresh context.
Changed
_generate_on_think_ctx refactored — Now delegates to generate_on_context() for the generation loop. Context lifecycle (free/keep) still managed by the wrapper. No behavior change for existing callers.
Fixed
run_latent_steps docstring — Fixed duplicate Args section and incorrect default value (was 10, should be 20).
_generate_on_think_ctxn_cur scoping — Fixed potential UnboundLocalError in finally block if generate_on_context raised.
Closed context validation — generate_on_context raises ValueError on closed LlamaCppInferenceContext instead of segfaulting.
HF add_bos semantics — tokenize(add_bos=True) on HuggingFace now prepends only BOS token (not all special tokens via add_special_tokens=True).