v0.6.1

Latest

Latest

SStas released this 05 Apr 00:21

· 1 commit to main since this release

8863db0

Added

generate_on_context() — Third latent primitive on LlamaCppConnector. Autoregressive generation on a caller-owned context with streaming via token_callback, n_ctx awareness for capacity checking, extra_stop_strings for custom stops, and generated_ids in the return tuple. Completes the create/think/generate primitive set alongside create_inference_context() and run_latent_steps().
tokenize(add_bos=True) — Optional add_bos parameter on tokenize() across all connectors (ABC, LlamaCpp, HuggingFace, vLLM). Default False preserves backward compatibility. Use True when tokenizing for manual decoding onto a fresh context.

Changed

_generate_on_think_ctx refactored — Now delegates to generate_on_context() for the generation loop. Context lifecycle (free/keep) still managed by the wrapper. No behavior change for existing callers.

Fixed

run_latent_steps docstring — Fixed duplicate Args section and incorrect default value (was 10, should be 20).
_generate_on_think_ctx n_cur scoping — Fixed potential UnboundLocalError in finally block if generate_on_context raised.
Closed context validation — generate_on_context raises ValueError on closed LlamaCppInferenceContext instead of segfaulting.
HF add_bos semantics — tokenize(add_bos=True) on HuggingFace now prepends only BOS token (not all special tokens via add_special_tokens=True).

Assets 2