Skip to content

v0.6.1

Latest

Choose a tag to compare

@SStas SStas released this 05 Apr 00:21
· 1 commit to main since this release

Added

  • generate_on_context() — Third latent primitive on LlamaCppConnector. Autoregressive generation on a caller-owned context with streaming via token_callback, n_ctx awareness for capacity checking, extra_stop_strings for custom stops, and generated_ids in the return tuple. Completes the create/think/generate primitive set alongside create_inference_context() and run_latent_steps().
  • tokenize(add_bos=True) — Optional add_bos parameter on tokenize() across all connectors (ABC, LlamaCpp, HuggingFace, vLLM). Default False preserves backward compatibility. Use True when tokenizing for manual decoding onto a fresh context.

Changed

  • _generate_on_think_ctx refactored — Now delegates to generate_on_context() for the generation loop. Context lifecycle (free/keep) still managed by the wrapper. No behavior change for existing callers.

Fixed

  • run_latent_steps docstring — Fixed duplicate Args section and incorrect default value (was 10, should be 20).
  • _generate_on_think_ctx n_cur scoping — Fixed potential UnboundLocalError in finally block if generate_on_context raised.
  • Closed context validationgenerate_on_context raises ValueError on closed LlamaCppInferenceContext instead of segfaulting.
  • HF add_bos semanticstokenize(add_bos=True) on HuggingFace now prepends only BOS token (not all special tokens via add_special_tokens=True).