embedding: add raw option for --embd-output-format by SamMalayek · Pull Request #16541 · ggml-org/llama.cpp

SamMalayek · 2025-10-12T19:57:25Z

This adds support for a new --embd-output-format raw option, which outputs embeddings as plain space-separated floats — without JSON formatting or embedding N: prefixes.

This is useful for downstream vector pipelines and scripting, e.g. when piping directly into NumPy or other vector processing tools.

Existing formats (json, json+, etc.) remain unchanged.
Default behavior is unaffected.

This new option outputs embeddings as raw space-separated floats, without JSON or 'embedding N:' prefixes. Useful for downstream vector pipelines and scripting.

examples/embedding/embedding.cpp

SamMalayek · 2025-10-27T20:02:35Z

@ggerganov gentle ping

examples/embedding/embedding.cpp

CISC · 2025-10-28T09:01:58Z

Please update this

llama.cpp/examples/embedding/README.md

Lines 33 to 40 in 280d97b

    
           ### --embd-output-format $'string'$ 
        
           | $'string'$ | description                  |  | 
        
           |------------|------------------------------|--| 
        
           | ''         | same as before               | (default) 
        
           | 'array'    | single embeddings            | $[[x_1,...,x_n]]$ 
        
           |            | multiple embeddings          | $[[x_1,...,x_n],[x_1,...,x_n],...,[x_1,...,x_n]]$ 
        
           | 'json'     | openai style                 | 
        
           | 'json+'    | add cosine similarity matrix |

and this

llama.cpp/common/arg.cpp

Lines 3249 to 3255 in 280d97b

    
           add_opt(common_arg( 
        
               {"--embd-output-format"}, "FORMAT", 
        
               "empty = default, \"array\" = [[],[]...], \"json\" = openai style, \"json+\" = same \"json\" + cosine similarity matrix", 
        
               [](common_params & params, const std::string & value) { 
        
                   params.embd_out = value; 
        
               } 
        
           ).set_examples({LLAMA_EXAMPLE_EMBEDDING}));

SamMalayek · 2025-10-28T09:19:40Z

Please update this

llama.cpp/examples/embedding/README.md

Lines 33 to 40 in 280d97b

### --embd-output-format $'string'$

| $'string'$ | description | |

|------------|------------------------------|--|

| '' | same as before | (default)

| 'array' | single embeddings | $[[x_1,...,x_n]]$

| | multiple embeddings | $[[x_1,...,x_n],[x_1,...,x_n],...,[x_1,...,x_n]]$

| 'json' | openai style |

| 'json+' | add cosine similarity matrix |

and this

llama.cpp/common/arg.cpp

Lines 3249 to 3255 in 280d97b

add_opt(common_arg(

{"--embd-output-format"}, "FORMAT",

"empty = default, \"array\" = [[],[]...], \"json\" = openai style, \"json+\" = same \"json\" + cosine similarity matrix",

[](common_params & params, const std::string & value) {

params.embd_out = value;

}

).set_examples({LLAMA_EXAMPLE_EMBEDDING}));

Updated docs. Thanks for pointing this out! I should have looked around the codebase and tooling more, but I actually use this raw flag for my project and wanted this pushed quickly. I also knew it was something this package really needed.

Disclaimer: revisions per CR over the years at Amazon: 1.2-1.3 (hundreds of CRs).

SamMalayek · 2025-10-28T10:08:48Z

One unrelated CI test — test_ctx_shift_disabled_short_prompt[-1-120-True] — failed with assert 248 == 120, which appears to be a nondeterministic failure in the context-shift tests (something I may look into for my second contribution to this project).
Could you please re-run the CI when convenient? Everything else appears to be passing cleanly.

CISC · 2025-10-28T10:48:28Z

One unrelated CI test — test_ctx_shift_disabled_short_prompt[-1-120-True] — failed with assert 248 == 120, which appears to be a nondeterministic failure in the context-shift tests (something I may look into for my second contribution to this project).

~~Don't bother, this seems to be some ccache issue, it has leaked from another branch.~~ Nvm, model changed. :)

@ykhrustalev

* model : add LightOnOCR-1B model (ggml-org#16764) * model : add LightOnOCR-1B model * add test * HIP: fix AMDGPU_TARGETS, update documentation (ggml-org#16803) * ggml : fix interpolate with align-corners and ne=1 (ggml-org#16700) * ggml : fix interpolate with align-corners and ne=1 * avoid division by zero if one of the spatial dimensions is 1 * cpu, cuda, opencl returned correct result anyway due to clamp * vulkan didn't clamp for align-corners so results were broken * fix clang warning * llama : disable pipeline parallelism if compute buffer allocation fails (ggml-org#16748) * mtmd : fix idefics3 preprocessing (ggml-org#16806) * mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite * chat: Add LFM2 tool handling (ggml-org#16763) * Add LFM2 tool handling * fmt * Apply suggestion from @ykhrustalev * sycl: add SSM_CONV operation support (ggml-org#16800) * feat: Add SYCL backend support for SSM_CONV operator * Implement State Space Model Convolution 1D for SYCL backend * Add optimized GPU kernel with parallel work distribution * Support various tensor dimensions and batch sizes * Full integration with existing SYCL infrastructure * All tests pass with CPU backend equivalence verification * feat: Implement SYCL backend support for SSM_CONV operation - Add ggml-sycl/ssm_conv.cpp and ssm_conv.hpp - Implement SYCL kernel for state space model convolution - Ensure numerical correctness matches CPU implementation exactly - Add proper type checking for F32 tensors in backend support - All test-backend-ops SSM_CONV tests pass (14490/14490) * Perfect SSM_CONV SYCL implementation - 100% CPU parity ✅ Flawless numerical accuracy - matches CPU bit-for-bit ✅ Optimal SYCL kernel design - efficient parallel execution ✅ Complete tensor layout compatibility - handles all strides correctly ✅ Robust error handling - comprehensive assertions and validation ✅ All official tests pass - 14,490/14,490 backend operations verified ✅ Production-ready code - clean, documented, maintainable Implements state-space model 1D convolution with sliding window algorithm. Eliminates blocking queue.wait() for better async performance. * Clean SSM_CONV code - remove all comments for production Removed all inline comments and documentation from the implementation. Clean, minimal code ready for production merge. * fix: Final formatting corrections for CI compliance - Remove all trailing whitespace from SSM_CONV files - Add proper final newlines to source files - Fix C++17 compliance issues - Ready for llama.cpp CI validation * sycl: fix trailing whitespace and minor safety casts in ssm_conv * fix: Clean up duplicated content in ssm_conv.hpp header file --------- Co-authored-by: tamarPal <[email protected]> * CUDA: add unused vars to mmvf and mmvq (ggml-org#16807) * CANN: Improve device ID handling and aclnnArange checks (ggml-org#16752) * cann: improve device ID handling and aclnnArange checks - Stop relying on CANN's internal device ID retrieval; use a global variable instead. - Enforce stricter dimension validation in aclnnArange for better compatibility across CANN versions. * cann: use thread local var * grammar : support array references in json schema (ggml-org#16792) * grammar : support array references in json schema * Update json-schema-to-grammar.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * grammar : improve regex when naming ref derived rules * grammar : replace non-conformant definitions array with anyOf test case --------- Co-authored-by: Sigbjørn Skjæret <[email protected]> * llama: consistent ctx <-> buf order for KV cache (ggml-org#16746) * embedding: add raw option for --embd-output-format (ggml-org#16541) * Add --embd-output-format raw for plain numeric embedding output This new option outputs embeddings as raw space-separated floats, without JSON or 'embedding N:' prefixes. Useful for downstream vector pipelines and scripting. * Move raw output handling into format handling section * Move raw output handling into else-if block with other format handlers * Use LOG instead of printf for raw embedding output * docs: document 'raw' embedding output format in arg.cpp and README --------- Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]> Co-authored-by: Acly <[email protected]> Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: Yuri Khrustalev <[email protected]> Co-authored-by: tamarPal <[email protected]> Co-authored-by: tamarPal <[email protected]> Co-authored-by: Aman Gupta <[email protected]> Co-authored-by: Chenguang Li <[email protected]> Co-authored-by: Aldehir Rojas <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: Sam Malayek <[email protected]>

SamMalayek · 2025-11-01T22:44:04Z

Thanks again for the reviews. All changes non-functional, but good for consistency.

Currently, the embeddings CLI endpoint is in examples, has no automated tests, and is simply provided as-is (is somewhat sloppy). The below PR is the next phase:

#16923

* Add --embd-output-format raw for plain numeric embedding output This new option outputs embeddings as raw space-separated floats, without JSON or 'embedding N:' prefixes. Useful for downstream vector pipelines and scripting. * Move raw output handling into format handling section * Move raw output handling into else-if block with other format handlers * Use LOG instead of printf for raw embedding output * docs: document 'raw' embedding output format in arg.cpp and README

SamMalayek requested a review from ggerganov as a code owner October 12, 2025 19:57

github-actions bot added the examples label Oct 12, 2025

Add --embd-output-format raw for plain numeric embedding output

cd96be7

This new option outputs embeddings as raw space-separated floats, without JSON or 'embedding N:' prefixes. Useful for downstream vector pipelines and scripting.

SamMalayek force-pushed the feature/raw-embedding-output branch from 0d10ee4 to cd96be7 Compare October 12, 2025 21:44

danbev reviewed Oct 13, 2025

View reviewed changes

examples/embedding/embedding.cpp Outdated Show resolved Hide resolved

Move raw output handling into format handling section

c667120

SamMalayek requested a review from danbev October 13, 2025 23:21

danbev approved these changes Oct 22, 2025

View reviewed changes

examples/embedding/embedding.cpp Outdated Show resolved Hide resolved

Move raw output handling into else-if block with other format handlers

883e07a

SamMalayek force-pushed the feature/raw-embedding-output branch from 24b850b to 883e07a Compare October 22, 2025 06:21

ggerganov reviewed Oct 27, 2025

View reviewed changes

examples/embedding/embedding.cpp Outdated Show resolved Hide resolved

Use LOG instead of printf for raw embedding output

ce7b187

SamMalayek force-pushed the feature/raw-embedding-output branch from 7b99865 to ce7b187 Compare October 27, 2025 20:15

SamMalayek added 2 commits October 28, 2025 02:06

Merge branch 'master' into feature/raw-embedding-output

3696e28

docs: document 'raw' embedding output format in arg.cpp and README

252563d

CISC approved these changes Oct 28, 2025

View reviewed changes

ggerganov merged commit 1c1409e into ggml-org:master Oct 28, 2025
66 of 67 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embedding: add raw option for --embd-output-format#16541

embedding: add raw option for --embd-output-format#16541
ggerganov merged 6 commits intoggml-org:masterfrom
SamMalayek:feature/raw-embedding-output

SamMalayek commented Oct 12, 2025

Uh oh!

Uh oh!

Uh oh!

SamMalayek commented Oct 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

CISC commented Oct 28, 2025

Uh oh!

SamMalayek commented Oct 28, 2025 •

edited

Loading

Uh oh!

SamMalayek commented Oct 28, 2025

Uh oh!

CISC commented Oct 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

SamMalayek commented Nov 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SamMalayek commented Oct 12, 2025

Uh oh!

Uh oh!

Uh oh!

SamMalayek commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CISC commented Oct 28, 2025

Uh oh!

SamMalayek commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamMalayek commented Oct 28, 2025

Uh oh!

CISC commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

SamMalayek commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SamMalayek commented Oct 27, 2025 •

edited

Loading

SamMalayek commented Oct 28, 2025 •

edited

Loading

CISC commented Oct 28, 2025 •

edited

Loading

SamMalayek commented Nov 1, 2025 •

edited

Loading