-
Notifications
You must be signed in to change notification settings - Fork 82
IBX-11003: Describe taxonomy suggestions in developer doc #2964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
81b965c
1676481
1204dba
23b1ae7
bdd7b10
75f7591
5ddbd03
8d93bcc
f9eefc4
cd0a9ec
0e1da7b
f73f225
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -112,3 +112,150 @@ | |
| php bin/console ibexa:taxonomy:remove-orphaned-content tags --dry-run | ||
| php bin/console ibexa:taxonomy:remove-orphaned-content tags --force | ||
| ``` | ||
|
|
||
| ## Taxonomy suggestions | ||
|
|
||
| Once the feature is [enabled](#enable-taxonomy-suggestions), with taxonomy suggestions, editors can pick from suggestions generated by an AI service based on selected fields like the product's or content item's name and description instead of having to manually browse through taxonomy trees and selecting [product categories]([[= user_doc =]]/pim/work_with_product_categories/#assign-product-categories-by-editing-product-details) or [tags]([[= user_doc =]]/content_management/create_edit_content_items/#add-taxonomy-entries). | ||
|
|
||
| Taxonomy suggestions build on existing [AI Actions](ai_actions_guide.md) functionality. | ||
| The `Ibexa\Taxonomy\Embedding\TaxonomyEmbeddingFieldProvider` uses an existing taxonomy tree as reference, generating an embedding for each path in the taxonomy tree, a multi-dimensional vector aka. embedding is generated and stored in the search index. | ||
|
|
||
| For performance reasons, embeddings for the taxonomy tree entries are generated only in two cases: | ||
|
|
||
| - when the database is reindexed, for example, after you enable the feature | ||
| - when an individual taxonomy entry is created or modified, it's embedding is updated | ||
|
|
||
| When the editor creates or edits a content item or a product, they can request that the application suggests tags or product categories to be associated with the item. | ||
| When it happens, the `Ibexa\Taxonomy\ActionHandler\TextToTaxonomyActionHandler` requests that an embedding is generated based on selected fields such as, for example, name and description. | ||
|
Check warning on line 129 in docs/content_management/taxonomy/taxonomy.md
|
||
|
|
||
| !!! note "Field selection" | ||
|
|
||
| You select the actual text fields, whose values are used as source for the embedding generation, when you create an [AI action](https://doc.ibexa.co/projects/userguide/en/latest/ai_actions/work_with_ai_actions/#create-ai-actions-that-use-ibexa-connect) that uses the `openai-text-to-taxonomy-entries` handler. | ||
|
|
||
| The search engine then compares the generated embedding with the taxonomy path embeddings stored in its index. | ||
| It selects the three best-matching taxonomy paths and presents them to the editor as suggestions. | ||
dabrt marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| The user can accept the suggestions, reject them, or request a new set of suggestions directly from the user interface. | ||
|
|
||
| ### Enable Taxonomy suggestions | ||
|
|
||
| Taxonomy suggestions are built into the product and do not require additional installation. | ||
| However, before you can enable it, make sure the following prerequisites have been fulfilled: | ||
|
|
||
| - [Search engine](search_engines.md): Taxonomy suggestions require a search engine that supports vector search. | ||
| The feature has been tested to work with Elasticsearch or Solr 9.8.1+. | ||
|
Check failure on line 145 in docs/content_management/taxonomy/taxonomy.md
|
||
| - [AI Actions](ai_actions.md): To be able to process embeddings, Taxonomy suggestions require that you have the [AI Actions configured](configure_ai_actions.md#configure-access-to-openai-optional) to support the OpenAI service. | ||
|
Check failure on line 146 in docs/content_management/taxonomy/taxonomy.md
|
||
|
|
||
| #### Enable taxonomy embedding indexing | ||
|
|
||
| Enable embedding indexing for taxonomy branches by changing the default setting from `false` to `true`: | ||
|
|
||
| ```yaml | ||
| ibexa: | ||
| system: | ||
| default: | ||
| taxonomy: | ||
| search: | ||
| index_embeddings: true | ||
| ``` | ||
|
|
||
| Toggle this setting at any time to enable or disable taxonomy suggestions. | ||
dabrt marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| If you are happy with the default settings, clear the cache and reindex the database. | ||
|
|
||
| ```shell | ||
| php bin/console ibexa:reindex | ||
| ``` | ||
|
|
||
| ```shell | ||
| php bin/console cache:clear | ||
| ``` | ||
|
|
||
| #### Configure AI action | ||
|
|
||
| Once you enable the Taxonomy suggestions feature, you must [configure an AI action]([[= user_doc =]]/ai_actions/work_with_ai_actions/#create-ai-actions-that-control-taxonomy-suggestions) that handles the generation of embeddings for newly created or edited content items or products. | ||
|
|
||
| That's where you decide which exact fields from which content type should be used as input for embedding generation, how many suggestions are being presenter to the editor, and so on. | ||
|
Check warning on line 177 in docs/content_management/taxonomy/taxonomy.md
|
||
|
|
||
| After ce you do it, your users are be able to assign tags and/or product categories by using suggestions provided by an AI engine. | ||
dabrt marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Customize Taxonomy suggestions | ||
|
|
||
| You can modify the default behavior of the Taxonomy suggestions model by changing various settings. | ||
|
|
||
| #### Change default number of suggestions | ||
|
|
||
| By default, the system returns three suggestions. | ||
| You can change the default number if needed by altering the following setting: | ||
|
|
||
| ```yaml hl_lines="4" | ||
| ibexa: | ||
| taxonomy: | ||
| text_to_taxonomy: | ||
| default_suggested_taxonomies_limit: 5 | ||
| ``` | ||
|
|
||
| You can also override this setting per AI action by editing its configuration. | ||
|
|
||
| #### Change default fields parsed when generating suggestions | ||
|
|
||
| The following setting decides which fields are used to generate suggestions by default. | ||
| You can change the default setting, if needed. | ||
|
|
||
| ```yaml hl_lines="6,7,8" | ||
| ibexa: | ||
| system: | ||
| default: | ||
| content_type_field_type_groups: | ||
| configurations: | ||
| vectorizable_fields: | ||
| - ezstring | ||
| - eztext | ||
| ``` | ||
|
|
||
| This way you can limit field selection to meaningful text fields and avoid unsupported field types. | ||
| Like in the case of the number of suggestions, you can override this setting per AI action by editing its configuration. | ||
|
|
||
| ### Change the embedding generation model | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It needs to be mentioned that suggestions are less precise the more data is in the input. But maybe it is already mentioned in manuals of models that we are suggesting 🤔
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added tip in cd0a9ec |
||
|
|
||
| By default, the system comes with a set of OpenAI models listed in its configuration, and a setting that allows you to choose the default model that should be used with the Taxonomy suggestions feature. | ||
|
|
||
| ```yaml hl_lines="20" | ||
| ibexa: | ||
| system: | ||
| default: | ||
| embedding_models: | ||
| text-embedding-3-small: | ||
| name: text-embedding-3-small | ||
| dimensions: 1536 | ||
| field_suffix: 3small | ||
| embedding_provider: ibexa_openai | ||
| text-embedding-3-large: | ||
| name: text-embedding-3-large | ||
| dimensions: 3072 | ||
| field_suffix: 3large | ||
| embedding_provider: ibexa_openai | ||
| text-embedding-ada-002: | ||
| name: text-embedding-ada-002 | ||
| dimensions: 1536 | ||
| field_suffix: ada002 | ||
| embedding_provider: ibexa_openai | ||
| default_embedding_model: text-embedding-ada-002 | ||
| ``` | ||
|
|
||
| Also, here is where you can change the name of the model used by the provider, the embedding's dimensions, and other settings. | ||
|
|
||
| ### Extending Taxonomy suggestions | ||
|
|
||
| You can extend the feature by replacing the default code by exploring one of the following ideas. | ||
|
|
||
| #### Replace the embedding provider | ||
|
|
||
| By default, the system uses the `ibexa_openai` connector. | ||
| You can add your own embedding provider if needed. To do it: | ||
|
|
||
| - Implement the `EmbeddingProviderInterface` | ||
| - Register the service with the `ibexa.embedding_provider` tag | ||
|
|
||
| #### Extend the AI action form | ||
|
|
||
| You can extend the `TextToTaxonomyOptionsType` AI action form by inheriting from `AbstractActionConfigurationOptions`. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature needs to be enabled for both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first sentence says exactly that.
Here, we are stressing that right after you enable the feature, you trigger the reindex and that's when the embeddings are calculated for the first time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, for me it was like two cases. In the first case, we need to enable the feature. In the second, we have no information on whether it should be enabled or not. When an individual taxonomy entry is created or modified, then embedding will be calculated only when the feature is enabled.