-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Current Configuration
I'm currently using the following embedder configuration for a medium-sized Android project:
{
"embedder": {
"client_class": "OpenAIClient",
"batch_size": 500,
"model_kwargs": {
"model": "text-embedding-3-small",
"dimensions": 256,
"encoding_format": "float"
}
},
"retriever": {
"top_k": 20
},
"text_splitter": {
"split_by": "word",
"chunk_size": 350,
"chunk_overlap": 100
}
}LLM Model: GPT-4o
Problem Description
When reading through my private Android project using this configuration, the parsing and retrieval appears to be missing significant content. The responses seem incomplete and lack important context from the codebase.
Questions
-
Embedding Dimensions: Is 256 dimensions too low for code embeddings? Should I increase this for better semantic understanding of Android code?
-
Chunk Size: Is
chunk_size: 350words appropriate for Android projects (Java/Kotlin files with classes, methods, XML layouts, etc.)? Should this be adjusted? -
Retriever Settings: Is
top_k: 20sufficient, or should it be increased for better context coverage? -
Model Selection: Would switching to
text-embedding-3-largeprovide better results for code understanding? -
Android-Specific Considerations: Are there recommended configurations specifically optimized for Android projects that better handle:
- Java/Kotlin code
- XML layouts and resources
- Gradle build files
- AndroidManifest.xml
- Multi-module project structure
Expected Behavior
The embedder should capture and retrieve comprehensive context from the Android codebase, including:
- Class implementations and their relationships
- Method implementations and logic
- Resource files (layouts, strings, etc.)
- Build configuration
- Project architecture and dependencies
Environment
- Project Type: Medium-sized Android project
- Primary Languages: Java/Kotlin
- Project Structure: Multi-module (assumed)
- Current Model: GPT-4o with OpenAI text-embedding-3-small
Requested Recommendations
What embedder configuration would you recommend for optimal performance with Android projects? Specifically:
- Optimal
dimensionsvalue - Recommended
chunk_sizeandchunk_overlap - Appropriate
top_kvalue - Alternative embedding models if applicable
- Any Android-specific tuning parameters
Thank you!