Incomplete/Missing Content in Retrieval for Android Project - Embedder Configuration Recommendations

## Current Configuration

I'm currently using the following embedder configuration for a medium-sized Android project:

```json
{
  "embedder": {
    "client_class": "OpenAIClient",
    "batch_size": 500,
    "model_kwargs": {
      "model": "text-embedding-3-small",
      "dimensions": 256,
      "encoding_format": "float"
    }
  },
  "retriever": {
    "top_k": 20
  },
  "text_splitter": {
    "split_by": "word",
    "chunk_size": 350,
    "chunk_overlap": 100
  }
}
```

**LLM Model**: GPT-4o

## Problem Description

When reading through my private Android project using this configuration, the parsing and retrieval appears to be missing significant content. The responses seem incomplete and lack important context from the codebase.

## Questions

1. **Embedding Dimensions**: Is 256 dimensions too low for code embeddings? Should I increase this for better semantic understanding of Android code?

2. **Chunk Size**: Is `chunk_size: 350` words appropriate for Android projects (Java/Kotlin files with classes, methods, XML layouts, etc.)? Should this be adjusted?

3. **Retriever Settings**: Is `top_k: 20` sufficient, or should it be increased for better context coverage?

4. **Model Selection**: Would switching to `text-embedding-3-large` provide better results for code understanding?

5. **Android-Specific Considerations**: Are there recommended configurations specifically optimized for Android projects that better handle:
   - Java/Kotlin code
   - XML layouts and resources
   - Gradle build files
   - AndroidManifest.xml
   - Multi-module project structure

## Expected Behavior

The embedder should capture and retrieve comprehensive context from the Android codebase, including:
- Class implementations and their relationships
- Method implementations and logic
- Resource files (layouts, strings, etc.)
- Build configuration
- Project architecture and dependencies

## Environment

- **Project Type**: Medium-sized Android project
- **Primary Languages**: Java/Kotlin
- **Project Structure**: Multi-module (assumed)
- **Current Model**: GPT-4o with OpenAI text-embedding-3-small

## Requested Recommendations

What embedder configuration would you recommend for optimal performance with Android projects? Specifically:

- Optimal `dimensions` value
- Recommended `chunk_size` and `chunk_overlap`
- Appropriate `top_k` value
- Alternative embedding models if applicable
- Any Android-specific tuning parameters

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incomplete/Missing Content in Retrieval for Android Project - Embedder Configuration Recommendations #375

Current Configuration

Problem Description

Questions

Expected Behavior

Environment

Requested Recommendations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incomplete/Missing Content in Retrieval for Android Project - Embedder Configuration Recommendations #375

Description

Current Configuration

Problem Description

Questions

Expected Behavior

Environment

Requested Recommendations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions