Skip to content

Conversation

@njosh13
Copy link

@njosh13 njosh13 commented Nov 10, 2025

Hey! So I added a new endpoint /generate_cypher that does pretty much what /text_to_cypher does, but stops right after generating the Cypher query - it doesn't execute anything and is not a streaming endpoint.

This is useful when you want to:

  • Preview what query the AI is going to generate before it runs
  • Validate the Cypher syntax manually
  • Integrate just the query generation part into your own workflows
  • Have more control over when and how queries get executed

Basically it does all the same stuff - connects to the graph, discovers the schema, sends it to the AI model with your question - but then just returns the generated Cypher as a JSON response instead of running it.

All the heavy lifting is shared with the existing code, I just added a new helper function generate_cypher_only() that skips the execution step. Its a super straightforward addition that gives users more flexibility in how they use the text-to-cypher functionality.

Please review it and let me know what you think.I added a Dockerfile.local for locally testing with the new changes.
This basically solves the issue #17 that I created.

Summary by CodeRabbit

  • New Features

    • Added an API endpoint to generate Cypher queries for preview/validation (returns a clean query string).
  • Documentation

    • Expanded example env settings with additional model/provider defaults and guidance for provider-specific API keys (OpenAI, Anthropic, Groq, Kimi K2).
  • Chores (Deployment)

    • Improved local Docker runtime: multi-stage build, non-root runtime user, supervisor-based entrypoint, ports exposed, and runtime directory/permission adjustments.

@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Walkthrough

Added a multi-stage Dockerfile.local to build and run the Rust app on a FalkorDB runtime, and implemented a new POST /generate_cypher endpoint in src/main.rs that generates and returns Cypher queries (no execution), with OpenAPI docs and environment examples updated.

Changes

Cohort / File(s) Summary
Docker Configuration
Dockerfile.local
New multi-stage Dockerfile: builder stage compiles the Rust app; runtime stage uses FalkorDB base, installs supervisord, creates non-root appuser, copies binary and templates, prepares /var/lib/falkordb/import and /tmp/falkordb-import, exposes ports 8080 and 3001, adds supervisord config and entrypoint script, and sets ENTRYPOINT.
API Implementation
src/.../main.rs
Added CypherResponse struct and new public POST endpoint generate_cypher (/generate_cypher) that accepts TextToCypherRequest and returns generated Cypher text only; added setup_genai_client, generate_cypher_from_schema, and generate_cypher_only helpers; refactored existing generation flow to use these helpers; updated Actix-web App registration and OpenAPI ApiDoc to include the new path and response schema.
Environment Example
.env.example
Expanded model/key examples and guidance: added OpenAI, Anthropic, Groq, and Kimi K2 (via Groq) examples; introduced provider-specific API key variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY) and notes about model mapping and optional per-request key usage.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant API as /generate_cypher
    participant GenAI as GenAI Client
    participant Service as Service Resolver

    Client->>API: POST TextToCypherRequest
    Note over API: apply env defaults and model mapping
    API->>GenAI: setup_genai_client(model, optional key)
    GenAI-->>API: client ready
    API->>Service: resolve service target for model
    Service-->>API: target info
    API->>API: generate_cypher_only(request, client, model)
    Note over API: query model, sanitize output → Cypher string
    API-->>Client: 200 CypherResponse { query }
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay extra attention to:
    • setup_genai_client logic for model mapping and per-request key handling.
    • Correctness and sanitization in generate_cypher_from_schema / generate_cypher_only.
    • OpenAPI/ApiDoc additions matching request/response types.
    • Dockerfile supervisord configuration, entrypoint script permissions, and filesystem ownership.

Possibly related issues

Poem

🐇 I hopped through code and stitched a route,
I asked the model for Cypher's fruit,
Returned a string — no DB bite,
Packaged in Docker, snug and tight. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a new non-streaming endpoint for Cypher generation, which is the core purpose of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
Dockerfile.local (3)

2-2: Consider pinning the Rust version for reproducible builds.

Using rust:alpine without a version tag means builds will use whatever version is "latest", which can lead to inconsistent builds and unexpected breakage when new Rust versions are released.

Apply this diff to pin to a specific Rust version:

-FROM rust:alpine AS builder
+FROM rust:1.83-alpine AS builder

26-28: Simplify user creation logic.

The error suppression with 2>/dev/null || true hides potential issues. If the user exists but with different properties (e.g., different shell, home directory), this will silently continue with the existing user, which might not match your requirements.

Apply this diff for a cleaner approach:

-# Create a non-root user for security (if not already exists)
-RUN groupadd -g 1000 appuser 2>/dev/null || true && \
-    useradd -m -s /bin/bash -u 1000 -g appuser appuser 2>/dev/null || true
+# Create a non-root user for security
+RUN if ! getent group appuser >/dev/null; then groupadd -g 1000 appuser; fi && \
+    if ! getent passwd appuser >/dev/null; then useradd -m -s /bin/bash -u 1000 -g appuser appuser; fi

39-49: Simplify and clarify permission setup.

The permission setup has redundancy: both /var/lib/falkordb and /var/lib/FalkorDB (different cases) are being configured. This suggests either a case-sensitivity issue or unnecessary duplication.

Apply this diff to simplify:

-# Create import directory and set permissions
-RUN mkdir -p /var/lib/FalkorDB/import && \
-    mkdir -p /tmp/falkordb-import && \
-    chown -R appuser:appuser /var/lib/FalkorDB/import && \
-    chmod -R 755 /var/lib/FalkorDB/import && \
-    chown -R appuser:appuser /tmp/falkordb-import && \
-    chmod -R 755 /tmp/falkordb-import && \
-    chown -R appuser:appuser /var/lib/falkordb && \
-    chmod -R 755 /var/lib/falkordb && \
-    chown -R appuser:appuser /var/lib/FalkorDB && \
-    chmod -R 755 /var/lib/FalkorDB
+# Create import directories and set permissions
+RUN mkdir -p /var/lib/FalkorDB/import /tmp/falkordb-import && \
+    chown -R appuser:appuser /var/lib/FalkorDB /tmp/falkordb-import && \
+    chmod -R 755 /var/lib/FalkorDB /tmp/falkordb-import

Note: Verify which path FalkorDB actually uses (FalkorDB vs falkordb).

src/main.rs (1)

1229-1272: Remove unused parameter and consider reusing existing logic.

Two observations:

  1. The _service_target parameter (line 1233) is unused and should be removed from the function signature.

  2. Lines 1250-1269 duplicate logic from the existing generate_cypher_query function (lines 1160-1185). Consider refactoring to reuse that logic.

Apply these changes:

 /// Generate only the Cypher query without executing it
 async fn generate_cypher_only(
     request: &TextToCypherRequest,
     client: &genai::Client,
-    _service_target: &genai::ServiceTarget,
     model: &str,
 ) -> Result<String, String> {
     tracing::info!("Processing generate_cypher request: {request:?}");
 
     let falkordb_connection = request
         .clone()
         .falkordb_connection
         .unwrap_or_else(|| AppConfig::get().falkordb_connection.clone());
 
     // Get schema for the graph
     let schema = match get_graph_schema_string(&falkordb_connection, &request.graph_name).await {
         Ok(schema) => schema,
         Err(e) => return Err(format!("Failed to discover schema: {}", e)),
     };
 
-    // Generate Cypher query
-    tracing::info!("Generating Cypher query using schema ...");
-    let genai_chat_request = generate_create_cypher_query_chat_request(&request.chat_request, &schema);
-    
-    // Make the actual request to the model
-    let chat_response = match client.exec_chat(model, genai_chat_request, None).await {
-        Ok(response) => response,
-        Err(e) => return Err(format!("Chat request failed: {}", e)),
-    };
-
-    let query = chat_response
-        .content_text_into_string()
-        .unwrap_or_else(|| String::from(""));
-
-    if query.trim().is_empty() || query.trim() == "NO ANSWER" {
-        tracing::warn!("No query generated from AI model");
-        return Err("No valid query was generated".into());
-    }
-
-    let clean_query = query.replace('\n', " ").replace("```", "").trim().to_string();
-    tracing::info!("Generated Cypher query: {}", clean_query);
-
-    Ok(clean_query)
+    // Use a dummy channel for generate_cypher_query (it only sends optional progress)
+    let (tx, _rx) = tokio::sync::mpsc::channel(1);
+    
+    // Reuse existing logic
+    generate_cypher_query(request, &schema, client, model, &tx)
+        .await
+        .ok_or_else(|| "Failed to generate Cypher query".to_string())
 }

Alternatively, if you prefer to keep the non-streaming path separate, at minimum remove the unused _service_target parameter:

 async fn generate_cypher_only(
     request: &TextToCypherRequest,
     client: &genai::Client,
-    _service_target: &genai::ServiceTarget,
     model: &str,
 ) -> Result<String, String> {

And update the call site on line 1094:

-    match generate_cypher_only(&request, &client, &service_target, model).await {
+    match generate_cypher_only(&request, &client, model).await {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86f2199 and 6f7893e.

📒 Files selected for processing (2)
  • Dockerfile.local (1 hunks)
  • src/main.rs (6 hunks)
🔇 Additional comments (4)
src/main.rs (3)

266-269: LGTM!

The CypherResponse struct is well-defined and consistent with other response types in the codebase.


1889-1922: LGTM!

The OpenAPI documentation updates correctly register the new endpoint and response type. The changes are consistent with the existing pattern.


1949-1963: LGTM!

The endpoint registration is correctly placed and follows the established pattern for service registration in this application.

Dockerfile.local (1)

58-59: Both files exist in the repository—no action needed.

The files supervisord.conf and entrypoint.sh are present in the repository, so the Dockerfile COPY commands on lines 58-59 will succeed without issue.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/main.rs (2)

314-348: Consider returning the resolved service target from the helper.

The helper function sets up the client and model but doesn't return the service_target. This causes the generate_cypher endpoint to resolve the service target separately (line 1063), performing validation work that could be consolidated in the helper.

Consider updating the function signature to return the service target as well:

-fn setup_genai_client(request: &mut TextToCypherRequest) -> Result<(genai::Client, String), String> {
+async fn setup_genai_client(request: &mut TextToCypherRequest) -> Result<(genai::Client, String, genai::ServiceTarget), String> {
     let config = AppConfig::get();
 
     // Apply defaults from .env file if values are not provided
     if request.model.is_none() {
         request.model.clone_from(&config.default_model);
     }
 
     if request.key.is_none() {
         request.key.clone_from(&config.default_key);
     }
 
     // Ensure we have a model after applying defaults
     let model = request.model.as_ref()
         .ok_or_else(|| "Model must be provided either in request or as DEFAULT_MODEL in .env file".to_string())?
         .clone();
 
     let client = request.key.as_ref().map_or_else(genai::Client::default, |key| {
         let key = key.clone();
         let auth_resolver = AuthResolver::from_resolver_fn(
             move |model_iden: ModelIden| -> Result<Option<AuthData>, genai::resolver::Error> {
                 let ModelIden {
                     adapter_kind,
                     model_name,
                 } = model_iden;
                 tracing::info!("Using custom auth provider for {adapter_kind} (model: {model_name})");
                 Ok(Some(AuthData::from_single(key.clone())))
             },
         );
         genai::Client::builder().with_auth_resolver(auth_resolver).build()
     });
 
+    let service_target = client.resolve_service_target(&model).await
+        .map_err(|e| format!("Failed to resolve service target: {e}"))?;
+
-    Ok((client, model))
+    Ok((client, model, service_target))
 }

This would eliminate the duplicate service target resolution in both endpoints.


1063-1070: Service target resolved but unused.

The service target is resolved (line 1063) but assigned to _service_target, indicating it's intentionally unused. While this serves as validation that the model is accessible, it adds latency to the request.

If validation is the only goal, consider whether this is necessary or if the validation could happen during the actual chat request in generate_cypher_only.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f7893e and 3ce567c.

📒 Files selected for processing (2)
  • Dockerfile.local (1 hunks)
  • src/main.rs (10 hunks)
🔇 Additional comments (9)
Dockerfile.local (2)

1-67: Well-structured multi-stage Dockerfile.

The Dockerfile follows best practices with a multi-stage build, non-root user creation, and appropriate permission settings. The conditional user/group creation logic prevents conflicts if the user already exists in the base image.


59-60: No action needed—both files exist in the repository.

The verification confirms that supervisord.conf and entrypoint.sh are present in the repository. The COPY commands in the Dockerfile will execute successfully without build failures.

src/main.rs (7)

266-269: LGTM!

The CypherResponse struct is well-defined and follows the existing patterns in the codebase for API response types.


998-1013: Good refactoring to use the shared helper.

The endpoint now uses setup_genai_client to reduce code duplication, which addresses the previous review feedback.


1139-1164: Well-designed helper function.

The generate_cypher_from_schema function appropriately extracts the core Cypher generation logic without progress tracking, making it reusable for both streaming and non-streaming endpoints.


1179-1189: Excellent refactoring to eliminate duplication.

The function now delegates to generate_cypher_from_schema while maintaining the SSE progress reporting. This reduces code duplication and improves maintainability.


1234-1259: Clean implementation of non-streaming generation.

The function correctly handles schema discovery (with caching via get_graph_schema_string) and delegates to generate_cypher_from_schema for the actual generation. The implementation is straightforward and appropriate for the non-streaming use case.


1877-1909: OpenAPI documentation properly updated.

The OpenAPI specification correctly includes the new /generate_cypher endpoint and CypherResponse schema, ensuring the Swagger UI will display the new functionality.


1939-1939: LGTM!

The new endpoint is properly registered with the Actix-web application.

@njosh13
Copy link
Author

njosh13 commented Nov 12, 2025

Hey @barakb could you help me and take a look at this?
I would appreciate your feedback.
Ps: I have also added support for moonshot ai kimi-k2.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/main.rs (1)

1186-1187: Review the query sanitization approach.

The current sanitization blindly replaces all newlines with spaces and removes triple backticks. This could break valid Cypher queries that contain newlines in string literals.

Consider a more robust approach:

  • Use a regex to extract Cypher from code blocks first (e.g., look for cypher... or ...)
  • Preserve newlines within string literals
  • Only normalize whitespace between Cypher tokens

Example improvement:

let clean_query = {
    // First try to extract from code blocks
    let code_block_regex = Regex::new(r"```(?:cypher|opencypher)?\s*\n?(.*?)\n?```").unwrap();
    let extracted = code_block_regex
        .captures(&query)
        .and_then(|caps| caps.get(1))
        .map(|m| m.as_str())
        .unwrap_or(&query);
    
    // Normalize whitespace while preserving structure
    extracted.trim().to_string()
};

Additionally, consider logging a warning when sanitization is applied so you can monitor if AI responses require cleanup.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ce567c and 9a3550e.

📒 Files selected for processing (2)
  • .env.example (1 hunks)
  • src/main.rs (11 hunks)
🔇 Additional comments (12)
.env.example (3)

5-14: LGTM! Clear documentation for supported models.

The expanded model examples and Groq/Kimi K2 guidance provide helpful context for users configuring the application.


17-22: LGTM! Well-structured provider-specific configuration.

The optional API keys section with auto-detection explanation is clear and helpful.


26-37: LGTM! Comprehensive Kimi K2 setup instructions.

The detailed configuration steps with examples make it easy for users to set up Kimi K2 via Groq.

src/main.rs (9)

13-16: LGTM! Imports support the new model mapping functionality.

The AdapterKind and ModelMapper imports are necessary for routing Kimi K2 models to the Groq adapter.


268-271: LGTM! Appropriate response structure.

The CypherResponse struct is well-defined for returning generated Cypher queries.


316-374: LGTM! Code duplication successfully eliminated.

The setup_genai_client helper function effectively addresses the DRY concern raised in previous review comments. The implementation correctly:

  • Applies configuration defaults
  • Routes Kimi K2 models to Groq adapter via ModelMapper
  • Conditionally sets up authentication when a key is provided
  • Returns clear error messages

Note: The function returns (genai::Client, String) and delegates service target resolution to each endpoint, which makes sense given their different error handling requirements (SSE vs HTTP responses).


1024-1056: LGTM! Successfully refactored to use the helper function.

The text_to_cypher endpoint now cleanly uses setup_genai_client, with appropriate SSE-based error handling.


1088-1096: Clarify the need for service target resolution.

The service target is resolved and validated but then discarded (stored in _service_target). While this validates the model configuration, the generate_cypher_only function doesn't use it. Is this intentional validation, or can this step be removed?

If the validation is intentional and necessary, consider adding a comment explaining why:

-    // Handle service target resolution errors
-    let _service_target = match client.resolve_service_target(&model).await {
+    // Validate service target resolution (ensures model is properly configured)
+    let _service_target = match client.resolve_service_target(&model).await {

Otherwise, if not needed, remove the resolution:

-    // Handle service target resolution errors
-    let _service_target = match client.resolve_service_target(&model).await {
-        Ok(target) => target,
-        Err(e) => {
-            return Ok(HttpResponse::BadRequest().json(ErrorResponse {
-                error: format!("Failed to resolve service target: {e}"),
-            }));
-        }
-    };
-
     // Generate Cypher query without executing it or streaming the response
     match generate_cypher_only(&request, &client, &model).await {

1203-1213: LGTM! Good refactoring to share core logic.

The function now delegates to generate_cypher_from_schema, eliminating code duplication while maintaining the same interface for SSE-based progress reporting.


1258-1283: LGTM! Clean separation of concerns.

The generate_cypher_only function appropriately handles non-streaming Cypher generation. The function delegates schema retrieval and query generation to existing helpers, maintaining good code reuse.


1900-1933: LGTM! Complete OpenAPI documentation.

The new endpoint and response type are properly registered in the OpenAPI specification.


1960-1974: LGTM! Endpoint properly registered.

The generate_cypher service is correctly added to the Actix-web application alongside the existing endpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant