Conversation
🧪 CI InsightsHere's what we observed from your CI run for 408a499. 🟢 All jobs passed!But CI Insights is watching 👀 |
Summary of ChangesHello @tae898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers native Python bindings for ArcadeDB, bridging the gap between the Java-based multi-model database and the Python ecosystem. The new arcadedb_embedded package allows developers to seamlessly embed ArcadeDB within their Python applications, offering a high-performance, Pythonic API for various data models including graph, document, and vector. This significantly simplifies deployment for data science, AI/ML, and web development, supported by comprehensive documentation, extensive testing, and automated build/CI processes. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is an impressive and substantial pull request that introduces comprehensive Python bindings for ArcadeDB. The structure of the new bindings/python directory is well-organized, including production code, a thorough test suite, an extensive documentation site, and a robust Docker-based build system. The separation into three distributions (headless, minimal, full) is a thoughtful approach to cater to different user needs. The overall quality of the code, tests, and documentation is very high. I've identified a few medium-severity issues related to maintainability and correctness in the build scripts and documentation that would further improve the quality of this contribution.
| # Python bindings - generated version file | ||
| bindings/python/src/arcadedb_embedded/_version.py |
There was a problem hiding this comment.
The path bindings/python/src/arcadedb_embedded/_version.py appears to be redundant and incorrect. Since this .gitignore file is located in bindings/python/, paths are relative to this directory. The specified path would incorrectly resolve to bindings/python/bindings/python/src/arcadedb_embedded/_version.py from the repository root. The entry on line 2, src/arcadedb_embedded/_version.py, is sufficient and correct.
bindings/python/Dockerfile.build
Outdated
|
|
||
| # Extract version and copy JARs | ||
| RUN echo "📌 Building distribution: ${DISTRIBUTION}" && \ | ||
| export ARCADEDB_VERSION=$(python3 extract_version.py) && \ |
There was a problem hiding this comment.
The call to extract_version.py here is likely incorrect. It's called without arguments, so the script will try to find pom.xml at ../../pom.xml relative to its own location. Inside the container, the script is at /build/extract_version.py, so it will look for /pom.xml, which doesn't exist. This will cause the ARCADEDB_VERSION variable to be empty, and the log message on the next line will be misleading. The correct path to pom.xml is /arcadedb/pom.xml, as used in a later RUN command. Please provide the correct path here to ensure the build log is accurate.
export ARCADEDB_VERSION=$(python3 extract_version.py /arcadedb/pom.xml) && \
| RUN echo '#!/usr/bin/env python3\n\ | ||
| import arcadedb_embedded as arcadedb\n\ | ||
| import tempfile\n\ | ||
| import shutil\n\ | ||
| import os\n\ | ||
| \n\ | ||
| print("🎮 Testing ArcadeDB Python bindings...")\n\ | ||
| print(f"📦 Version: {arcadedb.__version__}")\n\ | ||
| \n\ | ||
| temp_dir = tempfile.mkdtemp()\n\ | ||
| db_path = os.path.join(temp_dir, "test_db")\n\ | ||
| \n\ | ||
| try:\n\ | ||
| with arcadedb.create_database(db_path) as db:\n\ | ||
| print("✅ Database created")\n\ | ||
| \n\ | ||
| with db.transaction():\n\ | ||
| db.command("sql", "CREATE DOCUMENT TYPE TestDoc")\n\ | ||
| db.command("sql", "INSERT INTO TestDoc SET name = '\''docker_test'\'', value = 123")\n\ | ||
| print("✅ Transaction committed")\n\ | ||
| \n\ | ||
| result = db.query("sql", "SELECT FROM TestDoc")\n\ | ||
| for record in result:\n\ | ||
| print(f"✅ Query result: {record.get_property('\''name'\'')} = {record.get_property('\''value'\'')}")\n\ | ||
| \n\ | ||
| print("🎉 All tests passed!")\n\ | ||
| finally:\n\ | ||
| if os.path.exists(temp_dir):\n\ | ||
| shutil.rmtree(temp_dir)\n\ | ||
| ' > /test/test_install.py && chmod +x /test/test_install.py |
There was a problem hiding this comment.
Embedding a multi-line Python script directly into the Dockerfile using echo makes it difficult to read, maintain, and lint. For better maintainability, I recommend moving this script into a separate file, such as tests/smoke_test.py, and using a COPY instruction to add it to the Docker image. This will make the test script much easier to manage.
|
@tae898 really impressive contribution! A few questions:
|
Thanks! 1. This is not really a driver but bindingsThe python wheel includes the JARs and can talk to the DB directly via Java APIs, instead of HTTP. It's not a simple client. I didn't base my code on other work, although I got a lot of help from VS Copilot. 2. Java API, HTTP, and gRPC are all possible.From the PR (https://github.com/ArcadeData/arcadedb/blob/d7c9590f88181cfcaca51b5a8aa4d63b4404a76f/bindings/python/tests/test_server_patterns.py) you can see that I test managing the DB via both direct Java API calls and HTTP REST APIs, where the former is much faster than the latter, as you can expect. I think the main usage of this python bindings is for direct Java API calls, not HTTP REST APIs, but since ArcadeDB's Java supports both, I also made it that it supports both. Once the server is up and running with server = arcadedb.create_server(
root_path=root_path,
root_password="test12345"
)
server.start()The web UI (http://localhost:2480/) is open and you can do all the UI stuff there. I see that the gRPC support is also added here. I try to test this but I couldn't get it working, cuz AFAIK, the gRPC client is not implemented yet and its JAR doesn't exist in the docker image yet, right? Once it's up there, I can also test it, although I still think the main usage of this python bindings is direct Java API calls. RemarksI told @robfrank that I'll do a bit more tests, e.g., more data, import CSVs, JSONs, etc. I'll let both of you know when I am more comfortable with this PR. After the python bindings is done, I'll move on to Arcade-RAG |
84e4c6d to
ffbeaf7
Compare
95bc751 to
408a499
Compare
|
I'll squash my commits into one commit from now on. |
There was a problem hiding this comment.
I didn't really want to touch the files that are not necessarily related to my python-embedded PR, but this was kinda necessary to format the files properly.
Python Embedded Bindings for ArcadeDB
What does this PR do?
This PR introduces native Python bindings for ArcadeDB that embed the Java database engine directly in Python processes using JPype. It provides a Pythonic API to ArcadeDB's multi-model database capabilities (Graph, Document, Key/Value, Vector, Time Series) with three distribution options tailored to different use cases.
Key Additions:
arcadedb_embedded) with ~3,200 lines of production codeMotivation
ArcadeDB is a Java-based multi-model database, but Python is the dominant language in data science, AI/ML, and modern web development. This integration enables:
Why Three Distributions?
Related Issues
#2662
Architecture Decisions:
Technical Overview
Package Structure
Build System
Docker-based multi-stage builds ensure reproducibility:
Single command builds all three distributions:
CI/CD Workflows
Three GitHub Actions workflows added to
.github/workflows/:test-python-bindings.yml: Runs pytest on every push/PRrelease-python-packages.yml: Builds and publishes to PyPI when release tag contains "python"deploy-python-docs.yml: Builds and deploys MkDocs to GitHub PagesAPI Coverage
The bindings provide ~85% coverage of Java API features relevant to Python developers:
Testing
41 tests, 100% passing across all distributions:
Test categories:
Additional Notes
Documentation
Comprehensive documentation site built with MkDocs (Material theme):
Live site:
https://humemai.github.io/arcadedb/latest/Examples
Added
examples/basic.pydemonstrating:Dependencies
Minimal Python dependencies:
jpype1>=1.5.0(JVM integration)numpy>=1.20.0(for vector operations)pytest,pytest-cov,black,isort,mypyJava dependencies: All bundled in wheel (no external JARs needed)
Installation Requirements
Backward Compatibility
This PR adds a new
bindings/python/directory with no changes to existing Java code or other bindings. It's completely isolated and won't affect existing functionality.Performance Considerations
Known Limitations
Checklist
mvn clean packagecommandAdditional Testing Completed
examples/basic.pyruns successfully