Add Python bindings for ArcadeDB by tae898 · Pull Request #2696 · ArcadeData/arcadedb

tae898 · 2025-10-23T13:16:07Z

Previous discussion (outdated - click to expand)

Python Embedded Bindings for ArcadeDB

What does this PR do?

This PR introduces native Python bindings for ArcadeDB that embed the Java database engine directly in Python processes using JPype. It provides a Pythonic API to ArcadeDB's multi-model database capabilities (Graph, Document, Key/Value, Vector, Time Series) with three distribution options tailored to different use cases.

Key Additions:

Complete Python package (arcadedb_embedded) with ~3,200 lines of production code
Three distribution variants: headless (~94MB), minimal (~97MB with Studio UI), full (~158MB with Gremlin + GraphQL)
Comprehensive test suite: 41 tests across 6 test files, 1,847 lines of test code, 100% passing
Full documentation site: 63 markdown files with MkDocs, including API reference, user guides, and examples
Automated build system: Docker-based multi-stage builds for all three distributions
CI/CD workflows: Automated testing, building, PyPI publishing, and docs deployment via GitHub Actions

Motivation

ArcadeDB is a Java-based multi-model database, but Python is the dominant language in data science, AI/ML, and modern web development. This integration enables:

Embedded database access: Run ArcadeDB directly in Python processes without external servers
Simplified deployment: Self-contained wheels with all JARs bundled (just needs JRE 11+)
AI/ML integration: Native vector storage with HNSW indexing for embeddings
Developer experience: Pythonic API with context managers, type hints, and proper error handling
Multi-model flexibility: Access Graph, Document, Key/Value, Vector, and Time Series models from Python

Why Three Distributions?

Headless (production): Core database only, minimal size, no UI dependencies
Minimal (development): Adds Studio web UI (~2MB overhead) for visual debugging
Full (Gremlin users): Adds Gremlin graph traversal language and GraphQL support

Related Issues

#2662

Architecture Decisions:

Embedded vs Client/Server: Chose embedded mode as primary use case (client mode via HTTP is also supported)
Three packages vs one: Allows users to choose minimal dependencies based on their needs
MkDocs for docs: Material theme provides excellent UX and search functionality

Technical Overview

Package Structure

bindings/python/
├── src/arcadedb_embedded/        # Main package (316 lines core.py + 7 other modules)
│   ├── __init__.py               # Public API exports
│   ├── core.py                   # Database and DatabaseFactory
│   ├── server.py                 # ArcadeDBServer for HTTP mode (225 lines)
│   ├── results.py                # ResultSet and Result wrappers
│   ├── transactions.py           # TransactionContext manager
│   ├── vector.py                 # Vector search and HNSW indexing (142 lines)
│   ├── importer.py               # CSV, JSON, JSONL, Neo4j import (726 lines)
│   ├── exceptions.py             # ArcadeDBError exception
│   └── jvm.py                    # JVM lifecycle management
├── tests/                        # 41 tests across 6 files (1,847 lines total)
│   ├── test_core.py              # 13 tests: CRUD, transactions, queries, graphs, vectors
│   ├── test_server.py            # 6 tests: HTTP API, Studio, configuration
│   ├── test_concurrency.py       # 4 tests: File locking, thread safety, multi-process
│   ├── test_server_patterns.py   # 4 tests: Embedded + HTTP best practices
│   ├── test_importer.py          # 13 tests: CSV, JSON, JSONL, Neo4j import
│   └── test_gremlin.py           # 1 test: Gremlin query language (full only)
├── docs/                         # 63 markdown files (15,000+ lines)
│   ├── getting-started/          # Installation, quickstart, distributions
│   ├── guide/                    # User guides (core, server, vectors, import, graphs)
│   ├── api/                      # API reference for all modules
│   └── development/              # Testing, contributing, architecture, troubleshooting
├── build-all.sh                  # Unified Docker build script for all distributions
├── Dockerfile.build              # Multi-stage Docker build (177 lines)
├── setup_jars.py                 # Copies JARs to package based on distribution (172 lines)
├── extract_version.py            # Extracts version from parent pom.xml (61 lines)
├── write_version.py              # Writes _version.py during build (41 lines)
├── pyproject.toml                # Python package configuration
└── mkdocs.yml                    # Documentation site configuration

Build System

Docker-based multi-stage builds ensure reproducibility:

Stage 1: Build Java components with Maven (all modules)
Stage 2: Build Python wheel with specific JAR subset based on distribution
Stage 3: Run pytest test suite in isolated environment
Stage 4: Export built wheel for distribution

Single command builds all three distributions:

cd bindings/python && ./build-all.sh

CI/CD Workflows

Three GitHub Actions workflows added to .github/workflows/:

test-python-bindings.yml: Runs pytest on every push/PR
release-python-packages.yml: Builds and publishes to PyPI when release tag contains "python"
deploy-python-docs.yml: Builds and deploys MkDocs to GitHub Pages

API Coverage

The bindings provide ~85% coverage of Java API features relevant to Python developers:

Feature	Coverage	Notes
Database CRUD	✅ 100%	create, open, drop, exists
Queries	✅ 100%	SQL, Cypher, Gremlin (full), MongoDB syntax
Transactions	✅ 100%	Context manager pattern
Schema	✅ 100%	Document types, vertex types, edge types
Indexes	✅ 90%	LSM, full-text, HNSW vector
Server Mode	✅ 100%	HTTP API + Studio UI
Vector Search	✅ 100%	HNSW similarity search
Data Import	✅ 100%	CSV, JSON, JSONL, Neo4j
Graph API	⚠️ 60%	Basic operations (Python-relevant subset)
Gremlin	⚠️ 70%	Query execution (full dist only)

Testing

41 tests, 100% passing across all distributions:

✅ Headless: 34 passed, 7 skipped (server/Gremlin tests)
✅ Minimal: 38 passed, 3 skipped (Gremlin tests)
✅ Full: 41 passed, 0 skipped

Test categories:

Core operations: Database lifecycle, queries, transactions, schema
Server mode: HTTP endpoints, Studio UI, configuration
Concurrency: Thread safety, file locking, multi-process isolation
Vector search: HNSW indexing, similarity queries, distance metrics
Data import: CSV, JSON, JSONL, Neo4j graph import
Graph operations: Vertices, edges, traversals
Gremlin: Graph query language (full distribution only)

Additional Notes

Documentation

Comprehensive documentation site built with MkDocs (Material theme):

Getting Started: Installation guide, 5-minute quickstart, distribution comparison
User Guide: Database operations, queries, transactions, vectors, import, graphs, server mode
API Reference: Detailed documentation for all 8 modules
Development: Testing guide, architecture overview, contributing, troubleshooting
Java API Coverage: Comparison table showing what's implemented

Live site: https://humemai.github.io/arcadedb/latest/

Examples

Added examples/basic.py demonstrating:

Database creation and cleanup
Schema definition
Transactions
Queries with multiple languages (SQL, Cypher)
Graph operations (vertices, edges)
Vector search with HNSW
Data import from CSV/JSON

Dependencies

Minimal Python dependencies:

Required: jpype1>=1.5.0 (JVM integration)
Optional: numpy>=1.20.0 (for vector operations)
Dev: pytest, pytest-cov, black, isort, mypy

Java dependencies: All bundled in wheel (no external JARs needed)

Installation Requirements

Python 3.8 - 3.12
Java Runtime Environment (JRE)
That's it! Everything else is bundled.

Backward Compatibility

This PR adds a new bindings/python/ directory with no changes to existing Java code or other bindings. It's completely isolated and won't affect existing functionality.

Performance Considerations

Direct JVM integration: JPype provides near-native performance
No serialization overhead: Direct Java object access in Python
Transaction batching: Pythonic context managers ensure proper transaction handling
Lazy result iteration: ResultSet provides memory-efficient iteration over large result sets

Known Limitations

Java required: Cannot run without JRE installed
Single process: File-based locking prevents multiple processes accessing same database file (use server mode for multi-process)
JVM startup time: First database operation incurs ~1-2 second JVM initialization
Memory: JVM requires additional memory overhead (~100-200MB base)

Checklist

I have run the build using mvn clean package command
- ✅ All Java modules build successfully
- ✅ Docker-based Python build tested for all three distributions
My unit tests cover both failure and success scenarios
- ✅ 41 tests covering happy path and error cases
- ✅ Transaction rollback on errors
- ✅ File locking edge cases
- ✅ Invalid query handling
- ✅ Missing JAR error handling
- ✅ Concurrency edge cases

Additional Testing Completed

Distribution builds: All three distributions build cleanly via Docker
Test coverage: 100% test pass rate across all distributions
Documentation: Full docs build without errors via MkDocs
Example code: examples/basic.py runs successfully
CI workflows: GitHub Actions workflows validated (test, release, docs deploy)
Package metadata: PyPI metadata complete (classifiers, keywords, URLs)

tae898 · 2025-10-23T13:17:02Z

Previous discussion (outdated - click to expand)

I'm sorry to keep removing & creating PRs. I swear I won't do it again. It's just that I'm trying to find the best way to make this.

The last PR (#2686) got broken somehow, so this PR is made again.

gemini-code-assist · 2025-10-23T13:17:26Z

Summary of Changes

Hello @tae898, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a significant new feature: native Python bindings for ArcadeDB. It allows Python developers to seamlessly integrate and interact with the powerful Java-based multi-model database directly within their Python applications. The implementation provides a Pythonic interface to ArcadeDB's diverse data models, ensuring high performance through direct JVM integration. This initiative aims to enhance the developer experience for Python users, particularly in data science and AI/ML fields, by offering flexible deployment options and robust tooling for building and managing applications.

Highlights

Native Python Bindings: Introduces native Python bindings for ArcadeDB, embedding the Java database engine directly into Python processes using JPype, offering a Pythonic API for multi-model capabilities (Graph, Document, Key/Value, Vector, Time Series).
Multiple Distribution Options: Provides three distinct distribution variants: a minimal 'headless' (~94MB) for production, a 'minimal' (~97MB) including the Studio UI for development, and a 'full' (~158MB) with Gremlin and GraphQL support.
Comprehensive Ecosystem: Includes a complete Python package (~3,200 lines of code), a comprehensive test suite (41 tests, 100% passing), a full documentation site (63 Markdown files), an automated Docker-based build system, and CI/CD workflows for testing, publishing, and docs deployment.
Enhanced Developer Experience: Motivated by Python's dominance in data science, these bindings enable embedded database access, simplified deployment (self-contained wheels), native AI/ML integration (vector storage with HNSW), and a Pythonic API with context managers and type hints.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a massive and impressive pull request that introduces comprehensive Python bindings for ArcadeDB. The work is well-structured, including a robust Docker-based build system, an extensive test suite, and exceptionally detailed documentation. My review focuses on improving the maintainability of the build scripts and correcting some inconsistencies and potential issues within the documentation to ensure clarity and correctness for future users and contributors.

gemini-code-assist · 2025-10-23T13:19:58Z

bindings/python/docs/development/architecture.md

+- `Database` instances are **NOT thread-safe**
+- Each thread needs its own `Database` instance
+- Transactions are thread-local


The documentation incorrectly states that Database instances are not thread-safe. This contradicts the implementation shown in tests/test_concurrency.py::test_thread_safe_operations, where a single Database instance is successfully and safely shared across multiple threads. The underlying Java engine is thread-safe, and this is a key feature. The documentation should be corrected to reflect that Database instances are indeed thread-safe and can be shared across threads within the same process.

Suggested change

- `Database` instances are **NOT thread-safe**

- Each thread needs its own `Database` instance

- Transactions are thread-local

- `Database` instances are **thread-safe**

- A single `Database` instance can be shared across multiple threads

- Transactions are thread-local

gemini-code-assist · 2025-10-23T13:19:58Z

bindings/python/docs/development/testing/test-core.md

+        CREATE HNSW INDEX Document.embedding
+        ON Document(embedding)
+        WITH m=16, ef=128, efConstruction=128
+    """)
+
+    # Insert vectors
+    with db.transaction():
+        db.command("sql", "INSERT INTO Document SET name = 'doc1', embedding = [1.0, 0.0, 0.0]")
+        db.command("sql", "INSERT INTO Document SET name = 'doc2', embedding = [0.9, 0.1, 0.0]")
+        db.command("sql", "INSERT INTO Document SET name = 'doc3', embedding = [0.0, 1.0, 0.0]")
+
+    # Similarity search
+    result = db.query("sql", """
+        SELECT name, cosine_similarity(embedding, [1.0, 0.0, 0.0]) as similarity
+        FROM Document
+        ORDER BY similarity DESC
+        LIMIT 2
+    """)
+
+    docs = list(result)
+    assert docs[0].get_property("name") == "doc1"  # Closest match
+    assert docs[1].get_property("name") == "doc2"  # Second closest
+```


This documentation block demonstrates creating a vector index and searching using SQL syntax (CREATE HNSW INDEX, cosine_similarity). This is confusing and likely incorrect in the context of the Python bindings, which use the db.create_vector_index() and index.find_nearest() methods as shown in the test code. The SQL-based vector search syntax is not the primary or documented way to perform this operation via the Python API and may not be supported in the embedded engine. This section should be revised to accurately reflect the Python API's usage.

gemini-code-assist · 2025-10-23T13:19:58Z

bindings/python/Dockerfile.build

+RUN export ARCADEDB_VERSION=$(python3 extract_version.py --format=pep440 /arcadedb/pom.xml) && \
+    echo "📦 Python package version: ${ARCADEDB_VERSION}" && \
+    case ${DISTRIBUTION} in \
+        headless) \
+            PACKAGE_NAME="arcadedb-embedded-headless" && \
+            DESCRIPTION="ArcadeDB embedded Python bindings - Headless distribution (excludes Gremlin, GraphQL, MongoDB/Redis wire protocols, and Studio)" \
+            ;; \
+        minimal) \
+            PACKAGE_NAME="arcadedb-embedded-minimal" && \
+            DESCRIPTION="ArcadeDB embedded Python bindings - Minimal distribution (excludes Gremlin, GraphQL, MongoDB/Redis wire protocols)" \
+            ;; \
+        full) \
+            PACKAGE_NAME="arcadedb-embedded" && \
+            DESCRIPTION="ArcadeDB embedded Python bindings - Full distribution (includes Gremlin, GraphQL, MongoDB/Redis wire protocols, and Studio)" \
+            ;; \
+        *) \
+            PACKAGE_NAME="arcadedb-embedded" && \
+            DESCRIPTION="ArcadeDB embedded Python bindings" \
+            ;; \
+    esac && \
+    sed -i 's|^name = .*|name = "'"${PACKAGE_NAME}"'"|' pyproject.toml && \
+    sed -i 's|^version = .*|version = "'"${ARCADEDB_VERSION}"'"|' pyproject.toml && \
+    sed -i 's|^description = .*|description = "'"${DESCRIPTION}"'"|' pyproject.toml && \
+    python3 -m build --wheel && \
+    echo "✅ Wheel built successfully!" && \
+    ls -lh dist/


This RUN command is very long and performs multiple distinct tasks (version extraction, modifying pyproject.toml with sed, and building the wheel). For better readability, maintainability, and to leverage Docker's layer caching more effectively, consider splitting this into smaller, more focused RUN commands. Alternatively, this logic could be moved into a helper shell script that is COPY'd and executed.

gemini-code-assist · 2025-10-23T13:19:58Z

bindings/python/Dockerfile.build

+RUN echo '#!/usr/bin/env python3\n\
+import arcadedb_embedded as arcadedb\n\
+import tempfile\n\
+import shutil\n\
+import os\n\
+\n\
+print("🎮 Testing ArcadeDB Python bindings...")\n\
+print(f"📦 Version: {arcadedb.__version__}")\n\
+\n\
+temp_dir = tempfile.mkdtemp()\n\
+db_path = os.path.join(temp_dir, "test_db")\n\
+\n\
+try:\n\
+    with arcadedb.create_database(db_path) as db:\n\
+        print("✅ Database created")\n\
+        \n\
+        with db.transaction():\n\
+            db.command("sql", "CREATE DOCUMENT TYPE TestDoc")\n\
+            db.command("sql", "INSERT INTO TestDoc SET name = '\''docker_test'\'', value = 123")\n\
+        print("✅ Transaction committed")\n\
+        \n\
+        result = db.query("sql", "SELECT FROM TestDoc")\n\
+        for record in result:\n\
+            print(f"✅ Query result: {record.get_property('\''name'\'')} = {record.get_property('\''value'\'')}")\n\
+    \n\
+    print("🎉 All tests passed!")\n\
+finally:\n\
+    if os.path.exists(temp_dir):\n\
+        shutil.rmtree(temp_dir)\n\
+' > /test/test_install.py && chmod +x /test/test_install.py


Embedding a large, multi-line script directly into the Dockerfile using echo makes it difficult to read, maintain, and lint. It would be a better practice to store this script in a separate file (e.g., docker/test_install.py) and use a COPY instruction to add it to the image. This would significantly improve the readability and maintainability of the Dockerfile.

gemini-code-assist · 2025-10-23T13:19:59Z

bindings/python/docs/development/release.md

+2. Click **Draft a new release**
+3. Click **Choose a tag** dropdown
+
+2. Click **Choose a tag** → Type `vX.Y.Z-python` → **Create new tag**


This line appears to be a duplicate of the instruction in the preceding step. It should be removed to improve the clarity of the release process documentation.

gemini-code-assist · 2025-10-23T13:19:59Z

bindings/python/docs/getting-started/quickstart.md

+
+        # Count
+        result = db.query("sql", "SELECT count(*) as total FROM Person")
+        total = result[0].get_property('total')


The code result[0] suggests that the ResultSet object supports direct indexing. However, based on the API documentation and other examples, ResultSet is an iterator. Accessing it by index will raise an error. To get the first item, you should either convert it to a list list(result)[0] or, more efficiently, use result.next().

Suggested change

total = result[0].get_property('total')

total = result.next().get_property('total')

gemini-code-assist · 2025-10-23T13:19:59Z

bindings/python/docs/guide/graphs.md

+    alice = result_alice.next()._java_result
+    bob = result_bob.next()._java_result


The example for creating edges uses ._java_result, which is a private attribute. Exposing private attributes in documentation is not a good practice as it indicates a leaky abstraction and can be brittle if the internal implementation changes. A public method like as_java() on the Result object would provide a more stable and explicit API. Alternatively, the Result wrapper could proxy the newEdge method.

Suggested change

alice = result_alice.next()._java_result

bob = result_bob.next()._java_result

alice = result_alice.next().as_java()

bob = result_bob.next().as_java()

mergify · 2025-10-23T13:23:00Z

🧪 CI Insights

Here's what we observed from your CI run for a46cb72.

🟢 All jobs passed!

But CI Insights is watching 👀

tae898 · 2025-10-25T22:56:41Z

Previous discussion (outdated - click to expand)

@robfrank @lvca

The Python bindings are working well overall. The essential unit tests pass, and I'm now adding 9 realistic examples with larger datasets (millions of records) to test performance and scalability. I'm halfway through and plan to complete this by October 31st so it can be included in the next release.

While developing these examples, I've encountered some issues (or possibly mistakes on my part) which I've documented here:

CSV Importer: createdDocuments counter not incremented for document imports #2700
LSMTree compaction creates duplicate timestamped indexes that are not cleaned up #2701
NeedRetryException when creating indexes sequentially on large datasets #2702
FULL_TEXT index on LIKE queries shows no performance improvement or regression #2703

These may be Python-specific. If so, I'll fix them in the bindings.

Repository Structure

I plan to maintain my fork (https://github.com/humemai/arcadedb-embedded-python) as a Python-focused repository. It will stay synchronized with ArcadeDB's main branch but include:

Python-focused README
Dedicated Python documentation (https://humemai.github.io/arcadedb-embedded-python/)
Python-specific GitHub Actions workflows
Python bindings and related files

The Java codebase will remain unchanged and synchronized with upstream.

PyPI Releases

I'll publish Python wheels to PyPI from my fork:

These will follow ArcadeDB's release versions with an optional revision suffix for Python-specific updates (e.g., 25.9.1.3 for the third Python revision of 25.9.1).

Contributing Back

Once the Python bindings are stable and well-tested, I'd like to contribute them back to the main ArcadeDB repository via PR, similar to how projects like DuckDB and Apache Arrow maintain their Python bindings in the main repo with dedicated documentation and PyPI releases.

The quality of my python bindings can be assured by the tests via the GitHub actions performed from your repo (https://github.com/ArcadeData/arcadedb/actions/workflows/test-python-bindings.yml and https://github.com/ArcadeData/arcadedb/actions/workflows/test-python-examples.yml)

Introduce comprehensive Python bindings that enable embedded ArcadeDB usage directly from Python applications, leveraging JPype for seamless JVM integration. Core Features: - Embedded database operations with full CRUD support - Document, vertex, and edge models for graph databases - Transaction management (read, write, batch operations) - Server mode with HTTP API support - Vector search capabilities for AI/ML applications - Data import from CSV/JSONL with automatic type inference - Export to GraphML, GraphSON, JSONL, and CSV formats - Gremlin query language support - Async execution and batch processing utilities Development Infrastructure: - Multi-platform build system (Linux, macOS, Windows on x64/ARM64) - Native build scripts with JRE bundling - Docker-based build environment - Comprehensive test suite with 100+ tests covering: * Core database operations * Concurrency and transactions * Import/export functionality * Server patterns and API * Type conversions and result handling - CI/CD workflows for automated testing across all platforms - Testing for examples 01-03 (verified working) Examples: - Simple document store with CRUD operations - Social network graph modeling and traversal - Vector similarity search - CSV import with MovieLens dataset (examples 04-05 included but not CI-tested yet) Build System: - Platform-specific wheel generation - JAR exclusion filtering for minimal distributions - Version extraction from parent pom.xml - Setup utilities for streamlined installation This implementation provides a Pythonic interface to ArcadeDB while maintaining compatibility with the Java API and supporting all major platforms.

tae898 · 2025-11-03T10:58:12Z

Python Bindings for ArcadeDB

Overview

This PR introduces comprehensive Python bindings for ArcadeDB, enabling embedded database usage directly from Python applications. The implementation leverages JPype for seamless JVM integration, providing a Pythonic interface while maintaining full compatibility with the Java API.

Related Issue

#2662

🎯 Key Highlights

Multi-Platform Support (6 Platforms)

✅ All platforms supported thanks to Java's JIT nature:

linux/amd64
linux/arm64
darwin/amd64 (Intel Mac)
darwin/arm64 (Apple Silicon)
windows/amd64
windows/arm64

Key Innovation: Instead of compiling native extensions for each platform, we ship platform-specific stripped JREs bundled with each wheel. This approach:

Eliminates the need for users to install Java
Ensures consistent behavior across all platforms
Simplifies the build process (no native compilation required)
Leverages Java's "write once, run anywhere" philosophy

Current Status

⚠️ Not Production Ready - Currently undergoing comprehensive testing across all platforms.

📦 PyPI Distribution Pending - Wheels are ready but not yet published to PyPI. Waiting for PyPI approval to push wheels larger than 100MB (current wheels include bundled JREs).

🚀 Features

Core Functionality

Embedded Database Operations: Full CRUD support for documents, vertices, and edges
Transaction Management: Read, write, and batch operations with ACID guarantees
Graph Database Support: Native graph modeling with traversal capabilities
Vector Search: AI/ML-ready vector similarity search
Multiple Query Languages: SQL, Gremlin, and programmatic API
Server Mode: HTTP API for remote access

Data Import/Export

Import: CSV and JSONL with automatic type inference
Export: GraphML, GraphSON, JSONL, and CSV formats
Batch Processing: Optimized bulk operations with BatchContext and AsyncExecutor

Development Infrastructure

Multi-Platform Build System: Native build scripts with JRE bundling
Docker Support: Docker-based build environment for Linux
Comprehensive Testing: 100+ tests covering core operations, concurrency, and edge cases
CI/CD: Automated testing across all 6 platforms via GitHub Actions
Examples: Working examples for common use cases (examples 01-03 verified in CI)

📦 Installation (When Available on PyPI)

pip install arcadedb-embedded

Platform-specific wheels will be automatically selected based on your system.

🔧 Build System

The build system generates platform-specific wheels with bundled JREs:

# Build for current platform
cd bindings/python
./build.sh

# Build for specific platform
./build.sh linux/amd64
./build.sh darwin/arm64
./build.sh windows/amd64

JAR Exclusion

Non-essential JARs (e.g., gRPC) are excluded to minimize wheel size, configured via jar_exclusions.txt.

📊 Testing

Test Coverage

✅ Core database operations
✅ Concurrency and transaction handling
✅ Import/export functionality
✅ Server patterns and HTTP API
✅ Type conversions and result handling
✅ Async execution and batch processing

CI/CD Workflows

test-python-bindings.yml: Unit tests across all platforms
test-python-examples.yml: Examples 01-03 tested on all platforms

📝 Examples

1. Simple Document Store

Basic CRUD operations with comprehensive data type support.

2. Social Network Graph

Graph modeling with vertices, edges, and traversal queries.

3. Vector Search

Vector embeddings and semantic similarity search for AI/ML applications.

🛣️ Roadmap

Complete testing across all platforms
PyPI approval for 100MB+ wheels
Publish wheels to PyPI
Add examples 04-08 to CI testing
Performance benchmarking and optimization
Expand documentation with more advanced use cases
Add mkdocs documentation site

🤝 Technical Details

Architecture

JPype Integration: Seamless Python-Java interop without performance overhead
Bundled JRE: Platform-specific stripped Java Runtime Environments
Type Conversion: Automatic conversion between Python and Java types
Result Handling: Pythonic iteration over query results

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

robfrank linked an issue Oct 24, 2025 that may be closed by this pull request

Support for python embedded package #2662

Closed

robfrank added this to the 25.10.1 milestone Oct 24, 2025

robfrank added the enhancement New feature or request label Oct 24, 2025

tae898 force-pushed the python-embedded branch from 5990a8b to 849b4c7 Compare October 25, 2025 22:26

tae898 force-pushed the python-embedded branch 2 times, most recently from 2cd71e5 to 69f4c54 Compare October 25, 2025 23:08

lvca requested a review from robfrank October 29, 2025 05:51

lvca assigned tae898 Oct 29, 2025

tae898 force-pushed the python-embedded branch 2 times, most recently from f302557 to 53fbed6 Compare November 3, 2025 10:39

tae898 force-pushed the python-embedded branch from 53fbed6 to a46cb72 Compare November 3, 2025 10:44

robfrank merged commit a46cb72 into ArcadeData:main Nov 5, 2025
26 of 29 checks passed

tae898 deleted the python-embedded branch November 5, 2025 11:22

	total = result[0].get_property('total')
	total = result.next().get_property('total')

		alice = result_alice.next()._java_result
		bob = result_bob.next()._java_result

Uh oh!

Conversation

tae898 commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Embedded Bindings for ArcadeDB

What does this PR do?

Key Additions:

Motivation

Why Three Distributions?

Related Issues

Architecture Decisions:

Technical Overview

Package Structure

Build System

CI/CD Workflows

API Coverage

Testing

Additional Notes

Documentation

Examples

Dependencies

Installation Requirements

Backward Compatibility

Performance Considerations

Known Limitations

Checklist

Additional Testing Completed

Uh oh!

tae898 commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Oct 23, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

🟢 All jobs passed!

Uh oh!

tae898 commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Repository Structure

PyPI Releases

Contributing Back

Uh oh!

tae898 commented Nov 3, 2025

Python Bindings for ArcadeDB

Overview

Related Issue

🎯 Key Highlights

Multi-Platform Support (6 Platforms)

Current Status

🚀 Features

tae898 commented Oct 23, 2025 •

edited

Loading

tae898 commented Oct 23, 2025 •

edited

Loading

mergify bot commented Oct 23, 2025 •

edited

Loading

tae898 commented Oct 25, 2025 •

edited

Loading