|
| 1 | +# GraphBenchmark Design — LDBC-Inspired Benchmark for ArcadeDB |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +A JUnit 5 benchmark class in `engine/src/test/java/performance/GraphBenchmark.java` that generates an LDBC Social Network Benchmark-inspired graph and benchmarks ArcadeDB across creation, lookups, and traversals. Queries run in both SQL and OpenCypher side by side. The database is preserved between runs so only the first execution pays the generation cost. |
| 6 | + |
| 7 | +## Decisions |
| 8 | + |
| 9 | +| Decision | Choice | |
| 10 | +|----------|--------| |
| 11 | +| Location | `engine/src/test/java/performance/` | |
| 12 | +| Query languages | Both SQL and Cypher, side by side | |
| 13 | +| Schema fidelity | Full LDBC SNB (8 vertex types, 14 edge types) | |
| 14 | +| Execution model | JUnit 5 with `@Tag("benchmark")` | |
| 15 | +| Default scale | Medium (~30K Persons, ~150K Posts, ~600K Comments, ~3M edges) | |
| 16 | +| Metrics | Micrometer `SimpleMeterRegistry` (test-scoped dependency) | |
| 17 | + |
| 18 | +## File Structure |
| 19 | + |
| 20 | +Single file: `engine/src/test/java/performance/GraphBenchmark.java` |
| 21 | + |
| 22 | +New test-scoped dependency in `engine/pom.xml`: |
| 23 | +```xml |
| 24 | +<dependency> |
| 25 | + <groupId>io.micrometer</groupId> |
| 26 | + <artifactId>micrometer-core</artifactId> |
| 27 | + <version>${micrometer.version}</version> |
| 28 | + <scope>test</scope> |
| 29 | +</dependency> |
| 30 | +``` |
| 31 | + |
| 32 | +## Scale Constants |
| 33 | + |
| 34 | +```java |
| 35 | +private static final int NUM_PERSONS = 30_000; |
| 36 | +private static final int NUM_POSTS = 150_000; |
| 37 | +private static final int NUM_COMMENTS = 600_000; |
| 38 | +private static final int NUM_FORUMS = 5_000; |
| 39 | +private static final int NUM_TAGS = 2_000; |
| 40 | +private static final int NUM_TAG_CLASSES = 100; |
| 41 | +private static final int NUM_PLACES = 1_500; |
| 42 | +private static final int NUM_ORGANISATIONS = 3_000; |
| 43 | + |
| 44 | +private static final int AVG_KNOWS_PER_PERSON = 40; |
| 45 | +private static final int AVG_LIKES_PER_PERSON = 30; |
| 46 | +private static final int AVG_TAGS_PER_POST = 3; |
| 47 | +private static final int AVG_INTERESTS_PER_PERSON = 5; |
| 48 | + |
| 49 | +private static final int PARALLEL = 4; |
| 50 | +private static final int COMMIT_EVERY = 5_000; |
| 51 | +private static final String DB_PATH = "target/databases/graph-benchmark"; |
| 52 | +``` |
| 53 | + |
| 54 | +## Schema |
| 55 | + |
| 56 | +### Vertex Types (8) |
| 57 | + |
| 58 | +| Type | Properties | Index | |
| 59 | +|------|-----------|-------| |
| 60 | +| Person | id (long), firstName, lastName, gender, birthday, creationDate, locationIP, browserUsed | unique on `id` | |
| 61 | +| Post | id (long), imageFile, creationDate, locationIP, browserUsed, language, content, length | unique on `id` | |
| 62 | +| Comment | id (long), creationDate, locationIP, browserUsed, content, length | unique on `id` | |
| 63 | +| Forum | id (long), title, creationDate | unique on `id` | |
| 64 | +| Tag | id (long), name, url | unique on `id` | |
| 65 | +| TagClass | id (long), name, url | unique on `id` | |
| 66 | +| Place | id (long), name, url, type (City/Country/Continent) | unique on `id`, index on `type` | |
| 67 | +| Organisation | id (long), name, url, type (University/Company) | unique on `id`, index on `type` | |
| 68 | + |
| 69 | +All vertex types use `PARALLEL` bucket count for parallel insertion. |
| 70 | + |
| 71 | +### Edge Types (14) |
| 72 | + |
| 73 | +| Edge Type | From | To | Properties | |
| 74 | +|-----------|------|-----|------------| |
| 75 | +| KNOWS | Person | Person | creationDate | |
| 76 | +| HAS_CREATOR | Post, Comment | Person | -- | |
| 77 | +| REPLY_OF | Comment | Post or Comment | -- | |
| 78 | +| HAS_TAG | Post, Comment, Forum | Tag | -- | |
| 79 | +| LIKES | Person | Post or Comment | creationDate | |
| 80 | +| CONTAINER_OF | Forum | Post | -- | |
| 81 | +| HAS_MEMBER | Forum | Person | joinDate | |
| 82 | +| HAS_MODERATOR | Forum | Person | -- | |
| 83 | +| WORKS_AT | Person | Organisation | workFrom (int) | |
| 84 | +| STUDY_AT | Person | Organisation | classYear (int) | |
| 85 | +| IS_LOCATED_IN | Person, Post, Comment, Organisation | Place | -- | |
| 86 | +| HAS_INTEREST | Person | Tag | -- | |
| 87 | +| IS_PART_OF | Place | Place | -- | |
| 88 | +| IS_SUBCLASS_OF | TagClass | TagClass | -- | |
| 89 | + |
| 90 | +KNOWS is bidirectional. All others are unidirectional. |
| 91 | + |
| 92 | +## Data Generation |
| 93 | + |
| 94 | +Generation order (respects dependencies): |
| 95 | + |
| 96 | +1. **TagClass** -- hierarchy with ~5 root classes, rest as children (IS_SUBCLASS_OF) |
| 97 | +2. **Tag** -- each assigned to a TagClass (IS_PART_OF) |
| 98 | +3. **Place** -- 6 continents, ~50 countries, rest cities. IS_PART_OF links cities->countries->continents |
| 99 | +4. **Organisation** -- each IS_LOCATED_IN a random Place (country for University, city for Company) |
| 100 | +5. **Person** -- each IS_LOCATED_IN a random city. Random WORKS_AT/STUDY_AT. Random HAS_INTEREST tags |
| 101 | +6. **KNOWS** -- power-law distribution via ThreadLocalRandom. Bidirectional. Avg ~40 per Person |
| 102 | +7. **Forum** -- each HAS_MODERATOR a random Person. Random HAS_MEMBER edges |
| 103 | +8. **Post** -- each in a Forum (CONTAINER_OF), HAS_CREATOR to a Forum member. Random HAS_TAG. IS_LOCATED_IN from creator's location |
| 104 | +9. **Comment** -- each REPLY_OF a Post or earlier Comment. HAS_CREATOR, HAS_TAG, IS_LOCATED_IN |
| 105 | +10. **LIKES** -- random Persons liking random Posts/Comments |
| 106 | + |
| 107 | +Uses `database.async()` with `PARALLEL` level and `COMMIT_EVERY` batch size. WAL disabled during generation. `ThreadLocalRandom.current()` per thread. |
| 108 | + |
| 109 | +## Class Structure |
| 110 | + |
| 111 | +```java |
| 112 | +@TestInstance(TestInstance.Lifecycle.PER_CLASS) |
| 113 | +@TestMethodOrder(MethodOrderer.OrderAnnotation.class) |
| 114 | +@Tag("benchmark") |
| 115 | +class GraphBenchmark { |
| 116 | + |
| 117 | + private Database database; |
| 118 | + private MeterRegistry registry; |
| 119 | + private boolean freshlyCreated; |
| 120 | + |
| 121 | + @BeforeAll |
| 122 | + void setup() { |
| 123 | + registry = new SimpleMeterRegistry(); |
| 124 | + final DatabaseFactory factory = new DatabaseFactory(DB_PATH); |
| 125 | + if (factory.exists()) { |
| 126 | + database = factory.open(); |
| 127 | + freshlyCreated = false; |
| 128 | + } else { |
| 129 | + database = factory.create(); |
| 130 | + freshlyCreated = true; |
| 131 | + final Timer.Sample sample = Timer.start(registry); |
| 132 | + createSchema(); |
| 133 | + populateGraph(); |
| 134 | + sample.stop(registry.timer("benchmark.creation")); |
| 135 | + } |
| 136 | + } |
| 137 | + |
| 138 | + @AfterAll |
| 139 | + void teardownAndReport() { |
| 140 | + printReport(); |
| 141 | + if (database != null && database.isOpen()) |
| 142 | + database.close(); // close, NOT drop -- preserve for reuse |
| 143 | + } |
| 144 | + |
| 145 | + private void createSchema() { ... } |
| 146 | + private void populateGraph() { ... } |
| 147 | + private void printReport() { ... } |
| 148 | + private void benchmark(String phase, String name, int iterations, |
| 149 | + String sql, String cypher) { ... } |
| 150 | + |
| 151 | + @Test @Order(1) void phase2_lookups() { ... } |
| 152 | + @Test @Order(2) void phase3_simpleTraversals() { ... } |
| 153 | + @Test @Order(3) void phase4_complexTraversals() { ... } |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +Key: `close()` not `drop()` in teardown enables database reuse across runs. |
| 158 | + |
| 159 | +## Benchmark Phases |
| 160 | + |
| 161 | +### Phase 1: Creation (measured in @BeforeAll) |
| 162 | + |
| 163 | +Total generation time recorded. If database reused, prints vertex/edge counts only. |
| 164 | + |
| 165 | +### Phase 2: Simple Lookups (@Order(1)) |
| 166 | + |
| 167 | +| ID | Query | Iterations | |
| 168 | +|----|-------|-----------| |
| 169 | +| 2a | Person by `id` (indexed) | 1000 | |
| 170 | +| 2b | Post by `id` (indexed) | 1000 | |
| 171 | +| 2c | Person by `firstName` (non-indexed scan) | 100 | |
| 172 | +| 2d | Count vertices per type | 10 | |
| 173 | + |
| 174 | +### Phase 3: Simple Traversals (@Order(2)) |
| 175 | + |
| 176 | +| ID | Query | Iterations | |
| 177 | +|----|-------|-----------| |
| 178 | +| 3a | Direct friends of a Person (1-hop KNOWS) | 500 | |
| 179 | +| 3b | Posts created by a Person (HAS_CREATOR reverse) | 500 | |
| 180 | +| 3c | Tags of a given Post (HAS_TAG) | 500 | |
| 181 | +| 3d | Members of a Forum (HAS_MEMBER) | 500 | |
| 182 | + |
| 183 | +### Phase 4: Complex Traversals (@Order(3)) |
| 184 | + |
| 185 | +| ID | Query | Iterations | |
| 186 | +|----|-------|-----------| |
| 187 | +| 4a | Friends-of-friends (2-hop KNOWS, exclude direct) | 200 | |
| 188 | +| 4b | Posts by friends in a city (KNOWS + HAS_CREATOR + IS_LOCATED_IN) | 200 | |
| 189 | +| 4c | Common tags between two Persons' posts | 200 | |
| 190 | +| 4d | Shortest path between two Persons via KNOWS (Cypher only) | 100 | |
| 191 | +| 4e | Forum recommendation (forums with most of Person's friends) | 200 | |
| 192 | + |
| 193 | +Queries 4a–4e are Cypher-only (complex pattern matching, aggregation, `shortestPath`). Phases 2–3 run both SQL and Cypher side by side. |
| 194 | + |
| 195 | +## Representative Queries |
| 196 | + |
| 197 | +### Phase 2 -- Lookups |
| 198 | + |
| 199 | +```sql |
| 200 | +-- 2a: Person by ID (indexed) |
| 201 | +SQL: SELECT FROM Person WHERE id = ? |
| 202 | +Cypher: MATCH (p:Person {id: $id}) RETURN p |
| 203 | + |
| 204 | +-- 2c: Person by firstName (non-indexed scan) |
| 205 | +SQL: SELECT FROM Person WHERE firstName = ? |
| 206 | +Cypher: MATCH (p:Person) WHERE p.firstName = $name RETURN p |
| 207 | +``` |
| 208 | + |
| 209 | +### Phase 3 -- Simple Traversals |
| 210 | + |
| 211 | +```sql |
| 212 | +-- 3a: Direct friends |
| 213 | +SQL: SELECT expand(both('KNOWS')) FROM Person WHERE id = ? |
| 214 | +Cypher: MATCH (p:Person {id: $id})-[:KNOWS]-(friend) RETURN friend |
| 215 | + |
| 216 | +-- 3b: Posts by a Person |
| 217 | +SQL: SELECT expand(in('HAS_CREATOR')) FROM Person WHERE id = ? |
| 218 | +Cypher: MATCH (p:Person {id: $id})<-[:HAS_CREATOR]-(post:Post) RETURN post |
| 219 | +``` |
| 220 | + |
| 221 | +### Phase 4 -- Complex Traversals |
| 222 | + |
| 223 | +```sql |
| 224 | +-- 4a: Friends-of-friends (Cypher only -- SQL expand() cannot exclude direct friends) |
| 225 | +Cypher: MATCH (p:Person {id: $id})-[:KNOWS]-()-[:KNOWS]-(fof) |
| 226 | + WHERE fof <> p AND NOT (p)-[:KNOWS]-(fof) |
| 227 | + RETURN DISTINCT fof |
| 228 | + |
| 229 | +-- 4b: Posts by friends in a city |
| 230 | +Cypher: MATCH (p:Person {id: $id})-[:KNOWS]-(friend)-[:IS_LOCATED_IN]->(c:Place {name: $city}), |
| 231 | + (friend)<-[:HAS_CREATOR]-(post:Post) |
| 232 | + RETURN post, friend.firstName |
| 233 | + |
| 234 | +-- 4c: Common tags between two Persons' posts |
| 235 | +Cypher: MATCH (a:Person {id: $id1})<-[:HAS_CREATOR]-(p1)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(p2)-[:HAS_CREATOR]->(b:Person {id: $id2}) |
| 236 | + RETURN t.name, count(*) AS freq ORDER BY freq DESC |
| 237 | + |
| 238 | +-- 4d: Shortest path via KNOWS |
| 239 | +Cypher: MATCH path = shortestPath((a:Person {id: $id1})-[:KNOWS*]-(b:Person {id: $id2})) |
| 240 | + RETURN path, length(path) |
| 241 | + |
| 242 | +-- 4e: Forum recommendation |
| 243 | +Cypher: MATCH (p:Person {id: $id})-[:KNOWS]-(friend), |
| 244 | + (forum:Forum)-[:HAS_MEMBER]->(friend) |
| 245 | + RETURN forum.title, count(friend) AS friendCount |
| 246 | + ORDER BY friendCount DESC LIMIT 10 |
| 247 | +``` |
| 248 | + |
| 249 | +## Micrometer Metrics |
| 250 | + |
| 251 | +Uses `SimpleMeterRegistry` (in-process, no external infrastructure). |
| 252 | + |
| 253 | +Metrics per query: |
| 254 | +- `Timer`: `benchmark.query.<phase>.<queryName>.<language>` -- with p50, p95, p99 percentile histograms |
| 255 | +- `Counter`: `benchmark.query.<phase>.<queryName>.<language>.results` -- total result rows (sanity check) |
| 256 | +- `Timer`: `benchmark.creation` -- total graph generation time |
| 257 | + |
| 258 | +## Dependencies |
| 259 | + |
| 260 | +Adding Micrometer requires updating `ATTRIBUTIONS.md` (Apache 2.0 license, compatible). |
| 261 | + |
| 262 | +## Output Format |
| 263 | + |
| 264 | +Printed to stdout in `@AfterAll`: |
| 265 | + |
| 266 | +``` |
| 267 | +=== ArcadeDB Graph Benchmark (LDBC-inspired) === |
| 268 | +Database: target/databases/graph-benchmark |
| 269 | +Vertices: 789,500 | Edges: 3,240,000 |
| 270 | +
|
| 271 | +Phase | Query | Lang | Ops | Avg ms | p50 ms | p95 ms | p99 ms |
| 272 | +------|------------------------|---------|-----|---------|---------|---------|-------- |
| 273 | +2a | Person by ID | SQL | 1000| 0.12 | 0.10 | 0.25 | 0.41 |
| 274 | +2a | Person by ID | Cypher | 1000| 0.18 | 0.15 | 0.32 | 0.55 |
| 275 | +... |
| 276 | +4d | Shortest path (KNOWS) | Cypher | 100 | 12.40 | 10.20 | 28.50 | 45.00 |
| 277 | +``` |
0 commit comments