Skip to content

Commit b2dcfda

Browse files
authored
#3411 Add ldbc like bechmark (#3410)
1 parent 5ee9ad4 commit b2dcfda

File tree

4 files changed

+2844
-0
lines changed

4 files changed

+2844
-0
lines changed
Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
# GraphBenchmark Design — LDBC-Inspired Benchmark for ArcadeDB
2+
3+
## Summary
4+
5+
A JUnit 5 benchmark class in `engine/src/test/java/performance/GraphBenchmark.java` that generates an LDBC Social Network Benchmark-inspired graph and benchmarks ArcadeDB across creation, lookups, and traversals. Queries run in both SQL and OpenCypher side by side. The database is preserved between runs so only the first execution pays the generation cost.
6+
7+
## Decisions
8+
9+
| Decision | Choice |
10+
|----------|--------|
11+
| Location | `engine/src/test/java/performance/` |
12+
| Query languages | Both SQL and Cypher, side by side |
13+
| Schema fidelity | Full LDBC SNB (8 vertex types, 14 edge types) |
14+
| Execution model | JUnit 5 with `@Tag("benchmark")` |
15+
| Default scale | Medium (~30K Persons, ~150K Posts, ~600K Comments, ~3M edges) |
16+
| Metrics | Micrometer `SimpleMeterRegistry` (test-scoped dependency) |
17+
18+
## File Structure
19+
20+
Single file: `engine/src/test/java/performance/GraphBenchmark.java`
21+
22+
New test-scoped dependency in `engine/pom.xml`:
23+
```xml
24+
<dependency>
25+
<groupId>io.micrometer</groupId>
26+
<artifactId>micrometer-core</artifactId>
27+
<version>${micrometer.version}</version>
28+
<scope>test</scope>
29+
</dependency>
30+
```
31+
32+
## Scale Constants
33+
34+
```java
35+
private static final int NUM_PERSONS = 30_000;
36+
private static final int NUM_POSTS = 150_000;
37+
private static final int NUM_COMMENTS = 600_000;
38+
private static final int NUM_FORUMS = 5_000;
39+
private static final int NUM_TAGS = 2_000;
40+
private static final int NUM_TAG_CLASSES = 100;
41+
private static final int NUM_PLACES = 1_500;
42+
private static final int NUM_ORGANISATIONS = 3_000;
43+
44+
private static final int AVG_KNOWS_PER_PERSON = 40;
45+
private static final int AVG_LIKES_PER_PERSON = 30;
46+
private static final int AVG_TAGS_PER_POST = 3;
47+
private static final int AVG_INTERESTS_PER_PERSON = 5;
48+
49+
private static final int PARALLEL = 4;
50+
private static final int COMMIT_EVERY = 5_000;
51+
private static final String DB_PATH = "target/databases/graph-benchmark";
52+
```
53+
54+
## Schema
55+
56+
### Vertex Types (8)
57+
58+
| Type | Properties | Index |
59+
|------|-----------|-------|
60+
| Person | id (long), firstName, lastName, gender, birthday, creationDate, locationIP, browserUsed | unique on `id` |
61+
| Post | id (long), imageFile, creationDate, locationIP, browserUsed, language, content, length | unique on `id` |
62+
| Comment | id (long), creationDate, locationIP, browserUsed, content, length | unique on `id` |
63+
| Forum | id (long), title, creationDate | unique on `id` |
64+
| Tag | id (long), name, url | unique on `id` |
65+
| TagClass | id (long), name, url | unique on `id` |
66+
| Place | id (long), name, url, type (City/Country/Continent) | unique on `id`, index on `type` |
67+
| Organisation | id (long), name, url, type (University/Company) | unique on `id`, index on `type` |
68+
69+
All vertex types use `PARALLEL` bucket count for parallel insertion.
70+
71+
### Edge Types (14)
72+
73+
| Edge Type | From | To | Properties |
74+
|-----------|------|-----|------------|
75+
| KNOWS | Person | Person | creationDate |
76+
| HAS_CREATOR | Post, Comment | Person | -- |
77+
| REPLY_OF | Comment | Post or Comment | -- |
78+
| HAS_TAG | Post, Comment, Forum | Tag | -- |
79+
| LIKES | Person | Post or Comment | creationDate |
80+
| CONTAINER_OF | Forum | Post | -- |
81+
| HAS_MEMBER | Forum | Person | joinDate |
82+
| HAS_MODERATOR | Forum | Person | -- |
83+
| WORKS_AT | Person | Organisation | workFrom (int) |
84+
| STUDY_AT | Person | Organisation | classYear (int) |
85+
| IS_LOCATED_IN | Person, Post, Comment, Organisation | Place | -- |
86+
| HAS_INTEREST | Person | Tag | -- |
87+
| IS_PART_OF | Place | Place | -- |
88+
| IS_SUBCLASS_OF | TagClass | TagClass | -- |
89+
90+
KNOWS is bidirectional. All others are unidirectional.
91+
92+
## Data Generation
93+
94+
Generation order (respects dependencies):
95+
96+
1. **TagClass** -- hierarchy with ~5 root classes, rest as children (IS_SUBCLASS_OF)
97+
2. **Tag** -- each assigned to a TagClass (IS_PART_OF)
98+
3. **Place** -- 6 continents, ~50 countries, rest cities. IS_PART_OF links cities->countries->continents
99+
4. **Organisation** -- each IS_LOCATED_IN a random Place (country for University, city for Company)
100+
5. **Person** -- each IS_LOCATED_IN a random city. Random WORKS_AT/STUDY_AT. Random HAS_INTEREST tags
101+
6. **KNOWS** -- power-law distribution via ThreadLocalRandom. Bidirectional. Avg ~40 per Person
102+
7. **Forum** -- each HAS_MODERATOR a random Person. Random HAS_MEMBER edges
103+
8. **Post** -- each in a Forum (CONTAINER_OF), HAS_CREATOR to a Forum member. Random HAS_TAG. IS_LOCATED_IN from creator's location
104+
9. **Comment** -- each REPLY_OF a Post or earlier Comment. HAS_CREATOR, HAS_TAG, IS_LOCATED_IN
105+
10. **LIKES** -- random Persons liking random Posts/Comments
106+
107+
Uses `database.async()` with `PARALLEL` level and `COMMIT_EVERY` batch size. WAL disabled during generation. `ThreadLocalRandom.current()` per thread.
108+
109+
## Class Structure
110+
111+
```java
112+
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
113+
@TestMethodOrder(MethodOrderer.OrderAnnotation.class)
114+
@Tag("benchmark")
115+
class GraphBenchmark {
116+
117+
private Database database;
118+
private MeterRegistry registry;
119+
private boolean freshlyCreated;
120+
121+
@BeforeAll
122+
void setup() {
123+
registry = new SimpleMeterRegistry();
124+
final DatabaseFactory factory = new DatabaseFactory(DB_PATH);
125+
if (factory.exists()) {
126+
database = factory.open();
127+
freshlyCreated = false;
128+
} else {
129+
database = factory.create();
130+
freshlyCreated = true;
131+
final Timer.Sample sample = Timer.start(registry);
132+
createSchema();
133+
populateGraph();
134+
sample.stop(registry.timer("benchmark.creation"));
135+
}
136+
}
137+
138+
@AfterAll
139+
void teardownAndReport() {
140+
printReport();
141+
if (database != null && database.isOpen())
142+
database.close(); // close, NOT drop -- preserve for reuse
143+
}
144+
145+
private void createSchema() { ... }
146+
private void populateGraph() { ... }
147+
private void printReport() { ... }
148+
private void benchmark(String phase, String name, int iterations,
149+
String sql, String cypher) { ... }
150+
151+
@Test @Order(1) void phase2_lookups() { ... }
152+
@Test @Order(2) void phase3_simpleTraversals() { ... }
153+
@Test @Order(3) void phase4_complexTraversals() { ... }
154+
}
155+
```
156+
157+
Key: `close()` not `drop()` in teardown enables database reuse across runs.
158+
159+
## Benchmark Phases
160+
161+
### Phase 1: Creation (measured in @BeforeAll)
162+
163+
Total generation time recorded. If database reused, prints vertex/edge counts only.
164+
165+
### Phase 2: Simple Lookups (@Order(1))
166+
167+
| ID | Query | Iterations |
168+
|----|-------|-----------|
169+
| 2a | Person by `id` (indexed) | 1000 |
170+
| 2b | Post by `id` (indexed) | 1000 |
171+
| 2c | Person by `firstName` (non-indexed scan) | 100 |
172+
| 2d | Count vertices per type | 10 |
173+
174+
### Phase 3: Simple Traversals (@Order(2))
175+
176+
| ID | Query | Iterations |
177+
|----|-------|-----------|
178+
| 3a | Direct friends of a Person (1-hop KNOWS) | 500 |
179+
| 3b | Posts created by a Person (HAS_CREATOR reverse) | 500 |
180+
| 3c | Tags of a given Post (HAS_TAG) | 500 |
181+
| 3d | Members of a Forum (HAS_MEMBER) | 500 |
182+
183+
### Phase 4: Complex Traversals (@Order(3))
184+
185+
| ID | Query | Iterations |
186+
|----|-------|-----------|
187+
| 4a | Friends-of-friends (2-hop KNOWS, exclude direct) | 200 |
188+
| 4b | Posts by friends in a city (KNOWS + HAS_CREATOR + IS_LOCATED_IN) | 200 |
189+
| 4c | Common tags between two Persons' posts | 200 |
190+
| 4d | Shortest path between two Persons via KNOWS (Cypher only) | 100 |
191+
| 4e | Forum recommendation (forums with most of Person's friends) | 200 |
192+
193+
Queries 4a–4e are Cypher-only (complex pattern matching, aggregation, `shortestPath`). Phases 2–3 run both SQL and Cypher side by side.
194+
195+
## Representative Queries
196+
197+
### Phase 2 -- Lookups
198+
199+
```sql
200+
-- 2a: Person by ID (indexed)
201+
SQL: SELECT FROM Person WHERE id = ?
202+
Cypher: MATCH (p:Person {id: $id}) RETURN p
203+
204+
-- 2c: Person by firstName (non-indexed scan)
205+
SQL: SELECT FROM Person WHERE firstName = ?
206+
Cypher: MATCH (p:Person) WHERE p.firstName = $name RETURN p
207+
```
208+
209+
### Phase 3 -- Simple Traversals
210+
211+
```sql
212+
-- 3a: Direct friends
213+
SQL: SELECT expand(both('KNOWS')) FROM Person WHERE id = ?
214+
Cypher: MATCH (p:Person {id: $id})-[:KNOWS]-(friend) RETURN friend
215+
216+
-- 3b: Posts by a Person
217+
SQL: SELECT expand(in('HAS_CREATOR')) FROM Person WHERE id = ?
218+
Cypher: MATCH (p:Person {id: $id})<-[:HAS_CREATOR]-(post:Post) RETURN post
219+
```
220+
221+
### Phase 4 -- Complex Traversals
222+
223+
```sql
224+
-- 4a: Friends-of-friends (Cypher only -- SQL expand() cannot exclude direct friends)
225+
Cypher: MATCH (p:Person {id: $id})-[:KNOWS]-()-[:KNOWS]-(fof)
226+
WHERE fof <> p AND NOT (p)-[:KNOWS]-(fof)
227+
RETURN DISTINCT fof
228+
229+
-- 4b: Posts by friends in a city
230+
Cypher: MATCH (p:Person {id: $id})-[:KNOWS]-(friend)-[:IS_LOCATED_IN]->(c:Place {name: $city}),
231+
(friend)<-[:HAS_CREATOR]-(post:Post)
232+
RETURN post, friend.firstName
233+
234+
-- 4c: Common tags between two Persons' posts
235+
Cypher: MATCH (a:Person {id: $id1})<-[:HAS_CREATOR]-(p1)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(p2)-[:HAS_CREATOR]->(b:Person {id: $id2})
236+
RETURN t.name, count(*) AS freq ORDER BY freq DESC
237+
238+
-- 4d: Shortest path via KNOWS
239+
Cypher: MATCH path = shortestPath((a:Person {id: $id1})-[:KNOWS*]-(b:Person {id: $id2}))
240+
RETURN path, length(path)
241+
242+
-- 4e: Forum recommendation
243+
Cypher: MATCH (p:Person {id: $id})-[:KNOWS]-(friend),
244+
(forum:Forum)-[:HAS_MEMBER]->(friend)
245+
RETURN forum.title, count(friend) AS friendCount
246+
ORDER BY friendCount DESC LIMIT 10
247+
```
248+
249+
## Micrometer Metrics
250+
251+
Uses `SimpleMeterRegistry` (in-process, no external infrastructure).
252+
253+
Metrics per query:
254+
- `Timer`: `benchmark.query.<phase>.<queryName>.<language>` -- with p50, p95, p99 percentile histograms
255+
- `Counter`: `benchmark.query.<phase>.<queryName>.<language>.results` -- total result rows (sanity check)
256+
- `Timer`: `benchmark.creation` -- total graph generation time
257+
258+
## Dependencies
259+
260+
Adding Micrometer requires updating `ATTRIBUTIONS.md` (Apache 2.0 license, compatible).
261+
262+
## Output Format
263+
264+
Printed to stdout in `@AfterAll`:
265+
266+
```
267+
=== ArcadeDB Graph Benchmark (LDBC-inspired) ===
268+
Database: target/databases/graph-benchmark
269+
Vertices: 789,500 | Edges: 3,240,000
270+
271+
Phase | Query | Lang | Ops | Avg ms | p50 ms | p95 ms | p99 ms
272+
------|------------------------|---------|-----|---------|---------|---------|--------
273+
2a | Person by ID | SQL | 1000| 0.12 | 0.10 | 0.25 | 0.41
274+
2a | Person by ID | Cypher | 1000| 0.18 | 0.15 | 0.32 | 0.55
275+
...
276+
4d | Shortest path (KNOWS) | Cypher | 100 | 12.40 | 10.20 | 28.50 | 45.00
277+
```

0 commit comments

Comments
 (0)