Skip to content

Commit 349bc79

Browse files
committed
docs: exhaustive overview of statements & best practices
In order to avoid API misuse, much knowledge is now shared in a structured way of tables, and best practices are described to aid users.
1 parent 1224b76 commit 349bc79

File tree

2 files changed

+120
-33
lines changed

2 files changed

+120
-33
lines changed

docs/source/queries/paged.md

Lines changed: 43 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,31 @@
22
Sometimes query results might be so big that one prefers not to fetch them all at once,
33
e.g. to reduce latency and/or memory footprint.
44
Paged queries allow to receive the whole result page by page, with a configurable page size.
5+
In fact, most SELECTs queries should be done with paging, to avoid big load on cluster and large memory footprint.
56

6-
`Session::query_iter` and `Session::execute_iter` take a [simple query](simple.md)
7-
or a [prepared query](prepared.md) and return an `async` iterator over result `Rows`.
7+
> ***Warning***\
8+
> Issuing unpaged SELECTs (`Session::query_unpaged` or `Session::execute_unpaged`)
9+
> may have dramatic performance consequences! **BEWARE!**\
10+
> If the result set is big (or, e.g., there are a lot of tombstones), those atrocities can happen:
11+
> - cluster may experience high load,
12+
> - queries may time out,
13+
> - the driver may devour a lot of RAM,
14+
> - latency will likely spike.
15+
>
16+
> Stay safe. Page your SELECTs.
17+
18+
## `RowIterator`
19+
20+
The automated way to achieve that is `RowIterator`. It always fetches and enables access to one page,
21+
while prefetching the next one. This limits latency and is a convenient abstraction.
22+
23+
> ***Note***\
24+
> `RowIterator` is quite heavy machinery, introducing considerable overhead. Therefore,
25+
> don't use it for statements that do not benefit from paging. In particular, avoid using it
26+
> for non-SELECTs.
27+
28+
On API level, `Session::query_iter` and `Session::execute_iter` take a [simple query](simple.md)
29+
or a [prepared query](prepared.md), respectively, and return an `async` iterator over result `Rows`.
830

931
> ***Warning***\
1032
> In case of unprepared variant (`Session::query_iter`) if the values are not empty
@@ -22,7 +44,6 @@ Use `query_iter` to perform a [simple query](simple.md) with paging:
2244
# use scylla::Session;
2345
# use std::error::Error;
2446
# async fn check_only_compiles(session: &Session) -> Result<(), Box<dyn Error>> {
25-
use scylla::IntoTypedRows;
2647
use futures::stream::StreamExt;
2748

2849
let mut rows_stream = session
@@ -45,7 +66,6 @@ Use `execute_iter` to perform a [prepared query](prepared.md) with paging:
4566
# use scylla::Session;
4667
# use std::error::Error;
4768
# async fn check_only_compiles(session: &Session) -> Result<(), Box<dyn Error>> {
48-
use scylla::IntoTypedRows;
4969
use scylla::prepared_statement::PreparedStatement;
5070
use futures::stream::StreamExt;
5171

@@ -106,10 +126,10 @@ let _ = session.execute_iter(prepared, &[]).await?; // ...
106126
# }
107127
```
108128

109-
### Passing the paging state manually
110-
It's possible to fetch a single page from the table, extract the paging state
111-
from the result and manually pass it to the next query. That way, the next
112-
query will start fetching the results from where the previous one left off.
129+
## Manual paging
130+
It's possible to fetch a single page from the table, and manually pass paging state
131+
to the next query. That way, the next query will start fetching the results
132+
from where the previous one left off.
113133

114134
On a `Query`:
115135
```rust
@@ -197,5 +217,18 @@ loop {
197217
```
198218

199219
### Performance
200-
Performance is the same as in non-paged variants.\
201-
For the best performance use [prepared queries](prepared.md).
220+
For the best performance use [prepared queries](prepared.md).
221+
See [query types overview](queries.md).
222+
223+
## Best practices
224+
225+
| Query result fetching | Unpaged | Paged manually | Paged automatically |
226+
|-------------------------|-------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
227+
| Exposed Session API | `{query,execute}_unpaged` | `{query,execute}_single_page` | `{query,execute}_iter` |
228+
| Working | get all results in a single CQL frame, into a single Rust struct | get one page of results in a single CQL frame, into a single Rust struct | upon high-level iteration, fetch consecutive CQL frames and transparently iterate over their rows |
229+
| Cluster load | potentially **HIGH** for large results, beware! | normal | normal |
230+
| Driver overhead | low - simple frame fetch | low - simple frame fetch | considerable - `RowIteratorWorker` is a separate tokio task |
231+
| Feature limitations | none | none | speculative execution not supported |
232+
| Driver memory footprint | potentially **BIG** - all results have to be stored at once! | small - only one page stored at a time | small - at most constant number of pages stored at a time |
233+
| Latency | potentially **BIG** - all results have to be generated at once! | considerable on page boundary - new page needs to be fetched | small - next page is always pre-fetched in background |
234+
| Suitable operations | - in general: operations with empty result set (non-SELECTs)</br> - as possible optimisation: SELECTs with LIMIT clause | - for advanced users who prefer more control over paging, with less overhead of `RowIteratorWorker` | - in general: all SELECTs |

docs/source/queries/queries.md

Lines changed: 77 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,80 @@
1-
# Making queries
2-
3-
This driver supports all query types available in Scylla:
4-
* [Simple queries](simple.md)
5-
* Easy to use
6-
* Poor performance
7-
* Primitive load balancing
8-
* [Prepared queries](prepared.md)
9-
* Need to be prepared before use
10-
* Fast
11-
* Properly load balanced
12-
* [Batch statements](batch.md)
13-
* Run multiple queries at once
14-
* Can be prepared for better performance and load balancing
15-
* [Paged queries](paged.md)
16-
* Allows to read result in multiple pages when it might be so big that one
17-
prefers not to fetch it all at once
18-
* Can be prepared for better performance and load balancing
19-
20-
Additionally there is special functionality to enable `USE KEYSPACE` queries:
21-
[USE keyspace](usekeyspace.md)
22-
23-
Queries are fully asynchronous - you can run as many of them in parallel as you wish.
1+
# Making queries - best practices
2+
3+
Driver supports all kinds of statements supported by ScyllaDB. The following tables aim to bridge between DB concepts and driver's API.
4+
They include recommendations on which API to use in what cases.
5+
6+
## Kinds of CQL statements (from the CQL protocol point of view):
7+
8+
| Kind of CQL statement | Single | Batch |
9+
|-----------------------|---------------------|------------------------------------------|
10+
| Prepared | `PreparedStatement` | `Batch` filled with `PreparedStatement`s |
11+
| Unprepared | `Query` | `Batch` filled with `Query`s |
12+
13+
This is **NOT** strictly related to content of the CQL query string.
14+
15+
> ***Interesting note***\
16+
> In fact, any kind of CQL statement could contain any CQL query string.
17+
> Yet, some of such combinations don't make sense and will be rejected by the DB.
18+
> For example, SELECTs in a Batch are nonsense.
19+
20+
### [Unprepared](simple.md) vs [Prepared](prepared.md)
21+
22+
> ***GOOD TO KNOW***\
23+
> Each time a statement is executed by sending a query string to the DB, it needs to be parsed. Driver does not parse CQL, therefore it sees query strings as opaque.\
24+
> There is an option to *prepare* a statement, i.e. parse it once by the DB and associate it with an ID. After preparation, it's enough that driver sends the ID
25+
> and the DB already knows what operation to perform - no more expensive parsing necessary! Moreover, upon preparation driver receives valuable data for load balancing,
26+
> enabling advanced load balancing (so better performance!) of all further executions of that prepared statement.\
27+
> ***Key take-over:*** always prepare statements that you are going to execute multiple times.
28+
29+
| Statement comparison | Unprepared | Prepared |
30+
|----------------------|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
31+
| Exposed Session API | `query_*` | `execute_*` |
32+
| Usability | execute CQL statement string directly | need to be separately prepared before use, in-background repreparations if statement falls off the server cache |
33+
| Performance | poor (statement parsed each time) | good (statement parsed only upon preparation) |
34+
| Load balancing | primitive (random choice of a node/shard) | advanced (proper node/shard, optimisations for LWT statements) |
35+
| Suitable operations | one-shot operations | repeated operations |
36+
37+
### Single vs [Batch](batch.md)
38+
39+
| Statement comparison | Single | Batch |
40+
|----------------------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
41+
| Exposed Session API | `query_*`, `execute_*` | `batch` |
42+
| Usability | simple setup | need to aggregate statements and binding values to each is more cumbersome |
43+
| Performance | good (DB is optimised for handling single statements) | good for small batches, may be worse for larger (also: higher risk of request timeout due to big portion of work) |
44+
| Load balancing | advanced if prepared, else primitive | advanced if prepared **and ALL** statements in the batch target the same partition, else primitive |
45+
| Suitable operations | most of operations | - a list of operations that needs to be executed atomically (batch LightWeight Transaction)</br> - a batch of operations targetting the same partition (as an advanced optimisation) |
46+
47+
## CQL statements - operations (based on what the CQL string contains):
48+
49+
| CQL data manipulation statement | Recommended statement kind | Recommended Session operation |
50+
|------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
51+
| SELECT | `PreparedStatement` if repeated, `Query` if once | `{query,execute}_iter` (or `{query,execute}_single_page` in a manual loop for performance / more control) |
52+
| INSERT, UPDATE | `PreparedStatement` if repeated, `Query` if once, `Batch` if multiple statements are to be executed atomically (LightWeight Transaction) | `{query,execute}_unpaged` (paging is irrelevant, because the result set of such operation is empty) |
53+
| CREATE/DROP {KEYSPACE, TABLE, TYPE, INDEX,...} | `Query`, `Batch` if multiple statements are to be executed atomically (LightWeight Transaction) | `query_unpaged` (paging is irrelevant, because the result set of such operation is empty) |
54+
55+
### [Paged](paged.md) vs Unpaged query
56+
57+
> ***GOOD TO KNOW***\
58+
> SELECT statements return a [result set](result.md), possibly a large one. Therefore, paging is available to fetch it in chunks, relieving load on cluster and lowering latency.\
59+
> ***Key take-overs:***\
60+
> For SELECTs you had better **avoid unpaged queries**.\
61+
> For non-SELECTs, unpaged API is preferred.
62+
63+
| Query result fetching | Unpaged | Paged |
64+
|-----------------------|-------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
65+
| Exposed Session API | `{query,execute}_unpaged` | `{query,execute}_single_page`, `{query,execute}_iter` |
66+
| Usability | get all results in a single CQL frame, so into a [single Rust struct](result.md) | need to fetch multiple CQL frames and iterate over them - using driver's abstractions (`{query,execute}_iter`) or manually (`{query,execute}_single_page` in a loop) |
67+
| Performance | - for large results, puts **high load on the cluster**</br> - for small results, the same as paged | - for large results, relieves the cluster</br> - for small results, the same as unpaged |
68+
| Memory footprint | potentially big - all results have to be stored at once | small - at most constant number of pages are stored by the driver at the same time |
69+
| Latency | potentially big - all results have to be generated at once | small - at most one chunk of data must be generated at once, so latency of each chunk is small |
70+
| Suitable operations | - in general: operations with empty result set (non-SELECTs)</br> - as possible optimisation: SELECTs with LIMIT clause | - in general: all SELECTs |
71+
72+
For more detailed comparison and more best practices, see [doc page about paging](paged.md).
73+
74+
### Queries are fully asynchronous - you can run as many of them in parallel as you wish.
75+
76+
## `USE KEYSPACE`:
77+
There is a special functionality to enable [USE keyspace](usekeyspace.md).
2478

2579
```{eval-rst}
2680
.. toctree::

0 commit comments

Comments
 (0)