The securedrop.journalist_app.api2 package implements the synchronization
strategy for the v2 Journalist API.
| File/module | Contents |
|---|---|
API2.md (you are here) |
Specification |
securedrop.journalist_app.api2 |
Flask blueprint for /api/v2/ |
securedrop.journalist_app.api2.events |
Event-handling framework |
securedrop.journalist_app.api2.shared |
Helper functions factored out of and still shared with the v1 Journalist API (securedrop.journalist_app.api) |
securedrop.journalist_app.api2.types |
Types |
securedrop.tests.test_journalist_api2 |
Test suite for server implementation |
A client-side implementation should be able to interact with the endpoints
implemented in securedrop.journalist_app.api2 according to this specification.
This API is intended for use by the SecureDrop journalist app, and this documentation is intended to support its development. We make no guarantees about support, compatibility, or documentation for other purposes.
Although the SecureDrop Server remains the source of truth for its clients, the v2 Journalist API borrows ideas from distributed systems and content-addressable storage in order to:
-
Support the Journalist API's "occasionally connected" clients: actions should be possible while in offline mode, responsive even over flaky Tor connections, etc.
-
Provide a single write-read loop in every synchronization round trip, at an interval of the client's choosing.
-
Hash a canonical representation of each record (source, item, etc.) to version it deterministically.
-
Hash a canonical representation of an endpoint's entire state (all sources, all items, etc.) to version it deterministically.
- The mechanisms specified here for synchronization, idempotence, etc. are for performance, reliability, and integrity. They assume that these endpoints are authenticated and restricted to SecureDrop journalists and administrators. These mechanisms are not (in themselves) for security, to mitigate denial-of-service attacks, etc.
Note
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.
The request/response schemas referred to in these sequence diagrams are defined
as mypy types in securedrop.journalist_app.api2.types.
A client can request a specific shape (version) of response from the server by including in its requests a header of the form—
Prefer: securedrop=x
—where x is one of the values documented in
securedrop.journalist_app.api2.API_MINOR_VERSION.
Figure 1.
sequenceDiagram
participant Client
participant Server
Client ->> Server: POST /api/v1/token
Server ->> Client: token, hints
Note over Client: Calculate shards based on hints.
alt Global
Note over Server: Global version abcdef
Client ->> Server: GET /api/v2/index
Server ->> Client: ETag: abcdef<br>Index
else Sharded by sets of UUID prefixes
loop for shard in shards:
Note over Server: Shard <shard_spec> @ version uvwxyz
Client ->> Server: GET /api/v2/index/<shard_spec>
Server ->> Client: ETag: uvwxyz<br>Index
end
end
Note over Client: We want metadata for all new sources and items.<br>Calculate batches based on total size.
loop for batch in batches:
Client ->> Server: POST /api/v2/data<br>BatchRequest
Server ->> Client: BatchResponse
end
On login, the server returns hints to help the client choose whether and how to shard metadata for sources and their items:
{
"version": "abcdef",
"sources": 100,
"items": 200
}A shard is specified by shard spec consisting of a comma-separated list of UUID prefixes, so that the client can choose both the breadth and the depth of the shard:
| Shard | Matches |
|---|---|
a |
All sources (and their items) with UUIDs beginning with a |
ab |
All sources (and their items) with UUIDs beginning with ab |
a,b |
All sources (and their items) with UUIDs beginning with a or b |
Putting it all together, the client MAY make sharding decisions such as:
| Client's version | version |
sources |
items |
Client's next step | Records synced |
|---|---|---|---|---|---|
abcdef |
abcdef |
100 | 200 | None; already in sync | 0 |
ghijkl |
abcdef |
100 | 200 | Global sync; no sharding necessary | 300 |
ghijkl |
abcdef |
100 | 1000 | Sync over 4 shards:["0,1,2,3", "4,5,6,7", "8,9,a,b", "c,d,e,f"] |
~275 records/shard |
ghijkl |
abcdef |
100 | 2000 | Sync over 8 shards:["0,1", "2,3", "4,5", "6,7", "8,9", "a,b", "c,d", "e,f"] |
~262 records/shard |
The client SHOULD ensure that the set of shards it requests covers either (a) the entire UUID namespace or (b) the portion of the UUID namespace of interest.
As described in "Incremental Synchronization",
the client MAY request arbitrary batches of data at any time. During initial
synchronization, the client MAY choose (for example) to send a separate
BatchRequest for each metadata shard, in order to fetch only the records whose
metadata was returned in that shard.
Figure 2.
sequenceDiagram
participant Client
participant Server
Note over Client: Global version abcdef
Note over Client: Shard <shard_spec> @ version uvwxyz
alt Global
Client ->> Server: GET /api/v2/index<br>If-None-Match: abcdef
else Sharded by sets of UUID prefixes
Client ->> Server: GET /api/v2/index/<shard_spec><br>If-None-Match: uvwxyz
end
alt Up to date
Server ->> Client: HTTP 304
else Out of date
Server ->> Client: ETag: abcdef<br>Index
Note over Client: We want metadata for all new/changed sources and items.
Client ->> Server: POST /api/v2/data<br>BatchRequest
Server ->> Client: BatchResponse
end
Figure 3.
sequenceDiagram
participant Client
participant Server
Note over Client: Global version abcdef
Note over Server: Global version abcdef
Client ->> Client: reply_sent {id: X, uuid: Y, source: Z, ...}
Client ->> Server: POST /api/v2/data<br>BatchRequest
alt Already processed:
Server ->> Server: look up status of event {id: X}
Note over Server: Return status of event {id: X},<br>in addition to anything else requested.
Server ->> Client: BatchResponse
else
Server ->> Server: process "reply_sent" event for reply {uuid: Y}
Note over Server: Return new item {uuid: Y} and updated source {uuid: Z},<br>in addition to anything else requested.
Note over Server: Global version uvwxyz
Server ->> Client: BatchResponse
Note over Client: Global version uvwxyz
end
Events in a given BatchRequest are handled in snowflake-ID
order. Each event is handled according to the following state machine:
Figure 4.
stateDiagram-v2
direction TB
[*] --> CacheLookup : process(event)
CacheLookup: status = redis.get(event.id)
CacheLookup --> IdempotentBranch : status in {102 Processing, 200 OK}
CacheLookup --> StartBranch : status == None
state "Enforce idempotency" as IdempotentBranch {
AlreadyReported : 208 AlreadyReported
AlreadyReported --> [*] : return AlreadyReported
}
state "Start processing" as StartBranch {
[*] --> Processing : redis.set(event.id, Processing, ttl)
Processing : 102 Processing
}
Processing --> Handler
state "handle_<event.type>()" as Handler {
[*] --> [*]
}
Handler --> OK
state "Cache and report success" as SuccessBranch {
OK : 200 OK
OK --> UpdateCache
UpdateCache : redis.set(event.id, OK, ttl)
UpdateCache --> [*] : return (status, delta)
}
Handler --> BadRequest
Handler --> NotFound
Handler --> Conflict
Handler --> Gone
Handler --> NotImplemented
state "Report error" as ErrorBranch {
BadRequest : 400 BadRequest
NotFound : 404 NotFound
Conflict : 409 Conflict
Gone : 410 Gone
NotImplemented : 501 NotImplemented
BadRequest --> ClearCache
NotFound --> ClearCache
Conflict --> ClearCache
Gone --> ClearCache
NotImplemented --> ClearCache
ClearCache : redis.delete(event.id)
ClearCache --> [*] : return error
}
Notes:
-
A client that submits a successful event
$E$ will receive HTTP200 OKfor$E$ and SHOULD apply the event locally as confirmed based on the returned data (sources,items, etc.). -
A client that subsequently resubmits
$E$ will receive only a cached HTTP208 Already Reportedand SHOULD apply the event locally as confirmed. The server will not return data in this case, but the client SHOULD already know the results of the operation once confirmed. -
A client that submits a failed event
$E'$ will receive an individual error code for$E'$ . The client MAY resubmit$E'$ immediately, since idempotence is not enforced for error states.
Figure 3 above depicts single-round-trip consistency. That is:
-
If the server
$S$ currently has exactly one active client$C$ ; and -
$C$ submits a validBatchRequest$BR$ with$n$ events${E_0, \dots, E_n}$ ; and -
$S$ accepts$BR$ as valid and successfully processes all$E_i$ ; then -
$C$ 's index SHOULD match$S$ 's index without a subsequent synchronization.
This property does not hold:
-
when the server has multiple active clients. Because this API reuses some utility functions from the v1 Journalist API, it inherits the inconsistent transaction isolation of the latter's use of a shared SQLAlchemy session, which may cause side effects of events received very close in time to be interleaved at the SQLite level.
-
for resubmitted events, which return an HTTP
208 Already Reportedstatus without the full result of the event.
In these cases, a subsequent synchronization MAY be necessary for the client to "catch up" to the effects of accepted events.
The Event.id field is a "snowflake ID", which a client can generate using a
library like @sapphire/snowflake. To avoid precision-loss problems:
-
A client SHOULD store its IDs as opaque strings and sort them lexicographically.
-
A client MUST encode its IDs on the wire as JSON strings.
-
The server MAY convert IDs it receives to integers, but only for sorting and testing equality.