Skip to content
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 50 additions & 23 deletions docs/src/dev-docs/design-deployment-orchestration.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ one-time initialization jobs and their functions.
"primaryBorderColor": "transparent",
"lineColor": "#007fff",
"secondaryColor": "#007fff",
"tertiaryColor": "#fff"
"tertiaryColor": "#fff",
"clusterBkg": "#d1f6ff"
}
}
}%%
Expand All @@ -41,7 +42,9 @@ graph LR
results_cache["results-cache (MongoDB)"]
compression_scheduler["compression-scheduler"]
query_scheduler["query-scheduler"]
spider_scheduler["spider-scheduler"]
compression_worker["compression-worker"]
spider_compression_worker["spider-compression-worker"]
query_worker["query-worker"]
reducer["reducer"]
api_server["api-server"]
Expand All @@ -63,6 +66,8 @@ graph LR
queue -->|healthy| query_scheduler
redis -->|healthy| query_scheduler
query_scheduler -->|healthy| reducer
db_table_creator -->|healthy| spider_scheduler
db_table_creator -->|healthy| spider_compression_worker
results_cache_indices_creator -->|completed_successfully| reducer
db_table_creator -->|completed_successfully| api_server
results_cache_indices_creator -->|completed_successfully| api_server
Expand All @@ -75,9 +80,11 @@ graph LR

subgraph Databases
database
queue
redis
results_cache
subgraph Celery[Celery<br/>Native Query Engine]
queue
redis
end
end

subgraph Initialization jobs
Expand All @@ -88,10 +95,17 @@ graph LR
subgraph Schedulers
compression_scheduler
query_scheduler
subgraph SpiderSchedulers[Spider]
spider_scheduler
end
spider_scheduler
end

subgraph Workers
compression_worker
subgraph SpiderWorkers[Spider Workers]
spider_compression_worker["spider-compression-worker"]
end
query_worker
reducer
end
Expand All @@ -106,6 +120,13 @@ graph LR
mcp_server
end

%% Edges Styles
linkStyle 3,4,6,7 stroke:#ffd700,color:#ffd700
linkStyle 9,10 stroke:#00ced1,color:#00ced1
%% Subgraphs Styles
style Celery fill:#ffffe0,stroke:#ffd700
style SpiderSchedulers fill:#e0ffff,stroke:#00ced1
style SpiderWorkers fill:#e0ffff,stroke:#00ced1

+++
**Figure 1**: Orchestration architecture of the services in the CLP package.
Expand All @@ -117,21 +138,23 @@ graph LR
:::{table}
:align: left

| Service | Description |
|-----------------------|-----------------------------------------------------------------|
| database | Database for archive metadata, compression jobs, and query jobs |
| queue | Task queue for schedulers |
| redis | Task result storage for workers |
| compression_scheduler | Scheduler for compression jobs |
| query_scheduler | Scheduler for search/aggregation jobs |
| results_cache | Storage for the workers to return search results to the UI |
| compression_worker | Worker processes for compression jobs |
| query_worker | Worker processes for search/aggregation jobs |
| reducer | Reducers for performing the final stages of aggregation jobs |
| api_server | API server for submitting queries |
| webui | Web server for the UI |
| mcp_server | MCP server for AI agent to access CLP functionalities |
| garbage_collector | Process to manage data retention |
| Service | Description |
|---------------------------|-----------------------------------------------------------------|
| database | Database for archive metadata, compression jobs, and query jobs |
| queue | Task queue for schedulers |
| redis | Task result storage for workers |
| compression_scheduler | Scheduler for compression jobs |
| query_scheduler | Scheduler for search/aggregation jobs |
| spider_scheduler | Scheduler for Spider distributed task execution framework. |
| results_cache | Storage for the workers to return search results to the UI |
| compression_worker | Worker processes for compression jobs |
| spider_compression_worker | Worker processes for compression jobs using Spider |
| query_worker | Worker processes for search/aggregation jobs |
| reducer | Reducers for performing the final stages of aggregation jobs |
| api_server | API server for submitting queries |
| webui | Web server for the UI |
| mcp_server | MCP server for AI agent to access CLP functionalities |
| garbage_collector | Process to manage data retention |

:::

Expand Down Expand Up @@ -210,12 +233,16 @@ instance ID.

### Deployment Types

CLP supports two deployment types determined by the `package.query_engine` configuration setting.
CLP supports four deployment types determined by the `package.compression_scheduler.type` and
`package.query_engine` configuration setting.

| Deployment Type | Compression Scheduler | Query Engine | Docker Compose File |
|-----------------|-----------------------|------------------------------|------------------------------------|
| Base | Celery | [Presto][presto-integration] | `docker-compose-base.yaml` |
| Full | Celery | Native | `docker-compose.yaml` |
| Spider Base | Spider | [Presto][presto-integration] | `docker-compose-spider-base.yaml` |
| Spider Full | Spider | Native | `docker-compose-spider.yaml` |

1. **BASE**: For deployments using [Presto][presto-integration] as the query engine. This deployment
only uses `docker-compose.base.yaml`.
2. **FULL**: For deployments using one of CLP's native query engines. This uses both
`docker-compose.base.yaml` and `docker-compose.yaml`.

### Implementation details

Expand Down
18 changes: 15 additions & 3 deletions docs/src/user-docs/guides-multi-host.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,13 +162,13 @@ docker compose \
up db-table-creator \
--no-deps

# Start queue
# Start queue (optional, only if using Celery)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up queue \
--no-deps --wait

# Start redis
# Start redis (optional, only if using Celery)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up redis \
Expand All @@ -195,6 +195,12 @@ docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up compression-scheduler \
--no-deps --wait

# Start Spider scheduler (optional, only if using Spider)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up spider-scheduler \
--no-deps --wait

# Start query scheduler
docker compose \
Expand Down Expand Up @@ -230,11 +236,17 @@ docker compose \
# Worker services (can be started on multiple hosts)
################################################################################

# Start compression worker
# Start compression worker (optional, only if using Celery)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up compression-worker \
--no-deps --wait

# Start Spider compression worker (optional, only if using Spider)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up spider-compression-worker \
--no-deps --wait

# Start query worker
docker compose \
Expand Down
63 changes: 63 additions & 0 deletions docs/src/user-docs/guides-using-spider.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Using Spider with CLP

[Spider] is a fast and scalable distributed task execution engine that can be used to run tasks.
This guide describes how to set up and use Spider with CLP.

:::{note}
Spider is under active development, and its integration with CLP may change in the future.
Right now, Spider only supports executing CLP compression tasks. Support for search tasks will be added
later.
:::

## Requirements
* [CLP][clp-releases] v0.7.0 or higher
* [Docker] v28 or higher
* [Docker Compose][docker-compose] v2.20.2 or higher
* Python
* python3-venv (for the version of Python installed)

## Set up
To use Spider for CLP compression tasks, you need to [set up CLP](#setting-up-clp-with-spider) with
Spider in configuration.

### Setting up CLP with Spider

1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package,
but don't start the package just yet.
2. Before starting the package, update the package's config file (`etc/clp-config.yaml`) as follows:

* Set the `compression_scheduler.type` key to `"spider"`.

```yaml
compression_scheduler:
type: "spider"
```

* (Optional) Set the `spider_db`.

```yaml
spider_db:
db_name: "spider-db"
```

* (Optional) Set the `spider_scheduler`.

```yaml
spider_scheduler:
host: "localhost"
port: 6000
```
3. (Optional) Before starting the package, update the package's credential file (`etc/credentials.yaml`)
to add Spider database credentials as follows:

```yaml
spider_db:
username: "spider_user"
password: "spider_password"
```
4. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP.

[clp-releases]: https://github.com/y-scope/clp/releases
[docker-compose]: https://docs.docker.com/compose/install/
[Docker]: https://docs.docker.com/engine/install/
[Spider]: https://github.com/y-scope/spider
1 change: 1 addition & 0 deletions docs/src/user-docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ guides-external-database
guides-multi-host
guides-retention
guides-using-presto
guides-using-spider
:::

:::{toctree}
Expand Down