-
Notifications
You must be signed in to change notification settings - Fork 4
Getting Started
This section is a get-started guide for configuring Recce with actual databases for reconciliation.
Before proceeding with the steps below, trying out Recce with the example scenario provided can help with understanding how Recce works.
Recce is currently only published as a container image to a private GitHub Container Registry (GHCR) repo. Pulling officially built Docker images locally requires some additional setup to authenticate with the GHCR.
-
Generate a personal access token in your account with
packages:readpermission. -
Use Configure SSO to authorize the token for SSO access via the organisation
-
Login with something like this below (see here for details)
echo "ghp_REST_OF_TOKEN" | docker login https://ghcr.io -u my-github-username --password-stdin
-
Pull Recce's docker image from GHCR
docker pull ghcr.io/thoughtworks-sea/recce-server
-
You should be able to run Recce with the docker image locally, using this repository only for setting up a DB for Recce, and an example scenario.
# Run in one shell - starts a DB for Recce, and an example scenario ./batect run-deps # Run in another shell - runs Recce docker run -p 8080:8080 \ -v $(pwd)/examples/scenario/petshop-mysql:/config \ -e MICRONAUT_CONFIG_FILES=/config/application-petshop-mysql.yml \ -e DATABASE_HOST=host.docker.internal \ -e R2DBC_DATASOURCES_SOURCE_URL=r2dbc:pool:mysql://host.docker.internal:8000/db \ -e R2DBC_DATASOURCES_TARGET_URL=r2dbc:pool:mysql://host.docker.internal:8001/db \ ghcr.io/thoughtworks-sea/recce-server:latest
Recce is configured by adding datasources and datasets that you wish to reconcile. As a Micronaut application, much of Recce's configuration is open for hacking and can be expressed in multiple ways.
For this guide, it will take the recommended way of creating additional configuration and loading it into Recce through MICRONAUT_CONFIG_FILES.
Create a new YAML file inside the project, e.g. my-dataset-configs/config1.yml
mkdir -p my-dataset-configs
touch my-dataset-configs/config1.ymlInside the newly created yaml file, configure the username and password for authentication
auth:
username: admin
password: adminThis configures the credentials used in basic authentication to protect the API endpoints. In this case, the username and password are both set to admin.
Add all databases involved in reconciliation under the r2dbc.datasources block of your configuration file. Multiple data sources can be configured for connection. For more details, visit the section on configuring datasources.
r2dbc:
datasources:
my-source-db: # Name your datasource anything you want, other than "default"
url: r2dbc:pool:mysql://source-db:3306/db # R2DBC URL for your database r2dbc:pool:DB_TYPE://DB_HOST:DB_PORT/DB_NAME
username: user
password: password
my-target-db:
url: r2dbc:pool:mysql://target-db:3306/db
username: user
password: passwordIn this case, two MySQL databases named my-source-db and my-target-db are added.
Add the various datasets for reconciliation under the reconciliation.datasets block of your configuration file.
Each dataset has a source and target for reconciliation, where it will run the sql query on the database referenced in datasourceRef.
For more details, visit the section on configuring datasets.
reconciliation:
datasets:
pets: # Name your datasets however you would like
source:
# Reference to a datasource defined in `r2dbc.datasources`
datasourceRef: my-source-db
# Any SQL query to evaluate against the source DB
query: >
SELECT pet.id AS MigrationKey, category, name, status
FROM pet
target:
# Reference to a datasource defined in `r2dbc.datasources`
datasourceRef: my-target-db
# Any SQL query to evaluate against the source DB
query: >
SELECT pet.id AS MigrationKey, category.name AS category, pet.name, status
FROM pet INNER JOIN category ON pet.category_id = category.id
# Optional scheduling of regular or one-of reconciliations
schedule:
# Must adhere to format https://docs.micronaut.io/latest/api/io/micronaut/scheduling/cron/CronExpression.html
# or https://crontab.guru/ (without seconds)
cronExpression: 0 0 * * *In the code above, one dataset named pets was configured, reconciling between the source database my-source-db and my-target-db.
The query configuration under datasets have some constraints as to how it should be written. For more details, visit the section on writing dataset queries.
-
Recce needs to know which column represents a unique identifier for the row that should be consistent between
sourceandtargetand implies these rows represent the same entity. To do this, designate a column by naming it asMigrationKey(case insensitive).SELECT natural_id AS MigrationKey, some, other, columns FROM my_table
-
Currently Recce ignores names of columns other than the
MigrationKeycolumn. That means that the order of columns is critical and must match between your two queries. If the column in position 3 represents datumXin thesourcedataset, then the column in position 3 in thetargetdataset should also represent the same datum.
Pass the configuration file's location my-dataset-configs/config1.yml to Recce through the environment variable MICRONAUT_CONFIG_FILES
docker run -p 8080:8080 \
-v $(pwd)/my-dataset-configs:/config \
-e MICRONAUT_CONFIG_FILES=/config/config1.yml \
ghcr.io/thoughtworks-sea/recce-server:latest-
Explore and trigger runs via Recce's APIs, accessible via interactive UI at http://localhost:8080/rapidoc.
-
Some non-exhaustive examples are included below, but fuller documentation is available via the UI.
-
Synchronously trigger a run, waiting for it to complete via UI or
curl -X POST http://localhost:8080/runs -H 'Content-Type: application/json' -d '{ "datasetId": "pets" }' -u "username:password"
-
Retrieve details of an individual run by ID for a dataset via UI, or
curl 'http://localhost:8080/runs/35' -u "username:password"
-
Retrieve details of recent runs for a dataset via UI, or
curl 'http://localhost:8080/runs?datasetId=categories' -u "username:password"
-