Wrapper scripts for cdbexp / cdbimp CONTACT Elements tools to automate database migration of an existing Elements instance to a different database backend.
What this tool does:
- Migrates table data entries between CONTACT Elements instances running with different database backends
- Supports any CONTACT Elements database type: Oracle, MSSQL, PostgreSQL, or SQLite
- Automates the export/import process with parallel processing capabilities
What this tool does NOT do:
- Does not migrate data from or to external systems
- Does not do any sort of data transformation during migration
- Does not create database schemas (must be done with CONTACT Elements functionality, e.g.,
mkinstance) - Does not modify application code or configuration (must be done separately)
The source and target databases are interchangeable. While this guide uses Oracle → PostgreSQL as an example, you can migrate between any supported database types. The following documentation assumes that the reader is familiar with the basics of how CONTACT Elements operates.
Important: These scripts were originally created for CONTACT Elements migrations in cloud environments and have been adapted and generalized for on-premise usage. They are not standard CONTACT Software products, and CONTACT Software cannot be held responsible for any issues with customer data resulting from their use. Contributions and improvements are welcome in the form of GitHub issues and pull requests.
This walkthrough explains how to migrate the CONTACT Elements Database backend to PostgreSQL. The migration process follows these steps:
- Create the new temp CONTACT Elements instance with the PostgreSQL backend, based on the same configuration state as the source CONTACT Elements instance.
- Export the DB data from source instance with
cdbexp. - Copy exported data to the temp CONTACT Elements server with PostgreSQL backend.
- Import the data with
cdbimp. - In the source instance switch the
${CADDOK_BASE}/etc/dbtab.ymllink to the new PG database. - Remove the temp CONTACT Elements instance.
The scripts were originally written for Linux. If you’re on Windows, use WSL to run the bash scripts, or rewrite them in PowerShell.
graph TB
Start([Start Migration]) --> Prep[Preparation<br/>- Update packages<br/>- Fix datetime fields<br/>- Feature freeze]
Prep --> CreateTemp[Create TEMP Instance<br/>PostgreSQL backend]
CreateTemp --> Export[Export from SOURCE<br/>cdbexp]
Export --> Copy[Copy export data<br/>SOURCE → TEMP]
Copy --> Import[Import to TEMP<br/>db_import.py]
Import --> Verify{Verify TEMP<br/>instance working?}
Verify -->|No| Troubleshoot[Troubleshoot<br/>& Fix Issues]
Troubleshoot --> Import
Verify --> Switch[Switch dbtab.yml<br/>SOURCE → PostgreSQL]
Switch --> StartSource[Restart SOURCE instance<br/>with PostgreSQL backend]
StartSource --> TestSource{SOURCE working<br/>correctly?}
TestSource -->|No| Rollback[Switch dbtab.yml<br/>SOURCE → Oracle and do better next time]
TestSource -->|Yes| Cleanup[Remove TEMP instance]
Cleanup --> End([Migration Complete])
Before starting the migration, ensure that the CONTACT Elements Server version is >=15.8.24 or >=15.9.3 and PostgreSQL compatible cs-packages versions are installed.
1.1. Update the packages and the CONTACT Elements Server to a PostgreSQL-compatible version.
1.2. Migrate Oracle-specific queries in the customer module to PostgreSQL syntax before deployment. Use the script grep_oracle_specific_patterns.sh to identify Oracle statements:
./grep_oracle_specific_patterns.sh ${CADDOK_BASE}/customer.plm1.3. Fix datetime fields in the source instance:
powerscript fix_datetime.pySee more info about the script below.
1.4. Remove or integrate all local changes in the production environment that are not yet persisted in the git repository. Use cdbpkg diff / cdbpkg patch commands to achieve this. See Docportal CompArch for more details.
1.5. Build the refactored customer module, commit it to git, and release to production. This state will be used as the migration baseline. From this point forward, implement a feature freeze until the end of the migration process and prohibit any configuration changes in production.
Only the tables referenced in CONTACT Elements DataDictionary will be migrated! Orphaned tables will not be exported and the database specific tables (for example called
cdb_..._oracle_...) will not be imported.
Use
cdbpkg schema_coverageto find out orphaned tables.
2.1. Create a new temp CONTACT Elements instance with the PostgreSQL backend. The new instance should be based on the same configuration state (git commit) as the source instance. The created database backend will be referenced later from the source instance, so ensure that the database name, username, and password of the temporary CONTACT Elements instance match those of the original instance. Get the database password from the wallet of the source instance: cdbwallet resolve cs.platform/sqlapi/<database-name>.
2.2. Copy the export db_mig script to the source instance:
cp db_mig /home/contact/instance2.3. On the source instance server, run the export:
# Stop services to avoid new automatic or interactive data changes.
# Remove livenessProbe to avoid container restarts if CONTACT Elements runs in Kubernetes.
cdbsvcdctl stopall
# This will take around 30-60 minutes for 3.5GB of uncompressed data.
# Use nohup to disown the process and run in the background.
nohup instance/db_mig export >> instance/db_mig.log 2>&1 &
# To continue a previously unfinished export just start again (this only adds missing table dumps):
# nohup instance/db_mig export >> instance/db_mig.log 2>&1 &
# To restart completely from scratch:
# rm instance/db_mig.log
# rm -rf instance/c_cdbexp
# nohup instance/db_mig export >> instance/db_mig.log 2>&1 &Use the
jobscommand to check scripts running in the background.
3.1. Copy the resulting file and import script to the new temp server instance:
cp /home/contact/instance/c_cdbexp.tar.gz /home/contact/tempinstance/c_cdbexp.tar.gz
cp db_import.py /home/contact/tempinstance/db_import.py3.2. Run the import and verify that the instance is working correctly. More about the import script see below.
# Stop services to avoid new data. Remove livenessProbe to avoid container restarts.
cdbsvcdctl stopall
# This will take around 5-10 hours for 3.5GB of uncompressed data.
# First run: imports all tables except those handled separately with cdbimp (see pitfalls below).
nohup powerscript --instancedir=tempinstance --run-level=1 tempinstance/db_import.py --workers 4 2>&1 &
# Second run: imports a special set of tables sequentially.
nohup powerscript --instancedir=tempinstance --run-level=1 tempinstance/db_import.py --workers 1 2>&1 &4.1. In the source instance, overwrite dbtab.yml with the one pointing to the correct PostgreSQL backend:
Important: Make backups of
dbtab.ymlfrom both temp and source CONTACT Elements instances before proceeding.
# Copy correct dbtab.yml from the temp instance to the source instance.
cp /home/contact/tempinstance/etc/dbtab.yml /home/contact/instance/etc/dbtab.ymlExample diff:
defaultEntry: <database-name>
entries:
- connect: <instance>:37700
driver: remotedb
mode: -/-
name: rdb
system: remotedb
- - connect: <oracle-host>:1521/<oracle-sid>
- driver: oracle
- mode: <database-name>
- name: <database-name>
- opt3: german_germany.AL32UTF8
- system: oracle
+ - connect: host=<postgres-host> port=5432 dbname=<database-name> user=<database-user>
+ password=$(CADDOK_DB_PASSWORD)
+ driver: python
+ mode: psycopg
+ name: <database-name>
+ system: postgres4.2. Make a backup of the new PostgreSQL instance with pg_dump:
# This will create a binary backup file.
export PGPASSWORD=<pg_system_pwd>
pg_dump --username="<username>" --host="localhost" --dbname="<database>" --schema="<schema>" --format="custom" --file="/tmp/<backup>.dmp"If there are problems with passwords, verify that the wallet secrets are intact:
cdbwallet --instancedir=instance list --wallet instance/etc/wallet.json
# There should be the same key as the default entry in dbtab.yml
cdbwallet --instancedir=instance resolve cs.platform/sqlapi/<database-name>
# If the key is missing, copy it from the temp instance:
echo "<password>" | cdbwallet --instancedir=instance store --stdin cs.platform/sqlapi/<database-name>4.3. Shut down the temporary instance server and restart the source instance. It will start with the new PostgreSQL database backend.
You can restart db_mig export multiple times. This will append the necessary tables to the export folder. To re-export a specific table, remove it from /home/contact/instance/c_cdbexp/ and start the script.
Similarly, restart powerscript db_import.py to continue with the remaining tables not imported during the last run. Use -h param to check all script options.
Use this script to fix datetime columns across the entire CONTACT Elements database. Correct datetime formatting is a precondition for clean migration from Oracle to PostgreSQL. Without this fix, you will encounter errors like:
[14:29:16] Importing file: CDBPCS_ISSUE.exp
cdbimp: info: Will import all tables in datafile.
cdbimp: info: Creating table cdbpcs_issue
cdbimp: info: Table cdbpcs_issue already exists
cdbimp: info: Loading 26835 data rows for table cdbpcs_issue
cdbimp: info: Ignored 0 errors during insert#### ] 67% 00:00:08
cdbimp: info: Inserted 17880 rows in 17.176 seconds (62459.34 rows/minute)
cdbimp: error: Import ran with errors.
Traceback (most recent call last):
File "<path-to-ce>/bin/cdbimp", line 64, in <module>
sys.exit(cdb.scripts.cdbimp.main())
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<path-to-ce>/cdb/python/cdb/scripts/cdbimp.py", line 566, in main
return tool.run(_main)
^^^^^^^^^^^^^^^
File "<path-to-ce>/cdb/python/cdb/rte.py", line 1139, in run
return mainfunc(self)
^^^^^^^^^^^^^^
File "<path-to-ce>/cdb/python/cdb/scripts/cdbimp.py", line 461, in _main
return _process(tool.options, log)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<path-to-ce>/cdb/python/cdb/scripts/cdbimp.py", line 501, in _process
exitcode = inserter.insert_from_iter(itertools.chain(*importers))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<path-to-ce>/cdb/python/cdb/scripts/cdbimp.py", line 376, in insert_from_iter
self.commit_on_edge(check_want_commit, iterable)
File "<path-to-ce>/cdb/python/cdb/scripts/cdbimp.py", line 423, in commit_on_edge
self.insert_one_record(lno, rec)
File "<path-to-ce>/cdb/python/cdb/scripts/cdbimp.py", line 329, in insert_one_record
rec.insert()
File "<path-to-ce>/cdb/python/cdb/sqlapi.py", line 817, in insert
return i.insert()
^^^^^^^^^^
File "<path-to-ce>/lib/python3.11/cdbwrapc.py", line 2877, in insert
return _cdbwrapc.DBInserter_insert(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Invalid value '18.10.2019 11:25:09' found for cdbpcs_issue/completion_date:
Time portion is not allowed for columns defined as date without timeMany issues arise because cdbimp creates additional entries in tables other than those being directly imported. For example, when importing cdb_file, the cdb_object table is also populated with the corresponding new cdb_object_ids. Similarly, when importing documents, the mq_system_posting table is updated with events about newly created documents. Consequently, when importing with multiple workers in parallel, this results in Unique constraint violation errors.
[2024-10-22 16:47:06,839] [cdb_object] [CRITICAL] Encountered fatal error:
(23505) (23505) "duplicate key value violates unique constraint "cdb_object_pkey"
DETAIL: Key (id)=(0e7fe363-20b0-11ef-a797-c3dc792f9cbd) already exists." while executing: "INSERT INTO cdb_object (id, relation) VALUES ('0e7fe363-20b0-11ef-a797-c3dc792f9cbd', 'cdb_file')" SQL: "INSERT INTO cdb_object (id, relation) VALUES ('0e7fe363-20b0-11ef-a797-c3dc792f9cbd', 'cdb_file')"
[2024-10-22 16:47:06,850] [cdb_object] [INFO] Inserted 29116 rows in 28.449 seconds (61406.67 rows/minute)
[2024-10-22 16:47:06,854] [cdb_object] [ERROR] Import ran with errors.This will break the import for the table where the unique constraint violation occurred.
For mq*-relevant tables, excessive entries will be created in the target database later when the queue is processed by the corresponding service. For example, you would see duplicate creation events in the audit trail of documents.
The
db_import.pyscript handles these problems automatically. Simply run it twice: first with multiple workers (>1), then with 1 worker. See the script implementation for more details.
To find discrepancies between the expected and existing database states use the db_import_check.py script after the migration and switch of the dbtab.yml. This script requires the full contents of the initial cdbexp. Put them into $CADDOK_BASE/c_cdbexp/. The script produces the diff in HTML format in the log folder $CADDOK_BASE/db_import/.
# Extract the tar again
tar -xzf /home/contact/instance/c_cdbexp.tar.gz -C /home/contact/instance
# Generate report in instance/db_import folder
nohup powerscript --instancedir=instance --run-level=3 instance/db_import_check.py 2>&1 &In the report, green indicates new (unexpected) records, and red indicates missing (unexpected) records. Most likely you need to reimport the tables with differences.
It can happen that cdbexp exports the same database entry twice if the table contents grow during the export process (to avoid this, turn off the services during export with cdbsvcdctl stopall). This will inevitably lead to a unique constraint violation during import. In this case, fix the exp file as follows:
# Find the line numbers of the problematic contents.
grep -n <duplicate-id> instance/c_cdbexp/cdb_file.exp
1750038:D\_ <duplicate-id> ...
1750039:D\_ <duplicate-id> ...
# Remove the redundant line.
sed -i.bak '1750039d' instance/c_cdbexp/cdb_file.expAlternatively, to save time when importing large tables, you can remove the already-imported block and continue the import like this:
# Find the line numbers range to remove with grep, then remove the block.
sed -i '39,1760037d' instance/c_cdbexp/cdb_file.exp
# Continue import by restarting the script and skip deleting table contents.
nohup powerscript --run-level=1 instance/db_import.py --workers 1 --skip-delete 2>&1 &