diff --git a/README.md b/README.md index 6d49b4a..0c8d4b1 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,18 @@ # Amazon DocumentDB Tools -This repo contains the following tools. - -## Amazon DocumentDB Index Tool - -The `DocumentDB Index Tool` makes it easier to migrate only indexes (not data) between a source MongoDB deployment and an Amazon DocumentDB cluster. The Index Tool can also help you find potential compatibility issues between your source databases and Amazon DocumentDB. You can use the Index Tool to dump indexes and database metadata, or you can use the tool against an existing dump created with the mongodump tool. - -For more information about this tool, checkout the [Amazon DocumentDB Index Tool README](./index-tool/README.md) file. +This repository contains several tools to help users with Amazon DocumentDB including migration, monitoring, and performance. A few of the most popular tools are listed below but there are additional tools in the [migration](./migration), [monitoring](./monitoring), [operations](./operations), and [performance](./performance) folders. ## Amazon DocumentDB Compatibility Tool -The `DocumentDB Compatibility Tool` examines log files from MongoDB or source code from MongoDB applications to determine if there are any queries which use operators that are not supported in Amazon DocumentDB. This tool produces a simple report of unsupported operators and file names with line numbers for further investigation. +The [DocumentDB Compatibility Tool](./compat-tool) examines log files from MongoDB or source code from MongoDB applications to determine if there are any queries which use operators that are not supported in Amazon DocumentDB. -For more information about this tool, checkout the [Amazon DocumentDB Compatibility Tool README](./compat-tool/README.md) file. - -## Cosmos DB Migration Utility - -The `Cosmos DB Migration Utility` is an application created to help live migrate the Azure Cosmos DB for MongoDB API databases to Amazon DocumentDB with very little downtime. It keeps the target Amazon DocumentDB cluster in sync with the source Microsoft Azure Cosmos DB until the client applications are cut over to the DocumentDB cluster. - -For more information about the Cosmos DB Migrator tool, checkout the [Cosmos DB Migration Utility README](./cosmos-db-migration-utility/README.md) file. +## Amazon DocumentDB Index Tool -## Amazon DocumentDB Global Clusters Automation Tool +The [DocumentDB Index Tool](./index-tool) makes it easy to migrate only indexes (not data) between a source MongoDB deployment and an Amazon DocumentDB cluster. -The `global-clusters-automation` is a tool created to automate the global cluster failover process for Disaster Recovery (DR) and Business Continuity Planning (BCP) use cases. It uses AWS lambda functions to trigger failover process and convert a standalone regional cluster to a global cluster.Amazon Route53 private hosted zone is used to manage cluster endpoints changes for applications. +## Support -For more information about the Global Clusters Automation Tool, checkout the [Global Clusters Automation Tool](./global-clusters-automation/README.md) file. +The contents of this repository are maintained by Amazon DocumentDB Specialist SAs and are not officially supported by AWS. Please file a [Github Issue](https://github.com/awslabs/amazon-documentdb-tools/issues) if you experience any problems. ## License diff --git a/compat-tool/README.md b/compat-tool/README.md index 673948f..3cea38f 100644 --- a/compat-tool/README.md +++ b/compat-tool/README.md @@ -1,8 +1,58 @@ # Amazon DocumentDB Compatibility Tool -The tool examines MongoDB log files or source code from MongoDB applications to determine if there are any queries which use operators that are not supported in Amazon DocumentDB. This tool produces a simple report of unsupported operators and file names with line numbers for further investigation. +The tool examines **MongoDB serverStatus() counters**, **MongoDB log files**, or **application source code** from MongoDB applications to determine if there are any operators in use that are not supported in Amazon DocumentDB. It produces a simple report of both supported and unsupported operator usage. + +## Recommended compatibility testing for MongoDB 5.0 and newer +* run the tool once using directConnection=true to the primary instance in your cluster + * ```python3 compat.py --uri "mongodb://:@:/admin?directConnection=true"``` +* if using secondary instances for read-scale run the tool again on one secondary (again using &directConnection) + * ```python3 compat.py --uri "mongodb://:@:/admin?directConnection=true"``` + +## Prerequisites for MongoDB Log File Analysis + +### Enable query logging in MongoDB +#### For local, on-premise, or self-managed installations: +By default, MongoDB logs the slow queries, over the 100ms threshold, to the configured log file. + +1. Check current profiling status and note the `slowms` value: +``` +> db.getProfilingStatus() +{ + "was": 0, + "slowms": 100, + "sampleRate": 1 +} +``` + +2. Enable logging of all queries by setting `slowms` to `-1`: +``` +> db.setProfilingLevel(0, -1) +``` +3. **After completing the compatibility analysis**, reset to the original profiling level (using the `slowms` value from step 1): +``` +> db.setProfilingLevel(0, 100) +``` + +#### For MongoDB Atlas: +MongoDB Atlas dynamically adjusts slow query threshold based on the execution time of operations across the cluster. The Atlas-managed slow operation threshold is enabled by default and must be disabled using the Atlas CLI, Atlas Administration API, or Atlas UI. + +1. Disable the Atlas-Managed Slow Operation Threshold by going to Project Settings for the cluster project and toggling Managed Slow Operations to Off. Please refer to the documentation here: https://www.mongodb.com/docs/atlas/performance-advisor/ +2. Connect to your cluster using MongoDB shell or compass and enable full logging: + ``` + db.setProfilingLevel(0, -1) + ``` +3. [Download the logs](https://www.mongodb.com/docs/atlas/mongodb-logs/) from the Atlas console +4. **After completing the compatibility analysis**, enable the Atlas-Managed Slow Operation Threshold + +If Atlas-managed slow operation threshold is not enabled, follow the same steps as local or on-premise installations above. + +#### NOTE: +Query profiling can cause additional overhead, it is recommended to use a dev/test environment to capture the queries. +See the MongoDB [documentation](https://www.mongodb.com/docs/manual/reference/method/db.setProfilingLevel/) for additional information. ## Requirements -Python 3.6 or later +- Python 3.6 or later +- pymongo (if testing compatibility using the --uri option) + ## Installation Clone the repository and go to the tool folder: @@ -12,17 +62,40 @@ cd amazon-documentdb-tools/compat-tool/ ``` ## Usage/Examples -This tool supports examining compatibility with either the 3.6, 4.0 or 5.0 versions of Amazon DocumentDB. The script has the following arguments: +This tool supports examining compatibility with either the 3.6, 4.0, 5.0, or Elastic Clusters 5.0 versions of Amazon DocumentDB. The script has the following arguments: ``` ---version {3.6,4.0,5.0} -> Check for DocumentDB version compatibility (default is 5.0) ---directory SCANDIR -> Directory containing files to scan for compatibility ---file SCANFILE -> Specific file to scan for compatibility ---excluded-extensions EXCLUDEDEXTENSIONS -> Filename extensions to exclude from scanning, comma separated ---included-extensions INCLUDEDEXTENSIONS -> Filename extensions to include in scanning, comma separated ---show-supported -> Include supported operators in the report +--uri MONGO_URI -> Use db.serverStatus() output to scan for compatibility (requires MongoDB 5.0+) +--version {3.6,4.0,5.0,EC5.0} -> Check for DocumentDB version compatibility (default is 5.0) +--directory SCANDIR -> Directory containing files to scan for compatibility +--file SCANFILE -> Specific file to scan for compatibility +--excluded-extensions EXCLUDEDEXTENSIONS -> Filename extensions to exclude from scanning, comma separated +--excluded-directories EXCLUDEDDIRECTORIES -> Fully qualified path to directory to exclude, comma separated +--included-extensions INCLUDEDEXTENSIONS -> Filename extensions to include in scanning, comma separated ``` #### Example 1: +Check for compatibility using a MongoDB 5.0+ instance: +``` +python compat.py --uri "mongodb://:@:/admin?directConnection=true" + +connecting to the server at :27017 +database server major version is 5 +checking compatibility using db.serverStatus() + +The following 2 unsupported operators were found: + $bucket | executed 1 time(s) + $facet | executed 1 time(s) + +The following 6 supported operators were found: + $eq | executed 12 time(s) + $group | executed 3 time(s) + $gt | executed 6 time(s) + $match | executed 1 time(s) + $switch | executed 1 time(s) + $unwind | executed 1 time(s) +``` + +#### Example 2: Check for compatibility with Amazon DocumentDB version 5.0, files from the folder called test, excluding the ones with extension `.txt`: ``` python3 compat.py --version 5.0 --directory test --excluded-extensions txt @@ -61,11 +134,11 @@ List of skipped files - excluded extensions test/testlog2.txt ``` -#### Example 2: +#### Example 3: Check a specific file and show the supported operators found: ``` -python3 compat.py --file test/testlog.txt --show-supported +python3 compat.py --file test/testlog.txt processing file test/testlog.txt Processed 1 files, skipped 0 files @@ -95,39 +168,93 @@ The following 9 supported operators were found - $sum | found 1 time(s) ``` -#### NOTES: -* All files scanned by this utility are opened read-only and scanned in memory. For large files, make sure you have enough available RAM or split the files accordingly. -* With the exception of operators used, there is no logging of the file contents. -* Using the `--directory` argument will scan all the files, including subdirectories which will be scanned resursively. - -### Enable query logging in MongoDB -#### For local or on-premise installations: -By default, MongoDB logs the slow queries, over the 100ms threshold, to the configured log file. -To view the current profiling status, use the `getProfilingStatus()` in MongoDB shell: +### Example 4: +Check for compatibility with Amazon DocumentDB, files from the folder called test, excluding the ones with extension `.txt` and excluding directories `exclude1` and `exclude2`: ``` -> db.getProfilingStatus() -{ - "was": 0, - "slowms": 100, - "sampleRate": 1 -} -``` +python3 compat.py --version 5.0 --directory test --excluded-extensions txt --excluded-directories /path/to/directory/exclude1,/path/do/directory/exclude2 -To enable logging of all queries, set the `slowms` parameter to `-1`: +processing file test/mongod.log.2020-11-10T19-33-14 +processing file test/mongodb.log +processing file test/sample-5-0-features.py +processing file test/sample-python-1.py +processing file test/sample-python-2.py -``` -> db.setProfilingLevel(0, -1) -``` +Processed 5 files, skipped 3 files -To set the slow logging threshold to the prvious level: -``` -> db.setProfilingLevel(0, 100) +The following 5 unsupported operators were found: + $facet | found 2 time(s) + $sortByCount | found 2 time(s) + $bucket | found 1 time(s) + $bucketAuto | found 1 time(s) + $expr | found 1 time(s) + +Unsupported operators by filename and line number: + $facet | lines = found 2 time(s) + test/mongodb.log | lines = [80, 82] + $sortByCount | lines = found 2 time(s) + test/mongod.log.2020-11-10T19-33-14 | lines = [83] + test/sample-python-2.py | lines = [29] + $bucket | lines = found 1 time(s) + test/mongodb.log | lines = [80] + $bucketAuto | lines = found 1 time(s) + test/mongodb.log | lines = [82] + $expr | lines = found 1 time(s) + test/mongod.log.2020-11-10T19-33-14 | lines = [107] + +The following 34 supported operators were found: + - $match | found 34 time(s) + - $gt | found 15 time(s) + - $project | found 15 time(s) + - $lte | found 14 time(s) + - $group | found 13 time(s) + - $gte | found 13 time(s) + - $sum | found 11 time(s) + - $in | found 10 time(s) + - $count | found 7 time(s) + - $ne | found 7 time(s) + - $lookup | found 6 time(s) + - $unwind | found 4 time(s) + - $eq | found 3 time(s) + - $sort | found 3 time(s) + - $nin | found 2 time(s) + - $nor | found 2 time(s) + - $set | found 2 time(s) + - $skip | found 2 time(s) + - $addToSet | found 1 time(s) + - $and | found 1 time(s) + - $arrayElemAt | found 1 time(s) + - $avg | found 1 time(s) + - $dateAdd | found 1 time(s) + - $dateSubtract | found 1 time(s) + - $elemMatch | found 1 time(s) + - $first | found 1 time(s) + - $inc | found 1 time(s) + - $last | found 1 time(s) + - $limit | found 1 time(s) + - $lt | found 1 time(s) + - $max | found 1 time(s) + - $min | found 1 time(s) + - $not | found 1 time(s) + - $or | found 1 time(s) + +List of skipped files - excluded extensions + test/not_a_log_file.txt + test/testlog.txt + test/testlog2.txt + +List of skipped directories - excluded directories + test/exclude1 + test/exclude2 + ``` +#### NOTES: +* All files scanned by this utility are opened read-only and scanned in memory. For large files, make sure you have enough available RAM or split the files accordingly. +* With the exception of operators used, there is no logging of the file contents. +* Using the `--directory` argument will scan all the files, including subdirectories which will be scanned resursively. -#### For MongoDB Atlas: -Check the MongoDB Atlas documentation for how to [enable profiling](https://www.mongodb.com/docs/atlas/tutorial/profile-database/#access-the-query-profiler) and [download the logs](https://www.mongodb.com/docs/atlas/mongodb-logs/). +## Contributing +Contributions are always welcome! See the [contributing page](https://github.com/awslabs/amazon-documentdb-tools/blob/master/CONTRIBUTING.md) for ways to get involved. -#### NOTE: -Query profiling can cause additional overhead, it is recommended to use a dev/test environment to capture the queries. -See the MongoDB [documentation](https://www.mongodb.com/docs/manual/reference/method/db.setProfilingLevel/) for additional information. +## License +Apache 2.0 \ No newline at end of file diff --git a/compat-tool/add-new-version.py b/compat-tool/add-new-version.py new file mode 100644 index 0000000..a9a3371 --- /dev/null +++ b/compat-tool/add-new-version.py @@ -0,0 +1,16 @@ +#!/usr/bin/python3 + +import compat + +def main(): + existingVersion = "5.0" + newVersion = "8.0" + + keywords = compat.load_keywords() + + for thisKeyword in keywords.keys(): + keywords[thisKeyword][newVersion] = keywords[thisKeyword][existingVersion] + print(" \"{}\":{},".format(thisKeyword,keywords[thisKeyword])) + +if __name__ == '__main__': + main() diff --git a/compat-tool/check-percentages.py b/compat-tool/check-percentages.py new file mode 100644 index 0000000..94650ff --- /dev/null +++ b/compat-tool/check-percentages.py @@ -0,0 +1,57 @@ +#!/usr/bin/python3 + +import compat + +def main(): + versions = ['3.6','4.0','5.0','8.0','EC5.0'] + keywords = compat.load_keywords() + + totOps = 0 + numOps = {} + numOpsSupported = {} + + for thisKeyword in keywords.keys(): + # get counts by mongodb version + totOps += 1 + thisMongodbVersion = keywords[thisKeyword]["mongodbversion"] + if thisMongodbVersion in numOps: + numOps[thisMongodbVersion] += 1 + else: + numOps[thisMongodbVersion] = 1 + + # get supported count by documentdb version + for docDbVersion in versions: + if keywords[thisKeyword][docDbVersion] == "Yes": + if docDbVersion in numOpsSupported: + numOpsSupported[docDbVersion] += 1 + else: + numOpsSupported[docDbVersion] = 1 + + print("") + print("MongoDB Operations By Version, total = {}".format(totOps)) + for thisVersion in sorted(numOps.keys()): + print(" {} in version {}".format(numOps[thisVersion],thisVersion)) + + print("") + print("DocumentDB Supported Operations By Version") + for thisVersion in sorted(numOpsSupported.keys()): + print(" {} supported by DocumentDB version {} ({:.1f}%)".format(numOpsSupported[thisVersion],thisVersion,numOpsSupported[thisVersion]/totOps*100)) + print("") + + print("") + print("DocumentDB EC Compat Check") + for thisKeyword in sorted(keywords.keys()): + if keywords[thisKeyword]["5.0"] == "Yes" and keywords[thisKeyword]["EC5.0"] == "No": + print(" {}".format(thisKeyword)) + print("") + + #print("") + #print("DocumentDB 5.0 Check") + #for thisKeyword in sorted(keywords.keys()): + # if keywords[thisKeyword]["5.0"] == "No": + # print(" {} from MongoDB {}".format(thisKeyword,keywords[thisKeyword]["mongodbversion"])) + #print("") + + +if __name__ == '__main__': + main() diff --git a/compat-tool/compat.py b/compat-tool/compat.py index 2fd014f..92a38e6 100644 --- a/compat-tool/compat.py +++ b/compat-tool/compat.py @@ -5,9 +5,14 @@ import sys import re import argparse +import json +try: + import pymongo +except: + pass -versions = ['3.6','4.0','5.0','EC5.0'] +versions = ['3.6','4.0','5.0','8.0','EC5.0'] processingFeedbackLines = 10000 issuesDict = {} detailedIssuesDict = {} @@ -15,6 +20,33 @@ skippedFileList = [] exceptionFileList = [] numProcessedFiles = 0 +skippedDirectories = [] + + +def ensureDirect(uri): + # make sure we are directly connecting to the server requested, not via replicaSet + + connInfo = {} + parsedUri = pymongo.uri_parser.parse_uri(uri) + + for thisKey in sorted(parsedUri['options'].keys()): + if thisKey.lower() not in ['replicaset','readpreference']: + connInfo[thisKey] = parsedUri['options'][thisKey] + + # make sure we are using directConnection=true + connInfo['directconnection'] = True + + connInfo['username'] = parsedUri['username'] + connInfo['password'] = parsedUri['password'] + connInfo['host'] = parsedUri['nodelist'][0][0] + connInfo['port'] = parsedUri['nodelist'][0][1] + + print(" + connecting to the server at {}:{}".format(connInfo['host'],connInfo['port'])) + + if parsedUri.get('database') is not None: + connInfo['authSource'] = parsedUri['database'] + + return connInfo def double_check(checkOperator, checkLine, checkLineLength): @@ -28,8 +60,19 @@ def double_check(checkOperator, checkLine, checkLineLength): return foundOperator +def check_all_parents(fileName, excludedDirectories): + retVal = False + + for thisExcludedDirectory in excludedDirectories: + if fileName.startswith(thisExcludedDirectory+'/'): + retVal = True + break + + return retVal + + def scan_code(args, keywords): - global numProcessedFiles, issuesDict, detailedIssuesDict, supportedDict, skippedFileList, exceptionFileList + global numProcessedFiles, issuesDict, detailedIssuesDict, supportedDict, skippedFileList, exceptionFileList, skippedDirectories ver = args.version @@ -48,19 +91,30 @@ def scan_code(args, keywords): if args.includedExtensions != "NONE": excludedExtensions = args.excludedExtensions.lower().split(",") + excludedDirectories = [] + if args.excludedDirectories != "NONE": + excludedDirectories = args.excludedDirectories.lower().split(",") if args.scanFile is not None: fileArray.append(args.scanFile) numProcessedFiles += 1 else: for filename in glob.iglob("{}/**".format(args.scanDir), recursive=True): - if os.path.isfile(filename): - if ((pathlib.Path(filename).suffix[1:].lower() not in excludedExtensions) and - ((args.includedExtensions == "ALL") or - (pathlib.Path(filename).suffix[1:].lower() in includedExtensions))): - fileArray.append(filename) - numProcessedFiles += 1 - else: - skippedFileList.append(filename) + if os.path.isdir(filename) and filename in excludedDirectories: + # add to skipped directory list + skippedDirectories.append(filename) + elif check_all_parents(filename, excludedDirectories): + # move on + continue + else: + if os.path.isfile(filename): + if ((pathlib.Path(filename).suffix[1:].lower() not in excludedExtensions) and + ((args.includedExtensions == "ALL") or + (pathlib.Path(filename).suffix[1:].lower() in includedExtensions))): + fileArray.append(filename) + numProcessedFiles += 1 + else: + skippedFileList.append(filename) + for thisFile in fileArray: print("processing file {}".format(thisFile)) @@ -99,7 +153,7 @@ def scan_code(args, keywords): detailedIssuesDict[checkCompat] = {} detailedIssuesDict[checkCompat][thisFile] = [fileLineNum] - elif (keywords[checkCompat][ver] == 'Yes') and args.showSupported: + elif (keywords[checkCompat][ver] == 'Yes'): # check for supported operators if (thisLine.find(checkCompat) >= 0): # check for false positives - for each position found see if next character is not a..z|A..Z or if at EOL @@ -112,31 +166,130 @@ def scan_code(args, keywords): if (fileLineNum % processingFeedbackLines) == 0: print(" processing line {}".format(fileLineNum)) fileLineNum += 1 - + + +def getOperatorsFromServer(args): + fullListDict = {} + filteredOpsList = ['$alwaysFalse','$alwaysTrue','$backupCursor','$backupCursorExtend','$const','$listCachedAndActiveUsers','$listCatalog','$listClusterCatalog','$mergeCursors','$operationMetrics', + '$queue','$searchBeta','$setMetadata','$setVariableFromSubPipeline'] + + client = pymongo.MongoClient(**ensureDirect(args.uri)) + serverStatus = client.admin.command("serverStatus") + client.close() + + # uptime + upSeconds = serverStatus.get('uptime',-1) + print(" + database server has been up for {:.2f} days".format(upSeconds/86400)) + + # get/check version + majorVersion = int(serverStatus.get('version','0').split('.')[0]) + print(" + database server major version is {}".format(majorVersion)) + if majorVersion < 5: + print("This tool is only supported for version 5+") + sys.exit(1) + + for thisKey in serverStatus['metrics']['aggStageCounters']: + if type(serverStatus['metrics']['aggStageCounters'][thisKey]) is dict: + for thisKey2 in serverStatus['metrics']['aggStageCounters'][thisKey]: + if not thisKey2.startswith("$_") and thisKey2 not in filteredOpsList: + if thisKey2 in fullListDict: + fullListDict[thisKey2] += serverStatus['metrics']['aggStageCounters'][thisKey][thisKey2] + else: + fullListDict[thisKey2] = serverStatus['metrics']['aggStageCounters'][thisKey][thisKey2] + else: + if not thisKey.startswith("$_") and thisKey not in filteredOpsList: + if thisKey in fullListDict: + fullListDict[thisKey] += serverStatus['metrics']['aggStageCounters'][thisKey] + else: + fullListDict[thisKey] = serverStatus['metrics']['aggStageCounters'][thisKey] + + for thisKey in serverStatus['metrics']['operatorCounters']: + if type(serverStatus['metrics']['operatorCounters'][thisKey]) is dict: + for thisKey2 in serverStatus['metrics']['operatorCounters'][thisKey]: + if not thisKey2.startswith("$_") and thisKey2 not in filteredOpsList: + if thisKey2 in fullListDict: + fullListDict[thisKey2] = serverStatus['metrics']['operatorCounters'][thisKey][thisKey2] + else: + fullListDict[thisKey2] = serverStatus['metrics']['operatorCounters'][thisKey][thisKey2] + else: + if not thisKey.startswith("$_") and thisKey not in filteredOpsList: + if thisKey in fullListDict: + fullListDict[thisKey] += serverStatus['metrics']['operatorCounters'][thisKey] + else: + fullListDict[thisKey] = serverStatus['metrics']['operatorCounters'][thisKey] + + return fullListDict + def main(args): parser = argparse.ArgumentParser(description="Parse the command line.") - parser.add_argument("--version", dest="version", action="store", default="5.0", help="Check for DocumentDB version compatibility (default is 5.0)", choices=versions, required=False) - parser.add_argument("--directory", dest="scanDir", action="store", help="Directory containing files to scan for compatibility", required=False) - parser.add_argument("--file", dest="scanFile", action="store", help="Specific file to scan for compatibility", required=False) + + group = parser.add_argument_group('scan mode','technique to test compatibility') + exclusiveGroup = group.add_mutually_exclusive_group(required=True) + + exclusiveGroup.add_argument("--directory", dest="scanDir", action="store", help="Directory containing profiled log files or source code files to scan for compatibility", required=False) + exclusiveGroup.add_argument("--file", dest="scanFile", action="store", help="Specific log file or source code file to scan for compatibility", required=False) + exclusiveGroup.add_argument("--uri", dest="uri", action="store", help="URI of MongoDB server for compatibility check", required=False) + parser.add_argument("--excluded-extensions", dest="excludedExtensions", action="store", default="NONE", help="Filename extensions to exclude from scanning, comma separated", required=False) parser.add_argument("--included-extensions", dest="includedExtensions", action="store", default="ALL", help="Filename extensions to include in scanning, comma separated", required=False) - parser.add_argument("--show-supported", dest="showSupported", action="store_true", default=False, help="Include supported operators in the report", required=False) - args = parser.parse_args() - - if args.scanDir is None and args.scanFile is None: - parser.error("at least one of --directory and --file required") + parser.add_argument("--excluded-directories", dest="excludedDirectories", action="store", default="NONE", help="directories to exclude from scanning, comma separated", required=False) + parser.add_argument("--version", dest="version", action="store", default="5.0", help="Check for DocumentDB version compatibility (default is 5.0)", choices=versions, required=False) - elif args.scanDir is not None and args.scanFile is not None: - parser.error("must provide exactly one of --directory or --file required, not both") + args = parser.parse_args() - elif args.scanFile is not None and not os.path.isfile(args.scanFile): + if args.scanFile is not None and not os.path.isfile(args.scanFile): parser.error("unable to locate file {}".format(args.scanFile)) elif args.scanDir is not None and not os.path.isdir(args.scanDir): parser.error("unable to locate directory {}".format(args.scanDir)) keywords = load_keywords() + + + if args.uri is not None: + # check for compatibility using db.serverStatus() + print("Gathering usage data for analysis") + + ver = args.version + notCompatCounter = 0 + compatCounter = 0 + usageDict = getOperatorsFromServer(args) + print(" + checking compatibility using db.serverStatus()") + + # get count of compatible and incompatible operators found + for thisKey in sorted(usageDict.keys()): + if (usageDict[thisKey] > 0) and (keywords[thisKey][ver] == 'No'): + notCompatCounter += 1 + elif (usageDict[thisKey] > 0) and (keywords[thisKey][ver] == 'Yes'): + compatCounter += 1 + + print("") + # unsupported operators + if notCompatCounter > 0: + print("The following {} unsupported operators were found:".format(notCompatCounter)) + for thisKey in sorted(usageDict.keys()): + if (thisKey not in keywords): + print(" {} | executed {} time(s) - WARNING - operator is missing from compat tool, please file an issue".format(thisKey,usageDict[thisKey])) + elif (usageDict[thisKey] > 0) and (keywords[thisKey][ver] == 'No'): + print(" {} | executed {} time(s)".format(thisKey,usageDict[thisKey])) + else: + print("No unsupported operators found.") + + print("") + # supported operators + if compatCounter > 0: + print("The following {} supported operators were found:".format(compatCounter)) + for thisKey in sorted(usageDict.keys()): + if (usageDict[thisKey] > 0) and (keywords[thisKey][ver] == 'Yes'): + print(" {} | executed {} time(s)".format(thisKey,usageDict[thisKey])) + else: + print("WARNING - No supported operators found, check that the URI provided is correct") + + print("") + + sys.exit(0) + scan_code(args, keywords) print("") @@ -160,11 +313,14 @@ def main(args): print("") print("No unsupported operators found.") - if len(supportedDict) > 0 and args.showSupported: + if len(supportedDict) > 0: print("") print("The following {} supported operators were found:".format(len(supportedDict))) for thisKeyPair in sorted(supportedDict.items(), key=lambda x: (-x[1],x[0])): - print(" - {} | found {} time(s)".format(thisKeyPair[0],thisKeyPair[1])) + print(" {} | found {} time(s)".format(thisKeyPair[0],thisKeyPair[1])) + else: + print("") + print("WARNING - No supported operators found, check that profiling is enabled if scanning logs or using the correct path to scan source code") if len(skippedFileList) > 0: print("") @@ -177,6 +333,12 @@ def main(args): print("List of skipped files - unsupported file type/content") for exceptionFile in exceptionFileList: print(" {}".format(exceptionFile)) + + if len(skippedDirectories) > 0: + print("") + print("List of skipped directories - excluded directories") + for skippedDirectory in skippedDirectories: + print(" {}".format(skippedDirectory)) print("") @@ -188,240 +350,281 @@ def main(args): def load_keywords(): thisKeywords = { - "$$CURRENT":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$$DESCEND":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$$KEEP":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$$PRUNE":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$$REMOVE":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$$ROOT":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$abs":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$accumulator":{"mongodbversion":"4.4","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$acos":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$acosh":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$add":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$addFields":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$addToSet":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$all":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$allElementsTrue":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$and":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$anyElementTrue":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$arrayElemAt":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$arrayToObject":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$asin":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$asinh":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$atan":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$atan2":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$atanh":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$avg":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$binarySize":{"mongodbversion":"4.4","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$bit":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$bitsAllClear":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$bitsAllSet":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$bitsAnyClear":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$bitsAnySet":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$bottom":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$bottomN":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$box":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$bsonSize":{"mongodbversion":"4.4","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$bucket":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$bucketAuto":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$ceil":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$center":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$centerSphere":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$cmp":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$collStats":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$comment":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$concat":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$concatArrays":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$cond":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$convert":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$cos":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$cosh":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$count":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$currentDate":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$currentOp":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$dateAdd":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"Yes","EC5.0":"Yes"}, - "$dateDiff":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$dateFromParts":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$dateFromString":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$dateSubtract":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"Yes","EC5.0":"Yes"}, - "$dateToParts":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$dateToString":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$dateTrunc":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$dayOfMonth":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$dayOfWeek":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$dayOfYear":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$degreesToRadians":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$densify":{"mongodbversion":"5.1","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$divide":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$documents":{"mongodbversion":"5.1","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$each":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$elemMatch":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$eq":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$exists":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$exp":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$expr":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$facet":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$fill":{"mongodbversion":"5.3","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$filter":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$first":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$firstN":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$floor":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$function":{"mongodbversion":"4.4","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$geoIntersects":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$geometry":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$geoNear":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$geoWithin":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$getField":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$graphLookup":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$group":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$gt":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$gte":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$hour":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$ifNull":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$in":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$inc":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$indexOfArray":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$indexOfBytes":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$indexOfCP":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$indexStats":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$isArray":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$isNumber":{"mongodbversion":"4.4","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$isoDayOfWeek":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$isoWeek":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$isoWeekYear":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$jsonSchema":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$last":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$lastN":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$let":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$limit":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$linearFill":{"mongodbversion":"5.3","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$listLocalSessions":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$listSessions":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$literal":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$ln":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$locf":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$log":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$log10":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$lookup":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$lt":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$lte":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$ltrim":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$map":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$match":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$max":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$maxDistance":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$maxN":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$merge":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$mergeObjects":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$meta":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$millisecond":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$min":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$minDistance":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$minN":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$minute":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$mod":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$month":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$mul":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$multiply":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$natural":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$ne":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$near":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$nearSphere":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$nin":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$nor":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$not":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$objectToArray":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$or":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$out":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$planCacheStats":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$polygon":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$pop":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$position":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$pow":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$project":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$pull":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$pullAll":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$push":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$radiansToDegrees":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$rand":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$range":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$redact":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$reduce":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$regex":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$regexFind":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$regexFindAll":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$regexMatch":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$rename":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$replaceAll":{"mongodbversion":"4.4","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$replaceOne":{"mongodbversion":"4.4","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$replaceRoot":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$replaceWith":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$reverseArray":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$round":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$rtrim":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$sample":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"No"}, - "$sampleRate":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$second":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$set":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$setDifference":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$setEquals":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$setField":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$setIntersection":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$setIsSubset":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$setOnInsert":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$setUnion":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$setWindowFields":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$sin":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$sinh":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$size":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$skip":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$slice":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$sort":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$sortArray":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$sortByCount":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$split":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$sqrt":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$stdDevPop":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$stdDevSamp":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$strcasecmp":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$strLenBytes":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$strLenCP":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$substr":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$substrBytes":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$substrCP":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$subtract":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$sum":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$switch":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$tan":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$tanh":{"mongodbversion":"4.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$text":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$toBool":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$toDate":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$toDecimal":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$toDouble":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$toInt":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$toLong":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$toLower":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$toObjectId":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$top":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$topN":{"mongodbversion":"5.2","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$toString":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$toUpper":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$trim":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$trunc":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$tsIncrement":{"mongodbversion":"5.1","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$tsSecond":{"mongodbversion":"5.1","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$type":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$unionWith":{"mongodbversion":"4.4","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$uniqueDocs":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$unset":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$unsetField":{"mongodbversion":"5.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$unwind":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$week":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$where":{"mongodbversion":"4.0","3.6":"No","4.0":"No","5.0":"No","EC5.0":"No"}, - "$year":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}, - "$zip":{"mongodbversion":"4.0","3.6":"Yes","4.0":"Yes","5.0":"Yes","EC5.0":"Yes"}} + "$$CURRENT":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$$DESCEND":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$$KEEP":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$$PRUNE":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$$REMOVE":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$$ROOT":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$abs":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$accumulator":{'mongodbversion': '4.4', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$acos":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$acosh":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$add":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$addFields":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$addToSet":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$all":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$allElementsTrue":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$and":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$anyElementTrue":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$arrayElemAt":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$arrayToObject":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$asin":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$asinh":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$atan":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$atan2":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$atanh":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$avg":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$binarySize":{'mongodbversion': '4.4', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$bit":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$bitAnd":{'mongodbversion': '6.3', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$bitNot":{'mongodbversion': '6.3', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$bitOr":{'mongodbversion': '6.3', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$bitXor":{'mongodbversion': '6.3', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$bitsAllClear":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$bitsAllSet":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$bitsAnyClear":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$bitsAnySet":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$bottom":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$bottomN":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$box":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$bsonSize":{'mongodbversion': '4.4', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$bucket":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'Yes'}, + "$bucketAuto":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$ceil":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$center":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$centerSphere":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$changeStream":{'mongodbversion': '3.6', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$changeStreamSplitLargeEvent":{'mongodbversion': '7.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$cmp":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$collStats":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$comment":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$concat":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$concatArrays":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$cond":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$covariancePop":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$covarianceSamp":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$convert":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$cos":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$cosh":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$count":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$currentDate":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$currentOp":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$dateAdd":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$dateDiff":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$dateFromParts":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$dateFromString":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$dateSubtract":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$dateToParts":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$dateToString":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$dateTrunc":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'Yes'}, + "$dayOfMonth":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$dayOfWeek":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$dayOfYear":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$degreesToRadians":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$denseRank":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$densify":{'mongodbversion': '5.1', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$derivative":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$divide":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$documentNumber":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$documents":{'mongodbversion': '5.1', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$each":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$elemMatch":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$encStrContains":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$encStrEndsWith":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$encStrNormalizedEq":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$encStrStartsWith":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$eq":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$exists":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$exp":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$expMovingAvg":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$expr":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$facet":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$fill":{'mongodbversion': '5.3', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$filter":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$first":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$firstN":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$floor":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$function":{'mongodbversion': '4.4', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$geoIntersects":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$geometry":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$geoNear":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$geoWithin":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$getField":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$graphLookup":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$group":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$gt":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$gte":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$hour":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$ifNull":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$in":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$inc":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$indexOfArray":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$indexOfBytes":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$indexOfCP":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$indexStats":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$integral":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$isArray":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$isNumber":{'mongodbversion': '4.4', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$isoDayOfWeek":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$isoWeek":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$isoWeekYear":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$jsonSchema":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$last":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$lastN":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$let":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$limit":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$linearFill":{'mongodbversion': '5.3', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$listLocalSessions":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$listMqlEntities":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$listSampledQueries":{'mongodbversion': '7.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$listSearchIndexes":{'mongodbversion': '7.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$listSessions":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$literal":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$ln":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$locf":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$log":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$log10":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$lookup":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$lt":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$lte":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$ltrim":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$map":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$match":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$max":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$maxDistance":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$maxN":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$median":{'mongodbversion': '7.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$merge":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'Yes'}, + "$mergeObjects":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$meta":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$millisecond":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$min":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$minDistance":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$minMaxScalar":{'mongodbversion': '8.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$minN":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$minute":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$mod":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$month":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$mul":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$multiply":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$natural":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$ne":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$near":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$nearSphere":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$nin":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$nor":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$not":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$objectToArray":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$or":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$out":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$percentile":{'mongodbversion': '7.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$planCacheStats":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$polygon":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$pop":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$position":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$pow":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'Yes'}, + "$project":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$pull":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$pullAll":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$push":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$querySettings":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$queryStats":{'mongodbversion': '7.1', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$radiansToDegrees":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$rand":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'Yes'}, + "$range":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$rank":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$rankFusion":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$redact":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$reduce":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$regex":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$regexFind":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$regexFindAll":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$regexMatch":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$rename":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$replaceAll":{'mongodbversion': '4.4', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$replaceOne":{'mongodbversion': '4.4', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$replaceRoot":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$replaceWith":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'Yes'}, + "$reverseArray":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$round":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$rtrim":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$sample":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$sampleRate":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$score":{'mongodbversion': '8.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$scoreFusion":{'mongodbversion': '8.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$search":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$searchMeta":{'mongodbversion': 'atlas', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$second":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$set":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$setDifference":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$setEquals":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$setField":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$setIntersection":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$setIsSubset":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$setOnInsert":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$setUnion":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$setWindowFields":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$shardedDataDistribution":{'mongodbversion': '6.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$shift":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$sigmoid":{'mongodbversion': '6.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$similarityCosine":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$similarityDotProduct":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$similarityEuclidean":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$sin":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$sinh":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$size":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$skip":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$slice":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$sort":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$sortArray":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$sortByCount":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$split":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$sqrt":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$stdDevPop":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$stdDevSamp":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$strcasecmp":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$strLenBytes":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$strLenCP":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$substr":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$substrBytes":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$substrCP":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$subtract":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$sum":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$switch":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$tan":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$tanh":{'mongodbversion': '4.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$text":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$toBool":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$toDate":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$toDecimal":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$toDouble":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$toInt":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$toLong":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$toLower":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$toObjectId":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$top":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$topN":{'mongodbversion': '5.2', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$toHashedIndexKey":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$toString":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$toUpper":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$toUUID":{'mongodbversion': '8.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$trim":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'No', '8.0': 'Yes'}, + "$trunc":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$tsIncrement":{'mongodbversion': '5.1', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$tsSecond":{'mongodbversion': '5.1', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$type":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$unionWith":{'mongodbversion': '4.4', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$uniqueDocs":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$unset":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$unsetField":{'mongodbversion': '5.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$unwind":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$vectorSearch":{'mongodbversion': '6.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'Yes'}, + "$week":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$where":{'mongodbversion': '4.0', '3.6': 'No', '4.0': 'No', '5.0': 'No', 'EC5.0': 'No', '8.0': 'No'}, + "$year":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + "$zip":{'mongodbversion': '4.0', '3.6': 'Yes', '4.0': 'Yes', '5.0': 'Yes', 'EC5.0': 'Yes', '8.0': 'Yes'}, + } return thisKeywords diff --git a/compat-tool/create-compat-csv.py b/compat-tool/create-compat-csv.py new file mode 100644 index 0000000..6315755 --- /dev/null +++ b/compat-tool/create-compat-csv.py @@ -0,0 +1,17 @@ +#!/usr/bin/python3 + +import compat + +def main(): + versions = ['3.6','4.0','5.0','EC5.0'] + keywords = compat.load_keywords() + + print("{},{},{},{},{},{}".format('operator','mdb-version','docdb-36','docdb-40','docdb-50','docdb-ec')) + + for thisKeyword in keywords.keys(): + thisEntry = keywords[thisKeyword] + print("{},{},{},{},{},{}".format(thisKeyword,thisEntry["mongodbversion"],thisEntry["3.6"],thisEntry["4.0"],thisEntry["5.0"],thisEntry["EC5.0"])) + + +if __name__ == '__main__': + main() diff --git a/global-clusters-automation/README.md b/global-clusters-automation/README.md index 7b0144a..b0e951c 100644 --- a/global-clusters-automation/README.md +++ b/global-clusters-automation/README.md @@ -1,3 +1,13 @@ +> [!NOTE] +> +> With the release of managed failover and switchover, this automation is no longer required. +> +> For Disaster recovery use global cluster managed failover. +> +> For Business Continuity Planning (BCP) use global cluster switchover. +> +> You can only perform a managed failover or switchover on an Amazon DocumentDB global cluster if the primary and secondary clusters have the same major, minor, and patch level engine versions. However, the patch levels can be different, depending on the minor engine version. If your engine versions are incompatible, you can perform the failover using the automation below or by following the steps in Performing a manual failover for an Amazon DocumentDB global cluster. + # Global Cluster Automation The use cases considered for this tool are Disaster Recovery (DR) and Business Continuity Planning (BCP). DR use case is applicable when DocumentDB is unavailable in a region. BCP use case is applicable when you want to switch from one functional region to another to help validate business continuity across various AWS regions. @@ -52,11 +62,13 @@ Note: During a DR scenario, the writes to the cluster will fail until the promot "storage_encryption": true, "deletion_protection": false } - ] + ], + "io_optimized_storage": true, + "enable_performance_insights": true } ``` #### Action -This function will convert the regional cluster provided via input as `primary_cluster_arn` to a global cluster. The cluster indicated by `primary_cluster_arn` will become the primary cluster. The array of secondary clusters provided as input will be created with appropriate number of instances indicated by `number_of_instances`.The instance class for this instance will be the same as the instance class of primary instance in the primary cluster. +This function will convert the regional cluster provided via input as `primary_cluster_arn` to a global cluster. The cluster indicated by `primary_cluster_arn` will become the primary cluster. The array of secondary clusters provided as input will be created with appropriate number of instances indicated by `number_of_instances`.The instance class for this instance will be the same as the instance class of primary instance in the primary cluster. Additionally you can change the storage type of the secondary cluster by setting 'io_optimized_storage' in the payload to true. if you want to eanble PerformanceInsights in the newly created secondary cluster instances, please set 'enable_performance_insights' to true. #### Output Global cluster with a primary cluster and secondary cluster(s) in the provided region(s) and subnet(s) @@ -71,11 +83,13 @@ Global cluster with a primary cluster and secondary cluster(s) in the provided r "global_cluster_id": "global-demo", "secondary_cluster_arn": "arn:aws:rds:us-west-2:123456789123:cluster:cluster-1", "primary_cluster_cname": "primary.sample.com", - "hosted_zone_id": "Z005XXXXYYYYZZZZDOHSB" + "hosted_zone_id": "Z005XXXXYYYYZZZZDOHSB", + "io_optimized_storage": true, + "enable_performance_insights": true } ``` #### Action -This function will trigger the lambda function *failoverToSecondary* to remove and promote the provided secondary cluster `secondary_cluster_arn`. The boolean to delete old global cluster is set to True by this function. After successful promotion of secondary cluster, this lambda function will trigger the *failoverAndConvertToGlobal* lambda function to recreate the global cluster with secondary clusters in regions that existed in prior to failover.The secondary clusters will use VPC ID and security group ID that were used prior to failover and the cluster ID will be defined based on the current time stamp and prior cluster ID. The instance size and number of instances will also be the same as before (failover). +This function will trigger the lambda function *failoverToSecondary* to remove and promote the provided secondary cluster `secondary_cluster_arn`. The boolean to delete old global cluster is set to True by this function. After successful promotion of secondary cluster, this lambda function will trigger the *failoverAndConvertToGlobal* lambda function to recreate the global cluster with secondary clusters in regions that existed in prior to failover.The secondary clusters will use VPC ID and security group ID that were used prior to failover and the cluster ID will be defined based on the current time stamp and prior cluster ID. The instance size and number of instances will also be the same as before (failover). Additionally you can change the storage type of the secondary cluster by setting 'io_optimized_storage' in the payload to true. if you want to eanble PerformanceInsights in the newly created secondary cluster instances, please set 'enable_performance_insights' to true. Note: During failover to secondary cluster, there will be a brief window of time where the writes to the original primary cluster is not replicated to the newly promoted cluster. Hence, it is always recommended to perform BCP testing during non peak hours when the write traffic is minimum if not zero. diff --git a/global-clusters-automation/add_secondarycluster.py b/global-clusters-automation/add_secondarycluster.py index 0d79aa0..0571637 100644 --- a/global-clusters-automation/add_secondarycluster.py +++ b/global-clusters-automation/add_secondarycluster.py @@ -6,7 +6,7 @@ session = boto3.Session() -def convert_regional_to_global(primary_cluster_arn, global_cluster_id, secondary_clusters): +def convert_regional_to_global(primary_cluster_arn, global_cluster_id, secondary_clusters,enable_performance_insights, io_optimized_storage): try: start_time = time.time() primary_cluster_id = primary_cluster_arn.split(":")[-1] @@ -24,9 +24,6 @@ def convert_regional_to_global(primary_cluster_arn, global_cluster_id, secondary print('Checking for cluster and instance status before converting to global cluster...') cluster_status = get_cluster_status(primary_cluster_arn) time.sleep(1) - - # Start the conversion process by converting the regional cluster indicated by the primary cluster ARN to a - # global cluster. print('Cluster and instances for primary cluster ', primary_cluster_arn, ' is in available status. ' 'Start conversion process') print('Begin STEP 1 of 2 in convert to global cluster: Create global cluster ', global_cluster_id) @@ -39,12 +36,12 @@ def convert_regional_to_global(primary_cluster_arn, global_cluster_id, secondary print('Begin STEP 2 of 2 in convert to global cluster: Create Secondary Clusters') for each_item in secondary_clusters: client_local = session.client('docdb', region_name=each_item['region']) - create_secondary_cluster(each_item, global_cluster_id, client_local) + create_secondary_cluster(each_item, global_cluster_id, client_local, io_optimized_storage) print('Created secondary cluster with id ', each_item['secondary_cluster_id']) # For each secondary cluster in the global cluster, add instances as indicated in the input and use # instance class identified earlier from primary for instance_count in range(0, each_item['number_of_instances']): - add_instance_to_cluster(each_item, instance_class, instance_count, client_local) + add_instance_to_cluster(each_item, instance_class, instance_count, client_local, enable_performance_insights) print('Created instance ', each_item['secondary_cluster_id'] + str(instance_count), 'for secondary cluster ', each_item['secondary_cluster_id']) current_time = time.time() @@ -78,42 +75,67 @@ def identify_instance_class(primary_cluster_id, client_local): return instance_class -def add_instance_to_cluster(each_item, instance_class, instance_count, client_local): +def add_instance_to_cluster(each_item, instance_class, instance_count, client_local, enable_performance_insights): try: response = client_local.create_db_instance( DBClusterIdentifier=each_item['secondary_cluster_id'], DBInstanceIdentifier=each_item['secondary_cluster_id'] + str(instance_count), DBInstanceClass=instance_class, - Engine='docdb' + Engine='docdb', + EnablePerformanceInsights=enable_performance_insights ) except ClientError as e: print('ERROR OCCURRED WHILE PROCESSING: ', e) print('PROCESSING WILL STOP') raise ClientError - -def create_secondary_cluster(each_item, global_cluster_id, client_local): +# create secondary cluster using create_db_cluster API and pass a unpacked dictioanry as parameters +# omit values that are None +def create_secondary_cluster(each_item, global_cluster_id, client_local, io_optimized_storage): try: + cluster_map = get_cluster_args(global_cluster_id, each_item) + if io_optimized_storage: + cluster_map["StorageType"] = "iopt1" response = client_local.create_db_cluster( - GlobalClusterIdentifier=global_cluster_id, - SourceRegion=each_item['region'], - DBClusterIdentifier=each_item['secondary_cluster_id'], - DBSubnetGroupName=each_item['subnet_group'], - VpcSecurityGroupIds=each_item['security_group_id'], - KmsKeyId=each_item['kms_key_id'], - Engine='docdb', - EngineVersion=each_item['engine_version'], - DBClusterParameterGroupName=each_item['cluster_parameter_group'], - BackupRetentionPeriod=each_item['backup_retention_period'], - PreferredBackupWindow=each_item['preferred_back_up_window'], - PreferredMaintenanceWindow=each_item['preferred_maintenance_window'], - StorageEncrypted=each_item['storage_encryption'], - DeletionProtection=each_item['deletion_protection']) + **{k: v for k, v in cluster_map.items() if v is not None}) except ClientError as e: print('ERROR OCCURRED WHILE PROCESSING: ', e) print('PROCESSING WILL STOP') raise ClientError +# creating a dictionary of args to be used with create_db_cluster API +def get_cluster_args(global_cluster_id, each_item): + cluster_map = {} + cluster_map['GlobalClusterIdentifier']=global_cluster_id + cluster_map['SourceRegion']=each_item['region'] + cluster_map['DBClusterIdentifier']=each_item['secondary_cluster_id'] + cluster_map['DBSubnetGroupName']=each_item['subnet_group'] + cluster_map['VpcSecurityGroupIds']=each_item['security_group_id'] + cluster_map['KmsKeyId']=fetch_kms_key(each_item) + cluster_map['Engine']='docdb' + cluster_map['EngineVersion']=each_item['engine_version'] + cluster_map['DBClusterParameterGroupName']=each_item['cluster_parameter_group'] + cluster_map['BackupRetentionPeriod']=each_item['backup_retention_period'] + cluster_map['PreferredBackupWindow']=each_item['preferred_back_up_window'] + cluster_map['PreferredMaintenanceWindow']=each_item['preferred_maintenance_window'] + cluster_map['StorageEncrypted']=each_item['storage_encryption'] + cluster_map['DeletionProtection']=each_item['deletion_protection'] + cluster_map['StorageType'] = fetch_storage_type(each_item) + return cluster_map + +# retrieve KMS key if exists else return None +def fetch_kms_key(each_item): + if 'kms_key_id' in each_item: + KmsKeyId=each_item['kms_key_id'] + return KmsKeyId + else: + return None + +def fetch_storage_type(each_item): + if 'StorageType' in each_item: + return each_item['StorageType'] + else: + return None def create_global_cluster(global_cluster_id, primary_cluster_arn): try: diff --git a/global-clusters-automation/convert_to_global_lambda_function.py b/global-clusters-automation/convert_to_global_lambda_function.py index 2da97ef..ac2377c 100644 --- a/global-clusters-automation/convert_to_global_lambda_function.py +++ b/global-clusters-automation/convert_to_global_lambda_function.py @@ -110,7 +110,9 @@ def lambda_handler(event, context): add_secondarycluster.convert_regional_to_global(primary_cluster_arn=event['primary_cluster_arn'], global_cluster_id=event['global_cluster_id'], - secondary_clusters=event['secondary_clusters']) + secondary_clusters=event['secondary_clusters'], + enable_performance_insights=event['enable_performance_insights'], + io_optimized_storage=event['io_optimized_storage']) end_time = time.time() diff --git a/global-clusters-automation/failover_and_convert_lambda_function.py b/global-clusters-automation/failover_and_convert_lambda_function.py index 0a0b6f3..99f2a6b 100644 --- a/global-clusters-automation/failover_and_convert_lambda_function.py +++ b/global-clusters-automation/failover_and_convert_lambda_function.py @@ -55,7 +55,9 @@ def lambda_handler(event, context): print('Begin process to create request to convert regional cluster to global cluster ') convert_to_global_request = prepare_to_convert(global_cluster_members, global_cluster_id=event['global_cluster_id'], - secondary_cluster_arn=event['secondary_cluster_arn']) + secondary_cluster_arn=event['secondary_cluster_arn'], + io_optimized_storage=event['io_optimized_storage'], + enable_performance_insights=event['enable_performance_insights']) print('Created request to convert back to global cluster.') print('Starting process to failover') failover_function = os.environ['FAILOVER_FUNCTION'] diff --git a/global-clusters-automation/failover_and_convert_to_global.py b/global-clusters-automation/failover_and_convert_to_global.py index 7aa5791..353c9be 100644 --- a/global-clusters-automation/failover_and_convert_to_global.py +++ b/global-clusters-automation/failover_and_convert_to_global.py @@ -22,7 +22,7 @@ def get_global_cluster_members(global_cluster_id): return global_cluster_members -def prepare_to_convert(global_cluster_members, global_cluster_id, secondary_cluster_arn): +def prepare_to_convert(global_cluster_members, global_cluster_id, secondary_cluster_arn, io_optimized_storage, enable_performance_insights): try: # populate the list of clusters in the global cluster and remove the secondary cluster to be promoted from # the list @@ -35,12 +35,17 @@ def prepare_to_convert(global_cluster_members, global_cluster_id, secondary_clus secondary_clusters = [] for each_cluster in regional_clusters: - secondary_clusters.append(get_cluster_details(each_cluster)) + cluster_details = get_cluster_details(each_cluster) + if io_optimized_storage: + cluster_details["StorageType"] = "iopt1" + secondary_clusters.append(cluster_details) convert_to_global_request = { "global_cluster_id": global_cluster_id, "primary_cluster_arn": new_primary_cluster_arn, - "secondary_clusters": secondary_clusters + "secondary_clusters": secondary_clusters, + "io_optimized_storage": io_optimized_storage, + "enable_performance_insights": enable_performance_insights } except ClientError as e: print('ERROR OCCURRED WHILE PROCESSING: ', e) @@ -84,24 +89,34 @@ def get_cluster_details(cluster): vpc_group_ids = [] for each_item in cluster_response['VpcSecurityGroups']: vpc_group_ids.append(each_item['VpcSecurityGroupId']) - - cluster_details = {"region": region, - "secondary_cluster_id": cluster_id + "-" + dt_string, - # When converting the cluster to global cluster and adding clusters from the prior global - # cluster, we append the timestamp to keep the cluster ID unique. This is needed so that the - # function does not wait for the older clusters to be deleted. Also helps to differentiate - # between clusters created by script. - "number_of_instances": len(cluster_response['DBClusterMembers']), - "subnet_group": cluster_response['DBSubnetGroup'], - "security_group_id": vpc_group_ids, - "kms_key_id": cluster_response['KmsKeyId'], - "backup_retention_period": cluster_response['BackupRetentionPeriod'], - "cluster_parameter_group": cluster_response['DBClusterParameterGroup'], - "preferred_back_up_window": cluster_response['PreferredBackupWindow'], - "preferred_maintenance_window": cluster_response['PreferredMaintenanceWindow'], - "storage_encryption": cluster_response['StorageEncrypted'], - "deletion_protection": cluster_response['DeletionProtection']} - + + if "-flipped" in cluster_id: + last_index = cluster_id.rfind("-") + cluster_id = cluster_id[:last_index] + else: + cluster_id = cluster_id + "-flipped" + + cluster_details = { + # When converting the cluster to global cluster and adding clusters from the prior global + # cluster, we append the timestamp to keep the cluster ID unique. This is needed so that the + # function does not wait for the older clusters to be deleted. Also helps to differentiate + # between clusters created by script. + "secondary_cluster_id": cluster_id + "-" + dt_string, + "region": region, + "number_of_instances": len(cluster_response['DBClusterMembers']), + "subnet_group": cluster_response['DBSubnetGroup'], + "security_group_id": vpc_group_ids, + "backup_retention_period": cluster_response['BackupRetentionPeriod'], + "cluster_parameter_group": cluster_response['DBClusterParameterGroup'], + "preferred_back_up_window": cluster_response['PreferredBackupWindow'], + "preferred_maintenance_window": cluster_response['PreferredMaintenanceWindow'], + "storage_encryption": cluster_response['StorageEncrypted'], + "deletion_protection": cluster_response['DeletionProtection'], + "engine_version": cluster_response['EngineVersion'] + } + # add KmsKeyId to cluster_details dictionary only if it exists in the deleted cluster + if 'KmsKeyId' in cluster_response: + cluster_details["kms_key_id"] = cluster_response['KmsKeyId'] return cluster_details except ClientError as e: diff --git a/global-clusters-automation/failover_and_delete_global_cluster.py b/global-clusters-automation/failover_and_delete_global_cluster.py index 8859ccd..bc51b0c 100644 --- a/global-clusters-automation/failover_and_delete_global_cluster.py +++ b/global-clusters-automation/failover_and_delete_global_cluster.py @@ -148,9 +148,10 @@ def delete_global_cluster(global_cluster_id, secondary_cluster_arn, secondary_cl if each_secondary_cluster != secondary_cluster_arn: print('Removing secondary cluster ', each_secondary_cluster, ' from global cluster ', global_cluster_id) remove_from_global_cluster(each_secondary_cluster, global_cluster_id) - # Wait until all standalone clusters are promoted - print('Waiting till all secondary clusters are removed from global cluster ', global_cluster_id) - wait_for_promotion_to_complete(global_cluster_id, secondary_cluster_arn) + # Wait until all clusters in the global cluster are promoted to standalone clusters + # and removed from global cluster + print('Waiting till all secondary clusters are removed from global cluster ', global_cluster_id) + wait_for_promotion_to_complete(global_cluster_id, each_secondary_cluster) # Delete secondary clusters print('All secondary clusters are promoted to standalone cluster. Begin deleting each cluster.') @@ -175,14 +176,26 @@ def delete_primary_cluster(global_cluster_id): try: print('Retrieving primary cluster to delete from global cluster ', global_cluster_id) primary_cluster = get_primary_cluster(global_cluster_id) + + # Primary cluster state will change to 'modifying' a few seconds after the secondary cluster is removed + # Wait until cluster status is 'modifying' and then wait for it to become 'available' again + primary_cluster_status = "" + wait_start = time.time() + while primary_cluster_status != 'modifying' and (time.time() - wait_start < 30): + print('primary cluster status is not modifying') + primary_cluster_status = get_cluster_status(primary_cluster) + time.sleep(1) + # Cluster should be in available status before removing from global cluster primary_cluster_status = "" while primary_cluster_status != 'available': print('Checking for primary cluster ', primary_cluster, ' and its instance status before deletion...') primary_cluster_status = get_cluster_status(primary_cluster) time.sleep(1) + print('Removing primary cluster... ', primary_cluster, ' from global cluster ', global_cluster_id) remove_from_global_cluster(primary_cluster, global_cluster_id) + # Wait until all standalone clusters are promoted wait_for_promotion_to_complete(global_cluster_id, primary_cluster) diff --git a/global-clusters-automation/requirements.txt b/global-clusters-automation/requirements.txt index 2318ef0..84d8066 100644 --- a/global-clusters-automation/requirements.txt +++ b/global-clusters-automation/requirements.txt @@ -1,2 +1,2 @@ -boto3==1.20.26 -botocore==1.23.26 \ No newline at end of file +boto3==1.34.82 +botocore==1.34.82 \ No newline at end of file diff --git a/index-tool/README.md b/index-tool/README.md index 6084944..6c9c31e 100644 --- a/index-tool/README.md +++ b/index-tool/README.md @@ -47,6 +47,8 @@ The Index Tool accepts the following arguments: --skip-incompatible Skip incompatible indexes when restoring metadata --support-2dsphere Support 2dsphere indexes creation (collections must use GeoJSON Point type for indexing) --skip-python-version-check Permit execution using Python 3.6 and prior +--shorten-index-name Shorten long index name to compatible length +--skip-id-indexes Do not create _id indexes ``` ### Export indexes from a MongoDB instance: diff --git a/index-tool/migrationtools/documentdb_index_tool.py b/index-tool/migrationtools/documentdb_index_tool.py index 3d59371..3fdaa89 100644 --- a/index-tool/migrationtools/documentdb_index_tool.py +++ b/index-tool/migrationtools/documentdb_index_tool.py @@ -20,12 +20,16 @@ import logging import os import sys +import string +import random from bson.json_util import dumps from pymongo import MongoClient from pymongo.errors import (ConnectionFailure, OperationFailure, ServerSelectionTimeoutError) from collections import OrderedDict +alphabet = string.ascii_lowercase + string.digits + class AutovivifyDict(dict): """N depth defaultdict.""" @@ -45,13 +49,12 @@ class DocumentDbLimits(object): def __init__(self): pass - COLLECTION_QUALIFIED_INDEX_NAME_MAX_LENGTH = 63 - COLLECTION_NAME_MAX_LENGTH = 57 + COLLECTION_QUALIFIED_INDEX_NAME_MAX_LENGTH = 255 + COLLECTION_NAME_MAX_LENGTH = 255 COMPOUND_INDEX_MAX_KEYS = 32 DATABASE_NAME_MAX_LENGTH = 63 - FULLY_QUALIFIED_INDEX_NAME_MAX_LENGTH = 127 + FULLY_QUALIFIED_INDEX_NAME_MAX_LENGTH = 377 INDEX_KEY_MAX_LENGTH = 2048 - INDEX_NAME_MAX_LENGTH = 63 NAMESPACE_MAX_LENGTH = 120 @@ -63,11 +66,11 @@ class DocumentDbUnsupportedFeatures(object): def __init__(self): pass - UNSUPPORTED_INDEX_TYPES = ['text', '2d', '2dsphere', 'geoHaystack', 'hashed'] - UNSUPPORTED_INDEX_OPTIONS = ['partialFilterExpression', 'storageEngine', \ - 'collation', 'dropDuplicates'] + UNSUPPORTED_INDEX_TYPES = ['2d', '2dsphere', 'geoHaystack', 'hashed'] + UNSUPPORTED_INDEX_OPTIONS = ['storageEngine', 'collation', 'dropDuplicates'] UNSUPPORTED_COLLECTION_OPTIONS = ['capped'] - IGNORED_INDEX_OPTIONS = ['2dsphereIndexVersion'] + UNSUPPORTED_WILDCARD_INDEXES = ['$**', '$***', '$****'] + IGNORED_INDEX_OPTIONS = ['2dsphereIndexVersion','default_language','language_override','textIndexVersion'] class IndexToolConstants(object): @@ -79,7 +82,8 @@ def __init__(self): pass DATABASES_TO_SKIP = ['admin', 'config', 'local', 'system'] - METADATA_FILES_TO_SKIP = ['system.indexes.metadata.json', 'system.profile.metadata.json'] + SYSTEM_OBJECTS_TO_SKIP = ['system.buckets','system.namespaces','system.indexes','system.profile','system.js','system.views'] + METADATA_FILES_TO_SKIP = ['system.indexes.metadata.json', 'system.profile.metadata.json', 'system.users.metadata.json', 'system.views.metadata.json'] METADATA_FILE_SUFFIX_PATTERN = 'metadata.json' EXCEEDED_LIMITS = 'exceeded_limits' FILE_PATH = 'filepath' @@ -95,6 +99,7 @@ def __init__(self): UNSUPPORTED_INDEX_OPTIONS_KEY = 'unsupported_index_options' UNSUPPORTED_COLLECTION_OPTIONS_KEY = 'unsupported_collection_options' UNSUPPORTED_INDEX_TYPES_KEY = 'unsupported_index_types' + UNSUPPORTED_FIELD_NAMES_KEY = 'unsupported_field_names' class DocumentDbIndexTool(IndexToolConstants): @@ -133,7 +138,7 @@ def _get_db_connection(self, uri): """Connect to instance, returning a connection""" logging.info("Connecting to instance using provided URI") - mongodb_client = MongoClient(uri) + mongodb_client = MongoClient(host=uri,appname='indxtool') # force the client to actually connect mongodb_client.admin.command('ismaster') @@ -255,6 +260,11 @@ def _dump_indexes_from_server(self, connection, output_dir, dry_run=False): database_name][collection_name].options() if 'viewOn' in collection_metadata[self.OPTIONS]: # views cannot have indexes, skip to next collection + logging.debug(" skipping, view not collection") + continue + if collection_name in self.SYSTEM_OBJECTS_TO_SKIP: + # system objects, skip to next collection + logging.debug(" skipping, system object") continue collection_indexes = connection[database_name][ collection_name].list_indexes() @@ -379,7 +389,7 @@ def find_compatibility_issues(self, metadata): collection_name, index_name) if len( collection_qualified_index_name - ) > DocumentDbLimits.COLLECTION_QUALIFIED_INDEX_NAME_MAX_LENGTH: + ) > DocumentDbLimits.COLLECTION_QUALIFIED_INDEX_NAME_MAX_LENGTH and self.args.shorten_index_name is False: message = '$ greater than {} characters'.format( DocumentDbLimits. COLLECTION_QUALIFIED_INDEX_NAME_MAX_LENGTH) @@ -392,7 +402,7 @@ def find_compatibility_issues(self, metadata): collection_namespace, index_name) if len( fully_qualified_index_name - ) > DocumentDbLimits.FULLY_QUALIFIED_INDEX_NAME_MAX_LENGTH: + ) > DocumentDbLimits.FULLY_QUALIFIED_INDEX_NAME_MAX_LENGTH and self.args.shorten_index_name is False: message = '.$ greater than {} characters'.format( DocumentDbLimits. FULLY_QUALIFIED_INDEX_NAME_MAX_LENGTH) @@ -401,12 +411,10 @@ def find_compatibility_issues(self, metadata): message] = fully_qualified_index_name # Check for indexes with too many keys - if len(index) > DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS: - message = 'Index contains more than {} keys'.format( - DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS) - compatibility_issues[db_name][collection_name][ - index_name][self.EXCEEDED_LIMITS][message] = len( - index) + if len(index['key']) > DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS: + message = 'Index contains more than {} keys'.format(DocumentDbLimits.COMPOUND_INDEX_MAX_KEYS) + compatibility_issues[db_name][collection_name][index_name][self.EXCEEDED_LIMITS][message] = len(index['key']) + for key_name in index: # Check for index key names that are too long @@ -432,7 +440,7 @@ def find_compatibility_issues(self, metadata): self.UNSUPPORTED_INDEX_OPTIONS_KEY].append( key_name) - # Check for unsupported index types like text + # Check for unsupported index types if key_name == self.INDEX_KEY: for index_key_name in index[key_name]: key_value = index[key_name][index_key_name] @@ -442,6 +450,20 @@ def find_compatibility_issues(self, metadata): collection_name][index_name][ self. UNSUPPORTED_INDEX_TYPES_KEY] = key_value + + # Check for unsupported field names + if index_key_name in DocumentDbUnsupportedFeatures.UNSUPPORTED_WILDCARD_INDEXES or index_key_name.startswith('$**'): + if self.UNSUPPORTED_FIELD_NAMES_KEY not in compatibility_issues[ + db_name][collection_name][index_name]: + compatibility_issues[db_name][collection_name][ + index_name][ + self. + UNSUPPORTED_FIELD_NAMES_KEY] = [] + + compatibility_issues[db_name][collection_name][ + index_name][ + self.UNSUPPORTED_FIELD_NAMES_KEY].append( + index_key_name) return compatibility_issues @@ -449,49 +471,71 @@ def _restore_indexes(self, connection, metadata): """Restore compatible indexes to a DocumentDB instance""" for db_name in metadata: for collection_name in metadata[db_name]: - for index_name in metadata[db_name][collection_name][ - self.INDEXES]: + for index_name in metadata[db_name][collection_name][self.INDEXES]: # convert the keys dict to a list of tuples as pymongo requires - index_keys = metadata[db_name][collection_name][ - self.INDEXES][index_name][self.INDEX_KEY] + index_keys = metadata[db_name][collection_name][self.INDEXES][index_name][self.INDEX_KEY] keys_to_create = [] index_options = OrderedDict() - - index_options[self.INDEX_NAME] = index_name - for key in index_keys: - index_direction = index_keys[key] - - if type(index_direction) is float: - index_direction = int(index_direction) - elif type(index_direction) is dict and '$numberInt' in index_direction: - index_direction = int(index_direction['$numberInt']) - elif type(index_direction) is dict and '$numberDouble' in index_direction: - index_direction = int(float(index_direction['$numberDouble'])) - - keys_to_create.append((key, index_direction)) - - for k in metadata[db_name][collection_name][ - self.INDEXES][index_name]: - if k != self.INDEX_KEY and k != self.INDEX_VERSION and k not in DocumentDbUnsupportedFeatures.IGNORED_INDEX_OPTIONS: - # this key is an additional index option - index_options[k] = metadata[db_name][ - collection_name][self.INDEXES][index_name][k] + + # $ + collection_qualified_index_name = '{}${}'.format(collection_name, index_name) + # ..$ + fully_qualified_index_name = '{}.{}.${}'.format(db_name, collection_name, index_name) + + if (len(collection_qualified_index_name) > DocumentDbLimits.COLLECTION_QUALIFIED_INDEX_NAME_MAX_LENGTH or + len(fully_qualified_index_name) > DocumentDbLimits.FULLY_QUALIFIED_INDEX_NAME_MAX_LENGTH): + short_index_name = index_name[:(DocumentDbLimits.COLLECTION_QUALIFIED_INDEX_NAME_MAX_LENGTH - + (len(collection_name)+6))] +''.join(random.choices(alphabet, k=5)) + index_options[self.INDEX_NAME] = short_index_name + else: + index_options[self.INDEX_NAME] = index_name + + if 'textIndexVersion' in metadata[db_name][collection_name][self.INDEXES][index_name]: + # special case text indexes + for key in metadata[db_name][collection_name][self.INDEXES][index_name]['weights']: + keys_to_create.append((key, 'text')) + + for k in metadata[db_name][collection_name][self.INDEXES][index_name]: + if k != self.INDEX_KEY and k != self.INDEX_VERSION and k not in DocumentDbUnsupportedFeatures.IGNORED_INDEX_OPTIONS: + # this key is an additional index option + index_options[k] = metadata[db_name][collection_name][self.INDEXES][index_name][k] + else: + for key in index_keys: + index_direction = index_keys[key] + + if type(index_direction) is float: + index_direction = int(index_direction) + elif type(index_direction) is dict and '$numberInt' in index_direction: + index_direction = int(index_direction['$numberInt']) + elif type(index_direction) is dict and '$numberDouble' in index_direction: + index_direction = int(float(index_direction['$numberDouble'])) + + keys_to_create.append((key, index_direction)) + + for k in metadata[db_name][collection_name][self.INDEXES][index_name]: + if k != self.INDEX_KEY and k != self.INDEX_VERSION and k not in DocumentDbUnsupportedFeatures.IGNORED_INDEX_OPTIONS: + # this key is an additional index option + index_options[k] = metadata[db_name][collection_name][self.INDEXES][index_name][k] + # fix for current lack of camelCase of dotProduct + if k == 'vectorOptions' and index_options[k].get('similarity','**missing**') == 'dotproduct': + index_options[k]['similarity'] = 'dotProduct' if self.args.dry_run is True: - logging.info( - "(dry run) %s.%s: would attempt to add index: %s", - db_name, collection_name, index_name) - logging.info(" (dry run) index options: %s", index_options) - logging.info(" (dry run) index keys: %s", keys_to_create) + if self.args.skip_id_indexes and index_options[self.INDEX_NAME] == '_id_': + logging.info("(dry run) skipping _id index creation on %s.%s",db_name,collection_name) + else: + logging.info("(dry run) %s.%s: would attempt to add index: %s",db_name, collection_name, index_options[self.INDEX_NAME] ) + logging.info(" (dry run) index options: %s", index_options) + logging.info(" (dry run) index keys: %s", keys_to_create) else: - logging.debug("Adding index %s -> %s", keys_to_create, - index_options) - database = connection[db_name] - collection = database[collection_name] - collection.create_index(keys_to_create, - **index_options) - logging.info("%s.%s: added index: %s", db_name, - collection_name, index_name) + if self.args.skip_id_indexes and index_options[self.INDEX_NAME] == '_id_': + logging.info("Skipping _id index creation on %s.%s",db_name,collection_name) + else: + logging.debug("Adding index %s -> %s", keys_to_create,index_options) + database = connection[db_name] + collection = database[collection_name] + collection.create_index(keys_to_create,**index_options) + logging.info("%s.%s: added index: %s", db_name, collection_name, index_options[self.INDEX_NAME] ) def run(self): """Entry point @@ -567,72 +611,21 @@ def run(self): def main(): - """ - parse command line arguments and - """ - parser = argparse.ArgumentParser( - description='Dump and restore indexes from MongoDB to DocumentDB.') - - parser.add_argument('--debug', - required=False, - action='store_true', - help='Output debugging information') - - parser.add_argument( - '--dry-run', - required=False, - action='store_true', - help='Perform processing, but do not actually export or restore indexes') - - parser.add_argument('--uri', - required=False, - type=str, - help='URI to connect to MongoDB or Amazon DocumentDB') - - parser.add_argument('--dir', - required=True, - type=str, - help='Specify the folder to export to or restore from (required)') - - parser.add_argument('--show-compatible', - required=False, - action='store_true', - dest='show_compatible', - help='Output all compatible indexes with Amazon DocumentDB (no change is applied)') - - parser.add_argument( - '--show-issues', - required=False, - action='store_true', - dest='show_issues', - help='Output a report of compatibility issues found') - - parser.add_argument('--dump-indexes', - required=False, - action='store_true', - help='Perform index export from the specified server') - - parser.add_argument( - '--restore-indexes', - required=False, - action='store_true', - help='Restore indexes found in metadata to the specified server') - - parser.add_argument( - '--skip-incompatible', - required=False, - action='store_true', - help='Skip incompatible indexes when restoring metadata') - - parser.add_argument('--support-2dsphere', - required=False, - action='store_true', - help='Support 2dsphere indexes creation (collections data must use GeoJSON Point type for indexing)') - - parser.add_argument('--skip-python-version-check', - required=False, - action='store_true', - help='Permit execution on Python 3.6 and prior') + parser = argparse.ArgumentParser(description='Dump and restore indexes from MongoDB to DocumentDB.') + + parser.add_argument('--debug',required=False,action='store_true',help='Output debugging information') + parser.add_argument('--dry-run',required=False,action='store_true',help='Perform processing, but do not actually export or restore indexes') + parser.add_argument('--uri',required=False,type=str,help='URI to connect to MongoDB or Amazon DocumentDB') + parser.add_argument('--dir',required=True,type=str,help='Specify the folder to export to or restore from (required)') + parser.add_argument('--show-compatible',required=False,action='store_true',dest='show_compatible',help='Output all compatible indexes with Amazon DocumentDB (no change is applied)') + parser.add_argument('--show-issues',required=False,action='store_true',dest='show_issues',help='Output a report of compatibility issues found') + parser.add_argument('--dump-indexes',required=False,action='store_true',help='Perform index export from the specified server') + parser.add_argument('--restore-indexes',required=False,action='store_true',help='Restore indexes found in metadata to the specified server') + parser.add_argument('--skip-incompatible',required=False,action='store_true',help='Skip incompatible indexes when restoring metadata') + parser.add_argument('--support-2dsphere',required=False,action='store_true',help='Support 2dsphere indexes creation (collections data must use GeoJSON Point type for indexing)') + parser.add_argument('--skip-python-version-check',required=False,action='store_true',help='Permit execution on Python 3.6 and prior') + parser.add_argument('--shorten-index-name',required=False,action='store_true',help='Shorten long index name to compatible length') + parser.add_argument('--skip-id-indexes',required=False,action='store_true',help='Do not create _id indexes') args = parser.parse_args() diff --git a/index-tool/test/test-too-many-keys/idxtest/tmc.metadata.json b/index-tool/test/test-too-many-keys/idxtest/tmc.metadata.json new file mode 100644 index 0000000..3383ae6 --- /dev/null +++ b/index-tool/test/test-too-many-keys/idxtest/tmc.metadata.json @@ -0,0 +1 @@ +{"options":{},"indexes":[{"v":2,"key":{"_id":1},"name":"_id_","ns":"idxtest.tmc"},{"v":2,"key":{"one":1,"two":1},"name":"one_two","ns":"idxtest.tmc"},{"v":2,"key":{"five":1,"three":-1,"four":1},"name":"five_three_four","ns":"idxtest.tmc"},{"v":2,"key":{"five":1,"four":1},"name":"five_four","ns":"idxtest.tmc"},{"v":2,"key":{"two":-1,"one":1,"four":1},"name":"two_one_four","ns":"idxtest.tmc"},{"v":2,"key":{"five":1,"three":-1,"four":1,"one":1,"two":1},"name":"five_three_four_one_two","ns":"idxtest.tmc"}]} \ No newline at end of file diff --git a/index-tool/test/test1.bash b/index-tool/test/test1.bash index 5bc7549..931ff74 100755 --- a/index-tool/test/test1.bash +++ b/index-tool/test/test1.bash @@ -1,3 +1,3 @@ #! /bin/bash -python3 ../migrationtools/documentdb_index_tool.py --restore-indexes --dry-run --dir test1 | sed -n '1d;p' | cut -c 26- | diff - test1.expects +python3 ../migrationtools/documentdb_index_tool.py --show-issues --dry-run --dir test1 | sed -n '1d;p' | cut -c 26- | diff - test1.expects diff --git a/migration/README.md b/migration/README.md new file mode 100644 index 0000000..bc30d91 --- /dev/null +++ b/migration/README.md @@ -0,0 +1,13 @@ +# Amazon DocumentDB Migration Tools + +* [cosmos-db-migration-utility](./cosmos-db-migration-utility) - migrate from Cosmos DB to Amazon DocumentDB. +* [couchbase-migration-utility](./cosmos-db-migration-utility) - migrate from Couchbase to Amazon DocumentDB. +* [data-differ](./data-differ) - compare documents between two databases or collections. +* [dms-segments](./dms-segments) - calculate segments for Amazon DMS full load segmentation. +* [export-users](./export-users) - export users from MongoDB or Amazon DocumentDB. +* [json-import](./json-import) - high speed concurrent JSON data loader. +* [migrator](./migrator) - high speed concurrent full load and change data capture for online migrations. +* [mongodb-changestream-review](./mongodb-changestream-review) - scan the changestream to determine the collection level insert/update/delete rates. +* [mongodb-oplog-review](./mongodb-oplog-review) - scan the oplog to determine the collection level insert/update/delete rates. +* [mongodb-ops](./mongodb-ops) - extract collection level query/insert/update/delete MongoDB counters to estimate workload for migrations. +* [mvu-tool](./mvu-tool) - live migration tool to assist in providing near zero-downtime major version upgrades. diff --git a/cosmos-db-migration-utility/.gitignore b/migration/cosmos-db-migration-utility/.gitignore similarity index 100% rename from cosmos-db-migration-utility/.gitignore rename to migration/cosmos-db-migration-utility/.gitignore diff --git a/cosmos-db-migration-utility/README.md b/migration/cosmos-db-migration-utility/README.md similarity index 100% rename from cosmos-db-migration-utility/README.md rename to migration/cosmos-db-migration-utility/README.md diff --git a/cosmos-db-migration-utility/docs/architecture/architecture-diagram.drawio b/migration/cosmos-db-migration-utility/docs/architecture/architecture-diagram.drawio similarity index 100% rename from cosmos-db-migration-utility/docs/architecture/architecture-diagram.drawio rename to migration/cosmos-db-migration-utility/docs/architecture/architecture-diagram.drawio diff --git a/cosmos-db-migration-utility/docs/architecture/architecture-diagram.png b/migration/cosmos-db-migration-utility/docs/architecture/architecture-diagram.png similarity index 100% rename from cosmos-db-migration-utility/docs/architecture/architecture-diagram.png rename to migration/cosmos-db-migration-utility/docs/architecture/architecture-diagram.png diff --git a/cosmos-db-migration-utility/docs/images/cloud-trail-log.png b/migration/cosmos-db-migration-utility/docs/images/cloud-trail-log.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/cloud-trail-log.png rename to migration/cosmos-db-migration-utility/docs/images/cloud-trail-log.png diff --git a/cosmos-db-migration-utility/docs/images/cloud-watch-log-group.png b/migration/cosmos-db-migration-utility/docs/images/cloud-watch-log-group.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/cloud-watch-log-group.png rename to migration/cosmos-db-migration-utility/docs/images/cloud-watch-log-group.png diff --git a/cosmos-db-migration-utility/docs/images/core-resources-create-stack.png b/migration/cosmos-db-migration-utility/docs/images/core-resources-create-stack.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/core-resources-create-stack.png rename to migration/cosmos-db-migration-utility/docs/images/core-resources-create-stack.png diff --git a/cosmos-db-migration-utility/docs/images/core-resources-review-stack.png b/migration/cosmos-db-migration-utility/docs/images/core-resources-review-stack.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/core-resources-review-stack.png rename to migration/cosmos-db-migration-utility/docs/images/core-resources-review-stack.png diff --git a/cosmos-db-migration-utility/docs/images/core-resources-stack-details.png b/migration/cosmos-db-migration-utility/docs/images/core-resources-stack-details.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/core-resources-stack-details.png rename to migration/cosmos-db-migration-utility/docs/images/core-resources-stack-details.png diff --git a/cosmos-db-migration-utility/docs/images/core-resources-stack-status.png b/migration/cosmos-db-migration-utility/docs/images/core-resources-stack-status.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/core-resources-stack-status.png rename to migration/cosmos-db-migration-utility/docs/images/core-resources-stack-status.png diff --git a/cosmos-db-migration-utility/docs/images/documentdb-connection-string.png b/migration/cosmos-db-migration-utility/docs/images/documentdb-connection-string.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/documentdb-connection-string.png rename to migration/cosmos-db-migration-utility/docs/images/documentdb-connection-string.png diff --git a/cosmos-db-migration-utility/docs/images/documentdb-resources-create-stack.png b/migration/cosmos-db-migration-utility/docs/images/documentdb-resources-create-stack.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/documentdb-resources-create-stack.png rename to migration/cosmos-db-migration-utility/docs/images/documentdb-resources-create-stack.png diff --git a/cosmos-db-migration-utility/docs/images/documentdb-resources-stack-details.png b/migration/cosmos-db-migration-utility/docs/images/documentdb-resources-stack-details.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/documentdb-resources-stack-details.png rename to migration/cosmos-db-migration-utility/docs/images/documentdb-resources-stack-details.png diff --git a/cosmos-db-migration-utility/docs/images/documentdb-resources-stack-status.png b/migration/cosmos-db-migration-utility/docs/images/documentdb-resources-stack-status.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/documentdb-resources-stack-status.png rename to migration/cosmos-db-migration-utility/docs/images/documentdb-resources-stack-status.png diff --git a/cosmos-db-migration-utility/docs/images/ec2-instance-ami.png b/migration/cosmos-db-migration-utility/docs/images/ec2-instance-ami.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/ec2-instance-ami.png rename to migration/cosmos-db-migration-utility/docs/images/ec2-instance-ami.png diff --git a/cosmos-db-migration-utility/docs/images/ec2-instance-review.png b/migration/cosmos-db-migration-utility/docs/images/ec2-instance-review.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/ec2-instance-review.png rename to migration/cosmos-db-migration-utility/docs/images/ec2-instance-review.png diff --git a/cosmos-db-migration-utility/docs/images/ec2-instance-review3.png b/migration/cosmos-db-migration-utility/docs/images/ec2-instance-review3.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/ec2-instance-review3.png rename to migration/cosmos-db-migration-utility/docs/images/ec2-instance-review3.png diff --git a/cosmos-db-migration-utility/docs/images/s3-bucket-with-lambda-functions.png b/migration/cosmos-db-migration-utility/docs/images/s3-bucket-with-lambda-functions.png similarity index 100% rename from cosmos-db-migration-utility/docs/images/s3-bucket-with-lambda-functions.png rename to migration/cosmos-db-migration-utility/docs/images/s3-bucket-with-lambda-functions.png diff --git a/cosmos-db-migration-utility/lib/lambda/lambda-pack-pymongo.zip b/migration/cosmos-db-migration-utility/lib/lambda/lambda-pack-pymongo.zip similarity index 100% rename from cosmos-db-migration-utility/lib/lambda/lambda-pack-pymongo.zip rename to migration/cosmos-db-migration-utility/lib/lambda/lambda-pack-pymongo.zip diff --git a/cosmos-db-migration-utility/scripts/build-package.sh b/migration/cosmos-db-migration-utility/scripts/build-package.sh similarity index 100% rename from cosmos-db-migration-utility/scripts/build-package.sh rename to migration/cosmos-db-migration-utility/scripts/build-package.sh diff --git a/cosmos-db-migration-utility/src/cloudformation/core-resources.yaml b/migration/cosmos-db-migration-utility/src/cloudformation/core-resources.yaml similarity index 100% rename from cosmos-db-migration-utility/src/cloudformation/core-resources.yaml rename to migration/cosmos-db-migration-utility/src/cloudformation/core-resources.yaml diff --git a/cosmos-db-migration-utility/src/cloudformation/documentdb.yaml b/migration/cosmos-db-migration-utility/src/cloudformation/documentdb.yaml similarity index 100% rename from cosmos-db-migration-utility/src/cloudformation/documentdb.yaml rename to migration/cosmos-db-migration-utility/src/cloudformation/documentdb.yaml diff --git a/cosmos-db-migration-utility/src/configure/application.py b/migration/cosmos-db-migration-utility/src/configure/application.py similarity index 100% rename from cosmos-db-migration-utility/src/configure/application.py rename to migration/cosmos-db-migration-utility/src/configure/application.py diff --git a/cosmos-db-migration-utility/src/configure/commandline_parser.py b/migration/cosmos-db-migration-utility/src/configure/commandline_parser.py similarity index 100% rename from cosmos-db-migration-utility/src/configure/commandline_parser.py rename to migration/cosmos-db-migration-utility/src/configure/commandline_parser.py diff --git a/cosmos-db-migration-utility/src/configure/common/application_exception.py b/migration/cosmos-db-migration-utility/src/configure/common/application_exception.py similarity index 100% rename from cosmos-db-migration-utility/src/configure/common/application_exception.py rename to migration/cosmos-db-migration-utility/src/configure/common/application_exception.py diff --git a/cosmos-db-migration-utility/src/configure/common/logger.py b/migration/cosmos-db-migration-utility/src/configure/common/logger.py similarity index 100% rename from cosmos-db-migration-utility/src/configure/common/logger.py rename to migration/cosmos-db-migration-utility/src/configure/common/logger.py diff --git a/cosmos-db-migration-utility/src/configure/json_encoder.py b/migration/cosmos-db-migration-utility/src/configure/json_encoder.py similarity index 100% rename from cosmos-db-migration-utility/src/configure/json_encoder.py rename to migration/cosmos-db-migration-utility/src/configure/json_encoder.py diff --git a/cosmos-db-migration-utility/src/configure/main.py b/migration/cosmos-db-migration-utility/src/configure/main.py similarity index 100% rename from cosmos-db-migration-utility/src/configure/main.py rename to migration/cosmos-db-migration-utility/src/configure/main.py diff --git a/cosmos-db-migration-utility/src/configure/rds-combined-ca-bundle.pem b/migration/cosmos-db-migration-utility/src/configure/rds-combined-ca-bundle.pem similarity index 100% rename from cosmos-db-migration-utility/src/configure/rds-combined-ca-bundle.pem rename to migration/cosmos-db-migration-utility/src/configure/rds-combined-ca-bundle.pem diff --git a/cosmos-db-migration-utility/src/configure/requirements.txt b/migration/cosmos-db-migration-utility/src/configure/requirements.txt similarity index 100% rename from cosmos-db-migration-utility/src/configure/requirements.txt rename to migration/cosmos-db-migration-utility/src/configure/requirements.txt diff --git a/cosmos-db-migration-utility/src/lambda/app-request-reader/lambda_function.py b/migration/cosmos-db-migration-utility/src/lambda/app-request-reader/lambda_function.py similarity index 100% rename from cosmos-db-migration-utility/src/lambda/app-request-reader/lambda_function.py rename to migration/cosmos-db-migration-utility/src/lambda/app-request-reader/lambda_function.py diff --git a/cosmos-db-migration-utility/src/lambda/app-request-reader/sample_request_start.json b/migration/cosmos-db-migration-utility/src/lambda/app-request-reader/sample_request_start.json similarity index 100% rename from cosmos-db-migration-utility/src/lambda/app-request-reader/sample_request_start.json rename to migration/cosmos-db-migration-utility/src/lambda/app-request-reader/sample_request_start.json diff --git a/cosmos-db-migration-utility/src/lambda/app-request-reader/sample_request_stop.json b/migration/cosmos-db-migration-utility/src/lambda/app-request-reader/sample_request_stop.json similarity index 100% rename from cosmos-db-migration-utility/src/lambda/app-request-reader/sample_request_stop.json rename to migration/cosmos-db-migration-utility/src/lambda/app-request-reader/sample_request_stop.json diff --git a/cosmos-db-migration-utility/src/lambda/batch-request-reader/lambda_function.py b/migration/cosmos-db-migration-utility/src/lambda/batch-request-reader/lambda_function.py similarity index 99% rename from cosmos-db-migration-utility/src/lambda/batch-request-reader/lambda_function.py rename to migration/cosmos-db-migration-utility/src/lambda/batch-request-reader/lambda_function.py index 1c160b1..36d72ee 100644 --- a/cosmos-db-migration-utility/src/lambda/batch-request-reader/lambda_function.py +++ b/migration/cosmos-db-migration-utility/src/lambda/batch-request-reader/lambda_function.py @@ -125,7 +125,7 @@ def get_cluster_connection_string(cluster_name): client = boto3.resource('dynamodb') logger.info("Getting the connection string of the cluster_name: %s.", cluster_name) connection_string = get_secret_value("migrator-app/{}".format(cluster_name)) - logger.info("Successfully fetched the connection string of the cluster_name: %s. Connection string: %s", cluster_name, connection_string) + logger.info("Successfully fetched the connection string of the cluster_name: %s.", cluster_name) return connection_string def bulk_write_data_to_document_db(cluster_name, namespace, data): diff --git a/cosmos-db-migration-utility/src/lambda/batch-request-reader/rds-combined-ca-bundle.pem b/migration/cosmos-db-migration-utility/src/lambda/batch-request-reader/rds-combined-ca-bundle.pem similarity index 100% rename from cosmos-db-migration-utility/src/lambda/batch-request-reader/rds-combined-ca-bundle.pem rename to migration/cosmos-db-migration-utility/src/lambda/batch-request-reader/rds-combined-ca-bundle.pem diff --git a/cosmos-db-migration-utility/src/lambda/batch-request-reader/sample_request.json b/migration/cosmos-db-migration-utility/src/lambda/batch-request-reader/sample_request.json similarity index 100% rename from cosmos-db-migration-utility/src/lambda/batch-request-reader/sample_request.json rename to migration/cosmos-db-migration-utility/src/lambda/batch-request-reader/sample_request.json diff --git a/cosmos-db-migration-utility/src/lambda/gap-watch-request-reader/lambda_function.py b/migration/cosmos-db-migration-utility/src/lambda/gap-watch-request-reader/lambda_function.py similarity index 100% rename from cosmos-db-migration-utility/src/lambda/gap-watch-request-reader/lambda_function.py rename to migration/cosmos-db-migration-utility/src/lambda/gap-watch-request-reader/lambda_function.py diff --git a/cosmos-db-migration-utility/src/lambda/gap-watch-request-reader/sample_request.json b/migration/cosmos-db-migration-utility/src/lambda/gap-watch-request-reader/sample_request.json similarity index 100% rename from cosmos-db-migration-utility/src/lambda/gap-watch-request-reader/sample_request.json rename to migration/cosmos-db-migration-utility/src/lambda/gap-watch-request-reader/sample_request.json diff --git a/cosmos-db-migration-utility/src/migrator-app/commandline_parser.py b/migration/cosmos-db-migration-utility/src/migrator-app/commandline_parser.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/commandline_parser.py rename to migration/cosmos-db-migration-utility/src/migrator-app/commandline_parser.py diff --git a/cosmos-db-migration-utility/src/migrator-app/common/Singleton.py b/migration/cosmos-db-migration-utility/src/migrator-app/common/Singleton.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/common/Singleton.py rename to migration/cosmos-db-migration-utility/src/migrator-app/common/Singleton.py diff --git a/cosmos-db-migration-utility/src/migrator-app/common/__init__.py b/migration/cosmos-db-migration-utility/src/migrator-app/common/__init__.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/common/__init__.py rename to migration/cosmos-db-migration-utility/src/migrator-app/common/__init__.py diff --git a/cosmos-db-migration-utility/src/migrator-app/common/application_exception.py b/migration/cosmos-db-migration-utility/src/migrator-app/common/application_exception.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/common/application_exception.py rename to migration/cosmos-db-migration-utility/src/migrator-app/common/application_exception.py diff --git a/cosmos-db-migration-utility/src/migrator-app/common/json_encoder.py b/migration/cosmos-db-migration-utility/src/migrator-app/common/json_encoder.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/common/json_encoder.py rename to migration/cosmos-db-migration-utility/src/migrator-app/common/json_encoder.py diff --git a/cosmos-db-migration-utility/src/migrator-app/common/logger.py b/migration/cosmos-db-migration-utility/src/migrator-app/common/logger.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/common/logger.py rename to migration/cosmos-db-migration-utility/src/migrator-app/common/logger.py diff --git a/cosmos-db-migration-utility/src/migrator-app/common/timer.py b/migration/cosmos-db-migration-utility/src/migrator-app/common/timer.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/common/timer.py rename to migration/cosmos-db-migration-utility/src/migrator-app/common/timer.py diff --git a/cosmos-db-migration-utility/src/migrator-app/helpers/__init__.py b/migration/cosmos-db-migration-utility/src/migrator-app/helpers/__init__.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/helpers/__init__.py rename to migration/cosmos-db-migration-utility/src/migrator-app/helpers/__init__.py diff --git a/cosmos-db-migration-utility/src/migrator-app/helpers/change_manager.py b/migration/cosmos-db-migration-utility/src/migrator-app/helpers/change_manager.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/helpers/change_manager.py rename to migration/cosmos-db-migration-utility/src/migrator-app/helpers/change_manager.py diff --git a/cosmos-db-migration-utility/src/migrator-app/helpers/document_batcher.py b/migration/cosmos-db-migration-utility/src/migrator-app/helpers/document_batcher.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/helpers/document_batcher.py rename to migration/cosmos-db-migration-utility/src/migrator-app/helpers/document_batcher.py diff --git a/cosmos-db-migration-utility/src/migrator-app/helpers/dynamodb_helper.py b/migration/cosmos-db-migration-utility/src/migrator-app/helpers/dynamodb_helper.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/helpers/dynamodb_helper.py rename to migration/cosmos-db-migration-utility/src/migrator-app/helpers/dynamodb_helper.py diff --git a/cosmos-db-migration-utility/src/migrator-app/helpers/file_helper.py b/migration/cosmos-db-migration-utility/src/migrator-app/helpers/file_helper.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/helpers/file_helper.py rename to migration/cosmos-db-migration-utility/src/migrator-app/helpers/file_helper.py diff --git a/cosmos-db-migration-utility/src/migrator-app/helpers/s3_helper.py b/migration/cosmos-db-migration-utility/src/migrator-app/helpers/s3_helper.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/helpers/s3_helper.py rename to migration/cosmos-db-migration-utility/src/migrator-app/helpers/s3_helper.py diff --git a/cosmos-db-migration-utility/src/migrator-app/helpers/tokens_manager.py b/migration/cosmos-db-migration-utility/src/migrator-app/helpers/tokens_manager.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/helpers/tokens_manager.py rename to migration/cosmos-db-migration-utility/src/migrator-app/helpers/tokens_manager.py diff --git a/cosmos-db-migration-utility/src/migrator-app/main.py b/migration/cosmos-db-migration-utility/src/migrator-app/main.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/main.py rename to migration/cosmos-db-migration-utility/src/migrator-app/main.py diff --git a/cosmos-db-migration-utility/src/migrator-app/migrators/ClusterMigrator.py b/migration/cosmos-db-migration-utility/src/migrator-app/migrators/ClusterMigrator.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/migrators/ClusterMigrator.py rename to migration/cosmos-db-migration-utility/src/migrator-app/migrators/ClusterMigrator.py diff --git a/cosmos-db-migration-utility/src/migrator-app/migrators/CollectionMigrator.py b/migration/cosmos-db-migration-utility/src/migrator-app/migrators/CollectionMigrator.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/migrators/CollectionMigrator.py rename to migration/cosmos-db-migration-utility/src/migrator-app/migrators/CollectionMigrator.py diff --git a/cosmos-db-migration-utility/src/migrator-app/migrators/DatabaseMigrator.py b/migration/cosmos-db-migration-utility/src/migrator-app/migrators/DatabaseMigrator.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/migrators/DatabaseMigrator.py rename to migration/cosmos-db-migration-utility/src/migrator-app/migrators/DatabaseMigrator.py diff --git a/cosmos-db-migration-utility/src/migrator-app/migrators/TokenTracker.py b/migration/cosmos-db-migration-utility/src/migrator-app/migrators/TokenTracker.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/migrators/TokenTracker.py rename to migration/cosmos-db-migration-utility/src/migrator-app/migrators/TokenTracker.py diff --git a/cosmos-db-migration-utility/src/migrator-app/migrators/__init__.py b/migration/cosmos-db-migration-utility/src/migrator-app/migrators/__init__.py similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/migrators/__init__.py rename to migration/cosmos-db-migration-utility/src/migrator-app/migrators/__init__.py diff --git a/cosmos-db-migration-utility/src/migrator-app/requirements.txt b/migration/cosmos-db-migration-utility/src/migrator-app/requirements.txt similarity index 72% rename from cosmos-db-migration-utility/src/migrator-app/requirements.txt rename to migration/cosmos-db-migration-utility/src/migrator-app/requirements.txt index 824baae..96a4ef6 100644 --- a/cosmos-db-migration-utility/src/migrator-app/requirements.txt +++ b/migration/cosmos-db-migration-utility/src/migrator-app/requirements.txt @@ -1,4 +1,4 @@ -pymongo==3.10.1 +pymongo==4.6.3 PyYAML==5.4 boto3==1.12.17 argparse==1.4.0 \ No newline at end of file diff --git a/cosmos-db-migration-utility/src/migrator-app/tokens.yaml b/migration/cosmos-db-migration-utility/src/migrator-app/tokens.yaml similarity index 100% rename from cosmos-db-migration-utility/src/migrator-app/tokens.yaml rename to migration/cosmos-db-migration-utility/src/migrator-app/tokens.yaml diff --git a/migration/data-differ/README.md b/migration/data-differ/README.md index 83df52c..ffe3002 100644 --- a/migration/data-differ/README.md +++ b/migration/data-differ/README.md @@ -24,22 +24,72 @@ git clone https://github.com/awslabs/amazon-documentdb-tools.git cd amazon-documentdb-tools/migration/data-differ/ ``` -2. Update the `source.vars` file and export the variables with `source source.vars` - -3. Run the data-differ.py tool, which accepts the following (optional) arguments: +2. Run the data-differ.py tool, which accepts the following arguments: ``` python3 data-differ.py --help -usage: data-differ.py [-h] [--batch_size BATCH_SIZE] [--output_file OUTPUT_FILE] [--check_target CHECK_TARGET] +usage: data-differ.py [-h] [--batch-size BATCH_SIZE] [--output-file OUTPUT_FILE] [--check-target] --source-uri SOURCE_URI --target-uri TARGET_URI --source-db SOURCE_DB --target-db TARGET_DB --source-coll SOURCE_COLL --target-coll TARGET_COLL [--sample-size_percent SAMPLE_SIZE_PERCENT] [--sampling-timeout-ms SAMPLING_TIMEOUT_MS] Compare two collections and report differences. -optional arguments: +options: -h, --help show this help message and exit - --batch_size BATCH_SIZE - Batch size for bulk reads (default: 100) - --output_file OUTPUT_FILE - Output file path (default: differences.txt) - --check_target CHECK_TARGET - Check if extra documents exist in target database + --batch-size BATCH_SIZE + Batch size for bulk reads (optional, default: 100) + --output-file OUTPUT_FILE + Output file path (optional, default: differences.txt) + --check-target + optional, Check if extra documents exist in target database + --source-uri SOURCE_URI + Source cluster URI (required) + --target-uri TARGET_URI + Target cluster URI (required) + --source-db SOURCE_DB + Source database name (required) + --target-db TARGET_DB + Target database name (required) + --source-coll SOURCE_COLL + Source collection name (required) + --target-coll TARGET_COLL + Target collection name (required) + --sample-size-percent SAMPLE_SIZE_PERCENT + optional, if set only samples a percentage of the documents + --sampling-timeout-ms SAMPLING_TIMEOUT_MS + optional, override the timeout for returning a sample of documents when using the --sample-size-percent argument +``` + +## Example usage: +Connect to a standalone MongoDB instance as source and to a Amazon DocumentDB cluster as target. + +From the source uri, compare the collection *mysourcecollection* from database *mysource*, against the collection *mytargetcollection* from database *mytargetdb* in the target uri. + +``` +python3 data-differ.py \ +--source-uri "mongodb://user:password@mongodb-instance-hostname:27017/admin?directConnection=true" \ +--target-uri "mongodb://user:password@target.cluster.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false" \ +--source-db mysourcedb \ +--source-coll mysourcecollection \ +--target-db mytargetdb \ +--target-coll mytargetcollection ``` + +For more information on the connection string format, refer to the [documentation](https://www.mongodb.com/docs/manual/reference/connection-string/). + +## Sampling +For large databases it might be unfeasible to compare every document as: +* It takes a long time to compare every document. +* Reading every document from a large busy database could have a performance impact. + +If you use the `--sample-size-percent` option you can pass in a percentage of +documents to sample and compare. + +E.g. `--sample-size-percent 1` would sample 1% of the documents in the source +database and compare them to the target database. + +Under the hood this uses the [MongoDB `$sample` operator](https://www.mongodb.com/docs/manual/reference/operator/aggregation/sample/) +You should read the documentation on how that behaves on your version of MongoDB +when the percentage to sample is >= 5% before picking a percentage to sample. + +The default timeout for retriving a sample of documents is `500ms`, if this is +not long enough you can adjust it with the `--sampling-timeout-ms` argument. +For example `--sample-timeout-ms 600` would increase the timeout to `600ms`. diff --git a/migration/data-differ/data-differ.py b/migration/data-differ/data-differ.py index 211d8e9..8e2ddaa 100644 --- a/migration/data-differ/data-differ.py +++ b/migration/data-differ/data-differ.py @@ -1,15 +1,15 @@ -import os import argparse +import json from pymongo import MongoClient from deepdiff import DeepDiff from tqdm import tqdm from datetime import datetime -from multiprocessing import Pool +from multiprocessing import Pool, cpu_count def connect_to_db(uri, pool_size): try: - client = MongoClient(uri, maxPoolSize=pool_size) + client = MongoClient(host=uri, maxPoolSize=pool_size, appname='datadiff') return client except Exception as e: print(f"Error connecting to database: {e}") @@ -19,8 +19,12 @@ def connect_to_db(uri, pool_size): ## Find missing docs in source when doc count in target is higher def check_target_for_extra_documents(srcCollection, tgtCollection, output_file): print("Check if extra documents exist in target database. Scanning......") - missing_docs = tgtCollection.find({'_id': {'$nin': srcCollection.distinct('_id')}}) - if len(list(missing_docs.clone())) > 0 : + # Using aggregation pipeline instead of distinct to handle dictionary _id values + src_ids = [doc['_id'] for doc in srcCollection.find({}, {'_id': 1})] + + # Find documents in target that don't exist in source + missing_docs = tgtCollection.find({'_id': {'$nin': src_ids}}) + if len(list(missing_docs.clone())) > 0: write_difference_to_file(output_file, "Document _IDs present in the target collection but not in the source collection:") for doc in missing_docs: print(doc['_id']) @@ -39,10 +43,23 @@ def compare_docs_deepdiff(doc1, doc2, output_file): print(f"An error occurred while comparing documents: {e}") +## Helper function to make _id hashable +def make_id_hashable(id_value): + if isinstance(id_value, dict): + return json.dumps(id_value, sort_keys=True) + return id_value + ## Main compare document function -def compare_document_data(srcCollection, tgtCollection, batch_size, output_file, src_count): - source_cursor = srcCollection.find().sort('_id').batch_size(batch_size) - total_docs = src_count +def compare_document_data(srcCollection, tgtCollection, batch_size, output_file, src_count, sample_size_percent, sampling_timeout_ms): + if sample_size_percent: + percentage_in_decimal = sample_size_percent / 100 + docs_to_sample = int(percentage_in_decimal * src_count) + source_cursor = srcCollection.aggregate([ { "$sample": { "size": docs_to_sample } } ], batchSize=batch_size, maxTimeMS=sampling_timeout_ms) + total_docs = docs_to_sample + else: + source_cursor = srcCollection.find().sort('_id').batch_size(batch_size) + total_docs = src_count + progress_bar = tqdm(total=total_docs, desc='Comparing documents', unit='doc') tgt_missing_ids = [] processed_docs = 0 @@ -51,30 +68,38 @@ def compare_document_data(srcCollection, tgtCollection, batch_size, output_file, while source_cursor.alive: batch1 = [next(source_cursor, None) for _ in range(batch_size)] doc_pairs = [] - src_ids_set = set() + src_ids_list = [] for document in batch1: if document is not None: - source_id = document['_id'] - src_ids_set.add(source_id) doc_pairs.append((document, None)) # None is used as a placeholder for target document - - tgt_docs = tgtCollection.find({"_id": {"$in": [doc[0]['_id'] for doc in doc_pairs]}}) - tgt_docs_map = {doc['_id']: doc for doc in tgt_docs} - - doc_pairs = [(doc[0], tgt_docs_map.get(doc[0]['_id'])) for doc in doc_pairs] - - # Check if any documents from source collection are missing in the target collection - missing_docs = [doc[0] for doc in doc_pairs if doc[1] is None] - tgt_missing_ids.extend([doc['_id'] for doc in missing_docs]) + src_ids_list.append(document['_id']) + + # Use MongoDB's $in operator directly without converting to set + tgt_docs = tgtCollection.find({"_id": {"$in": src_ids_list}}) + + # Create a dictionary mapping hashable versions of _id to documents + tgt_docs_map = {} + for doc in tgt_docs: + hashable_id = make_id_hashable(doc['_id']) + tgt_docs_map[hashable_id] = doc + + # Match source documents with target documents + matched_doc_pairs = [] + for src_doc in [doc[0] for doc in doc_pairs if doc[0] is not None]: + hashable_id = make_id_hashable(src_doc['_id']) + tgt_doc = tgt_docs_map.get(hashable_id) + matched_doc_pairs.append((src_doc, tgt_doc)) + if tgt_doc is None: + tgt_missing_ids.append(src_doc['_id']) # Check difference between docs, multi process based on cpu_count() - pool_size = os.cpu_count() + pool_size = cpu_count() with Pool(pool_size) as pool: - pool.starmap(compare_docs_deepdiff, [(doc1, doc2, output_file) for doc1, doc2 in doc_pairs if doc2 is not None]) + pool.starmap(compare_docs_deepdiff, [(doc1, doc2, output_file) for doc1, doc2 in matched_doc_pairs if doc2 is not None]) - processed_docs += len(doc_pairs) - progress_bar.update(len(doc_pairs)) + processed_docs += len(matched_doc_pairs) + progress_bar.update(len(matched_doc_pairs)) except Exception as e: print(f"An error occurred while comparing documents: {e}") @@ -108,7 +133,7 @@ def write_difference_to_file(output_file, content): file.write(str(content) + '\n') -def compare_collections(srcCollection, tgtCollection, batch_size, output_file, check_target): +def compare_collections(srcCollection, tgtCollection, batch_size, output_file, check_target, sample_size_percent, sampling_timeout_ms): src_count = srcCollection.count_documents({}) trg_count = tgtCollection.count_documents({}) @@ -119,43 +144,45 @@ def compare_collections(srcCollection, tgtCollection, batch_size, output_file, c print("No documents found in the target collection, please re-check you selected the right target collection.") return if src_count < trg_count: - print(f"Warning: There are more documents in target collection than the source collection, {trg_count} vs. {src_count}. Use --check_target to identify the missing docs in the source collection. ") + if not check_target: + print(f"Warning: There are more documents in target collection than the source collection, {trg_count} vs. {src_count}. Use --check-target to identify the missing docs in the source collection. ") write_difference_to_file(output_file, "Count of documents in source:" + str(src_count) ) write_difference_to_file(output_file, "Count of documents in target:" + str(trg_count) ) print(f"Starting data differ at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} , output is saved to {output_file}") - compare_document_data(srcCollection, tgtCollection, batch_size, output_file,src_count) + compare_document_data(srcCollection, tgtCollection, batch_size, output_file, src_count, sample_size_percent, sampling_timeout_ms) compare_indexes(srcCollection, tgtCollection, output_file) - if check_target : + if check_target: check_target_for_extra_documents(srcCollection, tgtCollection, output_file) def main(): parser = argparse.ArgumentParser(description='Compare two collections and report differences.') - parser.add_argument('--batch_size', type=int, default=100, help='Batch size for bulk reads (default: 100)') - parser.add_argument('--output_file', type=str, default='differences.txt', help='Output file path (default: differences.txt)') - parser.add_argument('--check_target', type=str, default=False, help='Check if extra documents exist in target database') + parser.add_argument('--batch-size', type=int, default=100, help='Batch size for bulk reads (optional, default: 100)') + parser.add_argument('--output-file', type=str, default='differences.txt', help='Output file path (optional, default: differences.txt)') + parser.add_argument('--check-target', action='store_true', default=False, help='optional, Check if extra documents exist in target database') + parser.add_argument('--source-uri', type=str, required=True, help='Source cluster URI (required)') + parser.add_argument('--target-uri', type=str, required=True, help='Target cluster URI (required)') + parser.add_argument('--source-db', type=str, required=True, help='Source database name (required)') + parser.add_argument('--target-db', type=str, required=True, help='Target database name (required)') + parser.add_argument('--source-coll', type=str, required=True, help='Source collection name (required)') + parser.add_argument('--target-coll', type=str, required=True, help='Target collection name (required)') + parser.add_argument('--sample-size-percent', type=int, required=False, help='optional, if set only samples a percentage of the documents') + parser.add_argument('--sampling-timeout-ms', type=int, default=500, required=False, help='optional, override the timeout for returning a sample of documents when using the --sample-size-percent argument') args = parser.parse_args() - cluster1_uri = os.environ.get('SOURCE_URI') - cluster2_uri = os.environ.get('TARGET_URI') - srcDatabase = os.environ.get('SOURCE_DB') - tgtDatabase = os.environ.get('TARGET_DB') - srcCollection = os.environ.get('SOURCE_COLL') - tgtCollection = os.environ.get('TARGET_COLL') - # Connect to the source database cluster - cluster1_client = connect_to_db(cluster1_uri, 50) - srcdb = cluster1_client[srcDatabase] - srcCollection = srcdb[srcCollection] + cluster1_client = connect_to_db(args.source_uri, 50) + srcdb = cluster1_client[args.source_db] + srcCollection = srcdb[args.source_coll] # Connect to the target database cluster - cluster2_client = connect_to_db(cluster2_uri, 50) - tgtdb = cluster2_client[tgtDatabase] - tgtCollection = tgtdb[tgtCollection] + cluster2_client = connect_to_db(args.target_uri, 50) + tgtdb = cluster2_client[args.target_db] + tgtCollection = tgtdb[args.target_coll] # Compare collections and report differences - compare_collections(srcCollection, tgtCollection, args.batch_size, args.output_file, args.check_target) + compare_collections(srcCollection, tgtCollection, args.batch_size, args.output_file, args.check_target, args.sample_size_percent, args.sampling_timeout_ms) if __name__ == '__main__': main() diff --git a/migration/data-differ/source.vars b/migration/data-differ/source.vars deleted file mode 100644 index b695629..0000000 --- a/migration/data-differ/source.vars +++ /dev/null @@ -1,6 +0,0 @@ -export SOURCE_URI="mongodb://:@:27017/?tls=true&tlsCAFile=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false" -export SOURCE_DB="mysourcedb" -export SOURCE_COLL="mysourcecollection" -export TARGET_URI="mongodb://:@:27017/?tls=true&tlsCAFile=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false" -export TARGET_DB="mytargetdb" -export TARGET_COLL="mytargetcollection" diff --git a/migration/data-differ/test-scripts/dict_id.bash b/migration/data-differ/test-scripts/dict_id.bash new file mode 100644 index 0000000..23c5c7d --- /dev/null +++ b/migration/data-differ/test-scripts/dict_id.bash @@ -0,0 +1,9 @@ +#!/bin/bash + +mongoimport --uri="$SOURCE_URI" --db="$SOURCE_DB" --collection="$SOURCE_COLL" --file=dict_id_source.json +mongoimport --uri="$TARGET_URI" --db="$TARGET_DB" --collection="$TARGET_COLL" --file=dict_id_target.json + +python3 ../data-differ.py --source-uri "$SOURCE_URI" --target-uri "$TARGET_URI" --source-db "$SOURCE_DB" --source-coll "$SOURCE_COLL" --target-db "$TARGET_DB" --target-coll "$TARGET_COLL" --batch-size 100 --output-file dict_id_diff.txt + +mongosh "$SOURCE_URI" --eval "use $SOURCE_DB; db.$SOURCE_COLL.drop()" +mongosh "$TARGET_URI" --eval "use $TARGET_DB; db.$TARGET_COLL.drop()" \ No newline at end of file diff --git a/migration/data-differ/test-scripts/dict_id_diff.bash b/migration/data-differ/test-scripts/dict_id_diff.bash new file mode 100644 index 0000000..37b3d2b --- /dev/null +++ b/migration/data-differ/test-scripts/dict_id_diff.bash @@ -0,0 +1,9 @@ +#!/bin/bash + +mongoimport --uri="$SOURCE_URI" --db="$SOURCE_DB" --collection="$SOURCE_COLL" --file=dict_id_diff_source.json +mongoimport --uri="$TARGET_URI" --db="$TARGET_DB" --collection="$TARGET_COLL" --file=dict_id_diff_target.json + +python3 ../data-differ.py --source-uri "$SOURCE_URI" --target-uri "$TARGET_URI" --source-db "$SOURCE_DB" --source-coll "$SOURCE_COLL" --target-db "$TARGET_DB" --target-coll "$TARGET_COLL" --batch-size 100 --output-file dict_id_diff_result.txt + +mongosh "$SOURCE_URI" --eval "use $SOURCE_DB; db.$SOURCE_COLL.drop()" +mongosh "$TARGET_URI" --eval "use $TARGET_DB; db.$TARGET_COLL.drop()" \ No newline at end of file diff --git a/migration/data-differ/test-scripts/dict_id_diff_source.json b/migration/data-differ/test-scripts/dict_id_diff_source.json new file mode 100644 index 0000000..9147d6b --- /dev/null +++ b/migration/data-differ/test-scripts/dict_id_diff_source.json @@ -0,0 +1,3 @@ +{"_id": {"key1": "value1", "key2": "value2"}, "name": "Alice", "age": 25.0, "siblings": 1.0} +{"_id": {"key1": "value3", "key2": "value4"}, "name": "Bob", "age": 30.0, "siblings": 2.0} +{"_id": {"key1": "value5", "key2": "value6"}, "name": "Charlie", "age": 35.0, "siblings": 3.0} \ No newline at end of file diff --git a/migration/data-differ/test-scripts/dict_id_diff_target.json b/migration/data-differ/test-scripts/dict_id_diff_target.json new file mode 100644 index 0000000..3dc7aaa --- /dev/null +++ b/migration/data-differ/test-scripts/dict_id_diff_target.json @@ -0,0 +1,3 @@ +{"_id": {"key1": "value1", "key2": "value2"}, "name": "Alice", "age": 25.0, "siblings": 1.0} +{"_id": {"key1": "value3", "key2": "value4"}, "name": "Bob", "age": 31.0, "siblings": 2.0} +{"_id": {"key1": "value5", "key2": "value6"}, "name": "Charlie", "age": 35.0, "siblings": 3.0} \ No newline at end of file diff --git a/migration/data-differ/test-scripts/dict_id_source.json b/migration/data-differ/test-scripts/dict_id_source.json new file mode 100644 index 0000000..9147d6b --- /dev/null +++ b/migration/data-differ/test-scripts/dict_id_source.json @@ -0,0 +1,3 @@ +{"_id": {"key1": "value1", "key2": "value2"}, "name": "Alice", "age": 25.0, "siblings": 1.0} +{"_id": {"key1": "value3", "key2": "value4"}, "name": "Bob", "age": 30.0, "siblings": 2.0} +{"_id": {"key1": "value5", "key2": "value6"}, "name": "Charlie", "age": 35.0, "siblings": 3.0} \ No newline at end of file diff --git a/migration/data-differ/test-scripts/dict_id_target.json b/migration/data-differ/test-scripts/dict_id_target.json new file mode 100644 index 0000000..9147d6b --- /dev/null +++ b/migration/data-differ/test-scripts/dict_id_target.json @@ -0,0 +1,3 @@ +{"_id": {"key1": "value1", "key2": "value2"}, "name": "Alice", "age": 25.0, "siblings": 1.0} +{"_id": {"key1": "value3", "key2": "value4"}, "name": "Bob", "age": 30.0, "siblings": 2.0} +{"_id": {"key1": "value5", "key2": "value6"}, "name": "Charlie", "age": 35.0, "siblings": 3.0} \ No newline at end of file diff --git a/migration/data-differ/test-scripts/everything_same.bash b/migration/data-differ/test-scripts/everything_same.bash index 55fadde..ed473bf 100755 --- a/migration/data-differ/test-scripts/everything_same.bash +++ b/migration/data-differ/test-scripts/everything_same.bash @@ -1,17 +1,9 @@ #!/bin/bash -mongoimport $SOURCE_URI -d $SOURCE_DB -c $SOURCE_COLL everything_same.json -mongoimport $TARGET_URI -d $TARGET_DB -c $TARGET_COLL everything_same.json +mongoimport --uri="$SOURCE_URI" --db="$SOURCE_DB" --collection="$SOURCE_COLL" --file=everything_same.json +mongoimport --uri="$TARGET_URI" --db="$TARGET_DB" --collection="$TARGET_COLL" --file=everything_same.json -python3 ../data-differ.py --source-uri $SOURCE_URI --target-uri $TARGET_URI --source-namespace "$SOURCE_DB.$SOURCE_COLL" --target-namespace "$TARGET_DB.$TARGET_COLL" --percent 100 - -mongosh $SOURCE_URI <.txt```. Primary index defintions are not included since all Amazon DocumentDB collections have a default primary index on ```_id```. If the bucket does not have any indexes defined, the ```indexes-``` file will be empty. + +## Prerequisites +The [cbstats tool](https://docs.couchbase.com/server/current/cli/cbstats-intro.html) must be deployed and be able to connect to your Couchbase cluster. + +## Requirements +Python 3.9 or later + +## Installation +Clone the repository and go to the Discovery Tool for Couchbase folder: +``` +git clone https://github.com/awslabs/amazon-documentdb-tools.git +cd amazon-documentdb-tools/migration/discovery-tool-for-couchbase/ +``` + +## Usage/Examples +The script has the following arguments: +``` +--username -> Couchbase cluster username +--password -> Couchbase cluster password +--data_node -> Couchbase data node IP address or DNS name +--admin_port -> administration REST port, default: 8091 +--kv_zoom -> get K/V operation statistics for specified interval: , default: month +--tools_path -> full path to cbtools, default: /opt/couchbase/bin +--index_metrics -> gather query & index information , default: false +--indexer_port -> indexer service http REST port, default: 9102 +--n1ql_start -> number of milliseconds prior at which to start sampling: -7200000 +--n1ql_step -> sample interval over the sample period, in milliseconds, default: 100 +``` + +### Example: +``` +python3 discover.py --username xxx --password xxx --data_node "http://10.0.130.123" --admin_port 8091 --kv_zoom week --tools_path "/opt/couchbase/bin" --index_metrics true --indexer_port 9102 --n1ql_start -7200000 --n1ql_step 1000 +``` + +In this example, the ```beer-sample``` and ```travel-sample``` buckets have been loaded and there is a ```pillowfight``` bucket being used for [cbc-pillowfight](https://docs.couchbase.com/sdk-api/couchbase-c-client/md_doc_2cbc-pillowfight.html) and [n1qlback](https://docs.couchbase.com/sdk-api/couchbase-c-client/md_doc_2cbc-n1qlback.html) load testing. + +The tool generates the following output while executing: +``` +found data nodes ['10.0.129.165', '10.0.130.123', '10.0.133.73'] +found buckets ['beer-sample', 'pillowfight', 'travel-sample'] + +getting collection stats... +found collection beer-sample._default._default +found collection pillowfight._default._default +found collection travel-sample.inventory.airport +found collection travel-sample.inventory.airline +found collection travel-sample.inventory.route +found collection travel-sample.inventory.landmark +found collection travel-sample.inventory.hotel +found collection travel-sample.tenant_agent_00.users +found collection travel-sample.tenant_agent_00.bookings +found collection travel-sample.tenant_agent_01.users +found collection travel-sample.tenant_agent_01.bookings +found collection travel-sample.tenant_agent_02.bookings +found collection travel-sample.tenant_agent_02.users +found collection travel-sample.tenant_agent_03.users +found collection travel-sample.tenant_agent_03.bookings +found collection travel-sample.tenant_agent_04.bookings +found collection travel-sample.tenant_agent_04.users +found collection travel-sample._default._default + +getting K/V stats... + +getting KV stats for last week for bucket beer-sample... +cmd_get: 0 +cmd_set: 0 +delete_hits: 0 + +getting KV stats for last week for bucket pillowfight... +cmd_get: 397 +cmd_set: 549 +delete_hits: 217 + +getting KV stats for last week for bucket travel-sample... +cmd_get: 0 +cmd_set: 0 +delete_hits: 0 + +found index nodes ['10.0.132.125', '10.0.150.144'] + +getting index definitions... +found 0 indexes in bucket beer-sample +found 0 indexes in bucket pillowfight +found 17 indexes in bucket travel-sample + +getting index stats for bucket beer-sample +getting index stats for bucket pillowfight +getting index stats for bucket travel-sample + +getting N1QL stats every 1000 ms for -7200000 ms... +n1ql_selects: 0 +n1ql_deletes: 1 +n1ql_inserts: 1 +``` + +The output files contain the following information: +#### collection-stats.csv +``` +bucket,bucket_type,scope_name,collection_name,total_size,total_items,document_size +beer-sample,membase,_default,_default,2796956,7303,383 +pillowfight,membase,_default,_default,1901907730,1000005,1902 +travel-sample,membase,inventory,airport,547914,1968,279 +travel-sample,membase,inventory,airline,117261,187,628 +travel-sample,membase,inventory,route,13402503,24024,558 +travel-sample,membase,inventory,landmark,3072746,4495,684 +travel-sample,membase,inventory,hotel,4086989,917,4457 +travel-sample,membase,tenant_agent_00,users,88173,2,44087 +travel-sample,membase,tenant_agent_00,bookings,87040,0,0 +travel-sample,membase,tenant_agent_01,users,93163,11,8470 +travel-sample,membase,tenant_agent_01,bookings,89088,0,0 +travel-sample,membase,tenant_agent_02,bookings,89088,0,0 +travel-sample,membase,tenant_agent_02,users,98134,20,4907 +travel-sample,membase,tenant_agent_03,users,105583,33,3200 +travel-sample,membase,tenant_agent_03,bookings,89088,0,0 +travel-sample,membase,tenant_agent_04,bookings,87040,0,0 +travel-sample,membase,tenant_agent_04,users,107629,40,2691 +travel-sample,membase,_default,_default,20780949,31591,658 +``` + +### kv-stats.csv +``` +bucket,gets,sets,deletes +beer-sample,0,0,0 +pillowfight,398,548,217 +travel-sample,0,0,0 +``` + +### n1ql-stats.csv +``` +selects,deletes,inserts +0,121,79 +``` + +### index-stats.csv +``` +bucket,scope,collection,index-name,index-size +beer-sample,_default,_default,beer_primary,479061 +travel-sample,_default,_default,def_airportname,389408 +travel-sample,_default,_default,def_city,1029476 +travel-sample,_default,_default,def_faa,367120 +travel-sample,_default,_default,def_icao,387678 +travel-sample,_default,_default,def_name_type,79948 +travel-sample,_default,_default,def_primary,1140554 +travel-sample,_default,_default,def_route_src_dst_day,16235078 +travel-sample,_default,_default,def_schedule_utc,13864561 +travel-sample,_default,_default,def_sourceairport,2429464 +travel-sample,_default,_default,def_type,3628526 +travel-sample,inventory,airline,def_inventory_airline_primary,198473 +travel-sample,inventory,airport,def_inventory_airport_airportname,515968 +travel-sample,inventory,airport,def_inventory_airport_city,489507 +travel-sample,inventory,airport,def_inventory_airport_faa,529491 +travel-sample,inventory,airport,def_inventory_airport_primary,288326 +travel-sample,inventory,hotel,def_inventory_hotel_city,498513 +travel-sample,inventory,hotel,def_inventory_hotel_primary,227093 +travel-sample,inventory,landmark,def_inventory_landmark_city,957396 +travel-sample,inventory,landmark,def_inventory_landmark_primary,365002 +travel-sample,inventory,route,def_inventory_route_primary,832154 +travel-sample,inventory,route,def_inventory_route_route_src_dst_day,13978936 +travel-sample,inventory,route,def_inventory_route_schedule_utc,13461388 +travel-sample,inventory,route,def_inventory_route_sourceairport,2405883 +``` + +In this example, only the ```travel-sample``` bucket has indexes. +### indexes-travel-sample.txt +``` +CREATE INDEX `def_airportname` ON `travel-sample`(`airportname`) +CREATE INDEX `def_city` ON `travel-sample`(`city`) +CREATE INDEX `def_faa` ON `travel-sample`(`faa`) +CREATE INDEX `def_icao` ON `travel-sample`(`icao`) +CREATE INDEX `def_inventory_airport_airportname` ON `travel-sample`.`inventory`.`airport`(`airportname`) +CREATE INDEX `def_inventory_airport_city` ON `travel-sample`.`inventory`.`airport`(`city`) +CREATE INDEX `def_inventory_airport_faa` ON `travel-sample`.`inventory`.`airport`(`faa`) +CREATE INDEX `def_inventory_hotel_city` ON `travel-sample`.`inventory`.`hotel`(`city`) +CREATE INDEX `def_inventory_landmark_city` ON `travel-sample`.`inventory`.`landmark`(`city`) +CREATE INDEX `def_inventory_route_route_src_dst_day` ON `travel-sample`.`inventory`.`route`(`sourceairport`,`destinationairport`,(distinct (array (`v`.`day`) for `v` in `schedule` end))) +CREATE INDEX `def_inventory_route_schedule_utc` ON `travel-sample`.`inventory`.`route`(array (`s`.`utc`) for `s` in `schedule` end) +CREATE INDEX `def_inventory_route_sourceairport` ON `travel-sample`.`inventory`.`route`(`sourceairport`) +CREATE INDEX `def_name_type` ON `travel-sample`(`name`) WHERE (`_type` = "User") +CREATE INDEX `def_route_src_dst_day` ON `travel-sample`(`sourceairport`,`destinationairport`,(distinct (array (`v`.`day`) for `v` in `schedule` end))) WHERE (`type` = "route") +CREATE INDEX `def_schedule_utc` ON `travel-sample`(array (`s`.`utc`) for `s` in `schedule` end) +CREATE INDEX `def_sourceairport` ON `travel-sample`(`sourceairport`) +CREATE INDEX `def_type` ON `travel-sample`(`type`) +``` + +## Contributing +Contributions are always welcome! See the [contributing page](https://github.com/awslabs/amazon-documentdb-tools/blob/master/CONTRIBUTING.md) for ways to get involved. + +## License +Apache 2.0 \ No newline at end of file diff --git a/migration/discovery-tool-for-couchbase/discover.py b/migration/discovery-tool-for-couchbase/discover.py new file mode 100644 index 0000000..623605b --- /dev/null +++ b/migration/discovery-tool-for-couchbase/discover.py @@ -0,0 +1,454 @@ +import argparse +import csv +import json +import math +import requests +import statistics +import subprocess + + +# helper function to send an HTTP request to a REST endpoint +def send_request(node, + port, + rest_endpoint, + username, + password, + check_status_code=True, + params={}): + try: + url = f'{node}:{port}{rest_endpoint}' + response = requests.get(url, params=params, auth=(username, password)) + except Exception as e: + raise Exception(f'send_request: error {e} when calling {url}') + + if (check_status_code == True and response.status_code != 200): + raise Exception( + f'send_request: error {response.status_code} when calling {url}') + + return response + + +# get host name/IP address of all data nodes in the cluster +# See https://docs.couchbase.com/server/current/rest-api/rest-node-get-info.html for more information +def get_data_nodes(app_config): + response = send_request(app_config["data_node"], + app_config["admin_port"], + '/pools/nodes', + app_config["username"], + app_config["password"]) + return [ + node['hostname'].split(':')[0] for node in response.json()['nodes'] + if 'kv' in node['services'] + ] + + +# get host name/IP address of all index nodes in the cluster +# See https://docs.couchbase.com/server/current/rest-api/rest-node-get-info.html for more information +def get_index_nodes(app_config): + response = send_request(app_config["data_node"], + app_config["admin_port"], + '/pools/nodes', + app_config["username"], + app_config["password"]) + return [ + node['hostname'].split(':')[0] for node in response.json()['nodes'] + if 'index' in node['services'] + ] + + +# get the name of all buckets in the cluster +# See https://docs.couchbase.com/server/current/rest-api/rest-buckets-summary.html for more information +def get_buckets(app_config): + response = send_request(app_config["data_node"], + app_config["admin_port"], + '/pools/default/buckets/', + app_config["username"], + app_config["password"]) + return [bucket['name'] for bucket in response.json()] + + +# write the following details for all collections in the cluster to collection-stats.csv: +# bucket name +# bucket type +# scope name +# collection name +# total size +# total items +# average document size +# See the following for more information: +# https://docs.couchbase.com/server/current/rest-api/rest-bucket-intro.html +# https://docs.couchbase.com/server/current/cli/cbstats/cbstats-collections.html +def get_collection_stats(app_config, buckets, data_nodes): + print("\ngetting collection stats...") + + with open('collection-stats.csv', 'w', newline='') as f: + writer = csv.writer(f) + writer.writerow(['bucket','bucket_type', 'scope_name', 'collection_name', 'total_size', 'total_items', 'document_size']) + for bucket in buckets: + # Get bucket type + response = send_request(app_config["data_node"], + app_config["admin_port"], + f'/pools/default/buckets/{bucket}/', + app_config["username"], + app_config["password"]) + bucket_type = response.json()['bucketType'] + + # Get collections + response = send_request(app_config["data_node"], + app_config["admin_port"], + f'/pools/default/buckets/{bucket}/scopes/', + app_config["username"], app_config["password"]) + scopes_data = response.json()['scopes'] + + for scope in scopes_data: + scope_name = scope['name'] + + for collection in scope['collections']: + collection_name = collection['name'] + print(f"found collection {bucket}.{scope_name}.{collection_name}") + + collection_uid = collection['uid'] + total_items = 0 + total_size = 0 + + # Get stats from each data node + for node in data_nodes: + cmd = [ + f'{app_config["tools_path"]}/cbstats', + node, '-u', + app_config["username"], '-p', + app_config["password"], '-b', + bucket, + 'collections', + 'id', + f'0x{collection_uid}' + ] + + try: + output = subprocess.check_output(cmd, text=True) + + # Parse size and items from output + for line in output.splitlines(): + if 'data_size:' in line: + size = int(line.split()[1]) + total_size += size + elif 'items:' in line: + items = int(line.split()[1]) + total_items += items + + except subprocess.CalledProcessError as e: + print(f'get_collection_stats: caught exception {e}') + raise e + + # Calculate document size + document_size = math.ceil(total_size / total_items) if total_items > 0 else 0 + + # Write to CSV + writer.writerow([bucket, bucket_type, scope_name, collection_name, total_size, total_items, document_size]) + + +# helper function to get specified KV metric (cmd_get, cmd_set, delete_hits) for specifed bucket +# See the https://docs.couchbase.com/server/current/rest-api/rest-bucket-stats.html for more information. +def get_kv_metric(metric, bucket, app_config): + result = 0 + params = {'zoom': app_config["kv_zoom"]} + + response = send_request(app_config["data_node"], + app_config["admin_port"], + f'/pools/default/buckets/{bucket}/stats', + app_config["username"], + app_config["password"], + params) + data = response.json() + + # Extract values and convert to numbers + samples = data['op']['samples'][metric] + + # Calculate average and round up + if samples: + average = statistics.mean(samples) + result = math.ceil(average) + print(f"{metric}: {result}") + + return result + + +# helper function to get specified N1QL metric (n1ql_selects, n1ql_deletes, n1ql_inserts) +# See the https://docs.couchbase.com/server/current/rest-api/rest-bucket-stats.html for more information. +def get_n1ql_metric(metric, app_config): + result = 0 + response = send_request(app_config["data_node"], + app_config["admin_port"], + f'/pools/default/stats/range/{metric}/irate?start={app_config["n1ql_start"]}&step={app_config["n1ql_step"]}', + app_config["username"], + app_config["password"]) + data = response.json() + + if data['data'] == []: + # Metrics could not be retrieved + raise Exception(f'get_n1ql_metric: {data["errors"][0]["error"]}') + + # Extract values and convert to numbers + values = [ + float(val[1]) for arr in data['data'] for val in arr['values'] + if val[1] is not None + ] + + # Calculate average and round up + if values: + average = statistics.mean(values) + result = math.ceil(average) + print(f"{metric}: {result}") + + return result + + +# write the following KV operation details for all buckets in the cluster to kv-stats.csv: +# bucket name +# gets/second +# sets/second +# deletes/second +def get_kv_metrics(app_config, buckets): + print("\ngetting K/V stats...") + + # Initialize CSV file with header + with open('kv-stats.csv', 'w', newline='') as f: + writer = csv.writer(f) + writer.writerow(['bucket', 'gets', 'sets', 'deletes']) + + for bucket in buckets: + print(f"\ngetting KV stats for last {app_config['kv_zoom']} for bucket {bucket}...") + gets = get_kv_metric('cmd_get', bucket, app_config) + sets = get_kv_metric('cmd_set', bucket, app_config) + deletes = get_kv_metric('delete_hits', bucket, app_config) + writer.writerow([bucket, gets, sets, deletes]) + + +# write the following N1QL query details to n1ql-stats.csv: +# selects/second +# deletes/second +# inserts/second +def get_n1ql_metrics(app_config): + print(f"\ngetting N1QL stats every {app_config['n1ql_step']} ms for {app_config['n1ql_start']} ms...") + + # Initialize CSV file with header + with open('n1ql-stats.csv', 'w', newline='') as f: + writer = csv.writer(f) + writer.writerow(['selects', 'deletes', 'inserts']) + + selects = get_n1ql_metric('n1ql_selects', app_config) + deletes = get_n1ql_metric('n1ql_deletes', app_config) + inserts = get_n1ql_metric('n1ql_inserts', app_config) + writer.writerow([selects, deletes, inserts]) + + +# write index defintions to indexes-.txt +# primary index defintions are not included since all Amazon DocumentDB collections have a default primary index on _id +def get_index_definitions(app_config, buckets, index_node): + print("\ngetting index definitions...") + + for bucket in buckets: + bucket_index_definitions = [] + + # Write filtered index definitions to file + filename = f"indexes-{bucket}.txt" + try: + # Get final index definitions from specific node + response = send_request(f'http://{index_node}', + app_config["indexer_port"], + f'/getIndexStatement', + app_config["username"], + app_config["password"]) + index_statements = response.json() + + # Filter and write to file + index_count = 0 + with open(filename, 'w') as f: + for stmt in index_statements: + # skip primary index defintion + if f'`{bucket}`' in stmt and 'CREATE PRIMARY INDEX' not in stmt: + index_count += 1 + f.write(f"{stmt.split(' WITH')[0] if ' WITH' in stmt else stmt}\n") + + print(f"found {index_count} indexes in bucket {bucket}") + + except requests.RequestException as e: + print(f"Error writing index definitions to file for bucket {bucket}: {e}") + except IOError as e: + print(f"Error writing to file {filename}: {e}") + + +# write index statistics to index-stats.csv: +# bucket name +# scope name +# collection name +# index name +# index size +# See https://docs.couchbase.com/server/current/index-rest-stats/index.html for more information +def get_index_stats(app_config, buckets, index_node): + print("\n") + + # Initialize CSV file with header + with open('index-stats.csv', 'w', newline='') as f: + writer = csv.writer(f) + writer.writerow( + ['bucket', 'scope', 'collection', 'index-name', 'index-size']) + + for bucket in buckets: + print(f"getting index stats for bucket {bucket}") + + # Get index stats + try: + response = send_request(f'http://{index_node}', + app_config["indexer_port"], + f'/api/v1/stats/{bucket}', + app_config["username"], + app_config["password"], False) + + # 404 will be returned if there are no indexes on the specified bucket + if (response.status_code != 404): + # there are indexes on the specified bucket + stats = response.json() + with open('index-stats.csv', 'a', newline='') as f: + writer = csv.writer(f) + + for key, value in stats.items(): + parts = key.split(':') + + if len(parts) == 2: + # Format: bucket:index + writer.writerow([ + parts[0], # bucket + '_default', # scope + '_default', # collection + parts[1], # index-name + value['data_size'] # index-size + ]) + elif len(parts) == 3: + # Format: bucket:scope:index + writer.writerow([ + parts[0], # bucket + parts[1], # scope + '_default', # collection + parts[2], # index-name + value['data_size'] # index-size + ]) + else: + # Format: bucket:scope:collection:index + writer.writerow([ + parts[0], # bucket + parts[1], # scope + parts[2], # collection + parts[3], # index-name + value['data_size'] # index-size + ]) + + except requests.RequestException as e: + print(f"Error getting index stats for bucket {bucket}: {e}") + except IOError as e: + print(f"Error writing to CSV file: {e}") + + +def main(): + parser = argparse.ArgumentParser( + formatter_class=argparse.ArgumentDefaultsHelpFormatter) + parser.add_argument('--username', + required=True, + type=str, + help='Couchbase cluster username') + + parser.add_argument('--password', + required=True, + type=str, + help='Couchbase cluster password') + + parser.add_argument('--data_node', + required=True, + type=str, + help='Couchbase data node IP address or DNS name') + + parser.add_argument('--admin_port', + required=True, + type=str, + default="8091", + help='administration REST port') + + parser.add_argument('--kv_zoom', + required=True,type=str, + default="month", + help='get bucket statistics for specified interval: ' + ) + + parser.add_argument('--tools_path', + required=True, + type=str, + default="/opt/couchbase/bin", + help='full path to Couchbase tools') + + parser.add_argument('--index_metrics', + required=False, + type=str, + default="false", + help='gather index definitions and N1QL metrics: ') + + parser.add_argument('--indexer_port', + required=False, + type=str, + default="9102", + help='indexer service http REST port') + + parser.add_argument('--n1ql_start', + required=False, + type=str, + default="-60000", + help='number of milliseconds prior at which to start sampling' + ) + + parser.add_argument('--n1ql_step', + required=False, + type=str, + default="100", + help='sample interval over the sample period, in milliseconds') + + args = parser.parse_args() + app_config = {} + app_config['username'] = args.username + app_config['password'] = args.password + app_config['data_node'] = args.data_node + app_config['admin_port'] = args.admin_port + app_config['kv_zoom'] = args.kv_zoom + app_config['tools_path'] = args.tools_path + app_config['index_metrics'] = True if args.index_metrics == 'true' else False + app_config['indexer_port'] = args.indexer_port + app_config['n1ql_start'] = args.n1ql_start + app_config['n1ql_step'] = args.n1ql_step + + # get all information about the Couchbase cluster + try: + data_nodes = get_data_nodes(app_config) + print(f"found data nodes {data_nodes}") + + buckets = get_buckets(app_config) + print(f"found buckets {buckets}") + + get_collection_stats(app_config, buckets, data_nodes) + + get_kv_metrics(app_config, buckets) + + if app_config["index_metrics"] == True: + index_nodes = get_index_nodes(app_config) + print(f"found index nodes {index_nodes}") + + if len(index_nodes) > 0: + get_index_definitions(app_config, buckets, index_nodes[0]) + get_index_stats(app_config, buckets, index_nodes[0]) + get_n1ql_metrics(app_config) + else: + print(f"no index nodes exist in cluster, cannot gather index definitions and N1QL metrics.") + + except Exception as e: + print(f'{e}') + + +if __name__ == "__main__": + main() diff --git a/migration/dms-segments/dms-segments.py b/migration/dms-segments/dms-segments.py index 48b42ed..7935ba6 100644 --- a/migration/dms-segments/dms-segments.py +++ b/migration/dms-segments/dms-segments.py @@ -5,16 +5,21 @@ import time import os import argparse +import warnings + + +supportedIdTypes=['int','string','objectId'] def via_skips(appConfig): # get boundaries by performing large server-side skips + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") boundaryList = [] numBoundaries = appConfig['numSegments'] - 1 - client = pymongo.MongoClient(appConfig['uri']) + client = pymongo.MongoClient(host=appConfig['uri'],appname='segmentr') db = client[appConfig['database']] col = db[appConfig['collection']] @@ -64,10 +69,12 @@ def via_skips(appConfig): def via_cursor(appConfig): # get by walking the _id index + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + numBoundaries = appConfig['numSegments'] - 1 boundaryList = [] - client = pymongo.MongoClient(appConfig['uri']) + client = pymongo.MongoClient(host=appConfig['uri'],appname='segmentr') db = client[appConfig['database']] col = db[appConfig['collection']] @@ -133,6 +140,40 @@ def via_cursor(appConfig): client.close() +def check_for_mixed_types(appConfig): + # grab the first document and last document as ordered by _id, check for unsupported or differing data types + returnValue = True + + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + global supportedIdTypes + + client = pymongo.MongoClient(host=appConfig['uri']) + db = client[appConfig['database']] + col = db[appConfig['collection']] + + idTypeFirst = col.aggregate([{"$sort":{"_id":pymongo.ASCENDING}},{"$project":{"_id":False,"idType":{"$type":"$_id"}}},{"$limit":1}]).next()['idType'] + idTypeLast = col.aggregate([{"$sort":{"_id":pymongo.DESCENDING}},{"$project":{"_id":False,"idType":{"$type":"$_id"}}},{"$limit":1}]).next()['idType'] + + if idTypeFirst not in supportedIdTypes: + # unsupported data type + print("Unsupported data type of '{}' for first _id value in {}.{} - only {} types are supported, stopping".format(idTypeFirst,appConfig['database'],appConfig['collection'],supportedIdTypes)) + returnValue = False + + if idTypeLast not in supportedIdTypes: + # unsupported data type + print("Unsupported data type of '{}' for first _id value in {}.{} - only {} types are supported, stopping".format(idTypeLast,appConfig['database'],appConfig['collection'],supportedIdTypes)) + returnValue = False + + if idTypeFirst != idTypeLast: + # mixed data types + print("Mixed data types of '{}' and '{}' for first and last _id values in {}.{}, stopping".format(idTypeFirst,idTypeLast,appConfig['database'],appConfig['collection'])) + returnValue = False + + client.close() + + return returnValue + def main(): parser = argparse.ArgumentParser(description='DMS Segment Analysis Tool.') @@ -170,11 +211,12 @@ def main(): appConfig['collection'] = args.collection appConfig['numSegments'] = int(args.num_segments) - if args.single_cursor: - via_cursor(appConfig) + if check_for_mixed_types(appConfig): + if args.single_cursor: + via_cursor(appConfig) - else: - via_skips(appConfig) + else: + via_skips(appConfig) if __name__ == "__main__": diff --git a/migration/dms_buddy/README.md b/migration/dms_buddy/README.md new file mode 100644 index 0000000..22c023b --- /dev/null +++ b/migration/dms_buddy/README.md @@ -0,0 +1,249 @@ +# DMS Buddy + +A tool for analyzing MongoDB collections and generating AWS Database Migration Service (DMS) configuration recommendations. + +## Overview + +DMS Buddy analyzes your MongoDB collections and provides optimized configuration recommendations for AWS DMS migrations to Amazon DocumentDB. It helps you determine: + +- Appropriate DMS instance type based on data transfer requirements +- Required storage size based on collection size and change rate +- Optimal number of partitions for parallel full load +- Number of threads needed for CDC phase +- Other critical DMS configuration parameters + +The tool also generates a parameter file that can be directly used with the included CloudFormation template to deploy the AWS DMS resources. + +## Requirements + +- Python 3.6+ +- pymongo +- humanize +- AWS CLI (for deploying the CloudFormation template) +- AWS VPC with at least two subnets in different Availability Zones for DMS deployment +- SSL certificate imported into AWS DMS for DocumentDB connections (required for target endpoint) + +## SSL Certificate Setup for DocumentDB (Pre-requisite) + +When migrating to Amazon DocumentDB, SSL/TLS encryption is required. You need to import the DocumentDB certificate into AWS DMS before running DMS Buddy: + +1. **Download the DocumentDB Certificate**: + ```bash + wget https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem + ``` + +2. **Import the Certificate into DMS**: + ```bash + aws dms import-certificate \ + --certificate-identifier docdb-ca-cert \ + --certificate-pem file://global-bundle.pem + ``` + +3. **Get the Certificate ARN**: + ```bash + aws dms describe-certificates --filters Name=certificate-id,Values=global-bundle --query "Certificates[0].CertificateArn" + ``` + +4. **Use the Certificate ARN** in your configuration file or command line parameter. + +## Installation + +1. Clone or download this repository +2. Install the required Python packages: + +```bash +pip install -r requirements.txt +``` + +## Usage + +### Basic Usage + +```bash +python dms_buddy.py --source-uri mongodb://localhost:27017 --source-database mydb --collection-name-for-parallel-load mycollection +``` + +**Note**: When you specify a single collection using `--collection-name-for-parallel-load`, the tool will analyze only that collection and the migration summary will show information specific to that collection. If no collection is specified, the tool will analyze all collections in the database with 10,000+ documents. + +### Command Line Options + +| Option | Description | +|--------|-------------| +| `--source-uri` | MongoDB connection URI (required) | +| `--source-database` | Source database name to analyze (required) | +| `--collection-name-for-parallel-load` | Collection name to analyze and use for parallel load (optional - if not provided, analyzes all collections with 10K+ documents) | +| `--migration-type` | Migration type: full-load, cdc, or full-load-and-cdc (default: full-load-and-cdc) | +| `--monitor-time` | Monitoring time in minutes (default: 10) | +| `--vpc-id` | VPC ID for DMS replication instance | +| `--subnet-ids` | Subnet IDs for DMS replication instance (comma-separated) | +| `--multi-az` | Whether to use Multi-AZ for DMS replication instance (true/false) | +| `--source-host` | Source database host | +| `--source-port` | Source database port (default: 27017) | +| `--source-username` | Source database username | +| `--source-password` | Source database password | +| `--target-host` | Target database host | +| `--target-port` | Target database port (default: 27017) | +| `--target-database` | Target database name | +| `--target-username` | Target database username | +| `--target-password` | Target database password | +| `--target-certificate-arn` | Target database SSL certificate ARN for DocumentDB connections | + +### Examples + +#### Single Collection Analysis + +```bash +python dms_buddy.py --source-uri mongodb://localhost:27017 --source-database mydb --collection-name-for-parallel-load mycollection --migration-type full-load +``` + +#### All Collections Analysis + +```bash +python dms_buddy.py --source-uri mongodb://localhost:27017 --source-database mydb --migration-type full-load +``` + +#### CDC Migration (Includes Monitoring Period) + +```bash +python dms_buddy.py --source-uri mongodb://localhost:27017 --source-database mydb --collection-name-for-parallel-load mycollection --migration-type cdc +``` + +#### Custom Monitoring Time + +```bash +python dms_buddy.py --source-uri mongodb://localhost:27017 --source-database mydb --collection-name-for-parallel-load mycollection --monitor-time 5 +``` + +#### With Additional Parameters + +```bash +python dms_buddy.py --source-uri mongodb://localhost:27017 --source-database mydb --collection-name-for-parallel-load mycollection --vpc-id vpc-12345 --subnet-ids subnet-a,subnet-b --multi-az true +``` + +## Configuration File + +Instead of specifying all parameters on the command line, you can create a configuration file named `dms_buddy.cfg` in the same directory: + +```ini +[DMS] +VpcId = vpc-02095d845d94b21b4 +SubnetIds = subnet-xxxxx,subnet-yyyyy +MultiAZ = false +SourceDBHost = your-mongodb-host +SourceDBPort = 27017 +SourceDatabase = your-database +SourceUsername = your-username +SourcePassword = your-password +TargetHost = your-docdb-cluster-endpoint +TargetPort = 27017 +TargetDatabase = your-database +TargetUsername = your-username +TargetPassword = your-password +TargetCertificateArn = arn:aws:dms:us-east-1:123456789012:cert:your-cert-id +MigrationType = full-load-and-cdc +CollectionNameForParallelLoad = collectionname +``` + +### Configuration Parameters: + +- **VpcId**: VPC ID for DMS replication instance +- **SubnetIds**: Subnet IDs for DMS replication instance (comma-separated) +- **MultiAZ**: Whether to use Multi-AZ for DMS replication instance (true/false) +- **SourceDBHost**: Source database host +- **SourceDBPort**: Source database port (default: 27017) +- **SourceDatabase**: Source database name to analyze +- **SourceUsername**: Source database username +- **SourcePassword**: Source database password +- **TargetHost**: Target database host +- **TargetPort**: Target database port (default: 27017) +- **TargetDatabase**: Target database name +- **TargetUsername**: Target database username +- **TargetPassword**: Target database password +- **TargetCertificateArn**: Target database SSL certificate ARN for DocumentDB connections +- **MigrationType**: Migration type: full-load, cdc, or full-load-and-cdc (default: full-load-and-cdc) +- **CollectionNameForParallelLoad**: Collection name to analyze and use for parallel load (leave empty string to analyze all collections with 10K+ documents) + +**Note**: To analyze all collections in the database, set `CollectionNameForParallelLoad` to an empty string. + +Command line arguments take precedence over configuration file values. + +## How It Works + +1. **Collection Analysis**: The tool connects to your MongoDB instance and retrieves statistics about the specified collection. + +2. **Operations Monitoring**: For CDC migrations, it monitors database operations for a specified period (default: 10 minutes) to determine the rate of change. + +3. **Recommendations Calculation**: + - **Instance Type**: Based on bandwidth requirements calculated from document size and partitions + - **Storage Size**: Based on collection size and daily change rate, rounded to nearest 100GB + - **Partitions**: Based on document count, optimized for parallel processing + - **Parallel Apply Threads**: Based on operations per second for CDC + +4. **Parameter Generation**: Creates a `parameter.json` file with all the calculated and provided parameters. + +## Deploying with CloudFormation + +After running DMS Buddy to generate the `parameter.json` file, you can deploy the AWS DMS resources using the included CloudFormation template: + +```bash +# Deploy the CloudFormation stack +aws cloudformation create-stack \ + --stack-name mongodb-to-docdb-migration \ + --template-body file://dms_buddy.cfn \ + --parameters file://parameter.json \ + --capabilities CAPABILITY_IAM + +# Check the stack creation status +aws cloudformation describe-stacks --stack-name mongodb-to-docdb-migration + +# Update security groups to allow DMS access +# Get the DMS security group ID from the stack outputs +DMS_SG_ID=$(aws cloudformation describe-stacks --stack-name mongodb-to-docdb-migration --query "Stacks[0].Outputs[?OutputKey=='DMSSecurityGroupId'].OutputValue" --output text) + +# Update source MongoDB cluster security group to allow inbound TCP access from DMS +aws ec2 authorize-security-group-ingress \ + --group-id \ + --protocol tcp \ + --port 27017 \ + --source-group $DMS_SG_ID + +# Update target DocumentDB cluster security group to allow inbound TCP access from DMS +aws ec2 authorize-security-group-ingress \ + --group-id \ + --protocol tcp \ + --port 27017 \ + --source-group $DMS_SG_ID + +# Once the stack is created and security groups are updated, you can start the DMS task +aws dms start-replication-task \ + --replication-task-arn $(aws cloudformation describe-stacks --stack-name mongodb-to-docdb-migration --query "Stacks[0].Outputs[?OutputKey=='ReplicationTaskARN'].OutputValue" --output text) \ + --start-replication-task-type start-replication +``` + + +## Troubleshooting + +### Connection Issues + +If you encounter connection issues: +- Verify that the MongoDB URI is correct +- Ensure that the MongoDB server is running and accessible +- Check that the specified database and collection exist +- Verify that the user has appropriate permissions + +### CloudFormation Deployment Issues + +If you encounter issues deploying the CloudFormation template: +- Check the stack events for detailed error messages: + ```bash + aws cloudformation describe-stack-events --stack-name mongodb-to-docdb-migration + ``` +- Verify that all required parameters are present in the parameter.json file +- Ensure that the VPC and subnet IDs are valid +- **Important**: Make sure you provide at least two subnet IDs in different Availability Zones +- Check that you have the necessary permissions to create the resources + + +## License + +This project is licensed under the MIT License - see the LICENSE file for details. diff --git a/migration/dms_buddy/dms_buddy.cfg b/migration/dms_buddy/dms_buddy.cfg new file mode 100644 index 0000000..d15d520 --- /dev/null +++ b/migration/dms_buddy/dms_buddy.cfg @@ -0,0 +1,17 @@ +[DMS] +VpcId = +SubnetIds = +MultiAZ = +SourceDBHost = +SourceDBPort = +SourceDatabase = +SourceUsername = +SourcePassword = +TargetHost = +TargetPort = +TargetDatabase = +TargetUsername = +TargetPassword = +MigrationType = +CollectionNameForParallelLoad = +TargetCertificateArn = \ No newline at end of file diff --git a/migration/dms_buddy/dms_buddy.cfn b/migration/dms_buddy/dms_buddy.cfn new file mode 100644 index 0000000..2ecac56 --- /dev/null +++ b/migration/dms_buddy/dms_buddy.cfn @@ -0,0 +1,481 @@ +{ + "AWSTemplateFormatVersion": "2010-09-09", + "Description": "AWS DMS resources for MongoDB to DocumentDB migration", + "Parameters": { + "VpcId": { + "Type": "AWS::EC2::VPC::Id", + "Description": "VPC where DMS resources will be deployed" + }, + "SubnetIds": { + "Type": "List", + "Description": "List of subnet IDs for DMS replication instance" + }, + "ReplicationInstanceClass": { + "Type": "String", + "Default": "dms.t3.medium", + "AllowedValues": [ + "dms.t3.micro", + "dms.t3.small", + "dms.t3.medium", + "dms.t3.large", + "dms.r5.large", + "dms.r5.xlarge", + "dms.r5.2xlarge", + "dms.r5.4xlarge", + "dms.r5.8xlarge" + ], + "Description": "The compute and memory capacity of the replication instance" + }, + "AllocatedStorage": { + "Type": "Number", + "Default": 50, + "MinValue": 5, + "MaxValue": 6144, + "Description": "The amount of storage (in gigabytes) to be allocated for the replication instance" + }, + "MultiAZ": { + "Type": "String", + "Default": "false", + "AllowedValues": [ + "true", + "false" + ], + "Description": "Specify if the replication instance is Multi-AZ" + }, + "SourceDBHost": { + "Type": "String", + "Description": "Source database host address" + }, + "SourceDBPort": { + "Type": "Number", + "Default": 27017, + "Description": "Source database port number" + }, + "SourceDatabase": { + "Type": "String", + "Description": "Source database name" + }, + "SourceUsername": { + "Type": "String", + "Description": "Source database username", + "NoEcho": true + }, + "SourcePassword": { + "Type": "String", + "Description": "Source database password", + "NoEcho": true + }, + "TargetHost": { + "Type": "String", + "Description": "Target database cluster endpoint" + }, + "TargetPort": { + "Type": "Number", + "Default": 27017, + "Description": "Target database port number" + }, + "TargetDatabase": { + "Type": "String", + "Description": "Target database name" + }, + "TargetUsername": { + "Type": "String", + "Description": "Target database username", + "NoEcho": true + }, + "TargetPassword": { + "Type": "String", + "Description": "Target database password", + "NoEcho": true + }, + "MigrationType": { + "Type": "String", + "Default": "full-load-and-cdc", + "AllowedValues": [ + "full-load", + "cdc", + "full-load-and-cdc" + ], + "Description": "The migration type" + }, + "MaxFullLoadSubTasks": { + "Type": "Number", + "Default": 8, + "Description": "Parallelization factor for the full load migration type (only provided when > 8)" + }, + "NumberOfPartitions": { + "Type": "Number", + "Description": "Number of partitions for parallel full load" + }, + "ParallelApplyThreads": { + "Type": "Number", + "Description": "Parallelization factor for CDC migration type" + }, + "CollectionNameForParallelLoad": { + "Type": "String", + "Description": "Collection name for parallel load" + }, + "EngineVersion": { + "Type": "String", + "Default": "3.5.4", + "Description": "AWS DMS engine version (latest available)" + }, + "TargetCertificateArn": { + "Type": "String", + "Description": "SSL certificate ARN for DocumentDB target endpoint connection" + }, + "TableSettings": { + "Type": "String", + "Description": "JSON string containing table settings configuration for parallel load" + } + }, + "Resources": { + "DMSSubnetGroup": { + "Type": "AWS::DMS::ReplicationSubnetGroup", + "Properties": { + "ReplicationSubnetGroupIdentifier": { + "Fn::Sub": "${AWS::StackName}-subnet-group" + }, + "ReplicationSubnetGroupDescription": "Subnet group for DMS replication instance", + "SubnetIds": { + "Ref": "SubnetIds" + } + } + }, + "DMSSecurityGroup": { + "Type": "AWS::EC2::SecurityGroup", + "Properties": { + "GroupDescription": "Security group for DMS replication instance", + "VpcId": { + "Ref": "VpcId" + }, + "SecurityGroupIngress": [ + { + "IpProtocol": "tcp", + "FromPort": 27017, + "ToPort": 27017, + "CidrIp": "0.0.0.0/0" + } + ], + "Tags": [ + { + "Key": "Name", + "Value": { + "Fn::Sub": "${AWS::StackName}-dms-sg" + } + } + ] + } + }, + "DMSReplicationInstance": { + "Type": "AWS::DMS::ReplicationInstance", + "Properties": { + "ReplicationInstanceIdentifier": { + "Fn::Sub": "${AWS::StackName}-replication-instance" + }, + "ReplicationInstanceClass": { + "Ref": "ReplicationInstanceClass" + }, + "AllocatedStorage": { + "Ref": "AllocatedStorage" + }, + "MultiAZ": { + "Ref": "MultiAZ" + }, + "EngineVersion": { + "Ref": "EngineVersion" + }, + "ReplicationSubnetGroupIdentifier": { + "Ref": "DMSSubnetGroup" + }, + "VpcSecurityGroupIds": [ + { + "Ref": "DMSSecurityGroup" + } + ], + "Tags": [ + { + "Key": "Name", + "Value": { + "Fn::Sub": "${AWS::StackName}-replication-instance" + } + } + ] + } + }, + "DMSSourceEndpoint": { + "Type": "AWS::DMS::Endpoint", + "Properties": { + "EndpointIdentifier": { + "Fn::Sub": "${AWS::StackName}-mongodb-source" + }, + "EndpointType": "source", + "EngineName": "mongodb", + "ServerName": { + "Ref": "SourceDBHost" + }, + "Port": { + "Ref": "SourceDBPort" + }, + "DatabaseName": { + "Ref": "SourceDatabase" + }, + "Username": { + "Ref": "SourceUsername" + }, + "Password": { + "Ref": "SourcePassword" + }, + "MongoDbSettings": { + "AuthType": "password", + "AuthMechanism": "scram-sha-1" + }, + "Tags": [ + { + "Key": "Name", + "Value": { + "Fn::Sub": "${AWS::StackName}-source-endpoint" + } + } + ] + } + }, + "DMSTargetEndpoint": { + "Type": "AWS::DMS::Endpoint", + "Properties": { + "EndpointIdentifier": { + "Fn::Sub": "${AWS::StackName}-docdb-target" + }, + "EndpointType": "target", + "EngineName": "docdb", + "ServerName": { + "Ref": "TargetHost" + }, + "Port": { + "Ref": "TargetPort" + }, + "DatabaseName": { + "Ref": "TargetDatabase" + }, + "Username": { + "Ref": "TargetUsername" + }, + "Password": { + "Ref": "TargetPassword" + }, + "CertificateArn": { + "Ref": "TargetCertificateArn" + }, + "SslMode": "verify-full", + "Tags": [ + { + "Key": "Name", + "Value": { + "Fn::Sub": "${AWS::StackName}-target-endpoint" + } + } + ] + } + }, + "DMSReplicationTask": { + "Type": "AWS::DMS::ReplicationTask", + "Properties": { + "ReplicationTaskIdentifier": { + "Fn::Sub": "${AWS::StackName}-replication-task" + }, + "SourceEndpointArn": { + "Ref": "DMSSourceEndpoint" + }, + "TargetEndpointArn": { + "Ref": "DMSTargetEndpoint" + }, + "ReplicationInstanceArn": { + "Ref": "DMSReplicationInstance" + }, + "MigrationType": { + "Ref": "MigrationType" + }, + "ReplicationTaskSettings": { + "Fn::Join": [ + "", + [ + "{", + "\"TargetMetadata\": {", + "\"TargetSchema\": \"\",", + "\"SupportLobs\": true,", + "\"FullLobMode\": false,", + "\"LobChunkSize\": 64,", + "\"LimitedSizeLobMode\": true,", + "\"LobMaxSize\": 32,", + "\"InlineLobMaxSize\": 0,", + "\"LoadMaxFileSize\": 0,", + "\"ParallelLoadThreads\": 0,", + "\"ParallelLoadBufferSize\": 0,", + "\"BatchApplyEnabled\": false,", + "\"TaskRecoveryTableEnabled\": false,", + "\"ParallelLoadQueuesPerThread\": 0,", + "\"ParallelApplyThreads\": ", { "Ref": "ParallelApplyThreads" }, ",", + "\"ParallelApplyBufferSize\": 100,", + "\"ParallelApplyQueuesPerThread\": 1", + "},", + "\"FullLoadSettings\": {", + "\"TargetTablePrepMode\": \"DROP_AND_CREATE\",", + "\"CreatePkAfterFullLoad\": false,", + "\"StopTaskCachedChangesApplied\": false,", + "\"StopTaskCachedChangesNotApplied\": false,", + "\"MaxFullLoadSubTasks\": ", { "Ref": "MaxFullLoadSubTasks" }, ",", + "\"TransactionConsistencyTimeout\": 600,", + "\"CommitRate\": 10000", + "},", + "\"Logging\": {", + "\"EnableLogging\": true,", + "\"LogComponents\": [", + "{\"Id\": \"SOURCE_UNLOAD\", \"Severity\": \"LOGGER_SEVERITY_DEFAULT\"},", + "{\"Id\": \"SOURCE_CAPTURE\", \"Severity\": \"LOGGER_SEVERITY_DEFAULT\"},", + "{\"Id\": \"TARGET_LOAD\", \"Severity\": \"LOGGER_SEVERITY_DEFAULT\"},", + "{\"Id\": \"TARGET_APPLY\", \"Severity\": \"LOGGER_SEVERITY_DEFAULT\"},", + "{\"Id\": \"TASK_MANAGER\", \"Severity\": \"LOGGER_SEVERITY_DEFAULT\"}", + "]", + "},", + "\"ControlTablesSettings\": {", + "\"ControlSchema\": \"dms_control\",", + "\"HistoryTimeslotInMinutes\": 5,", + "\"HistoryTableEnabled\": true,", + "\"SuspendedTablesTableEnabled\": true,", + "\"StatusTableEnabled\": true", + "},", + "\"StreamBufferSettings\": {", + "\"StreamBufferCount\": 3,", + "\"StreamBufferSizeInMB\": 8,", + "\"CtrlStreamBufferSizeInMB\": 5", + "},", + "\"ChangeProcessingDdlHandlingPolicy\": {", + "\"HandleSourceTableDropped\": true,", + "\"HandleSourceTableTruncated\": true,", + "\"HandleSourceTableAltered\": true", + "},", + "\"ErrorBehavior\": {", + "\"DataErrorPolicy\": \"LOG_ERROR\",", + "\"DataTruncationErrorPolicy\": \"LOG_ERROR\",", + "\"DataErrorEscalationPolicy\": \"SUSPEND_TABLE\",", + "\"DataErrorEscalationCount\": 0,", + "\"TableErrorPolicy\": \"SUSPEND_TABLE\",", + "\"TableErrorEscalationPolicy\": \"STOP_TASK\",", + "\"TableErrorEscalationCount\": 0,", + "\"RecoverableErrorCount\": -1,", + "\"RecoverableErrorInterval\": 5,", + "\"RecoverableErrorThrottling\": true,", + "\"RecoverableErrorThrottlingMax\": 1800,", + "\"RecoverableErrorStopRetryAfterThrottlingMax\": false,", + "\"ApplyErrorDeletePolicy\": \"IGNORE_RECORD\",", + "\"ApplyErrorInsertPolicy\": \"LOG_ERROR\",", + "\"ApplyErrorUpdatePolicy\": \"LOG_ERROR\",", + "\"ApplyErrorEscalationPolicy\": \"LOG_ERROR\",", + "\"ApplyErrorEscalationCount\": 0,", + "\"ApplyErrorFailOnTruncationDdl\": false,", + "\"FullLoadIgnoreConflicts\": true,", + "\"FailOnTransactionConsistencyBreached\": false,", + "\"FailOnNoTablesCaptured\": true", + "},", + "\"ChangeProcessingTuning\": {", + "\"BatchApplyPreserveTransaction\": true,", + "\"BatchApplyTimeoutMin\": 1,", + "\"BatchApplyTimeoutMax\": 30,", + "\"BatchApplyMemoryLimit\": 500,", + "\"BatchSplitSize\": 0,", + "\"MinTransactionSize\": 1000,", + "\"CommitTimeout\": 1,", + "\"MemoryLimitTotal\": 1024,", + "\"MemoryKeepTime\": 60,", + "\"StatementCacheSize\": 50", + "},", + "\"ValidationSettings\": {", + "\"EnableValidation\": false,", + "\"ValidationMode\": \"ROW_LEVEL\",", + "\"ThreadCount\": 5,", + "\"FailureMaxCount\": 10000,", + "\"TableFailureMaxCount\": 1000,", + "\"HandleCollationDiff\": false,", + "\"ValidationOnly\": false,", + "\"RecordFailureDelayLimitInMinutes\": 0,", + "\"SkipLobColumns\": false,", + "\"ValidationPartialLobSize\": 0,", + "\"ValidationQueryCdcDelaySeconds\": 0,", + "\"PartitionSize\": 10000", + "},", + "\"PostProcessingRules\": null,", + "\"CharacterSetSettings\": null,", + "\"LoopbackPreventionSettings\": null,", + "\"BeforeImageSettings\": null,", + "\"FailTaskWhenCleanTaskResourceFailed\": false", + "}" + ] + ] + }, + "TableMappings": { + "Fn::Join": [ + "", + [ + "{", + "\"rules\": [", + "{", + "\"rule-type\": \"selection\",", + "\"rule-id\": \"1\",", + "\"rule-name\": \"1\",", + "\"object-locator\": {", + "\"schema-name\": \"", { "Ref": "SourceDatabase" }, "\",", + "\"table-name\": \"", { "Ref": "CollectionNameForParallelLoad" }, "\"", + "},", + "\"rule-action\": \"include\",", + "\"filters\": []", + "},", + { "Ref": "TableSettings" }, + "]", + "}" + ] + ] + }, + "Tags": [ + { + "Key": "Name", + "Value": { + "Fn::Sub": "${AWS::StackName}-replication-task" + } + } + ] + } + } + }, + "Outputs": { + "ReplicationInstanceARN": { + "Description": "ARN of the DMS Replication Instance", + "Value": { + "Ref": "DMSReplicationInstance" + } + }, + "SourceEndpointARN": { + "Description": "ARN of the Source Endpoint", + "Value": { + "Ref": "DMSSourceEndpoint" + } + }, + "TargetEndpointARN": { + "Description": "ARN of the Target Endpoint", + "Value": { + "Ref": "DMSTargetEndpoint" + } + }, + "ReplicationTaskARN": { + "Description": "ARN of the Replication Task", + "Value": { + "Ref": "DMSReplicationTask" + } + }, + "DMSSecurityGroupId": { + "Description": "ID of the DMS Security Group", + "Value": { + "Ref": "DMSSecurityGroup" + } + } + } +} diff --git a/migration/dms_buddy/dms_buddy.py b/migration/dms_buddy/dms_buddy.py new file mode 100644 index 0000000..3ce380f --- /dev/null +++ b/migration/dms_buddy/dms_buddy.py @@ -0,0 +1,448 @@ +import pymongo +import time +from math import ceil +import humanize +import argparse +import warnings +import json +import os +import configparser + +warnings.filterwarnings("ignore") + +def get_partition_count(doc_count): + """Determine optimal number of partitions for DMS full load based on document count.""" + if doc_count <= 100000: + return 2 + elif doc_count <= 1000000: + return 4 + elif doc_count <= 100000000: + return 8 + else: + return 16 + +def get_instance_type(bandwidth_required_mbps): + """Determine appropriate AWS DMS instance type based on bandwidth requirements.""" + if bandwidth_required_mbps <= 630: + return "dms.r5.large" + elif bandwidth_required_mbps <= 2500: + return "dms.r5.2xlarge" + elif bandwidth_required_mbps <= 5000: + return "dms.r5.4xlarge" + else: + return "dms.r5.8xlarge" + +def calculate_operations_per_second(uri, db_name, collection_name, monitor_minutes=10): + """Calculate operations per second by monitoring database for specified period.""" + client = pymongo.MongoClient(uri, serverSelectionTimeoutMS=5000) + db = client[db_name] + + try: + if collection_name and collection_name not in db.list_collection_names(): + print(f"Warning: Collection '{collection_name}' not found in database '{db_name}'") + return 0 + + initial_status = db.command("serverStatus") + initial_ops = initial_status['opcounters']['insert'] + initial_status['opcounters']['update'] + initial_status['opcounters']['delete'] + + print(f"\nMonitoring database operations for {monitor_minutes} minutes...") + print("Please wait while we collect data...") + + monitoring_time = monitor_minutes * 60 + + for i in range(monitoring_time): + if i % 60 == 0: + minutes_left = (monitoring_time - i) // 60 + print(f"{minutes_left} minutes remaining...") + time.sleep(1) + + final_status = db.command("serverStatus") + final_ops = final_status['opcounters']['insert'] + final_status['opcounters']['update'] + final_status['opcounters']['delete'] + + ops_per_second = (final_ops - initial_ops) / monitoring_time + return ops_per_second + + except pymongo.errors.ServerSelectionTimeoutError: + print(f"Error: Could not connect to MongoDB server at {uri}") + return 0 + except pymongo.errors.OperationFailure as e: + print(f"Error: Authentication failed or insufficient permissions: {str(e)}") + return 0 + except Exception as e: + print(f"Warning: Error calculating operations rate: {str(e)}") + return 0 + finally: + client.close() + +def calculate_parallel_apply_threads(ops_per_second): + """Calculate parallel apply threads based on operations per second.""" + threads = ceil(ops_per_second / 250) + return max(2, threads) + +def calculate_storage_size(collection_size_bytes, avg_doc_size, ops_per_second): + """Calculate required storage size for AWS DMS replication instance.""" + daily_ops = ops_per_second * 86400 + daily_change_bytes = daily_ops * avg_doc_size + + base_storage = collection_size_bytes * 0.5 + required_storage_bytes = base_storage + (daily_change_bytes * 1.2) + + required_storage_gb = ceil(required_storage_bytes / (1024 * 1024 * 1024)) + required_storage_gb = ceil(required_storage_gb / 100) * 100 + + if required_storage_gb < 100: + return 100, daily_change_bytes + elif required_storage_gb > 1000: + return 1000, daily_change_bytes + else: + return required_storage_gb, daily_change_bytes + +def format_change_rate(bytes_per_day): + """Format change rate in human readable format.""" + if bytes_per_day == 0: + return "0 B/day" + + sizes = ['B', 'KB', 'MB', 'GB', 'TB'] + scale = 1024 + + size = abs(bytes_per_day) + unit_index = 0 + while size >= scale and unit_index < len(sizes) - 1: + size /= scale + unit_index += 1 + + return f"{size:.2f} {sizes[unit_index]}/day" + +def get_eligible_collections(client, db_name, min_doc_count=10000): + """Get all collections in database that have document count >= min_doc_count.""" + db = client[db_name] + eligible_collections = [] + small_collections_count = 0 + + print(f"\nScanning collections in database '{db_name}' for collections with >= {humanize.intcomma(min_doc_count)} documents...") + + collection_names = db.list_collection_names() + for collection_name in collection_names: + try: + stats = db.command("collStats", collection_name) + doc_count = stats['count'] + + if doc_count >= min_doc_count: + collection_size = stats['size'] + avg_doc_size = stats['avgObjSize'] if doc_count > 0 else 0 + + eligible_collections.append({ + 'name': collection_name, + 'doc_count': doc_count, + 'size': collection_size, + 'avg_doc_size': avg_doc_size + }) + + print(f" ✓ {collection_name}: {humanize.intcomma(doc_count)} documents ({humanize.naturalsize(collection_size)})") + else: + small_collections_count += 1 + + except Exception as e: + print(f" ! Error analyzing {collection_name}: {str(e)}") + continue + + if small_collections_count > 0: + print(f" Found {small_collections_count} smaller collection(s) that will use default DMS settings") + + return eligible_collections + +def read_config_file(config_file="dms_buddy.cfg"): + """Read configuration from file if it exists.""" + config = {} + if os.path.exists(config_file): + print(f"Reading configuration from {config_file}") + parser = configparser.ConfigParser() + parser.read(config_file) + + if 'DMS' in parser: + dms_section = parser['DMS'] + config_params = [ + 'VpcId', 'SubnetIds', 'MultiAZ', 'SourceDBHost', 'SourceDBPort', + 'SourceDatabase', 'SourceUsername', 'SourcePassword', + 'TargetHost', 'TargetPort', 'TargetDatabase', 'TargetUsername', + 'TargetPassword', 'TargetCertificateArn', 'MigrationType', 'CollectionNameForParallelLoad' + ] + + for param in config_params: + if param in dms_section: + config[param] = dms_section[param] + + return config + +def main(): + parser = argparse.ArgumentParser( + description=""" +DMS Buddy - AWS DMS Configuration Recommender for MongoDB + +This tool analyzes your MongoDB collection and provides recommendations for AWS DMS configuration including: +1. Appropriate DMS instance type based on data transfer requirements +2. Required storage size based on current size and operation rate +3. Optimal number of partitions for parallel full load (for Full Load migrations) +4. Number of threads needed for CDC phase (for CDC migrations) + +The analysis monitors database operations to calculate the rate of change. +You can specify the monitoring time with --monitor-time (default: 10 minutes). +You can also specify the migration type with --migration-type to get targeted recommendations. + +Parameters can also be provided in a dms_buddy.cfg file in the current directory. + """, + formatter_class=argparse.RawDescriptionHelpFormatter + ) + + parser.add_argument("--source-uri", required=True, help="MongoDB connection URI") + parser.add_argument("--monitor-time", type=int, default=10, help="Monitoring time in minutes (default: 10)") + parser.add_argument("--migration-type", choices=["full-load", "cdc", "full-load-and-cdc"], + default="full-load-and-cdc", help="Migration type (default: full-load-and-cdc)") + parser.add_argument("--vpc-id", help="VPC ID for DMS replication instance") + parser.add_argument("--subnet-ids", help="Subnet IDs for DMS replication instance (comma-separated)") + parser.add_argument("--multi-az", choices=["true", "false"], default="false", + help="Whether to use Multi-AZ for DMS replication instance") + parser.add_argument("--source-host", help="Source database host") + parser.add_argument("--source-port", default="27017", help="Source database port") + parser.add_argument("--source-database", help="Source database name to analyze") + parser.add_argument("--source-username", help="Source database username") + parser.add_argument("--source-password", help="Source database password") + parser.add_argument("--target-host", help="Target database host") + parser.add_argument("--target-port", default="27017", help="Target database port") + parser.add_argument("--target-database", help="Target database name") + parser.add_argument("--target-username", help="Target database username") + parser.add_argument("--target-password", help="Target database password") + parser.add_argument("--target-certificate-arn", help="Target database SSL certificate ARN for DocumentDB connections") + parser.add_argument("--collection-name-for-parallel-load", help="Collection name to analyze and use for parallel load") + parser.add_argument("--engine-version", default="3.5.4", help="AWS DMS engine version (default: 3.5.4)") + + args = parser.parse_args() + + config = read_config_file() + + migration_type = args.migration_type if args.migration_type != "full-load-and-cdc" else config.get('MigrationType', args.migration_type) + source_database = args.source_database if args.source_database else config.get('SourceDatabase') + collection_name = args.collection_name_for_parallel_load if args.collection_name_for_parallel_load else config.get('CollectionNameForParallelLoad') + + if not source_database: + print("Error: Source database name is required. Provide it via --source-database or in dms_buddy.cfg") + return + + analyze_all_collections = not collection_name + if analyze_all_collections: + print("No specific collection provided. Will analyze all collections with >= 10,000 documents.") + + param_mapping = { + 'VpcId': args.vpc_id if args.vpc_id else config.get('VpcId', ''), + 'SubnetIds': args.subnet_ids if args.subnet_ids else config.get('SubnetIds', ''), + 'MultiAZ': args.multi_az if args.multi_az != 'false' else config.get('MultiAZ', 'false'), + 'SourceDBHost': args.source_host if args.source_host else config.get('SourceDBHost', ''), + 'SourceDBPort': args.source_port if args.source_port != '27017' else config.get('SourceDBPort', '27017'), + 'SourceDatabase': source_database, + 'SourceUsername': args.source_username if args.source_username else config.get('SourceUsername', ''), + 'SourcePassword': args.source_password if args.source_password else config.get('SourcePassword', ''), + 'TargetHost': args.target_host if args.target_host else config.get('TargetHost', ''), + 'TargetPort': args.target_port if args.target_port != '27017' else config.get('TargetPort', '27017'), + 'TargetDatabase': args.target_database if args.target_database else config.get('TargetDatabase', ''), + 'TargetUsername': args.target_username if args.target_username else config.get('TargetUsername', ''), + 'TargetPassword': args.target_password if args.target_password else config.get('TargetPassword', ''), + 'TargetCertificateArn': args.target_certificate_arn if args.target_certificate_arn else config.get('TargetCertificateArn', ''), + 'MigrationType': migration_type, + 'CollectionNameForParallelLoad': collection_name, + 'EngineVersion': args.engine_version if args.engine_version != '3.5.4' else config.get('EngineVersion', '3.5.4') + } + + required_params = ['VpcId', 'SubnetIds', 'SourceDBHost', 'SourceUsername', 'SourcePassword', + 'TargetHost', 'TargetDatabase', 'TargetUsername', 'TargetPassword', 'TargetCertificateArn'] + + missing_params = [param for param in required_params if not param_mapping[param]] + + if missing_params: + print(f"\nError: The following required parameters are missing:") + for param in missing_params: + print(f" - {param}") + print("\nThese parameters must be provided via command line arguments or in dms_buddy.cfg") + print("Example:") + print(" Command line: --vpc-id vpc-12345 --subnet-ids subnet-a,subnet-b --target-certificate-arn arn:aws:dms:...") + print(" Config file: Add the missing parameters to dms_buddy.cfg") + return + + print("\nStarting DMS configuration analysis...") + print(f"Migration type: {migration_type.upper()}") + print(f"This will take approximately {args.monitor_time} minutes to complete.") + + try: + client = pymongo.MongoClient(args.source_uri, serverSelectionTimeoutMS=5000) + db = client[source_database] + + if source_database not in client.list_database_names(): + print(f"Error: Database '{source_database}' not found") + return + + if analyze_all_collections: + eligible_collections = get_eligible_collections(client, source_database) + + if not eligible_collections: + print(f"\nNo collections found with >= 10,000 documents in database '{source_database}'") + print("Will generate empty table settings and use default values for DMS recommendations.") + + collection_name = "no-eligible-collections" + doc_count = 0 + collection_size = 0 + avg_doc_size = 0 + eligible_collections = [] + else: + print(f"\nFound {len(eligible_collections)} large collection(s) that will get optimized parallel processing.") + + else: + if collection_name not in db.list_collection_names(): + print(f"Error: Collection '{collection_name}' not found in database '{source_database}'") + return + + stats = db.command("collStats", collection_name) + doc_count = stats['count'] + collection_size = stats['size'] + avg_doc_size = stats['avgObjSize'] if doc_count > 0 else 0 + eligible_collections = [{'name': collection_name, 'doc_count': doc_count, 'size': collection_size, 'avg_doc_size': avg_doc_size}] + + ops_per_second = 0 + if migration_type != "full-load": + print("\nMonitoring database-level operations for CDC configuration...") + ops_per_second = calculate_operations_per_second(args.source_uri, source_database, None, args.monitor_time) + else: + print("\nSkipping operations monitoring for FULL-LOAD migration type...") + + if eligible_collections: + print(f"\nCalculating DMS recommendations based on all {len(eligible_collections)} collection(s)...") + + min_storage_size = 100 + min_bandwidth_mbps = 0 + min_partitions = 2 + + for coll in eligible_collections: + coll_partitions = get_partition_count(coll['doc_count']) + coll_storage, _ = calculate_storage_size(coll['size'], coll['avg_doc_size'], ops_per_second) + coll_bandwidth = (coll['avg_doc_size'] * 10000 * coll_partitions * 8) / (1024 * 1024) + + max_storage_size = max(min_storage_size, coll_storage) + max_bandwidth_mbps = max(min_bandwidth_mbps, coll_bandwidth) + max_partitions = max(min_partitions, coll_partitions) + + print(f" {coll['name']}: {coll_partitions} partitions, {coll_storage} GB storage, {round(coll_bandwidth, 2)} Mbps") + + partitions = max_partitions + storage_size = max_storage_size + bandwidth_required_mbps = max_bandwidth_mbps + parallel_threads = calculate_parallel_apply_threads(ops_per_second) + + else: + partitions = get_partition_count(doc_count) + storage_size, _ = calculate_storage_size(collection_size, avg_doc_size, ops_per_second) + parallel_threads = calculate_parallel_apply_threads(ops_per_second) + bandwidth_required_mbps = (avg_doc_size * 10000 * partitions * 8) / (1024 * 1024) + + instance_type = get_instance_type(bandwidth_required_mbps) + + print("\nDMS Configuration Recommendations:") + print("---------------------------------") + print(f"1. DMS Instance Type: {instance_type}") + print(f"2. DMS Storage Size: {storage_size} GB") + + # Only show parallel apply threads for CDC-related migration types + if migration_type == "cdc" or migration_type == "full-load-and-cdc": + print(f"3. Parallel Apply Threads: {parallel_threads} (for CDC)") + + table_settings_json_parts = [] + rule_id = 10 + + print(f"\nGenerating optimized collection settings for {len(eligible_collections)} large collection(s):") + for coll in eligible_collections: + coll_partitions = get_partition_count(coll['doc_count']) + table_setting = { + "rule-type": "table-settings", + "rule-id": str(rule_id), + "rule-name": str(rule_id), + "rule-action": "include", + "filters": [], + "object-locator": { + "schema-name": source_database, + "table-name": coll['name'] + }, + "parallel-load": { + "number-of-partitions": coll_partitions, + "type": "partitions-auto" + } + } + table_settings_json_parts.append(json.dumps(table_setting)) + print(f" ✓ {coll['name']}: {coll_partitions} partitions ({humanize.intcomma(coll['doc_count'])} docs)") + rule_id += 1 + + table_settings_string = ",".join(table_settings_json_parts) + + parameters = [] + parameters.append({"ParameterKey": "ReplicationInstanceClass", "ParameterValue": instance_type}) + parameters.append({"ParameterKey": "AllocatedStorage", "ParameterValue": str(storage_size)}) + parameters.append({"ParameterKey": "NumberOfPartitions", "ParameterValue": str(partitions)}) + parameters.append({"ParameterKey": "ParallelApplyThreads", "ParameterValue": str(parallel_threads)}) + + if partitions > 8: + parameters.append({"ParameterKey": "MaxFullLoadSubTasks", "ParameterValue": str(partitions)}) + + parameters.append({"ParameterKey": "TableSettings", "ParameterValue": table_settings_string}) + + # Set CollectionNameForParallelLoad based on analysis mode + if analyze_all_collections: + # For all collections analysis, use % wildcard + param_mapping['CollectionNameForParallelLoad'] = "%" + else: + # For single collection analysis, use the specific collection name + param_mapping['CollectionNameForParallelLoad'] = collection_name if collection_name else "" + + for key, value in param_mapping.items(): + parameters.append({"ParameterKey": key, "ParameterValue": value}) + + with open('parameter.json', 'w') as f: + json.dump(parameters, f, indent=2) + + print(f"\nParameters written to parameter.json") + print(f"Table settings generated for {len(eligible_collections)} collection(s)") + + print(f"\nMigration Summary:") + if analyze_all_collections: + # Summary for all collections analysis + total_collections = len(db.list_collection_names()) + print(f"- Total collections to be migrated: {total_collections}") + print(f"- Collections with optimized parallel settings: {len(eligible_collections)}") + print(f"- Collections using default settings: {total_collections - len(eligible_collections)}") + print(f"\nNote: ALL collections in the database will be migrated. Large collections (≥10K docs)") + print(f"get optimized parallel processing, while smaller collections use efficient default settings.") + else: + # Summary for single collection analysis + if eligible_collections: + coll = eligible_collections[0] + coll_partitions = get_partition_count(coll['doc_count']) + print(f"- Collection to be migrated: {collection_name}") + print(f"- Document count: {humanize.intcomma(coll['doc_count'])}") + print(f"- Collection size: {humanize.naturalsize(coll['size'])}") + print(f"- Parallel load partitions: {coll_partitions}") + print(f"\nNote: Only the specified collection '{collection_name}' will be migrated with optimized settings.") + else: + print(f"- Collection to be migrated: {collection_name}") + print(f"- Collection will use default DMS settings") + print(f"\nNote: Only the specified collection '{collection_name}' will be migrated.") + + except pymongo.errors.ServerSelectionTimeoutError: + print(f"Error: Could not connect to MongoDB server at {args.source_uri}") + print("Please check that the server is running and the URI is correct") + except pymongo.errors.OperationFailure as e: + print(f"Error: Authentication failed or insufficient permissions: {str(e)}") + except pymongo.errors.ConfigurationError as e: + print(f"Error: Invalid MongoDB URI format: {str(e)}") + except Exception as e: + print(f"\nError: {str(e)}") + finally: + if 'client' in locals(): + client.close() + +if __name__ == "__main__": + main() diff --git a/migration/dms_buddy/requirements.txt b/migration/dms_buddy/requirements.txt new file mode 100644 index 0000000..efd45b0 --- /dev/null +++ b/migration/dms_buddy/requirements.txt @@ -0,0 +1,2 @@ +pymongo>=4.0.0 +humanize>=4.0.0 diff --git a/migration/export-users/README.md b/migration/export-users/README.md index 9dca304..9914753 100644 --- a/migration/export-users/README.md +++ b/migration/export-users/README.md @@ -1,21 +1,23 @@ # Export Users tool - -This tool will export Amazon DocumentDB or MongoDB users to a file, which then can be used to import them to other instance. Note: Passwords are not exported. +This tool will export Amazon DocumentDB or MongoDB users and custom roles to files, which then can be used to create them in another cluster. Note: Passwords are not exported. # Requirements - Python 3.7+ - PyMongo ## Using the Export Users Tool -`python3 docdbExportUsers.py --users-file --uri ` +`python3 docdbExportUsers.py --users-file --roles-file --uri ` ## Example: -`python3 docdbExportUsers.py --users-file mydocdb-users.js --uri "mongodb://user:password@mydocdb.cluster-cdtjj00yfi95.eu-west-2.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=rds-combined-ca-bundle.pem&replicaSet=rs0&retryWrites=false"` +`python3 docdbExportUsers.py --users-file mydocdb-users.js --roles-file mydocdb-roles.js --uri "mongodb://user:password@mydocdb.cluster-cdtjj00yfi95.eu-west-2.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=rds-combined-ca-bundle.pem&replicaSet=rs0&retryWrites=false"` -## Restore users -Edit the file and update passwords for each user. Run the .js script: +## Restore custom roles +Run the custom roles .js script: +`mongo --ssl --host mydocdb.cluster-cdtjj00yfi95.eu-west-2.docdb.amazonaws.com:27017 --sslCAFile rds-combined-ca-bundle.pem --username --password mydocdb-roles.js` -`mongo --ssl --host mydocdb.cluster-cdtjj00yfi95.eu-west-2.docdb.amazonaws.com:27017 --sslCAFile rds-combined-ca-bundle.pem --username --password --password mydocdb-users.js` ## License -This tool is licensed under the Apache 2.0 License. +This tool is licensed under the Apache 2.0 License. \ No newline at end of file diff --git a/migration/export-users/docdbExportUsers.py b/migration/export-users/docdbExportUsers.py index 4bab90b..a0d6424 100644 --- a/migration/export-users/docdbExportUsers.py +++ b/migration/export-users/docdbExportUsers.py @@ -3,24 +3,96 @@ import pymongo +rolesToExport = {} + + def exportUsers(appConfig): - client = pymongo.MongoClient(appConfig['uri']) - listusers = client.admin.command('usersInfo', {'forAllDBs': True}) - with open(appConfig['usersFile'], "w+", encoding='utf-8') as f: - print("use admin", file=f) - for user in listusers['users']: + client = pymongo.MongoClient(host=appConfig['uri'], appname='userexp') + database_names = client.list_database_names() + database_names.append("$external") + + f = open(appConfig['usersFile'], "w+", encoding='utf-8') + for database_name in database_names: + print("") + if (database_name == 'local'): + print(f"Skipping database: {database_name}") + continue + + print(f"Checking database: {database_name}") + database = client[database_name] + users = database.command('usersInfo') + if len(users['users']) == 0: + print(f"No users in database: {database_name}") + continue + + use_db_printed = False + for user in users['users']: """ Exclude serviceadmin user """ if user['user'] == "serviceadmin": continue + + if (database_name == "$external") and (user['user'].startswith("arn:aws:iam::") == False): + print(f"Skipping user: {user['user']}, user must start with 'arn:aws:iam::'") + continue + print(f"Exporting user: {user['user']}") + + if (use_db_printed == False): + print(f"use {database_name}", file=f) + use_db_printed = True + print('db.createUser({user: "' + user['user'] + '", pwd: "REPLACE_THIS_PASS",' + ' roles: ' + str(user['roles']) + '});', file=f) + + print(f"Checking roles for user: {user['user']}") + for userRole in user['roles']: + checkRole(database, userRole, database_name) + + f.close() print(f"Done! Users exported to {appConfig['usersFile']}") +def checkRole(database, userRole, database_name): + print (f"Checking role {userRole}") + """ A role can be assigned to multiple users so we only want to export the role definition once """ + """ Build a dictionary to keep track of all user-defined roles assigned to users being exported """ + try: + roleInfo = database.command({'rolesInfo': {'role': userRole['role'], 'db': userRole['db']}, 'showPrivileges': True, 'showBuiltinRoles': False}) + + if len(roleInfo['roles']) == 1: + role = roleInfo['roles'][0] + if (role['isBuiltin'] == False): + """ Check role against list of roles supported by DocumentDB """ + if not role['role'] in rolesToExport: + """ If this is a user-defined role not already marked for export, mark it for export """ + rolesToExport[role['role']] = role + + except pymongo.errors.OperationFailure as e: + # DocumentDB does not allow custom roles in $external database + if (database_name == "$external"): + pass + else: + raise e + + +def exportRoles(appConfig): + with open(appConfig['rolesFile'], "w+", encoding='utf-8') as f: + print("use admin", file=f) + for role in rolesToExport: + print(f"Exporting role: {role}") + privileges = str(rolesToExport[role]['privileges']) + """ convert Python True/False to JSON true/false """ + privileges = privileges.replace(": True}", ": true}") + privileges = privileges.replace(": False}", ": false}") + print('db.createRole({role: "' + rolesToExport[role]['role'] + '", privileges: ' + privileges + ', roles: ' + str(rolesToExport[role]['roles']) + '});', file=f) + + f.close() + print(f"Done! Roles exported to {appConfig['rolesFile']}") + + def main(): """ v1: Initial script, export users to a file """ - parser = argparse.ArgumentParser(description='Export Amazon DocumentDB users to user_output.js file, can be used to import them to other instance. Note: Passwords are not exported.') + parser = argparse.ArgumentParser(description='Export Amazon DocumentDB users and user defined roles to user_output.js file, can be used to import them to other instance. Note: Passwords are not exported.') parser.add_argument('--skip-python-version-check', required=False, @@ -37,6 +109,11 @@ def main(): type=str, help='The users output file') + parser.add_argument('--roles-file', + required=True, + type=str, + help='The roles output file') + args = parser.parse_args() MIN_PYTHON = (3, 7) @@ -46,8 +123,10 @@ def main(): appConfig = {} appConfig['uri'] = args.uri appConfig['usersFile'] = args.users_file + appConfig['rolesFile'] = args.roles_file exportUsers(appConfig) + exportRoles(appConfig) if __name__ == "__main__": diff --git a/migration/json-import/README.md b/migration/json-import/README.md new file mode 100644 index 0000000..34d1544 --- /dev/null +++ b/migration/json-import/README.md @@ -0,0 +1,71 @@ +# Amazon DocumentDB JSON Import Tool + +The purpose of the JSON Import Tool is to load JSON formatted data from a single file into DocumentDB or MongoDB in parallel. Input file must contain one JSON document per line. + +## Prerequisites: + + - Python 3 + - Modules: pymongo +``` + pip3 install pymongo +``` +## How to use + +1. Clone the repository and go to the tool folder: +``` +git clone https://github.com/awslabs/amazon-documentdb-tools.git +cd amazon-documentdb-tools/migration/json-import/ +``` + +2. Run the json-import.py tool, which accepts the following arguments: + +``` +python3 json-import.py --help +usage: json-import.py [-h] --uri URI --file-name FILE_NAME --operations-per-batch OPERATIONS_PER_BATCH --workers WORKERS --database DATABASE --collection COLLECTION --log-file-name LOG_FILE_NAME + [--skip-python-version-check] [--lines-per-chunk LINES_PER_CHUNK] [--debug-level DEBUG_LEVEL] --mode {insert,replace,update} [--drop-collection] + +Bulk/Concurrent JSON file import utility. + +optional arguments: + -h, --help show this help message and exit + --uri URI URI + --file-name FILE_NAME + Name of JSON file to load + --operations-per-batch OPERATIONS_PER_BATCH + Number of operations per batch + --workers WORKERS Number of parallel workers + --database DATABASE Database name + --collection COLLECTION + Collection name + --log-file-name LOG_FILE_NAME + Log file name + --skip-python-version-check + Permit execution on Python 3.6 and prior + --lines-per-chunk LINES_PER_CHUNK + Number of lines each worker reserves before jumping ahead in the file to the next chunk + --debug-level DEBUG_LEVEL + Debug output level. + --mode {insert,replace,update} + Mode - insert, replace, or update + --drop-collection Drop the collection prior to loading data + +``` + +## Example usage: +Load data (as inserts) from JSON formatted file load-me.json + +``` +python3 json-import.py \ + --uri "mongodb://user:password@target.cluster.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false" \ + --file-name load-me.json + --operations-per-batch 100 + --workers 4 + --database jsonimport + --collection coll1 + --log-file-name json-import-log-file.log + --lines-per-chunk 1000 + --mode insert + --drop-collection +``` + +For more information on the connection string format, refer to the [documentation](https://www.mongodb.com/docs/manual/reference/connection-string/). diff --git a/migration/json-import/json-import.py b/migration/json-import/json-import.py new file mode 100644 index 0000000..cb83b33 --- /dev/null +++ b/migration/json-import/json-import.py @@ -0,0 +1,341 @@ +from datetime import datetime, timedelta +from queue import Queue, Full, Empty +import sys +import random +import json +import pymongo +from pymongo import InsertOne, DeleteOne, ReplaceOne, UpdateOne +import time +import threading +import os +import multiprocessing as mp +import argparse +from bson.json_util import loads + + +def initializeLogFile(appConfig): + with open(appConfig['logFileName'], "w") as logFile: + logFile.write("") + + +def logAndPrint(appConfig,string): + with open(appConfig['logFileName'], "a") as logFile: + logFile.write(string+"\n") + print(string) + + +def setup(appConfig): + if appConfig['dropCollection']: + logAndPrint(appConfig," dropping the collection") + client = pymongo.MongoClient(host=appConfig['uri'],appname='jsonimp') + col = client[appConfig['databaseName']][appConfig['collectionName']] + col.drop() + client.close() + + +def reportCollectionInfo(appConfig): + client = pymongo.MongoClient(host=appConfig['uri'],appname='jsonimp') + db = client[appConfig['databaseName']] + + collStats = db.command("collStats", appConfig['collectionName']) + + compressionRatio = collStats['size'] / collStats['storageSize'] + gbDivisor = 1024*1024*1024 + + logAndPrint(appConfig,"collection statistics | numDocs = {0:12,d}".format(collStats['count'])) + logAndPrint(appConfig,"collection statistics | avgObjSize = {0:12,d}".format(int(collStats['avgObjSize']))) + logAndPrint(appConfig,"collection statistics | size (GB) = {0:12,.4f}".format(collStats['size']/gbDivisor)) + logAndPrint(appConfig,"collection statistics | storageSize (GB) = {0:12,.4f} ".format(collStats['storageSize']/gbDivisor)) + logAndPrint(appConfig,"collection statistics | compressionRatio = {0:12,.4f}".format(compressionRatio)) + logAndPrint(appConfig,"collection statistics | totalIndexSize (GB) = {0:12,.4f}".format(collStats['totalIndexSize']/gbDivisor)) + + client.close() + + +def reporter(appConfig,perfQ): + numSecondsFeedback = 10 + numIntervalsTps = 5 + numWorkers = appConfig['numWorkers'] + + if appConfig['debugLevel'] >= 1: + logAndPrint(appConfig,'starting reporting thread') + + recentTps = [] + + startTime = time.time() + lastTime = time.time() + lastTotalOps = 0 + nextReportTime = startTime + numSecondsFeedback + intervalLatencyMs = 0 + + numWorkersCompleted = 0 + totalOps = 0 + + queueMessagesProcessed = 0 + + while (numWorkersCompleted < numWorkers): + time.sleep(numSecondsFeedback) + nowTime = time.time() + + numLatencyBatches = 0 + numLatencyMs = 0 + + queueMessagesProcessed = 0 + queueDrained = False + while not queueDrained: + try: + qMessage = perfQ.get_nowait() + except Empty: + queueDrained = True + queueMessagesProcessed += 1 + if qMessage['name'] == "batchCompleted": + totalOps += qMessage['operations'] + numLatencyBatches += 1 + numLatencyMs += qMessage['latency'] + elif qMessage['name'] == "processCompleted": + numWorkersCompleted += 1 + + # total total + elapsedSeconds = nowTime - startTime + opsPerSecond = totalOps / elapsedSeconds + + # elapsed hours, minutes, seconds + thisHours, rem = divmod(elapsedSeconds, 3600) + thisMinutes, thisSeconds = divmod(rem, 60) + thisHMS = "{:0>2}:{:0>2}:{:05.2f}".format(int(thisHours),int(thisMinutes),thisSeconds) + + # this interval + intervalElapsedSeconds = nowTime - lastTime + intervalOps = totalOps - lastTotalOps + if intervalElapsedSeconds > 0: + intervalOpsPerSecond = intervalOps / intervalElapsedSeconds + else: + intervalOpsPerSecond = 0 + if numLatencyBatches > 0: + intervalLatencyMs = numLatencyMs // numLatencyBatches + else: + intervalLatencyMs = 0 + + # recent intervals + if len(recentTps) == numIntervalsTps: + recentTps.pop(0) + recentTps.append(intervalOpsPerSecond) + totRecentTps = 0 + for thisTps in recentTps: + totRecentTps += thisTps + avgRecentTps = totRecentTps / len(recentTps) + + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' + logAndPrint(appConfig,"[{}] elapsed {} | total ins/upd {:16,d} at {:12,.2f} p/s | last {} {:12,.2f} p/s | interval {:12,.2f} p/s | lat (ms) {:12}" + .format(logTimeStamp,thisHMS,totalOps,opsPerSecond,numIntervalsTps,avgRecentTps,intervalOpsPerSecond,intervalLatencyMs)) + nextReportTime = nowTime + numSecondsFeedback + + lastTime = nowTime + lastTotalOps = totalOps + + +def task_worker(workerNum,appConfig,perfQ): + numOpsPerBatch = appConfig['numOpsPerBatch'] + numWorkers = appConfig['numWorkers'] + linesPerChunk = appConfig['linesPerChunk'] + opMode = appConfig['mode'] + numChunks = 1 + + myLineStart = (workerNum*linesPerChunk)+1 + myLineEnd = (workerNum+1)*linesPerChunk + + if appConfig['debugLevel'] >= 1: + logAndPrint(appConfig,"worker {} - start {} end {} chunk {}".format(workerNum,myLineStart,myLineEnd,numChunks)) + + client = pymongo.MongoClient(host=appConfig['uri'],appname='jsonimp') + db = client[appConfig['databaseName']] + col = db[appConfig['collectionName']] + + if appConfig['debugLevel'] >= 1: + logAndPrint(appConfig,"starting worker process {} - using collection {}.{}".format(workerNum,appConfig['databaseName'],appConfig['collectionName'])) + + startTime = time.time() + lastTime = time.time() + + numBatchesCompleted = 0 + numBatchOps = 0 + fileLineNum = 0 + insList = [] + + with open(appConfig['fileName'], 'r') as f: + for thisLine in f: + fileLineNum += 1 + + if (fileLineNum >= myLineStart) and (fileLineNum <= myLineEnd): + # add to batch + thisDict = loads(thisLine) + numBatchOps += 1 + + if opMode == 'insert': + insList.append(InsertOne(thisDict.copy())) + elif opMode == 'replace': + insList.append(ReplaceOne({"_id":thisDict['_id']},thisDict.copy(),upsert=True)) + elif opMode == 'update': + insList.append(UpdateOne({"_id":thisDict['_id']},{"$set":thisDict.copy()},upsert=True)) + + if (numBatchOps >= numOpsPerBatch): + batchStartTime = time.time() + result = col.bulk_write(insList, ordered=False) + batchElapsedMs = int((time.time() - batchStartTime) * 1000) + numBatchesCompleted += 1 + perfQ.put({"name":"batchCompleted","operations":numBatchOps,"latency":batchElapsedMs,"timeAt":time.time()}) + insList = [] + numBatchOps = 0 + + if (fileLineNum == myLineEnd): + # increment boundaries + myLineStart += (numWorkers*linesPerChunk) + myLineEnd += (numWorkers*linesPerChunk) + numChunks += 1 + if appConfig['debugLevel'] >= 1: + logAndPrint(appConfig,"worker {} - start {} end {} chunk {}".format(workerNum,myLineStart,myLineEnd,numChunks)) + + if numBatchOps > 0: + batchStartTime = time.time() + result = col.bulk_write(insList, ordered=False) + batchElapsedMs = int((time.time() - batchStartTime) * 1000) + numBatchesCompleted += 1 + perfQ.put({"name":"batchCompleted","operations":numBatchOps,"latency":batchElapsedMs,"timeAt":time.time()}) + + client.close() + + perfQ.put({"name":"processCompleted","processNum":workerNum,"timeAt":time.time()}) + + +def main(): + parser = argparse.ArgumentParser(description='Bulk/Concurrent JSON file import utility.') + + parser.add_argument('--uri', + required=True, + type=str, + help='URI') + + parser.add_argument('--file-name', + required=True, + type=str, + help='Name of JSON file to load') + + parser.add_argument('--operations-per-batch', + required=True, + type=str, + help='Number of operations per batch') + + parser.add_argument('--workers', + required=True, + type=int, + help='Number of parallel workers') + + parser.add_argument('--database', + required=True, + type=str, + help='Database name') + + parser.add_argument('--collection', + required=True, + type=str, + help='Collection name') + + parser.add_argument('--log-file-name', + required=True, + type=str, + help='Log file name') + + parser.add_argument('--skip-python-version-check', + required=False, + action='store_true', + help='Permit execution on Python 3.6 and prior') + + parser.add_argument('--lines-per-chunk', + required=False, + type=int, + default=1000, + help='Number of lines each worker reserves before jumping ahead in the file to the next chunk') + + parser.add_argument('--debug-level', + required=False, + type=int, + default=0, + help='Debug output level.') + + parser.add_argument('--mode', + required=True, + type=str, + choices=['insert', 'replace', 'update'], + help='Mode - insert, replace, or update') + + parser.add_argument('--drop-collection', + required=False, + action='store_true', + help='Drop the collection prior to loading data') + + args = parser.parse_args() + + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + appConfig = {} + appConfig['uri'] = args.uri + appConfig['numOpsPerBatch'] = int(float(args.operations_per_batch)) + appConfig['numWorkers'] = int(args.workers) + appConfig['databaseName'] = args.database + appConfig['collectionName'] = args.collection + appConfig['fileName'] = args.file_name + appConfig['logFileName'] = args.log_file_name + appConfig['linesPerChunk'] = args.lines_per_chunk + appConfig['debugLevel'] = args.debug_level + appConfig['mode'] = args.mode + appConfig['dropCollection'] = args.drop_collection + + initializeLogFile(appConfig) + + logAndPrint(appConfig,'---------------------------------------------------------------------------------------') + for thisKey in appConfig: + if (thisKey == 'uri'): + thisUri = appConfig[thisKey] + thisParsedUri = pymongo.uri_parser.parse_uri(thisUri) + thisUsername = thisParsedUri['username'] + thisPassword = thisParsedUri['password'] + thisUri = thisUri.replace(thisUsername,'') + thisUri = thisUri.replace(thisPassword,'') + logAndPrint(appConfig," config | {} | {}".format(thisKey,thisUri)) + else: + logAndPrint(appConfig," config | {} | {}".format(thisKey,appConfig[thisKey])) + logAndPrint(appConfig,'---------------------------------------------------------------------------------------') + + setup(appConfig) + + mp.set_start_method('spawn') + + random.seed() + + q = mp.Manager().Queue() + + t = threading.Thread(target=reporter,args=(appConfig,q)) + t.start() + + processList = [] + for loop in range(appConfig['numWorkers']): + p = mp.Process(target=task_worker,args=(loop,appConfig,q)) + processList.append(p) + + for process in processList: + process.start() + + for process in processList: + process.join() + + t.join() + + reportCollectionInfo(appConfig) + + +if __name__ == "__main__": + main() + + diff --git a/migration/migration-utility-for-couchbase/README.md b/migration/migration-utility-for-couchbase/README.md new file mode 100644 index 0000000..d4f85f6 --- /dev/null +++ b/migration/migration-utility-for-couchbase/README.md @@ -0,0 +1,454 @@ +# How to perform a live migration from Couchbase to Amazon DocumentDB (with MongoDB Compatibility) + +This document explains how you can perform a live migration from Couchbase to Amazon DocumentDB and walks you through deploying the solution and performing a live migration of the Couchbase `beer-sample` sample bucket to an Amazon DocumentDB cluster. + +## Solution overview + +The solution uses Amazon Managed Streaming for Apache Kafka (MSK) to perform a full load of existing data and replicate ongoing changes from Couchbase to Amazon DocumentDB. The solution keeps the target Amazon DocumentDB arget cluster in sync with the source Couchbase cluster until the client applications are cutover to the Amazon DocumentDB cluster. It makes use of the following connectors: + +- [Couchbase Kafka connector](https://docs.couchbase.com/kafka-connector/current/index.html) to stream documents from Couchbase Server and publish them to a Kafka topic in near-real time. +- [MongoDB Kafka connector](https://www.mongodb.com/docs/kafka-connector/current/) to read data from a Kafka topic and write it to Amazon DocumentDB. + +The solution also uses: +- [AWS Cloudformation](https://aws.amazon.com/cloudformation/) to deploy the solution. +- [AWS Identity and Access Management (IAM)](https://aws.amazon.com/iam/) to manage access to the AWS services and resources used in this solution. +- [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) for logging and monitoring of the migration. +- [Amazon DocumentDB](https://aws.amazon.com/documentdb/) for a single instance cluser as the migration target. You can scale out as necessary. +- [Amazon Elastic Compute Cloud (EC2)](https://aws.amazon.com/ec2/) for a bastion host (and associated resources) you can use to manage the Kafka topic. +- [Amazon MSK](https://aws.amazon.com/msk/) to stream data from Couchbase to Amazon DocumentDB. + +![solution overview diagram](./static/images/solution-overview.png) + +## Pre-requisites + +To deploy this solution you will need the following: +1. A Couchbase cluster. +2. An [Amazon Virtual Private Cloud (VPC)](https://aws.amazon.com/vpc/) with +* connectivity to the Couchbase cluster +* 3 private subnets +* 1 public subnet +3. An EC2 key pair to use to connect to the EC2 bastion host. +4. An [Amazon S3](https://aws.amazon.com/pm/serv-s3/) general purpose bucket to store MSK custom plugin and connector resources. + + +## Step 1 - Deploy IAM, EC2, & Amazon DocumentDB resources and MSK cluster +* Download [migration-utility.yaml](./migration-utility.yaml). +* This CloudFormation template creates all required IAM, EC2, and Amazon DocumentDB resources and the MSK cluster. The Amazon DocumentDB cluster is configured to use a custom parameter group with collection-level document compression enabled. The default compression threshold is 2KB. A different value can be specified for new collections using the `createCollection` command, and changed for existing collections using `collMod` command. See [Managing collection-level document compression](https://docs.aws.amazon.com/documentdb/latest/developerguide/doc-compression.html) for more information. + +### [CloudFormation console](https://console.aws.amazon.com/cloudformation/home) +* Select **Stacks**. +* Select **Create stack → With new resources (standard)**. +* **Create stack** step: + * **Specify template** section: + * Select **Upload a template file**. + * Select **Choose file** and choose the `migration-utility.yaml` file downloaded above. + * Select **Next**. +* **Specify stack details** step: + * **Provide a stack name** section: + * **Stack name**: `migration-utility` + * **Parameters** section + * **DocumentdbInstanceType**: select desired instance type of DocumentDB cluster primary instance + * **DocumentdbPassword**: specify password of DocumentDB user + * **DocumentdbUsername**: specify username of DocumentDB user + * **Ec2KeyPairName**: specify name of existing EC2 key pair that will be used for the new EC2 instance bastion host + * **LatestAmiId**: do not change this value + * **MskBrokerNodes**: specify the number of broker nodes for the MSK cluster that will be created (multiple of the number of private subnets). Refer to [Best practices for Express brokers](https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices-express.html) to determine the number of broker instances. + * **MskClusterBrokerInstanceType**: select desired instance type of MSK cluster broker + * **PrivateSubnets**: select 3 private subnets in VPC specified in `VpcId`. The DocumentDB instance and MSK brokers will be created in these subnets. + * **PublicSubnetId**: select a public in the VPC specified in `VpcId`. An EC2 instance will be created in this subnet for interacting with the Kafka topic used for migration. + * **S3BucketName**: specify the name of an existing S3 general purpose bucket that will be used to store the custom connector .zip files and trust store .jks file for connecting to the Amazon DocumentDB cluster + * **SshIngressIpAddress**: specify the IP address (in CIDR notation) to allow SSH traffic to EC2 instance + * **VpcId**: specify ID of existing VPC where EC2 instance, Amazon DocumentDB cluster, & MSK cluster will be created + * Select **Next**. +* **Configure stack options** step: + * **Stack failure options** section: + * Select **Preserve successfully provisioned resources**. + * **Capabilities** section: + * Select the **I acknowledge that AWS CouldFormation might create IAM resources with custom names.** checkbox. + * Select **Next**. +* **Review and Create** step: + * **Parameters** section: + * Ensure all parameters are correct. + * Select **Submit**. +* It will take ~40 minutes for the stack to fully deploy. +* Select the **Outputs** tab and note the **MigrationMSKRoleARN**, **S3BucketName**, and **SecurityGroupId** values. These will be used as parameters when deploying the next CloudFormation template. +![CloudFormation Couchbase to Amazon DocumentDB output](./static/images/cloudformation-migration-utility-output.png) + +Confirm the following files exist in the S3 bucket you specified: +* `couchbase-kafka-connect-couchbase-4.2.8.zip` +* `docdb-custom-plugin.zip` +* `docdb-truststore.jks` + +If they do not exist, SSH to the EC2 instance and check these log files: +* `createTruststore.log` +* `setup.log` + +## Step 2 - Modify Couchbase cluster security group to allow inbound traffic from `cfn-migration-security-group`. +If you are migrating from self-managed Couchbase on EC2, modify the security group of the EC2 instance(s) to allow inbound traffic from the MSK cluster. + +### [EC2 console](https://console.aws.amazon.com/ec2/home) +* Select **Network & Security → Security Groups**. +* Select the security group used for your Couchbase cluster and then select **Inbound rules**. +![EC2 select Couchbase security group inbound rules](./static/images/ec2-select-couchbase-security-group-inbound-rules.png) + +* Select **Edit inbound rules**. +![EC2 edit Couchbase security group inbound rules](./static/images/ec2-edit-couchbase-security-group-inbound-rules.png) + +* Select **Add rule** and specify: + * **Inbound rules** section: + * **Type**: All traffic + * **Source**: Custom + * **Search box**: `cfn-migration-security-group` + * Select **Save rules**. + +## Step 3 - Validate Amazon DocumentDB connectivity and create target collection(s) & indexes. +### [EC2 console](https://console.aws.amazon.com/ec2/home) +* Select **Instances → Instances**. +* Select the **cfn-msk-ec2** instance checkbox and then select **Connect**. +* Copy the example SSH command. +![EC2 copy SSH command](./static/images/ec2-copy-ssh-command.png) +* Go to the server or system represented by the **SshIngressIpAddress** value you specified when deploying `migration-utility.yaml`. +* Paste and execute the copied command. +* Confirm that you can login to the EC2 instance. +``` + , #_ + ~\_ ####_ Amazon Linux 2023 + ~~ \_#####\ + ~~ \###| + ~~ \#/ ___ https://aws.amazon.com/linux/amazon-linux-2023 + ~~ V~' '-> + ~~~ / + ~~._. _/ + _/ _/ + _/m/' +Last login: Mon Jun 9 15:33:18 2025 from XXX.XXX.XXX.XXX +``` + +### [Amazon DocumentDB console](https://console.aws.amazon.com/docdb/home) +* Select **Clusters**. +* Click the link for the `cfn-documentdb-target` cluster. +* Select the **Connectivity & security** tab. +* **Connect** section: + * Select **Copy** next to **Connect to this cluster with the mongo shell**. + * Replace `` with the **DocumentdbPassword** value you specified when deploying `migration-utility.yaml`. +![Amazon DocumentDB connect with mongo shell](./static/images/amazon-documentdb-connect-with-mongo-shell.png) + +### EC2 bastion host +* Paste and execute the `mongosh` command copied above. +* Confirm that you see the following: +``` +rs0 [direct: primary] test> +``` + +It is a best practice to create indexes before migrating data so create the target collection(s) and indexes. +* Create the required target database(s) and collection(s). These instructions use the `test` collection in the `beer-sample` database with the default 2 KB threshold. See [Setting the compression thresholds](https://docs.aws.amazon.com/documentdb/latest/developerguide/doc-compression.html#manage-compression) to create a collection with a different compression threshold. +``` +use beer-sample +db.createCollection("test") +``` +* You should see an `ok` return code from the `createCollection()` command. +``` +{ ok: 1 } +``` +* Show the collections in the `beer-sample` database. +``` +show collections +``` +* Confirm that you see the `test` collection. +``` +test +``` +* Create the required indexes on the target collection(s). For example, create an index on the `type` field in the `test` collection: +``` +db.test.createIndex({"type": 1}) +``` +* You should see the name of the newly created index. +``` +type_1 +``` +* View the indexes that exist on the `test` collection. +``` +db.test.getIndexes() +``` +* All collections in Amazon DocumentDB have a default index on the `_id` field so in this example there will be two indexes. +``` +[ + { v: 4, key: { _id: 1 }, name: '_id_', ns: 'beer-sample.test' }, + { v: 4, key: { type: 1 }, name: 'type_1', ns: 'beer-sample.test' } +] +``` +* Exit mongo shell by typing `exit`. +``` +exit +``` +## Step 4 - Deploy MSK Connect resources. +*This guide assumes you are using the `test` collection in the `beer-sample` database. If you specify a different collection and database, modify the commands accordingly.* +* Download [migration-utility-connectors.yaml](./migration-utility-connectors.yaml). +* Edit `migration-utility-connectors.yaml`. + * Provide the values for the following in the `DocumentDbSinkConnector.Properties.ConnectorConfiguration` section: + * **database** (line 117): target Amazon DocumentDB database name (e.g. `beer-sample`) + * **collection** (line 118): target Amazon DocumentDB collection name (e.g. `test`) + * **connection.uri** (line 119): copy the connection string from the Amazon DocumentDB console + * [Amazon DocumentDB console](https://console.aws.amazon.com/docdb/home) + * Select **Clusters**. + * Click the link for the `cfn-documentdb-target` cluster. + * Select the **Connectivity & security** tab. + * **Connect** section: + * Select **Copy** next to **Connect to this cluster with an application**. + * Replace `` with your password and use this as the value for **connection.uri**. +![Amazon DocumentDB connect with an application](./static/images/amazon-documentdb-connect-with-an-application.png) + + * Provide the values for the following in the `CouchbaseSourceSinkConnector.Properties.ConnectorConfiguration` section: + * **couchbase.seed.nodes** (line 182): Couchbase source seed nodes + * **couchbase.bucket** (line 183): source Couchbase bucket + * **couchbase.username** (line 184): Couchbase user username + * **couchbase.password** (line 185): Couchbase user password + +### [CloudFormation console](https://console.aws.amazon.com/cloudformation/home) +* Select **Stacks**. +* Select **Create stack → With new resources (standard)**. +* **Create stack** step: + * **Specify template** section: + * Select **Upload a template file**. + * Select **Choose file** and choose the `migration-utility-connectors.yaml` file downloaded above. + * Select **Next**. +* **Specify stack details** step: + * **Provide a stack name** section: + * **Stack name**: `migration-utility-connectors` + * **Parameters** section + * **BootstrapServers**: copy the private endpoint value from the Amazon MSK console + * [Amazon MSK console](https://console.aws.amazon.com/msk/home) + * Select **MSK Clusters → Clusters**. + * Click the **`cfn-msk-cluster`** link. + * Select **View client information**. +![MSK cluster client information](./static/images/msk-cluster-client-information.png) + * Copy the **Private endpoint (single-VPC)** value and use as the value for **BootstrapServers**. +![MSK cluster bootstrap servers](./static/images/msk-cluster-bootstrap-servers.png) + * **CouchbaseSourceMcuCount**: specify the number of microcontroller units (MCUs) per worker (e.g. 2). Each MCU provides 1 vCPU of compute and 4 GiB of memory. Connectors are scaled by increasing the number of workers and/or the MCU count per worker to adjust for workload changes. + * **CouchbaseSourceMcuWorkers**: specify the number of workers (e.g. 1). A worker is a Java virtual machine (JVM) process that executes the logic of a Kafka Connect connector in Amazon MSK Connect. + * **DocumentDbSinkMcuCount**: specify the number of MCUs per worker (e.g. 2) + * **DocumentDbSinkMcuWorkers**: specify the number of workers (e.g. 2). For Amazon DocumentDB, more workers provide more parallelism and higher update rates. + * **MigrationMSKRoleARN**: use the value of the **MigrationMSKRoleARN** key from the `migration-utility.yaml` **Outputs** tab + * **PrivateSubnets**: select 3 private subnets in VPC specified in VpcId. These must be the same 3 private subnets used when deploying migration-utility.yaml. + * **S3BucketName**: use the value of this Key from the `migration-utility.yaml` **Outputs** tab. + * **SecurityGroupId**: use the value of this Key from the `migration-utility.yaml` **Outputs** tab. + * Select **Next**. +* **Configure stack options** step: + * **Stack failure options** section: + * Select **Preserve successfully provisioned resources**. + * Select **Next**. +* **Review and Create** step: + * **Parameters** section: + * Ensure all parameters are correct. + * Select **Submit**. +* It will take ~15 minutes for the stack to fully deploy. + +## Step 5 - Create Kafka topic to use for live migration. +*Note that the migration will start immediately after creating the Kafka topic. When you deployed `migration-utility-connectors.yaml`, the Couchbase source and Amazon DocumentDB sink connectors were created and they will start writing to and reading from the **migration-utility** topic when it exists.* + +[Amazon MSK console](https://console.aws.amazon.com/msk/home) +* Select **MSK Clusters → Clusters**. +* Click the **`cfn-msk-cluster`** link. +* Select **View client information**. +![MSK cluster client information](./static/images/msk-cluster-client-information.png) +* Copy the **Private endpoint (single-VPC)** value. +![MSK cluster bootstrap servers](./static/images/msk-cluster-bootstrap-servers.png) + +### EC2 bastion host +* Execute the following command to create a `BOOTSTRAP_SERVER` environment variable based on the **Private endpoint (single-VPC)** value you copied above. +``` +echo 'export BOOTSTRAP_SERVER=""' >> ~/.bashrc +``` +* Execute the `~/.bashrc` file to set the environment variable. +``` +source ~/.bashrc +``` +* Create the `migration-utility` topic that will be used for the migration. **At this point the migration will begin.** +``` +kafka_2.13-4.0.0/bin/kafka-topics.sh \ +--create \ +--bootstrap-server $BOOTSTRAP_SERVER \ +--command-config kafka_2.13-4.0.0/config/client.properties \ +--replication-factor 3 \ +--partitions 15 \ +--topic migration-utility +``` +* You will see the following message if successful. +``` +created topic migration-utility. +``` +* List all topics in the cluster. +``` +kafka_2.13-4.0.0/bin/kafka-topics.sh \ +--list \ +--bootstrap-server $BOOTSTRAP_SERVER \ +--command-config kafka_2.13-4.0.0/config/client.properties +``` +* You will see the `migration-utility` topic and additional topics created by Amazon MSK. +``` +__amazon_msk_canary +__amazon_msk_connect_configs_cfn-couchbase-source-* +__amazon_msk_connect_configs_cfn-documentdb-sink-* +__consumer_offsets +migration-utility +``` +* Finally, describe the `migration-utility` topic. +``` +kafka_2.13-4.0.0/bin/kafka-topics.sh \ +--describe \ +--bootstrap-server $BOOTSTRAP_SERVER \ +--command-config kafka_2.13-4.0.0/config/client.properties \ +--topic migration-utility +``` +* Confirm that it has the specified number of partitions (15). +``` +Topic: migration-utility TopicId: xb7TyMzvSkiiRb329G9ThA PartitionCount: 15 ReplicationFactor: 3 Configs: message.format.version=3.0-IV1,min.insync.replicas=2,unclean.leader.election.enable=false,message.timestamp.after.max.ms=86400000,message.timestamp.before.max.ms=86400000,message.timestamp.difference.max.ms=86400000 + Topic: migration-utility Partition: 0 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 1 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 2 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 3 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 4 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 5 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 6 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 7 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 8 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 9 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 10 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 11 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 12 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 13 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Elr: N/A LastKnownElr: N/A + Topic: migration-utility Partition: 14 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Elr: N/A LastKnownElr: N/A +``` +* Confirm that the documents from the `beer-sample` Couchbase bucket exist in the `beer-sample.test` collection in the Amazon DocumentDB cluster. Use the `mongosh` command you used earlier when validating the connection to the cluster from the EC2 instance. +* Switch to the `beer-sample` database. +``` +use beer-sample +``` +* You will see that mongo shell has switched to the `beer-sample` database. +``` +switched to db beer-sample +``` +* Get a count of documents in the `test` collection. +``` +db.test.countDocuments() +``` +* It should match the number of the documents in the `beer-sample` source bucket. +``` +7303 +``` +* Find a specific document. +``` +db.test.find({"_id":"21st_amendment_brewery_cafe"}) +``` +* It should match the document in the `beer-sample` source bucket with the exception of the added `_id` field. +``` +[ + { + _id: '21st_amendment_brewery_cafe', + geo: { accuracy: 'ROOFTOP', lon: -122.393, lat: 37.7825 }, + country: 'United States', + website: 'http://www.21st-amendment.com/', + code: '94107', + address: [ '563 Second Street' ], + city: 'San Francisco', + phone: '1-415-369-0900', + name: '21st Amendment Brewery Cafe', + description: 'The 21st Amendment Brewery offers a variety of award winning house made brews and American grilled cuisine in a comfortable loft like setting. Join us before and after Giants baseball games in our outdoor beer garden. A great location for functions and parties in our semi-private Brewers Loft. See you soon at the 21A!', + state: 'California', + type: 'brewery', + updated: '2010-10-24 13:54:07' + } +] +``` +* Get distinct `type` values in the `test` collection. +``` +db.test.distinct("type") +``` +* It should match the document types in the `beer-sample` source bucket. +``` +[ 'beer', 'brewery' ] +``` +* Get count of `brewery` documents. +``` +db.test.find({"type":"brewery"}).count() +``` +* It should match the number of documents in the `beer-sample` source bucket with `"type": "brewery"`. +``` +1412 +``` +## Cleanup +[Amazon S3 console](https://console.aws.amazon.com/s3/home) +* Select **Amazon S3 → General purpose buckets**. +* Click the link for the S3 Bucket specified when deploying the CloudFormation templates. +* Select the **Objects** tab. +* **Objects** section: + * Select the following objects: + * `couchbase-kafka-connect-couchbase-4.2.8.zip` + * `docdb-custom-plugin.zip` + * `docdb-truststore.jks` +* Select **Delete**. +![S3 delete objects](./static/images/s3-delete-objects.png) +* **Specified objects** section: + * Confirm the following 3 objects are listed. +![S3 confirm specified objects](./static/images/s3-confirm-specified-objects.png) +* **Permanently delete objects?** section: + * Type **permanently delete**. + * Select **Delete objects**. + +### [EC2 console](https://console.aws.amazon.com/ec2/home) +* Select **Network & Security → Security Groups**. +* Find the **cfn-migration-security-group** security group and note the Security group ID. +![EC2 cfn-migration-security-group](./static/images/ec2-cfn-migration-security-group.png) +* Select the security group used for your Couchbase cluster and then select **Inbound rules**. +![EC2 select Couchbase security group inbound rules](./static/images/ec2-select-couchbase-security-group-inbound-rules.png) +* Select **Edit inbound rules**. +![EC2 edit Couchbase security group inbound rules](./static/images/ec2-edit-couchbase-security-group-inbound-rules.png) +* Find the **Inbound rule** that has a **Source matching** the **cfn-migration-security-group** id and select **Delete**. +![EC2 delete inbound rule](./static/images/ec2-delete-inbound-rule.png) +* Select **Save rules**. + +### [CloudFormation console](https://console.aws.amazon.com/cloudformation/home) +* Select **CloudFormation → Stacks**. +* Select the **migration-utility-connectors** stack. +* Select **Delete**. +* **Delete stack?** popup: + * Select **Delete**. +* It will take ~1 minute for the stack to fully delete +* Select **CloudFormation → Stacks**. +* Select the **migration-utility** stack. +* Select **Delete**. +* **Delete stack?** popup: + * Select **Delete**. +* It will take ~15 minutes for the stack to fully delete. + +## Other useful Kafka commands. +### Publish messages to a topic +``` +kafka_2.13-4.0.0/bin/kafka-console-producer.sh \ +--bootstrap-server $BOOTSTRAP_SERVER \ +--producer.config kafka_2.13-4.0.0/config/client.properties \ +--topic migration-utility \ +--property "parse.key=true" \ +--property "key.separator=:" +> msg1:{"msg": "this is a test"} +> msg2:{"src": "kafka-console-producer"} +``` +Press `ctrl-d` to exit. +### Consume messages from a topic +``` +kafka_2.13-4.0.0/bin/kafka-console-consumer.sh \ +--bootstrap-server $BOOTSTRAP_SERVER \ +--consumer.config kafka_2.13-4.0.0/config/client.properties \ +--topic migration-utility \ +--property print.key=true \ +--from-beginning +``` +Press `ctrl-c` to exit. +### Delete a topic. +``` +kafka_2.13-4.0.0/bin/kafka-topics.sh \ +--delete \ +--bootstrap-server $BOOTSTRAP_SERVER \ +--command-config kafka_2.13-4.0.0/config/client.properties \ +--topic migration-utility +``` diff --git a/migration/migration-utility-for-couchbase/migration-utility-connectors.yaml b/migration/migration-utility-for-couchbase/migration-utility-connectors.yaml new file mode 100644 index 0000000..3390195 --- /dev/null +++ b/migration/migration-utility-for-couchbase/migration-utility-connectors.yaml @@ -0,0 +1,222 @@ +Parameters: + DocumentDbSinkMcuCount: + Description: Each MCU provides 1 vCPU of compute and 4 GiB of memory. + Type: Number + AllowedValues: + - 1 + - 2 + - 4 + - 8 + DocumentDbSinkMcuWorkers: + Description: A worker is a Java virtual machine (JVM) connect process. There will be one worker per instance. + Type: Number + AllowedValues: + - 1 + - 2 + - 3 + - 4 + - 5 + - 6 + - 7 + - 8 + - 9 + - 10 + CouchbaseSourceMcuCount: + Description: Each MCU provides 1 vCPU of compute and 4 GiB of memory. + Type: Number + AllowedValues: + - 1 + - 2 + - 4 + - 8 + CouchbaseSourceMcuWorkers: + Description: A worker is a Java virtual machine (JVM) connect process. There will be one worker per instance. + Type: Number + AllowedValues: + - 1 + - 2 + - 3 + - 4 + - 5 + - 6 + - 7 + - 8 + - 9 + - 10 + S3BucketName: + Description: Name of general purpose S3 bucket to store connector resources (output from migration-utility.yaml). + Type: String + BootstrapServers: + Description: Comma-separated list of Amazon MSK cluster bootstrap servers. + Type: String + SecurityGroupId: + Description: EC2 security group ID used for all resources (output from migration-utility.yaml). + Type: String + PrivateSubnets: + # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-supplied-parameter-types.html#cloudformation-supplied-parameter-types-overview + Description: Select private subnets within the specified VPC. + Type: List + MigrationMSKRoleARN: + Description: ARN of IAM role that allows resources to call Amazon MSK on your behalf (output from migration-utility.yaml). + Type: String + LogGroupName: + Description: The name of the CloudWatch log group for Amazon MSK logs. + Type: String + Default: '/migration-utility/migration' + + +Resources: +# CloudWatch resources + MigrationLogGroup: + Type: AWS::Logs::LogGroup + Properties: + LogGroupClass: "STANDARD" + RetentionInDays: 5 + LogGroupName: !Ref LogGroupName + +# KafkaConnect resources + DocumentDBCustomPlugin: + Type: AWS::KafkaConnect::CustomPlugin + Properties: + ContentType: 'ZIP' + Description: 'Amazon DocumentDB plug-in.' + Location: + S3Location: + BucketArn: !Join + - '' + - - 'arn:aws:s3:::' + - !Ref S3BucketName + FileKey: "docdb-custom-plugin.zip" + Name: cfn-documentdb-plugin + + CouchbaseCustomPlugin: + Type: AWS::KafkaConnect::CustomPlugin + Properties: + ContentType: 'ZIP' + Description: 'Couchbase plug-in.' + Location: + S3Location: + BucketArn: !Join + - '' + - - 'arn:aws:s3:::' + - !Ref S3BucketName + FileKey: "couchbase-kafka-connect-couchbase-4.2.8.zip" + Name: cfn-couchbase-plugin + + DocumentDbSinkConnector: + DependsOn: + - DocumentDBCustomPlugin + - MigrationLogGroup + Type: AWS::KafkaConnect::Connector + Properties: + Capacity: + ProvisionedCapacity: + McuCount: !Ref DocumentDbSinkMcuCount + WorkerCount: !Ref DocumentDbSinkMcuWorkers + ConnectorConfiguration: + database: '' + collection: '' + connection.uri: '' + config.providers: 's3import,ssm,sm' + config.providers.s3import.class: 'com.amazonaws.kafka.config.providers.S3ImportConfigProvider' + config.providers.s3import.param.region: !Ref 'AWS::Region' + connection.ssl.truststore: !Join + - '' + - - "${s3import:" + - !Ref 'AWS::Region' + - ":" + - !Ref S3BucketName + - "/docdb-truststore.jks}" + connection.ssl.truststorePassword: 'password' + connector.class: 'com.mongodb.kafka.connect.MongoSinkConnector' + document.id.strategy: 'com.mongodb.kafka.connect.sink.processor.id.strategy.ProvidedInKeyStrategy' + document.id.strategy.overwrite.existing: 'true' + errors.tolerance: 'all' + key.converter: 'org.apache.kafka.connect.storage.StringConverter' + key.converter.schemas.enable: 'false' + max.batch.size: '100' + tasks.max: '15' + topics: 'migration-utility' + transforms: 'hk' + transforms.hk.field: '_id' + transforms.hk.type: 'org.apache.kafka.connect.transforms.HoistField$Key' + value.converter: 'org.apache.kafka.connect.json.JsonConverter' + value.converter.schemas.enable: 'false' + writemodel.strategy: 'com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneDefaultStrategy' + ConnectorName: 'cfn-documentdb-sink-connector' + KafkaCluster: + ApacheKafkaCluster: + BootstrapServers: !Ref BootstrapServers + Vpc: + SecurityGroups: + - !Ref SecurityGroupId + Subnets: !Split [',', !Join [',', !Ref PrivateSubnets]] + KafkaClusterClientAuthentication: + AuthenticationType: 'IAM' + KafkaClusterEncryptionInTransit: + EncryptionType: 'TLS' + KafkaConnectVersion: '2.7.1' + LogDelivery: + WorkerLogDelivery: + CloudWatchLogs: + Enabled: true + LogGroup: !Ref LogGroupName + Plugins: + - CustomPlugin: + CustomPluginArn: !GetAtt DocumentDBCustomPlugin.CustomPluginArn + Revision: !GetAtt DocumentDBCustomPlugin.Revision + ServiceExecutionRoleArn: !Ref MigrationMSKRoleARN + + CouchbaseSourceConnector: + DependsOn: + - CouchbaseCustomPlugin + - MigrationLogGroup + Type: AWS::KafkaConnect::Connector + Properties: + Capacity: + ProvisionedCapacity: + McuCount: !Ref CouchbaseSourceMcuCount + WorkerCount: !Ref CouchbaseSourceMcuWorkers + ConnectorConfiguration: + connector.class: 'com.couchbase.connect.kafka.CouchbaseSourceConnector' + couchbase.seed.nodes: '' + couchbase.bucket: '' + couchbase.username: '' + couchbase.password: '' + couchbase.bootstrap.timeout: '10s' + couchbase.enable.tls: 'false' + couchbase.env.timeout.kvTimeout: '10s' + couchbase.event.filter: 'com.couchbase.connect.kafka.filter.AllPassFilter' + couchbase.flow.control.buffer: '16m' + couchbase.log.document.lifecycle: 'false' + couchbase.persistence.polling.interval: '100ms' + couchbase.replicate.to: 'NONE' + couchbase.source.handler: 'com.couchbase.connect.kafka.handler.source.RawJsonSourceHandler' + couchbase.stream.from: 'SAVED_OFFSET_OR_BEGINNING' + couchbase.topic: 'migration-utility' + key.converter: 'org.apache.kafka.connect.storage.StringConverter' + value.converter: 'org.apache.kafka.connect.converters.ByteArrayConverter' + tasks.max: '15' + ConnectorName: 'cfn-couchbase-source-connector' + KafkaCluster: + ApacheKafkaCluster: + BootstrapServers: !Ref BootstrapServers + Vpc: + SecurityGroups: + - !Ref SecurityGroupId + Subnets: !Split [',', !Join [',', !Ref PrivateSubnets]] + KafkaClusterClientAuthentication: + AuthenticationType: 'IAM' + KafkaClusterEncryptionInTransit: + EncryptionType: 'TLS' + KafkaConnectVersion: '2.7.1' + LogDelivery: + WorkerLogDelivery: + CloudWatchLogs: + Enabled: true + LogGroup: !Ref LogGroupName + Plugins: + - CustomPlugin: + CustomPluginArn: !GetAtt CouchbaseCustomPlugin.CustomPluginArn + Revision: !GetAtt CouchbaseCustomPlugin.Revision + ServiceExecutionRoleArn: !Ref MigrationMSKRoleARN diff --git a/migration/migration-utility-for-couchbase/migration-utility.yaml b/migration/migration-utility-for-couchbase/migration-utility.yaml new file mode 100644 index 0000000..86abd9b --- /dev/null +++ b/migration/migration-utility-for-couchbase/migration-utility.yaml @@ -0,0 +1,306 @@ +Parameters: + Ec2KeyPairName: + Description: Name of EC2 key pair that will be used for EC2 instance. + Type: String + SshIngressIpAddress: + Description: Allow incoming SSH traffic to EC2 instance from this IP address (CIDR notation). + Type: String + AllowedPattern: (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2}) + VpcId: + # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-supplied-parameter-types.html#cloudformation-supplied-parameter-types-overview + Description: ID of an existing Virtual Private Cloud (VPC) where migration resources will be deployed. + Type: 'AWS::EC2::VPC::Id' + LatestAmiId: + # https://docs.aws.amazon.com/linux/al2023/ug/ec2.html#launch-from-cloudformation + Description: Image ID for EC2 instance (do not edit). + Type: 'AWS::SSM::Parameter::Value' + Default: '/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-6.1-x86_64' + PublicSubnetId: + # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-supplied-parameter-types.html#cloudformation-supplied-parameter-types-overview + Description: ID of an existing public subnet within the specified VPC. + Type: 'AWS::EC2::Subnet::Id' + PrivateSubnets: + # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-supplied-parameter-types.html#cloudformation-supplied-parameter-types-overview + Description: Select at least 3 private subnets within the specified VPC. + Type: List + DocumentdbUsername: + Description: Specify an alphanumeric string that defines the login ID for the user. + Type: String + MinLength: '1' + MaxLength: '63' + AllowedPattern: '[a-zA-Z][a-zA-Z0-9]*' + ConstraintDescription: Username must start with a letter and contain 1 to 63 characters + DocumentdbPassword: + NoEcho: 'true' + Description: Specify an alphanumeric string that defines the password for the user + Type: String + MinLength: '8' + MaxLength: '100' + AllowedPattern: '[a-zA-Z0-9]*' + ConstraintDescription: Password must contain 8 to 100 characters. + DocumentdbInstanceType: + Type: String + Default: "db.r6g.large" + AllowedValues: + - "db.r8g.large" + - "db.r8g.xlarge" + - "db.r8g.2xlarge" + - "db.r8g.4xlarge" + - "db.r8g.8xlarge" + - "db.r8g.12xlarge" + - "db.r8g.16xlarge" + - "db.r6g.large" + - "db.r6g.xlarge" + - "db.r6g.2xlarge" + - "db.r6g.4xlarge" + - "db.r6g.8xlarge" + - "db.r6g.12xlarge" + - "db.r6g.16xlarge" + - "db.r6gd.xlarge" + - "db.r6gd.2xlarge" + - "db.r6gd.4xlarge" + - "db.r6gd.8xlarge" + - "db.r6gd.12xlarge" + - "db.r6gd.16xlarge" + - "db.r5.large" + - "db.r5.xlarge" + - "db.r5.2xlarge" + - "db.r5.4xlarge" + - "db.r5.8xlarge" + - "db.r5.12xlarge" + - "db.r5.16xlarge" + - "db.r5.24xlarge" + - "db.r4.large" + - "db.r4.xlarge" + - "db.r4.2xlarge" + - "db.r4.4xlarge" + - "db.r4.8xlarge" + - "db.r4.12xlarge" + - "db.r4.16xlarge" + - "db.t4g.medium" + - "db.t3.medium" + S3BucketName: + Description: Name of existing general purpose S3 bucket to store connector resources. + Type: String + MskClusterBrokerInstanceType: + Description: MSK cluster broker size + Type: String + Default: "express.m7g.large" + AllowedValues: + - "express.m7g.large" + - "express.m7g.xlarge" + - "express.m7g.2xlarge" + - "express.m7g.4xlarge" + - "express.m7g.8xlarge" + - "express.m7g.12xlarge" + - "express.m7g.16xlarge" + MskBrokerNodes: + Description: The number of broker nodes in the cluster (multiple of the number of private subnets). + Type: Number + Default: 3 + +Resources: + # IAM Resources + MigrationMskPolicy: +# https://docs.aws.amazon.com/msk/latest/developerguide/create-iam-access-control-policies.html + Type: 'AWS::IAM::ManagedPolicy' + Properties: + ManagedPolicyName: 'cfn-migration-msk-policy' + PolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: 'kafka-cluster:*' + Resource: !Join + - '' + - - 'arn:aws:kafka:' + - !Ref 'AWS::Region' + - ':' + - !Ref 'AWS::AccountId' + - ':*/cfn-msk-cluster/*' + MigrationKafkaConnectPolicy: +# https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonmanagedstreamingforkafkaconnect.html + Type: 'AWS::IAM::ManagedPolicy' + Properties: + ManagedPolicyName: 'cfn-migration-kafkaconnect-policy' + PolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: 'kafkaconnect:CreateCustomPlugin' + Resource: !Join + - '' + - - 'arn:aws:kafka:' + - !Ref 'AWS::Region' + - ':' + - !Ref 'AWS::AccountId' + - ':custom-plugin/*/*' + MigrationMskRole: + Type: 'AWS::IAM::Role' + Properties: + RoleName: 'cfn-migration-msk-role' + ManagedPolicyArns: + - "arn:aws:iam::aws:policy/AmazonS3FullAccess" + - Ref: "MigrationMskPolicy" + MaxSessionDuration: 3600 + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: 'Allow' + Action: 'sts:AssumeRole' + Principal: + Service: + - kafkaconnect.amazonaws.com + MigrationEc2MskRole: + Type: 'AWS::IAM::Role' + Properties: + RoleName: 'cfn-ec2-msk-role' + Description: "Allows EC2 instances to call MSK & Kafka Connect on your behalf." + ManagedPolicyArns: + - "arn:aws:iam::aws:policy/AmazonS3FullAccess" + - Ref: "MigrationMskPolicy" + - Ref: "MigrationKafkaConnectPolicy" + MaxSessionDuration: 3600 + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: 'Allow' + Action: 'sts:AssumeRole' + Principal: + Service: + - ec2.amazonaws.com + MigrationEc2InstanceProfile: + DependsOn: MigrationEc2MskRole + Type: 'AWS::IAM::InstanceProfile' + Properties: + InstanceProfileName: 'cfn-ec2-instance-profile' + Roles: + - !Ref MigrationEc2MskRole + + # EC2 resources + MigrationEc2SecurityGroup: + Type: "AWS::EC2::SecurityGroup" + Properties: + GroupDescription: "security group for Couchbase to DocumentDB migration resources" + GroupName: "cfn-migration-security-group" + VpcId: !Ref VpcId + SecurityGroupIngress: + - IpProtocol: tcp + FromPort: 22 + ToPort: 22 + CidrIp: !Ref SshIngressIpAddress + MigrationEc2SecurityGroupIngress: + Type: 'AWS::EC2::SecurityGroupIngress' + DependsOn: MigrationEc2SecurityGroup + Properties: + GroupId: !Ref MigrationEc2SecurityGroup + IpProtocol: "-1" + FromPort: -1 + ToPort: -1 + SourceSecurityGroupId: !Ref MigrationEc2SecurityGroup + MigrationEc2Instance: + DependsOn: MigrationEc2InstanceProfile + Type: AWS::EC2::Instance + Properties: + IamInstanceProfile: !Ref MigrationEc2InstanceProfile + KeyName: !Ref Ec2KeyPairName + ImageId: !Ref LatestAmiId + InstanceType: t2.micro + Tags: + - Key: Name + Value: cfn-msk-ec2 + NetworkInterfaces: + - AssociatePublicIpAddress: "true" + DeviceIndex: "0" + GroupSet: + - Ref: MigrationEc2SecurityGroup + SubnetId: !Ref PublicSubnetId + UserData: + Fn::Base64: + !Sub | + #!/bin/bash + cd /home/ec2-user + + echo "downloading createTruststore.sh from https://raw.githubusercontent.com/awslabs/amazon-documentdb-tools/refs/heads/master/migration/migration-utility-for-couchbase/static/scripts/createTruststore.sh ..." >> setup.log + wget https://raw.githubusercontent.com/awslabs/amazon-documentdb-tools/refs/heads/master/migration/migration-utility-for-couchbase/static/scripts/createTruststore.sh -O createTruststore.sh 2> setup.log + + echo "making createTruststore.sh executable ..." >> setup.log + chmod 755 createTruststore.sh + + echo "downloading setup.sh from https://raw.githubusercontent.com/awslabs/amazon-documentdb-tools/refs/heads/master/migration/migration-utility-for-couchbase/static/scripts/setup.sh ..." >> setup.log + wget https://raw.githubusercontent.com/awslabs/amazon-documentdb-tools/refs/heads/master/migration/migration-utility-for-couchbase/static/scripts/setup.sh -O setup.sh 2> setup.log + + echo "making setup.sh executable ..." >> setup.log + chmod 755 setup.sh + + echo "running setup.sh ${S3BucketName} ${AWS::Region} ..." >> setup.log + ./setup.sh ${S3BucketName} ${AWS::Region} + +# Amazon DocumentDB resources + MigrationDocumentDBParameterGroup: + Type: AWS::DocDB::DBClusterParameterGroup + Properties: + Description: '5.0 custom parameter group with collection compression enabled' + Family: 'docdb5.0' + Name: 'cfn-migration-parameter-group' + Parameters: + default_collection_compression: "enabled" + MigrationDocumentDBSubnetGroup: + Type: AWS::DocDB::DBSubnetGroup + Properties: + DBSubnetGroupName: 'cfn-migration-subnet-group' + DBSubnetGroupDescription: 'Private subnet group for Amazon DocumentDB' + SubnetIds: !Split [',', !Join [',', !Ref PrivateSubnets]] + MigrationDocumentDBCluster: + DependsOn: + - MigrationDocumentDBParameterGroup + - MigrationDocumentDBSubnetGroup + - MigrationEc2SecurityGroup + Type: AWS::DocDB::DBCluster + Properties: + DBClusterIdentifier: 'cfn-documentdb-target' + DBClusterParameterGroupName: !Ref MigrationDocumentDBParameterGroup + DBSubnetGroupName: !Ref MigrationDocumentDBSubnetGroup + MasterUsername: !Ref DocumentdbUsername + MasterUserPassword: !Ref DocumentdbPassword + VpcSecurityGroupIds: + - !GetAtt MigrationEc2SecurityGroup.GroupId + MigrationDocumentDBPrimaryInstance: + DependsOn: MigrationDocumentDBCluster + Type: AWS::DocDB::DBInstance + Properties: + DBClusterIdentifier: !Ref MigrationDocumentDBCluster + DBInstanceClass: !Ref DocumentdbInstanceType + DBInstanceIdentifier: cfn-primary-instance + EnablePerformanceInsights: true + +# MSK resources + # not including custom plugins as there is a dependency on execution of EC2 user data script + MigrationMSKCluster: + Type: AWS::MSK::Cluster + Properties: + BrokerNodeGroupInfo: + SecurityGroups: + - !GetAtt MigrationEc2SecurityGroup.GroupId + ClientSubnets: !Split [',', !Join [',', !Ref PrivateSubnets]] + InstanceType: !Ref MskClusterBrokerInstanceType + ClientAuthentication: + Sasl: + Iam: + Enabled: true + ClusterName: cfn-msk-cluster + KafkaVersion: 3.6.0 + NumberOfBrokerNodes: !Ref MskBrokerNodes + +Outputs: + SecurityGroupId: + Description: Security group ID for migration resources. + Value: !Ref MigrationEc2SecurityGroup + + MigrationMSKRoleARN: + Description: ARN of migration MSK IAM role. + Value: !GetAtt MigrationMskRole.Arn + + S3BucketName: + Description: Name of general purpose S3 bucket to store connector resources. + Value: !Ref S3BucketName diff --git a/migration/migration-utility-for-couchbase/static/images/amazon-documentdb-connect-with-an-application.png b/migration/migration-utility-for-couchbase/static/images/amazon-documentdb-connect-with-an-application.png new file mode 100644 index 0000000..d98f927 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/amazon-documentdb-connect-with-an-application.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/amazon-documentdb-connect-with-mongo-shell.png b/migration/migration-utility-for-couchbase/static/images/amazon-documentdb-connect-with-mongo-shell.png new file mode 100644 index 0000000..ff97c87 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/amazon-documentdb-connect-with-mongo-shell.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/amazon-documentdb-connectivity-and-security.png b/migration/migration-utility-for-couchbase/static/images/amazon-documentdb-connectivity-and-security.png new file mode 100644 index 0000000..f718dc1 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/amazon-documentdb-connectivity-and-security.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/cloudformation-migration-utility-output.png b/migration/migration-utility-for-couchbase/static/images/cloudformation-migration-utility-output.png new file mode 100644 index 0000000..30b0fd7 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/cloudformation-migration-utility-output.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/ec2-cfn-migration-security-group.png b/migration/migration-utility-for-couchbase/static/images/ec2-cfn-migration-security-group.png new file mode 100644 index 0000000..5ca7b57 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/ec2-cfn-migration-security-group.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/ec2-copy-ssh-command.png b/migration/migration-utility-for-couchbase/static/images/ec2-copy-ssh-command.png new file mode 100644 index 0000000..3689d40 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/ec2-copy-ssh-command.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/ec2-delete-inbound-rule.png b/migration/migration-utility-for-couchbase/static/images/ec2-delete-inbound-rule.png new file mode 100644 index 0000000..9732380 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/ec2-delete-inbound-rule.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/ec2-edit-couchbase-security-group-inbound-rules.png b/migration/migration-utility-for-couchbase/static/images/ec2-edit-couchbase-security-group-inbound-rules.png new file mode 100644 index 0000000..7d3bd40 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/ec2-edit-couchbase-security-group-inbound-rules.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/ec2-select-couchbase-security-group-inbound-rules.png b/migration/migration-utility-for-couchbase/static/images/ec2-select-couchbase-security-group-inbound-rules.png new file mode 100644 index 0000000..a802e1a Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/ec2-select-couchbase-security-group-inbound-rules.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/msk-cluster-bootstrap-servers.png b/migration/migration-utility-for-couchbase/static/images/msk-cluster-bootstrap-servers.png new file mode 100644 index 0000000..9ba90ba Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/msk-cluster-bootstrap-servers.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/msk-cluster-client-information.png b/migration/migration-utility-for-couchbase/static/images/msk-cluster-client-information.png new file mode 100644 index 0000000..ee4896f Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/msk-cluster-client-information.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/s3-confirm-specified-objects.png b/migration/migration-utility-for-couchbase/static/images/s3-confirm-specified-objects.png new file mode 100644 index 0000000..d7fb9ce Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/s3-confirm-specified-objects.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/s3-delete-objects.png b/migration/migration-utility-for-couchbase/static/images/s3-delete-objects.png new file mode 100644 index 0000000..eae9249 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/s3-delete-objects.png differ diff --git a/migration/migration-utility-for-couchbase/static/images/solution-overview.png b/migration/migration-utility-for-couchbase/static/images/solution-overview.png new file mode 100644 index 0000000..ff5a4e3 Binary files /dev/null and b/migration/migration-utility-for-couchbase/static/images/solution-overview.png differ diff --git a/migration/migration-utility-for-couchbase/static/scripts/createTruststore.sh b/migration/migration-utility-for-couchbase/static/scripts/createTruststore.sh new file mode 100755 index 0000000..29f4309 --- /dev/null +++ b/migration/migration-utility-for-couchbase/static/scripts/createTruststore.sh @@ -0,0 +1,20 @@ +truststore=/home/ec2-user/docdb-truststore.jks +storepassword=password + +curl -sS "https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem" > /home/ec2-user/global-bundle.pem +awk 'split_after == 1 {n++;split_after=0} /-----END CERTIFICATE-----/ {split_after=1}{print > "rds-ca-" n ".pem"}' < /home/ec2-user/global-bundle.pem + +for CERT in rds-ca-*; do + alias=$(openssl x509 -noout -text -in $CERT | perl -ne 'next unless /Subject:/; s/.*(CN=|CN = )//; print') + echo "Importing $alias" >> setup.log + keytool -import -file ${CERT} -alias "${alias}" -storepass ${storepassword} -keystore ${truststore} -noprompt + rm $CERT +done + +echo "Trust store content is: " + +keytool -list -v -keystore "$truststore" -storepass ${storepassword} | grep Alias | cut -d " " -f3- | while read alias +do + expiry=`keytool -list -v -keystore "$truststore" -storepass ${storepassword} -alias "${alias}" | grep Valid | perl -ne 'if(/until: (.*?)\n/) { print "$1\n"; }'` + echo " Certificate ${alias} expires in '$expiry'" >> setup.log +done diff --git a/migration/migration-utility-for-couchbase/static/scripts/setup.sh b/migration/migration-utility-for-couchbase/static/scripts/setup.sh new file mode 100644 index 0000000..5c155bd --- /dev/null +++ b/migration/migration-utility-for-couchbase/static/scripts/setup.sh @@ -0,0 +1,78 @@ +# install Java +echo "installing java-21-amazon-corretto-devel ..." >> setup.log +sudo yum install -y -v java-21-amazon-corretto-devel >> setup.log + +echo "copying cacerts to kafka_truststore.jks ..." >> setup.log +cp /usr/lib/jvm/java-21-amazon-corretto.x86_64/lib/security/cacerts kafka_truststore.jks + +# Couchbase connector +echo "downloading couchbase-kafka-connect-couchbase-4.2.8.zip ..." >> setup.log +wget https://packages.couchbase.com/clients/kafka/4.2.8/couchbase-kafka-connect-couchbase-4.2.8.zip + +echo "copying couchbase-kafka-connect-couchbase-4.2.8.zip to s3://$1 ..." >> setup.log +aws s3 cp couchbase-kafka-connect-couchbase-4.2.8.zip s3://$1 + +# Amazon DocumentDB connector +echo "create directories for Amazon DocumentDB custom plugin ..." >> setup.log +cd /home/ec2-user +mkdir -p docdb-custom-plugin +mkdir -p docdb-custom-plugin/mongo-connector +mkdir -p docdb-custom-plugin/msk-config-providers + +echo "downloading mongo-kafka-connect-1.15.0-all.jar ..." >> setup.log +cd /home/ec2-user/docdb-custom-plugin/mongo-connector +wget https://repo1.maven.org/maven2/org/mongodb/kafka/mongo-kafka-connect/1.15.0/mongo-kafka-connect-1.15.0-all.jar + +echo "downloading msk-config-providers-0.3.1-with-dependencies.zip ..." >> /home/ec2-user/setup.log +cd /home/ec2-user/docdb-custom-plugin/msk-config-providers +wget https://github.com/aws-samples/msk-config-providers/releases/download/r0.3.1/msk-config-providers-0.3.1-with-dependencies.zip + +echo "unzipping msk-config-providers-0.3.1-with-dependencies.zip ..." >> /home/ec2-user/setup.log +unzip msk-config-providers-0.3.1-with-dependencies.zip + +echo "deleting msk-config-providers-0.3.1-with-dependencies.zip ..." >> /home/ec2-user/setup.log +rm msk-config-providers-0.3.1-with-dependencies.zip + +echo "creating docdb-custom-plugin.zip ..." >> /home/ec2-user/setup.log +cd /home/ec2-user +zip -r docdb-custom-plugin.zip docdb-custom-plugin + +echo "copying docdb-custom-plugin.zip to s3://$1 ..." >> setup.log +aws s3 cp docdb-custom-plugin.zip s3://$1 + +# Kafka +echo "downloading kafka_2.13-4.0.0.tgz ..." >> setup.log +wget https://dlcdn.apache.org/kafka/4.0.0/kafka_2.13-4.0.0.tgz + +echo "extracting kafka_2.13-4.0.0.tgz ..." >> setup.log +tar -xzf kafka_2.13-4.0.0.tgz + +# AWS MSK IAM auth +echo "downloading aws-msk-iam-auth-2.3.2-all.jar ..." >> setup.log +wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar + +echo "copying aws-msk-iam-auth-2.3.2-all.jar to kafka_2.13-4.0.0/libs/. ..." >> setup.log +cp aws-msk-iam-auth-2.3.2-all.jar kafka_2.13-4.0.0/libs/. + +# Mongo shell +echo "installing mongodb-mongosh-shared-openssl3 ..." >> setup.log +echo -e "[mongodb-org-5.0] \nname=MongoDB Repository\nbaseurl=https://repo.mongodb.org/yum/amazon/2023/mongodb-org/5.0/x86_64/\ngpgcheck=1 \nenabled=1 \ngpgkey=https://pgp.mongodb.com/server-5.0.asc" | sudo tee /etc/yum.repos.d/mongodb-org-5.0.repo +sudo yum install -y -v mongodb-mongosh-shared-openssl3 >> setup.log + +# create Amazon DocumentDB trust store +echo "executing createTruststore.sh ..." >> setup.log +./createTruststore.sh + +echo "copying docdb-truststore.jks to s3://$1 ..." >> setup.log +aws s3 cp /home/ec2-user/docdb-truststore.jks s3://$1 + +# create Kafka client properties file +echo "creating /home/ec2-user/kafka_2.13-4.0.0/config/client.properties ..." >> setup.log +echo "ssl.truststore.location=/home/ec2-user/kafka_truststore.jks" >> kafka_2.13-4.0.0/config/client.properties +echo "security.protocol=SASL_SSL" >> kafka_2.13-4.0.0/config/client.properties +echo "sasl.mechanism=AWS_MSK_IAM " >> kafka_2.13-4.0.0/config/client.properties +echo "sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;" >> kafka_2.13-4.0.0/config/client.properties +echo "sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler" >> kafka_2.13-4.0.0/config/client.properties + +# setup complete +echo "setup complete ..." >> setup.log diff --git a/migration/migrator/.gitignore b/migration/migrator/.gitignore index a44efc8..98a529f 100644 --- a/migration/migrator/.gitignore +++ b/migration/migrator/.gitignore @@ -1,2 +1,4 @@ rds-combined-ca-bundle.pem doit.bash +doit-fl.bash +doit-cdc.bash diff --git a/migration/migrator/README.md b/migration/migrator/README.md index ae674b1..426d344 100644 --- a/migration/migrator/README.md +++ b/migration/migrator/README.md @@ -1,16 +1,20 @@ -# Amazon DocumentDB Change Data Capture (CDC) Synchronization Tool -This synchronization tool enables high-speed CDC from a MongoDB source database to an Amazon DocumentDB target database. +# Amazon DocumentDB Full Load and Change Data Capture (CDC) Synchronization Tool +This synchronization tool enables high-speed Full Load and CDC from a MongoDB/DocumentDB source database to an Amazon DocumentDB target database. + +The full load script requires "boundaries" for parallelism, you can run the [dms-segments tool](https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/dms-segments) to calculate them. ## Installation Clone the repository. ## Requirements * Python 3.7+ -* PyMongo +* PyMongo, boto3 + * IAM permission "cloudwatch:PutMetricData" is required to create CloudWatch metrics + -## Using the tool +## Using the change data capture tool ``` -python3 cdc-multiprocess.py --source-uri --target-uri --source-namespace --start-position [0 or YYYY-MM-DD+HH:MM:SS in UTC] --use-[oplog|change-stream] +python3 cdc-multiprocess.py --source-uri --target-uri --source-namespace --start-position [0 or YYYY-MM-DD+HH:MM:SS in UTC] --use-[oplog|change-stream] [--create-cloudwatch-metrics] [--cluster-name ] ``` * source-uri and target-uri follow the [MongoDB Connection String URI Format](https://www.mongodb.com/docs/manual/reference/connection-string/) @@ -19,3 +23,19 @@ python3 cdc-multiprocess.py --source-uri --target-uri * must pass either --use-oplog for oplog to be source (MongoDB only) or --use-change-stream to use change streams for source (MongoDB or DocumentDB) * optionally pass 2+ for the --threads option to process the oplog with concurrent processes * several other optional parameters as supported, execute the script with -h for a full listing +* include --create-cloudwatch-metrics to create metrics for the number of CDC operations per second and the number of seconds behind current + * CloudWatch metrics are captured in namespace "CustomDocDB" as "MigratorCDCOperationsPerSecond" and "MigratorCDCNumSecondsBehind" +* include --cluster-name if capturing CloudWatch metrics via --create-cloudwatch-metrics + +## Using the full load tool +``` +python3 fl-multiprocess.py --source-uri --target-uri --source-namespace --boundaries [--create-cloudwatch-metrics] [--cluster-name ] +``` + +* source-uri and target-uri follow the [MongoDB Connection String URI Format](https://www.mongodb.com/docs/manual/reference/connection-string/) +* source-namespace and target-namespace in database.collection format (i.e. "database1.collection2") +* pass --boundary-datatype for string or int for _id boundaries that are not objectid type +* several other optional parameters as supported, execute the script with -h for a full listing +* include --create-cloudwatch-metrics to create metrics for the number of inserts per second and the approximate number of seconds to done + * CloudWatch metrics are captured in namespace "CustomDocDB" as "MigratorFLInsertsPerSecond" and "MigratorFLRemainingSeconds" +* include --cluster-name if capturing CloudWatch metrics via --create-cloudwatch-metrics diff --git a/migration/migrator/cdc-multiprocess-readahead.py b/migration/migrator/cdc-multiprocess-readahead.py new file mode 100644 index 0000000..5e2fb3d --- /dev/null +++ b/migration/migrator/cdc-multiprocess-readahead.py @@ -0,0 +1,860 @@ +from datetime import datetime, timedelta +import os +import sys +import time +import pymongo +from bson.timestamp import Timestamp +import threading +import multiprocessing as mp +import hashlib +import argparse +import boto3 +import warnings + + +def logIt(threadnum, message): + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' + print("[{}] thread {:>3d} | {}".format(logTimeStamp,threadnum,message)) + + +def oplog_processor(threadnum, appConfig, perfQ): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + if appConfig['verboseLogging']: + logIt(threadnum,'thread started') + + c = pymongo.MongoClient(host=appConfig["sourceUri"],appname='migrcdc') + oplog = c.local.oplog.rs + + destConnection = pymongo.MongoClient(host=appConfig["targetUri"],appname='migrcdc') + destDatabase = destConnection[appConfig["targetNs"].split('.',1)[0]] + destCollection = destDatabase[appConfig["targetNs"].split('.',1)[1]] + + ''' + i = insert + u = update + d = delete + c = command + db = database + n = no-op + ''' + + startTime = time.time() + lastFeedback = time.time() + lastBatch = time.time() + + allDone = False + threadOplogEntries = 0 + + bulkOpList = [] + + # list with replace, not insert, in case document already exists (replaying old oplog) + bulkOpListReplace = [] + numCurrentBulkOps = 0 + + numTotalBatches = 0 + + printedFirstTs = False + myCollectionOps = 0 + + # starting timestamp + endTs = appConfig["startTs"] + + while not allDone: + if appConfig['verboseLogging']: + logIt(threadnum,"Creating oplog tailing cursor for timestamp {}".format(endTs.as_datetime())) + + cursor = oplog.find({'ts': {'$gte': endTs},'ns':appConfig["sourceNs"]},cursor_type=pymongo.CursorType.TAILABLE_AWAIT,oplog_replay=True) + + while cursor.alive and not allDone: + for doc in cursor: + # check if time to exit + if ((time.time() - startTime) > appConfig['durationSeconds']) and (appConfig['durationSeconds'] != 0): + allDone = True + break + + endTs = doc['ts'] + + # NOTE: Python's non-deterministic hash() cannot be used as it is seeded at startup, since this code is multiprocessing we need all hash calls to be the same between processes + # hash(str(doc['o']['_id'])) + if (((doc['op'] in ['i','d']) and (doc['ns'] == appConfig["sourceNs"]) and ((int(hashlib.sha512(str(doc['o']['_id']).encode('utf-8')).hexdigest(), 16) % appConfig["numProcessingThreads"]) == threadnum)) or + ((doc['op'] in ['u']) and (doc['ns'] == appConfig["sourceNs"]) and ((int(hashlib.sha512(str(doc['o2']['_id']).encode('utf-8')).hexdigest(), 16) % appConfig["numProcessingThreads"]) == threadnum))): + # this is for my thread + + threadOplogEntries += 1 + + if (not printedFirstTs) and (doc['op'] in ['i','u','d']) and (doc['ns'] == appConfig["sourceNs"]): + if appConfig['verboseLogging']: + logIt(threadnum,'first timestamp = {} aka {}'.format(doc['ts'],doc['ts'].as_datetime())) + printedFirstTs = True + + if (doc['op'] == 'i'): + # insert + if (doc['ns'] == appConfig["sourceNs"]): + myCollectionOps += 1 + bulkOpList.append(pymongo.InsertOne(doc['o'])) + # if playing old oplog, need to change inserts to be replaces (the inserts will fail due to _id uniqueness) + bulkOpListReplace.append(pymongo.ReplaceOne({'_id':doc['o']['_id']},doc['o'],upsert=True)) + numCurrentBulkOps += 1 + else: + pass + + elif (doc['op'] == 'u'): + # update + if (doc['ns'] == appConfig["sourceNs"]): + myCollectionOps += 1 + # field "$v" is not present in MongoDB 3.4 + doc['o'].pop('$v',None) + bulkOpList.append(pymongo.UpdateOne(doc['o2'],doc['o'],upsert=False)) + # if playing old oplog, need to change inserts to be replaces (the inserts will fail due to _id uniqueness) + bulkOpListReplace.append(pymongo.UpdateOne(doc['o2'],doc['o'],upsert=False)) + numCurrentBulkOps += 1 + else: + pass + + elif (doc['op'] == 'd'): + # delete + if (doc['ns'] == appConfig["sourceNs"]): + myCollectionOps += 1 + bulkOpList.append(pymongo.DeleteOne(doc['o'])) + # if playing old oplog, need to change inserts to be replaces (the inserts will fail due to _id uniqueness) + bulkOpListReplace.append(pymongo.DeleteOne(doc['o'])) + numCurrentBulkOps += 1 + else: + pass + + elif (doc['op'] == 'c'): + # command + pass + + elif (doc['op'] == 'n'): + # no-op + pass + + else: + print(doc) + sys.exit(1) + + if ((numCurrentBulkOps >= appConfig["maxOperationsPerBatch"]) or (time.time() >= (lastBatch + appConfig["maxSecondsBetweenBatches"]))) and (numCurrentBulkOps > 0): + if not appConfig['dryRun']: + try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + except: + # replace inserts as replaces + result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"endts":endTs,"processNum":threadnum}) + bulkOpList = [] + bulkOpListReplace = [] + numCurrentBulkOps = 0 + numTotalBatches += 1 + lastBatch = time.time() + + if ((numCurrentBulkOps >= appConfig["maxOperationsPerBatch"]) or (time.time() >= (lastBatch + appConfig["maxSecondsBetweenBatches"]))) and (numCurrentBulkOps > 0): + if not appConfig['dryRun']: + try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + except: + # replace inserts as replaces + result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"endts":endTs,"processNum":threadnum}) + bulkOpList = [] + bulkOpListReplace = [] + numCurrentBulkOps = 0 + numTotalBatches += 1 + lastBatch = time.time() + + # nothing arrived in the oplog for 1 second, pause before trying again + time.sleep(1) + + if (numCurrentBulkOps > 0): + if not appConfig['dryRun']: + try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + except: + # replace inserts as replaces + result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"endts":endTs,"processNum":threadnum}) + bulkOpList = [] + bulkOpListReplace = [] + numCurrentBulkOps = 0 + numTotalBatches += 1 + + c.close() + destConnection.close() + + perfQ.put({"name":"processCompleted","processNum":threadnum}) + + +def change_stream_processor(threadnum, appConfig, perfQ): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + if appConfig['verboseLogging']: + logIt(threadnum,'thread started') + + sourceConnection = pymongo.MongoClient(host=appConfig["sourceUri"],appname='migrcdc') + sourceDb = sourceConnection[appConfig["sourceNs"].split('.',1)[0]] + sourceColl = sourceDb[appConfig["sourceNs"].split('.',1)[1]] + + destConnection = pymongo.MongoClient(host=appConfig["targetUri"],appname='migrcdc') + destDatabase = destConnection[appConfig["targetNs"].split('.',1)[0]] + destCollection = destDatabase[appConfig["targetNs"].split('.',1)[1]] + + startTime = time.time() + lastFeedback = time.time() + lastBatch = time.time() + + allDone = False + threadOplogEntries = 0 + perfReportInterval = 1 + nextPerfReportTime = time.time() + perfReportInterval + + bulkOpList = [] + + # list with replace, not insert, in case document already exists (replaying old oplog) + bulkOpListReplace = [] + numCurrentBulkOps = 0 + numReportBulkOps = 0 + + numTotalBatches = 0 + + printedFirstTs = False + myCollectionOps = 0 + + # starting timestamp + endTs = appConfig["startTs"] + + if (appConfig["startTs"] == "RESUME_TOKEN"): + stream = sourceColl.watch(resume_after={'_data': appConfig["startPosition"]}, full_document='updateLookup', pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},{'$project':{'updateDescription':0}}]) + else: + stream = sourceColl.watch(start_at_operation_time=endTs, full_document='updateLookup', pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},{'$project':{'updateDescription':0}}]) + + if appConfig['verboseLogging']: + if (appConfig["startTs"] == "RESUME_TOKEN"): + logIt(threadnum,"Creating change stream cursor for resume token {}".format(appConfig["startPosition"])) + else: + logIt(threadnum,"Creating change stream cursor for timestamp {}".format(endTs.as_datetime())) + + while not allDone: + for change in stream: + # check if time to exit + if ((time.time() - startTime) > appConfig['durationSeconds']) and (appConfig['durationSeconds'] != 0): + allDone = True + break + + endTs = change['clusterTime'] + resumeToken = change['_id']['_data'] + thisNs = change['ns']['db']+'.'+change['ns']['coll'] + thisOp = change['operationType'] + + # NOTE: Python's non-deterministic hash() cannot be used as it is seeded at startup, since this code is multiprocessing we need all hash calls to be the same between processes + # hash(str(doc['o']['_id'])) + #if ((thisOp in ['insert','update','replace','delete']) and + # (thisNs == appConfig["sourceNs"]) and + if ((int(hashlib.sha512(str(change['documentKey']).encode('utf-8')).hexdigest(), 16) % appConfig["numProcessingThreads"]) == threadnum): + # this is for my thread + + threadOplogEntries += 1 + + if (not printedFirstTs) and (thisOp in ['insert','update','replace','delete']) and (thisNs == appConfig["sourceNs"]): + if appConfig['verboseLogging']: + logIt(threadnum,'first timestamp = {} aka {}'.format(change['clusterTime'],change['clusterTime'].as_datetime())) + printedFirstTs = True + + if (thisOp == 'insert'): + # insert + if (thisNs == appConfig["sourceNs"]): + myCollectionOps += 1 + bulkOpList.append(pymongo.InsertOne(change['fullDocument'])) + # if playing old oplog, need to change inserts to be replaces (the inserts will fail due to _id uniqueness) + #bulkOpListReplace.append(pymongo.ReplaceOne({'_id':change['documentKey']},change['fullDocument'],upsert=True)) + bulkOpListReplace.append(pymongo.ReplaceOne(change['documentKey'],change['fullDocument'],upsert=True)) + numCurrentBulkOps += 1 + else: + pass + + elif (thisOp in ['update','replace']): + # update/replace + if (change['fullDocument'] is not None): + if (thisNs == appConfig["sourceNs"]): + myCollectionOps += 1 + #bulkOpList.append(pymongo.ReplaceOne({'_id':change['documentKey']},change['fullDocument'],upsert=True)) + bulkOpList.append(pymongo.ReplaceOne(change['documentKey'],change['fullDocument'],upsert=True)) + # if playing old oplog, need to change inserts to be replaces (the inserts will fail due to _id uniqueness) + #bulkOpListReplace.append(pymongo.ReplaceOne({'_id':change['documentKey']},change['fullDocument'],upsert=True)) + bulkOpListReplace.append(pymongo.ReplaceOne(change['documentKey'],change['fullDocument'],upsert=True)) + numCurrentBulkOps += 1 + else: + pass + + elif (thisOp == 'delete'): + # delete + if (thisNs == appConfig["sourceNs"]): + myCollectionOps += 1 + bulkOpList.append(pymongo.DeleteOne({'_id':change['documentKey']['_id']})) + # if playing old oplog, need to change inserts to be replaces (the inserts will fail due to _id uniqueness) + bulkOpListReplace.append(pymongo.DeleteOne({'_id':change['documentKey']['_id']})) + numCurrentBulkOps += 1 + else: + pass + + elif (thisOp in ['drop','rename','dropDatabase','invalidate']): + # operations we do not track + pass + + else: + print(change) + sys.exit(1) + + if time.time() > nextPerfReportTime: + nextPerfReportTime = time.time() + perfReportInterval + perfQ.put({"name":"batchCompleted","operations":numReportBulkOps,"endts":endTs,"processNum":threadnum,"resumeToken":resumeToken}) + numReportBulkOps = 0 + + if ((numCurrentBulkOps >= appConfig["maxOperationsPerBatch"]) or (time.time() >= (lastBatch + appConfig["maxSecondsBetweenBatches"]))) and (numCurrentBulkOps > 0): + if not appConfig['dryRun']: + try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + except: + # replace inserts as replaces + result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + + bulkOpList = [] + bulkOpListReplace = [] + numReportBulkOps += numCurrentBulkOps + numCurrentBulkOps = 0 + numTotalBatches += 1 + lastBatch = time.time() + + # nothing arrived in the oplog for 1 second, pause before trying again + #time.sleep(1) + + if (numCurrentBulkOps > 0): + if not appConfig['dryRun']: + try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + except: + # replace inserts as replaces + result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"endts":endTs,"processNum":threadnum,"resumeToken":resumeToken}) + bulkOpList = [] + bulkOpListReplace = [] + numCurrentBulkOps = 0 + numTotalBatches += 1 + + sourceConnection.close() + destConnection.close() + + perfQ.put({"name":"processCompleted","processNum":threadnum}) + + +def readahead_worker(threadnum, appConfig, perfQ): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + sourceNs = appConfig['sourceNs'] + tempFileName = "{}.tempfile".format(sourceNs) + readaheadMaximumAhead = appConfig['readaheadMaximumAhead'] + + if appConfig['verboseLogging']: + logIt(threadnum,'READAHEAD | process started') + + numReadaheadWorkers = appConfig['numReadaheadWorkers'] + readaheadChunkSeconds = appConfig['readaheadChunkSeconds'] + readaheadJumpSeconds = numReadaheadWorkers * readaheadChunkSeconds + readaheadTimeDelta = timedelta(seconds=readaheadJumpSeconds) + + usableThreadNum = threadnum - appConfig['numProcessingThreads'] + + sourceConnection = pymongo.MongoClient(host=appConfig["sourceUri"],appname='migrcdc') + sourceDb = sourceConnection[appConfig["sourceNs"].split('.',1)[0]] + sourceColl = sourceDb[appConfig["sourceNs"].split('.',1)[1]] + + startTime = time.time() + lastFeedback = time.time() + lastBatch = time.time() + + allDone = False + threadOplogEntries = 0 + perfReportInterval = 1 + nextPerfReportTime = time.time() + perfReportInterval + + numCurrentBulkOps = 0 + numReportBulkOps = 0 + + numTotalBatches = 0 + + printedFirstTs = False + myCollectionOps = 0 + + # starting timestamp + endTs = appConfig["startTs"] + + while not allDone: + endTs = Timestamp(endTs.time + (usableThreadNum * readaheadChunkSeconds), 0) + chunkStopTs = Timestamp(endTs.time + readaheadChunkSeconds, 4294967295) + #logIt(threadnum,"READAHEAD | starting at {}".format(endTs)) + + if (appConfig["startTs"] == "RESUME_TOKEN"): + stream = sourceColl.watch(resume_after={'_data': appConfig["startPosition"]}, full_document='updateLookup', pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},{'$project':{'updateDescription':0,'fullDocument':0}}]) + else: + stream = sourceColl.watch(start_at_operation_time=endTs, full_document='updateLookup', pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},{'$project':{'updateDescription':0,'fullDocument':0}}]) + + #if appConfig['verboseLogging']: + # if (appConfig["startTs"] == "RESUME_TOKEN"): + # logIt(threadnum,"READAHEAD | Creating change stream cursor for resume token {}".format(appConfig["startPosition"])) + # else: + # logIt(threadnum,"READAHEAD | Creating change stream cursor for timestamp {}".format(endTs.as_datetime())) + + try: + with open(tempFileName, 'r') as f: + content = f.read() + dtUtcNow = datetime.utcnow() + applierSecondsBehind = int(content) + secondsBehind = int((dtUtcNow - endTs.as_datetime().replace(tzinfo=None)).total_seconds()) + secondsAhead = applierSecondsBehind - secondsBehind + #logIt(threadnum,"READAHEAD | ahead of applier by {} seconds".format(secondsAhead)) + if (secondsAhead > readaheadMaximumAhead): + sleepSeconds = secondsAhead - readaheadMaximumAhead + logIt(threadnum,"READAHEAD | ahead of applier by {} seconds, sleeping for {} seconds".format(secondsAhead,sleepSeconds)) + time.sleep(sleepSeconds) + except FileNotFoundError: + #logIt(threadnum,"READAHEAD | temp file {} not found".format(tempFileName)) + pass + except IOError as e: + #logIt(threadnum,"READAHEAD | reading temp file {} exception".format(e)) + pass + + for change in stream: + # check if time to exit + if ((time.time() - startTime) > appConfig['durationSeconds']) and (appConfig['durationSeconds'] != 0): + allDone = True + break + + endTs = change['clusterTime'] + resumeToken = change['_id']['_data'] + thisNs = change['ns']['db']+'.'+change['ns']['coll'] + thisOp = change['operationType'] + + # check if done with chunk + if (endTs > chunkStopTs): + #logIt(threadnum,"READAHEAD | Done with chunk") + stream.close() + break + + threadOplogEntries += 1 + + if (not printedFirstTs) and (thisOp in ['insert','update','replace','delete']) and (thisNs == appConfig["sourceNs"]): + if appConfig['verboseLogging']: + #logIt(threadnum,'READAHEAD | first timestamp = {} aka {}'.format(change['clusterTime'],change['clusterTime'].as_datetime())) + pass + printedFirstTs = True + + if (thisOp == 'insert'): + # insert + if (thisNs == appConfig["sourceNs"]): + myCollectionOps += 1 + numCurrentBulkOps += 1 + else: + pass + + elif (thisOp in ['update','replace']): + # update/replace + if (thisNs == appConfig["sourceNs"]): + myCollectionOps += 1 + numCurrentBulkOps += 1 + else: + pass + + elif (thisOp == 'delete'): + # delete + if (thisNs == appConfig["sourceNs"]): + myCollectionOps += 1 + numCurrentBulkOps += 1 + else: + pass + + elif (thisOp in ['drop','rename','dropDatabase','invalidate']): + # operations we do not track + pass + + else: + print(change) + sys.exit(1) + + if time.time() > nextPerfReportTime: + nextPerfReportTime = time.time() + perfReportInterval + perfQ.put({"name":"readaheadBatchCompleted","operations":numReportBulkOps,"endts":endTs,"processNum":threadnum}) + numReportBulkOps = 0 + + numReportBulkOps += numCurrentBulkOps + numCurrentBulkOps = 0 + numTotalBatches += 1 + lastBatch = time.time() + + # nothing arrived in the oplog for 1 second, pause before trying again + #time.sleep(1) + + #perfQ.put({"name":"readaheadBatchCompleted","operations":numCurrentBulkOps,"endts":endTs,"processNum":threadnum}) + + sourceConnection.close() + + perfQ.put({"name":"readaheadProcessCompleted","processNum":threadnum}) + + +def get_resume_token(appConfig): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + logIt(-1,'getting current change stream resume token') + + sourceConnection = pymongo.MongoClient(host=appConfig["sourceUri"],appname='migrcdc') + sourceDb = sourceConnection[appConfig["sourceNs"].split('.',1)[0]] + sourceColl = sourceDb[appConfig["sourceNs"].split('.',1)[1]] + + allDone = False + + stream = sourceColl.watch() + + while not allDone: + for change in stream: + resumeToken = change['_id']['_data'] + logIt(-1,'Change stream resume token is {}'.format(resumeToken)) + allDone = True + break + + +def reporter(appConfig, perfQ): + createCloudwatchMetrics = appConfig['createCloudwatchMetrics'] + clusterName = appConfig['clusterName'] + sourceNs = appConfig['sourceNs'] + tempFileName = "{}.tempfile".format(sourceNs) + + if appConfig['verboseLogging']: + logIt(-1,'reporting thread started') + + if createCloudwatchMetrics: + # only instantiate client if needed + cloudWatchClient = boto3.client('cloudwatch') + + startTime = time.time() + lastTime = time.time() + + # number of seconds between posting metrics to cloudwatch + cloudwatchPutSeconds = 30 + lastCloudwatchPutTime = time.time() + + lastProcessedOplogEntries = 0 + nextReportTime = startTime + appConfig["feedbackSeconds"] + + resumeToken = 'N/A' + + numWorkersCompleted = 0 + numProcessedOplogEntries = 0 + numReadaheadProcessedOplogEntries = 0 + + dtDict = {} + dtReadaheadDict = {} + + while (numWorkersCompleted < appConfig["numProcessingThreads"]): + time.sleep(appConfig["feedbackSeconds"]) + nowTime = time.time() + + numBatchEntries = 0 + numReadaheadBatchEntries = 0 + while not perfQ.empty(): + qMessage = perfQ.get_nowait() + if qMessage['name'] == "batchCompleted": + numBatchEntries += 1 + numProcessedOplogEntries += qMessage['operations'] + thisEndDt = qMessage['endts'].as_datetime().replace(tzinfo=None) + thisProcessNum = qMessage['processNum'] + if (thisProcessNum in dtDict) and (thisEndDt > dtDict[thisProcessNum]): + dtDict[thisProcessNum] = thisEndDt + else: + dtDict[thisProcessNum] = thisEndDt + #print("received endTs = {}".format(thisEndTs.as_datetime())) + if 'resumeToken' in qMessage: + resumeToken = qMessage['resumeToken'] + else: + resumeToken = 'N/A' + + elif qMessage['name'] == "readaheadBatchCompleted": + numReadaheadBatchEntries += 1 + numReadaheadProcessedOplogEntries += qMessage['operations'] + thisEndDt = qMessage['endts'].as_datetime().replace(tzinfo=None) + thisProcessNum = qMessage['processNum'] + if (thisProcessNum in dtReadaheadDict) and (thisEndDt > dtReadaheadDict[thisProcessNum]): + dtReadaheadDict[thisProcessNum] = thisEndDt + else: + dtReadaheadDict[thisProcessNum] = thisEndDt + #logIt(thisProcessNum,"received endTs = {}".format(thisEndDt)) + + elif qMessage['name'] == "processCompleted": + numWorkersCompleted += 1 + + # total total + elapsedSeconds = nowTime - startTime + totalOpsPerSecond = int(numProcessedOplogEntries / elapsedSeconds) + + # elapsed hours, minutes, seconds + thisHours, rem = divmod(elapsedSeconds, 3600) + thisMinutes, thisSeconds = divmod(rem, 60) + thisHMS = "{:0>2}:{:0>2}:{:05.2f}".format(int(thisHours),int(thisMinutes),thisSeconds) + + # this interval + intervalElapsedSeconds = nowTime - lastTime + intervalOpsPerSecond = int((numProcessedOplogEntries - lastProcessedOplogEntries) / intervalElapsedSeconds) + + # how far behind current time + if numBatchEntries == 0: + # no work this interval, we are fully caught up + avgSecondsBehind = 0 + else: + dtUtcNow = datetime.utcnow() + totSecondsBehind = 0 + numSecondsBehindEntries = 0 + for thisDt in dtDict: + totSecondsBehind += (dtUtcNow - dtDict[thisDt].replace(tzinfo=None)).total_seconds() + numSecondsBehindEntries += 1 + + avgSecondsBehind = int(totSecondsBehind / max(numSecondsBehindEntries,1)) + + # write seconds behind to file + with open(tempFileName, 'w') as f: + f.write("{}".format(avgSecondsBehind)) + + if appConfig['verboseLogging']: + # how far behind are the readahead workers + for thisDt in dtReadaheadDict: + secondsBehind = int((dtUtcNow - dtReadaheadDict[thisDt].replace(tzinfo=None)).total_seconds()) + #logIt(-1,"READAHEAD | worker {} is {:9,d} seconds behind current and {:9d} seconds ahead of appliers".format(thisDt,secondsBehind,avgSecondsBehind-secondsBehind)) + + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' + print("[{0}] elapsed {1} | total o/s {2:9,d} | interval o/s {3:9,d} | tot {4:16,d} | {5:12,d} secs behind | resume token = {6}".format(logTimeStamp,thisHMS,totalOpsPerSecond,intervalOpsPerSecond,numProcessedOplogEntries,avgSecondsBehind,resumeToken)) + nextReportTime = nowTime + appConfig["feedbackSeconds"] + + lastTime = nowTime + lastProcessedOplogEntries = numProcessedOplogEntries + + # output CW metrics every cloudwatchPutSeconds seconds + if createCloudwatchMetrics and ((time.time() - lastCloudwatchPutTime) > cloudwatchPutSeconds): + # log to cloudwatch + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[{'MetricName':'MigratorCDCOperationsPerSecond','Dimensions':[{'Name':'Cluster','Value':clusterName}],'Value':intervalOpsPerSecond,'StorageResolution':60}, + {'MetricName':'MigratorCDCNumSecondsBehind','Dimensions':[{'Name':'Cluster','Value':clusterName}],'Value':avgSecondsBehind,'StorageResolution':60}]) + + lastCloudwatchPutTime = time.time() + + +def main(): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + parser = argparse.ArgumentParser(description='CDC replication tool.') + + parser.add_argument('--skip-python-version-check', + required=False, + action='store_true', + help='Permit execution on Python 3.6 and prior') + + parser.add_argument('--source-uri', + required=True, + type=str, + help='Source URI') + + parser.add_argument('--target-uri', + required=True, + type=str, + help='Target URI') + + parser.add_argument('--source-namespace', + required=True, + type=str, + help='Source Namespace as .') + + parser.add_argument('--target-namespace', + required=False, + type=str, + help='Target Namespace as ., defaults to --source-namespace') + + parser.add_argument('--duration-seconds', + required=False, + type=int, + default=0, + help='Number of seconds to run before exiting, 0 = run forever') + + parser.add_argument('--feedback-seconds', + required=False, + type=int, + default=60, + help='Number of seconds between feedback output') + + parser.add_argument('--threads', + required=False, + type=int, + default=1, + help='Number of threads (parallel processing)') + + parser.add_argument('--max-seconds-between-batches', + required=False, + type=int, + default=5, + help='Maximum number of seconds to await full batch') + + parser.add_argument('--max-operations-per-batch', + required=False, + type=int, + default=100, + help='Maximum number of operations to include in a single batch') + + parser.add_argument('--dry-run', + required=False, + action='store_true', + help='Read source changes only, do not apply to target') + + parser.add_argument('--start-position', + required=True, + type=str, + help='Starting position - 0 for all available changes, YYYY-MM-DD+HH:MM:SS in UTC, or change stream resume token') + + parser.add_argument('--verbose', + required=False, + action='store_true', + help='Enable verbose logging') + + parser.add_argument('--use-oplog', + required=False, + action='store_true', + help='Use the oplog as change data capture source') + + parser.add_argument('--use-change-stream', + required=False, + action='store_true', + help='Use change streams as change data capture source') + + parser.add_argument('--get-resume-token', + required=False, + action='store_true', + help='Display the current change stream resume token') + + parser.add_argument('--create-cloudwatch-metrics',required=False,action='store_true',help='Create CloudWatch metrics when garbage collection is active') + parser.add_argument('--cluster-name',required=False,type=str,help='Name of cluster for CloudWatch metrics') + parser.add_argument('--readahead-workers',required=False,type=int,default=0,help='Number of additional workers to heat the cache') + parser.add_argument('--readahead-chunk-seconds',required=False,type=int,default=5,help='Number of seconds each worker processes before leaping ahead') + parser.add_argument('--readahead-maximum-ahead',required=False,type=int,default=60,help='Maximum number of seconds readahead workers are allowed') + + args = parser.parse_args() + + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + if (not args.use_oplog) and (not args.use_change_stream): + message = "Must supply either --use-oplog or --use-change-stream" + parser.error(message) + + if (args.use_oplog) and (args.use_change_stream): + message = "Cannot supply both --use-oplog or --use-change-stream" + parser.error(message) + + if (args.use_change_stream) and (args.start_position == "0"): + message = "--start-position must be supplied as YYYY-MM-DD+HH:MM:SS in UTC or resume token when executing in --use-change-stream mode" + parser.error(message) + + if args.create_cloudwatch_metrics and (args.cluster_name is None): + sys.exit("\nMust supply --cluster-name when capturing CloudWatch metrics.\n") + + appConfig = {} + appConfig['sourceUri'] = args.source_uri + appConfig['targetUri'] = args.target_uri + appConfig['numProcessingThreads'] = args.threads + appConfig['maxSecondsBetweenBatches'] = args.max_seconds_between_batches + appConfig['maxOperationsPerBatch'] = args.max_operations_per_batch + appConfig['durationSeconds'] = args.duration_seconds + appConfig['feedbackSeconds'] = args.feedback_seconds + appConfig['dryRun'] = args.dry_run + appConfig['sourceNs'] = args.source_namespace + if not args.target_namespace: + appConfig['targetNs'] = args.source_namespace + else: + appConfig['targetNs'] = args.target_namespace + appConfig['startPosition'] = args.start_position + appConfig['verboseLogging'] = args.verbose + appConfig['createCloudwatchMetrics'] = args.create_cloudwatch_metrics + appConfig['clusterName'] = args.cluster_name + appConfig['numReadaheadWorkers'] = args.readahead_workers + appConfig['readaheadChunkSeconds'] = args.readahead_chunk_seconds + appConfig['readaheadMaximumAhead'] = args.readahead_maximum_ahead + + sourceNs = appConfig['sourceNs'] + tempFileName = "{}.tempfile".format(sourceNs) + try: + os.remove(tempFileName) + except FileNotFoundError: + pass + except PermissionError: + pass + + if args.get_resume_token: + get_resume_token(appConfig) + sys.exit(0) + + if args.use_oplog: + appConfig['cdcSource'] = 'oplog' + else: + appConfig['cdcSource'] = 'changeStream' + + logIt(-1,"processing {} using {} threads".format(appConfig['cdcSource'],appConfig['numProcessingThreads'])) + + if len(appConfig["startPosition"]) == 36: + # resume token + appConfig["startTs"] = "RESUME_TOKEN" + + logIt(-1,"starting with resume token = {}".format(appConfig["startPosition"])) + + else: + if appConfig["startPosition"] == "0": + # start with first oplog entry + c = pymongo.MongoClient(host=appConfig["sourceUri"],appname='migrcdc') + oplog = c.local.oplog.rs + first = oplog.find().sort('$natural', pymongo.ASCENDING).limit(1).next() + appConfig["startTs"] = first['ts'] + c.close() + elif appConfig["startPosition"].upper() == "NOW": + # start with current time + appConfig["startTs"] = Timestamp(datetime.utcnow(), 1) + else: + # start at an arbitrary position + appConfig["startTs"] = Timestamp(datetime.fromisoformat(args.start_position), 1) + + logIt(-1,"starting with timestamp = {}".format(appConfig["startTs"].as_datetime())) + + mp.set_start_method('spawn') + q = mp.Manager().Queue() + + t = threading.Thread(target=reporter,args=(appConfig,q)) + t.start() + + processList = [] + for loop in range(appConfig["numProcessingThreads"]): + if (appConfig['cdcSource'] == 'oplog'): + p = mp.Process(target=oplog_processor,args=(loop,appConfig,q)) + else: + p = mp.Process(target=change_stream_processor,args=(loop,appConfig,q)) + processList.append(p) + + # add readahead workers + if appConfig['cdcSource'] == 'changeStream' and appConfig['numReadaheadWorkers'] > 0: + for loop in range(appConfig["numReadaheadWorkers"]): + p = mp.Process(target=readahead_worker,args=(loop+appConfig['numProcessingThreads'],appConfig,q)) + processList.append(p) + + for process in processList: + process.start() + + for process in processList: + process.join() + + t.join() + + +if __name__ == "__main__": + main() diff --git a/migration/migrator/cdc-multiprocess.py b/migration/migrator/cdc-multiprocess.py index 74929fe..6235650 100644 --- a/migration/migrator/cdc-multiprocess.py +++ b/migration/migrator/cdc-multiprocess.py @@ -8,6 +8,8 @@ import multiprocessing as mp import hashlib import argparse +import boto3 +import warnings def logIt(threadnum, message): @@ -16,13 +18,19 @@ def logIt(threadnum, message): def oplog_processor(threadnum, appConfig, perfQ): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + if appConfig['verboseLogging']: logIt(threadnum,'thread started') - c = pymongo.MongoClient(appConfig["sourceUri"]) + myAppname = None + if (threadnum == 0): + myAppname = 'migrcdc' + + c = pymongo.MongoClient(host=appConfig["sourceUri"],appname=myAppname) oplog = c.local.oplog.rs - destConnection = pymongo.MongoClient(appConfig["targetUri"]) + destConnection = pymongo.MongoClient(host=appConfig["targetUri"],appname=myAppname) destDatabase = destConnection[appConfig["targetNs"].split('.',1)[0]] destCollection = destDatabase[appConfig["targetNs"].split('.',1)[1]] @@ -176,19 +184,26 @@ def oplog_processor(threadnum, appConfig, perfQ): numTotalBatches += 1 c.close() + destConnection.close() perfQ.put({"name":"processCompleted","processNum":threadnum}) def change_stream_processor(threadnum, appConfig, perfQ): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + if appConfig['verboseLogging']: logIt(threadnum,'thread started') - sourceConnection = pymongo.MongoClient(appConfig["sourceUri"]) + myAppname = None + if (threadnum == 0): + myAppname = 'migrcdc' + + sourceConnection = pymongo.MongoClient(host=appConfig["sourceUri"],appname=myAppname) sourceDb = sourceConnection[appConfig["sourceNs"].split('.',1)[0]] sourceColl = sourceDb[appConfig["sourceNs"].split('.',1)[1]] - destConnection = pymongo.MongoClient(appConfig["targetUri"]) + destConnection = pymongo.MongoClient(host=appConfig["targetUri"],appname=myAppname) destDatabase = destConnection[appConfig["targetNs"].split('.',1)[0]] destCollection = destDatabase[appConfig["targetNs"].split('.',1)[1]] @@ -198,12 +213,15 @@ def change_stream_processor(threadnum, appConfig, perfQ): allDone = False threadOplogEntries = 0 + perfReportInterval = 1 + nextPerfReportTime = time.time() + perfReportInterval bulkOpList = [] # list with replace, not insert, in case document already exists (replaying old oplog) bulkOpListReplace = [] numCurrentBulkOps = 0 + numReportBulkOps = 0 numTotalBatches = 0 @@ -295,6 +313,11 @@ def change_stream_processor(threadnum, appConfig, perfQ): print(change) sys.exit(1) + if time.time() > nextPerfReportTime: + nextPerfReportTime = time.time() + perfReportInterval + perfQ.put({"name":"batchCompleted","operations":numReportBulkOps,"endts":endTs,"processNum":threadnum,"resumeToken":resumeToken}) + numReportBulkOps = 0 + if ((numCurrentBulkOps >= appConfig["maxOperationsPerBatch"]) or (time.time() >= (lastBatch + appConfig["maxSecondsBetweenBatches"]))) and (numCurrentBulkOps > 0): if not appConfig['dryRun']: try: @@ -302,9 +325,10 @@ def change_stream_processor(threadnum, appConfig, perfQ): except: # replace inserts as replaces result = destCollection.bulk_write(bulkOpListReplace,ordered=True) - perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"endts":endTs,"processNum":threadnum,"resumeToken":resumeToken}) + bulkOpList = [] bulkOpListReplace = [] + numReportBulkOps += numCurrentBulkOps numCurrentBulkOps = 0 numTotalBatches += 1 lastBatch = time.time() @@ -325,15 +349,18 @@ def change_stream_processor(threadnum, appConfig, perfQ): numCurrentBulkOps = 0 numTotalBatches += 1 - c.close() + sourceConnection.close() + destConnection.close() perfQ.put({"name":"processCompleted","processNum":threadnum}) def get_resume_token(appConfig): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + logIt(-1,'getting current change stream resume token') - sourceConnection = pymongo.MongoClient(appConfig["sourceUri"]) + sourceConnection = pymongo.MongoClient(host=appConfig["sourceUri"]) sourceDb = sourceConnection[appConfig["sourceNs"].split('.',1)[0]] sourceColl = sourceDb[appConfig["sourceNs"].split('.',1)[1]] @@ -350,12 +377,23 @@ def get_resume_token(appConfig): def reporter(appConfig, perfQ): + createCloudwatchMetrics = appConfig['createCloudwatchMetrics'] + clusterName = appConfig['clusterName'] + if appConfig['verboseLogging']: logIt(-1,'reporting thread started') - + + if createCloudwatchMetrics: + # only instantiate client if needed + cloudWatchClient = boto3.client('cloudwatch') + startTime = time.time() lastTime = time.time() + # number of seconds between posting metrics to cloudwatch + cloudwatchPutSeconds = 30 + lastCloudwatchPutTime = time.time() + lastProcessedOplogEntries = 0 nextReportTime = startTime + appConfig["feedbackSeconds"] @@ -370,9 +408,11 @@ def reporter(appConfig, perfQ): time.sleep(appConfig["feedbackSeconds"]) nowTime = time.time() + numBatchEntries = 0 while not perfQ.empty(): qMessage = perfQ.get_nowait() if qMessage['name'] == "batchCompleted": + numBatchEntries += 1 numProcessedOplogEntries += qMessage['operations'] thisEndDt = qMessage['endts'].as_datetime().replace(tzinfo=None) thisProcessNum = qMessage['processNum'] @@ -391,7 +431,7 @@ def reporter(appConfig, perfQ): # total total elapsedSeconds = nowTime - startTime - totalOpsPerSecond = numProcessedOplogEntries / elapsedSeconds + totalOpsPerSecond = int(numProcessedOplogEntries / elapsedSeconds) # elapsed hours, minutes, seconds thisHours, rem = divmod(elapsedSeconds, 3600) @@ -400,27 +440,43 @@ def reporter(appConfig, perfQ): # this interval intervalElapsedSeconds = nowTime - lastTime - intervalOpsPerSecond = (numProcessedOplogEntries - lastProcessedOplogEntries) / intervalElapsedSeconds + intervalOpsPerSecond = int((numProcessedOplogEntries - lastProcessedOplogEntries) / intervalElapsedSeconds) # how far behind current time - dtUtcNow = datetime.utcnow() - totSecondsBehind = 0 - numSecondsBehindEntries = 0 - for thisDt in dtDict: - totSecondsBehind = (dtUtcNow - dtDict[thisDt].replace(tzinfo=None)).total_seconds() - numSecondsBehindEntries += 1 + if numBatchEntries == 0: + # no work this interval, we are fully caught up + avgSecondsBehind = 0 + else: + dtUtcNow = datetime.utcnow() + totSecondsBehind = 0 + numSecondsBehindEntries = 0 + for thisDt in dtDict: + totSecondsBehind += (dtUtcNow - dtDict[thisDt].replace(tzinfo=None)).total_seconds() + numSecondsBehindEntries += 1 - avgSecondsBehind = int(totSecondsBehind / max(numSecondsBehindEntries,1)) + avgSecondsBehind = int(totSecondsBehind / max(numSecondsBehindEntries,1)) logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' - print("[{0}] elapsed {1} | total o/s {2:12,.2f} | interval o/s {3:12,.2f} | tot {4:16,d} | {5:12,d} secs behind | resume token = {6}".format(logTimeStamp,thisHMS,totalOpsPerSecond,intervalOpsPerSecond,numProcessedOplogEntries,avgSecondsBehind,resumeToken)) + print("[{0}] elapsed {1} | total o/s {2:9,d} | interval o/s {3:9,d} | tot {4:16,d} | {5:12,d} secs behind | resume token = {6}".format(logTimeStamp,thisHMS,totalOpsPerSecond,intervalOpsPerSecond,numProcessedOplogEntries,avgSecondsBehind,resumeToken)) nextReportTime = nowTime + appConfig["feedbackSeconds"] lastTime = nowTime lastProcessedOplogEntries = numProcessedOplogEntries + # output CW metrics every cloudwatchPutSeconds seconds + if createCloudwatchMetrics and ((time.time() - lastCloudwatchPutTime) > cloudwatchPutSeconds): + # log to cloudwatch + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[{'MetricName':'MigratorCDCOperationsPerSecond','Dimensions':[{'Name':'Cluster','Value':clusterName}],'Value':intervalOpsPerSecond,'StorageResolution':60}, + {'MetricName':'MigratorCDCNumSecondsBehind','Dimensions':[{'Name':'Cluster','Value':clusterName}],'Value':avgSecondsBehind,'StorageResolution':60}]) + + lastCloudwatchPutTime = time.time() + def main(): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + parser = argparse.ArgumentParser(description='CDC replication tool.') parser.add_argument('--skip-python-version-check', @@ -508,6 +564,9 @@ def main(): action='store_true', help='Display the current change stream resume token') + parser.add_argument('--create-cloudwatch-metrics',required=False,action='store_true',help='Create CloudWatch metrics when garbage collection is active') + parser.add_argument('--cluster-name',required=False,type=str,help='Name of cluster for CloudWatch metrics') + args = parser.parse_args() MIN_PYTHON = (3, 7) @@ -526,6 +585,9 @@ def main(): message = "--start-position must be supplied as YYYY-MM-DD+HH:MM:SS in UTC or resume token when executing in --use-change-stream mode" parser.error(message) + if args.create_cloudwatch_metrics and (args.cluster_name is None): + sys.exit("\nMust supply --cluster-name when capturing CloudWatch metrics.\n") + appConfig = {} appConfig['sourceUri'] = args.source_uri appConfig['targetUri'] = args.target_uri @@ -542,6 +604,8 @@ def main(): appConfig['targetNs'] = args.target_namespace appConfig['startPosition'] = args.start_position appConfig['verboseLogging'] = args.verbose + appConfig['createCloudwatchMetrics'] = args.create_cloudwatch_metrics + appConfig['clusterName'] = args.cluster_name if args.get_resume_token: get_resume_token(appConfig) @@ -563,7 +627,7 @@ def main(): else: if appConfig["startPosition"] == "0": # start with first oplog entry - c = pymongo.MongoClient(appConfig["sourceUri"]) + c = pymongo.MongoClient(host=appConfig["sourceUri"]) oplog = c.local.oplog.rs first = oplog.find().sort('$natural', pymongo.ASCENDING).limit(1).next() appConfig["startTs"] = first['ts'] diff --git a/migration/migrator/fl-multiprocess-filtered.py b/migration/migrator/fl-multiprocess-filtered.py new file mode 100644 index 0000000..d6b66ea --- /dev/null +++ b/migration/migrator/fl-multiprocess-filtered.py @@ -0,0 +1,273 @@ +from datetime import datetime, timedelta +import os +import sys +import time +import pymongo +from bson.timestamp import Timestamp +from bson.objectid import ObjectId +import threading +import multiprocessing as mp +import hashlib +import argparse + + +def logIt(threadnum, message): + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' + print("[{}] thread {:>3d} | {}".format(logTimeStamp,threadnum,message)) + + +def full_load_loader(threadnum, appConfig, perfQ): + if appConfig['verboseLogging']: + logIt(threadnum,'thread started') + + sourceConnection = pymongo.MongoClient(host=appConfig["sourceUri"],appname='migrfull') + sourceDb = sourceConnection[appConfig["sourceNs"].split('.',1)[0]] + sourceColl = sourceDb[appConfig["sourceNs"].split('.',1)[1]] + + destConnection = pymongo.MongoClient(host=appConfig["targetUri"],appname='migrfull') + destDatabase = destConnection[appConfig["targetNs"].split('.',1)[0]] + destCollection = destDatabase[appConfig["targetNs"].split('.',1)[1]] + + startTime = time.time() + lastFeedback = time.time() + + bulkOpList = [] + + # list with replace, not insert, in case document already exists (replaying old oplog) + bulkOpListReplace = [] + numCurrentBulkOps = 0 + + numTotalBatches = 0 + + myCollectionOps = 0 + + if appConfig['verboseLogging']: + logIt(threadnum,"Creating cursor") + + ttlDateTime=datetime(2024,6,30,0,0,0,0) + + if (threadnum == 0): + # thread 0 = $lte only + #cursor = sourceColl.find({'_id': {'$lte': appConfig['boundaries'][threadnum]}}) + #cursor = sourceColl.find({'_id': {'$lte': appConfig['boundaries'][threadnum]},"ttl":{"$gt":ttlDateTime}},hint=[('_id',pymongo.ASCENDING)]) + cursor = sourceColl.find({'_id': {'$lte': appConfig['boundaries'][threadnum]}},hint=[('_id',pymongo.ASCENDING)]) + elif (threadnum == appConfig['numProcessingThreads'] - 1): + # last processor = $gt only + #cursor = sourceColl.find({'_id': {'$gt': appConfig['boundaries'][threadnum-1]}}) + #cursor = sourceColl.find({'_id': {'$gt': appConfig['boundaries'][threadnum-1]},"ttl":{"$gt":ttlDateTime}},hint=[('_id',pymongo.ASCENDING)]) + cursor = sourceColl.find({'_id': {'$gt': appConfig['boundaries'][threadnum-1]}},hint=[('_id',pymongo.ASCENDING)]) + else: + # last processor = $gt prior, $lte next + #cursor = sourceColl.find({'_id': {'$gt': appConfig['boundaries'][threadnum-1], '$lte': appConfig['boundaries'][threadnum]}}) + #cursor = sourceColl.find({'_id': {'$gt': appConfig['boundaries'][threadnum-1], '$lte': appConfig['boundaries'][threadnum]},"ttl":{"$gt":ttlDateTime}},hint=[('_id',pymongo.ASCENDING)]) + cursor = sourceColl.find({'_id': {'$gt': appConfig['boundaries'][threadnum-1], '$lte': appConfig['boundaries'][threadnum]}},hint=[('_id',pymongo.ASCENDING)]) + + perfQ.put({"name":"findCompleted","processNum":threadnum}) + + for doc in cursor: + if ('ttl' in doc) and (doc['ttl'] < ttlDateTime): + # skip old documents + continue + + myCollectionOps += 1 + bulkOpList.append(pymongo.InsertOne(doc)) + # if playing old oplog, need to change inserts to be replaces (the inserts will fail due to _id uniqueness) + #bulkOpListReplace.append(pymongo.ReplaceOne(doc['_id'],doc,upsert=True)) + numCurrentBulkOps += 1 + + if (numCurrentBulkOps >= appConfig["maxInsertsPerBatch"]): + if not appConfig['dryRun']: + # try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + # except: + # # replace inserts as replaces + # result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"processNum":threadnum}) + bulkOpList = [] + bulkOpListReplace = [] + numCurrentBulkOps = 0 + numTotalBatches += 1 + + if (numCurrentBulkOps > 0): + if not appConfig['dryRun']: + # try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + # except: + # # replace inserts as replaces + # result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"processNum":threadnum}) + bulkOpList = [] + bulkOpListReplace = [] + numCurrentBulkOps = 0 + numTotalBatches += 1 + + perfQ.put({"name":"processCompleted","processNum":threadnum}) + + +def reporter(appConfig, perfQ): + if appConfig['verboseLogging']: + logIt(-1,'reporting thread started') + + startTime = time.time() + lastTime = time.time() + + lastProcessedOplogEntries = 0 + nextReportTime = startTime + appConfig["feedbackSeconds"] + + numWorkersCompleted = 0 + numWorkersLoading = 0 + numProcessedOplogEntries = 0 + + while (numWorkersCompleted < appConfig["numProcessingThreads"]): + time.sleep(appConfig["feedbackSeconds"]) + nowTime = time.time() + + while not perfQ.empty(): + qMessage = perfQ.get_nowait() + if qMessage['name'] == "batchCompleted": + numProcessedOplogEntries += qMessage['operations'] + elif qMessage['name'] == "processCompleted": + numWorkersCompleted += 1 + elif qMessage['name'] == "findCompleted": + numWorkersLoading += 1 + + # total total + elapsedSeconds = nowTime - startTime + totalOpsPerSecond = numProcessedOplogEntries / elapsedSeconds + + # elapsed hours, minutes, seconds + thisHours, rem = divmod(elapsedSeconds, 3600) + thisMinutes, thisSeconds = divmod(rem, 60) + thisHMS = "{:0>2}:{:0>2}:{:05.2f}".format(int(thisHours),int(thisMinutes),thisSeconds) + + # this interval + intervalElapsedSeconds = nowTime - lastTime + intervalOpsPerSecond = (numProcessedOplogEntries - lastProcessedOplogEntries) / intervalElapsedSeconds + + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' + print("[{0}] elapsed {1} | total o/s {2:12,.2f} | interval o/s {3:12,.2f} | tot ops {4:16,d} | loading {5:5d}".format(logTimeStamp,thisHMS,totalOpsPerSecond,intervalOpsPerSecond,numProcessedOplogEntries,numWorkersLoading)) + nextReportTime = nowTime + appConfig["feedbackSeconds"] + + lastTime = nowTime + lastProcessedOplogEntries = numProcessedOplogEntries + + +def main(): + parser = argparse.ArgumentParser(description='Full Load migration tool.') + + parser.add_argument('--skip-python-version-check', + required=False, + action='store_true', + help='Permit execution on Python 3.6 and prior') + + parser.add_argument('--source-uri', + required=True, + type=str, + help='Source URI') + + parser.add_argument('--target-uri', + required=True, + type=str, + help='Target URI') + + parser.add_argument('--source-namespace', + required=True, + type=str, + help='Source Namespace as .') + + parser.add_argument('--target-namespace', + required=False, + type=str, + help='Target Namespace as ., defaults to --source-namespace') + + parser.add_argument('--feedback-seconds', + required=False, + type=int, + default=60, + help='Number of seconds between feedback output') + + parser.add_argument('--max-inserts-per-batch', + required=False, + type=int, + default=100, + help='Maximum number of inserts to include in a single batch') + + parser.add_argument('--dry-run', + required=False, + action='store_true', + help='Read source changes only, do not apply to target') + + parser.add_argument('--verbose', + required=False, + action='store_true', + help='Enable verbose logging') + + parser.add_argument('--boundaries', + required=True, + type=str, + help='Boundaries for segmenting') + + parser.add_argument('--boundary-datatype', + required=False, + type=str, + default='objectid', + choices=['objectid','string','int'], + help='Boundaries for segmenting') + + + args = parser.parse_args() + + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + appConfig = {} + appConfig['sourceUri'] = args.source_uri + appConfig['targetUri'] = args.target_uri + appConfig['maxInsertsPerBatch'] = args.max_inserts_per_batch + appConfig['feedbackSeconds'] = args.feedback_seconds + appConfig['dryRun'] = args.dry_run + appConfig['sourceNs'] = args.source_namespace + if not args.target_namespace: + appConfig['targetNs'] = args.source_namespace + else: + appConfig['targetNs'] = args.target_namespace + appConfig['verboseLogging'] = args.verbose + appConfig['boundaryDatatype'] = args.boundary_datatype + + boundaryList = args.boundaries.split(',') + appConfig['boundaries'] = [] + for thisBoundary in boundaryList: + if appConfig['boundaryDatatype'] == 'objectid': + appConfig['boundaries'].append(ObjectId(thisBoundary)) + elif appConfig['boundaryDatatype'] == 'string': + appConfig['boundaries'].append(thisBoundary) + else: + appConfig['boundaries'].append(int(thisBoundary)) + + appConfig['numProcessingThreads'] = len(appConfig['boundaries'])+1 + + logIt(-1,"processing using {} threads".format(appConfig['numProcessingThreads'])) + + mp.set_start_method('spawn') + q = mp.Manager().Queue() + + t = threading.Thread(target=reporter,args=(appConfig,q)) + t.start() + + processList = [] + for loop in range(appConfig["numProcessingThreads"]): + p = mp.Process(target=full_load_loader,args=(loop,appConfig,q)) + processList.append(p) + + for process in processList: + process.start() + + for process in processList: + process.join() + + t.join() + + +if __name__ == "__main__": + main() diff --git a/migration/migrator/fl-multiprocess.py b/migration/migrator/fl-multiprocess.py index 4a6ba0c..25b13c2 100644 --- a/migration/migrator/fl-multiprocess.py +++ b/migration/migrator/fl-multiprocess.py @@ -9,22 +9,45 @@ import multiprocessing as mp import hashlib import argparse +import boto3 +import warnings +from bson import encode def logIt(threadnum, message): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' print("[{}] thread {:>3d} | {}".format(logTimeStamp,threadnum,message)) +def getCollectionCount(appConfig): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + sourceDb = appConfig["sourceNs"].split('.',1)[0] + sourceColl = appConfig["sourceNs"].split('.',1)[1] + client = pymongo.MongoClient(appConfig['sourceUri']) + db = client[sourceDb] + collStats = db.command("collStats", sourceColl) + client.close() + return max(collStats['count'],1) + + def full_load_loader(threadnum, appConfig, perfQ): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + if appConfig['verboseLogging']: logIt(threadnum,'thread started') - sourceConnection = pymongo.MongoClient(appConfig["sourceUri"]) + myAppname = None + if (threadnum == 0): + myAppname = 'migrfull' + + sourceConnection = pymongo.MongoClient(host=appConfig["sourceUri"],appname=myAppname) sourceDb = sourceConnection[appConfig["sourceNs"].split('.',1)[0]] sourceColl = sourceDb[appConfig["sourceNs"].split('.',1)[1]] - destConnection = pymongo.MongoClient(appConfig["targetUri"]) + destConnection = pymongo.MongoClient(host=appConfig["targetUri"],appname=myAppname) destDatabase = destConnection[appConfig["targetNs"].split('.',1)[0]] destCollection = destDatabase[appConfig["targetNs"].split('.',1)[1]] @@ -36,6 +59,7 @@ def full_load_loader(threadnum, appConfig, perfQ): # list with replace, not insert, in case document already exists (replaying old oplog) bulkOpListReplace = [] numCurrentBulkOps = 0 + numCurrentBytes = 0 numTotalBatches = 0 @@ -46,16 +70,19 @@ def full_load_loader(threadnum, appConfig, perfQ): if (threadnum == 0): # thread 0 = $lte only - cursor = sourceColl.find({'_id': {'$lte': ObjectId(appConfig['boundaries'][threadnum])}}) + cursor = sourceColl.find({'_id': {'$lte': appConfig['boundaries'][threadnum]}}) elif (threadnum == appConfig['numProcessingThreads'] - 1): # last processor = $gt only - cursor = sourceColl.find({'_id': {'$gt': ObjectId(appConfig['boundaries'][threadnum-1])}}) + cursor = sourceColl.find({'_id': {'$gt': appConfig['boundaries'][threadnum-1]}}) else: # last processor = $gt prior, $lte next - cursor = sourceColl.find({'_id': {'$gt': ObjectId(appConfig['boundaries'][threadnum-1]), '$lte': ObjectId(appConfig['boundaries'][threadnum])}}) + cursor = sourceColl.find({'_id': {'$gt': appConfig['boundaries'][threadnum-1], '$lte': appConfig['boundaries'][threadnum]}}) + + perfQ.put({"name":"findCompleted","processNum":threadnum}) for doc in cursor: myCollectionOps += 1 + numCurrentBytes += len(encode(doc)) bulkOpList.append(pymongo.InsertOne(doc)) # if playing old oplog, need to change inserts to be replaces (the inserts will fail due to _id uniqueness) #bulkOpListReplace.append(pymongo.ReplaceOne(doc['_id'],doc,upsert=True)) @@ -68,10 +95,11 @@ def full_load_loader(threadnum, appConfig, perfQ): # except: # # replace inserts as replaces # result = destCollection.bulk_write(bulkOpListReplace,ordered=True) - perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"processNum":threadnum}) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"processNum":threadnum,"bytes":numCurrentBytes}) bulkOpList = [] bulkOpListReplace = [] numCurrentBulkOps = 0 + numCurrentBytes = 0 numTotalBatches += 1 if (numCurrentBulkOps > 0): @@ -81,7 +109,7 @@ def full_load_loader(threadnum, appConfig, perfQ): # except: # # replace inserts as replaces # result = destCollection.bulk_write(bulkOpListReplace,ordered=True) - perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"processNum":threadnum}) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"processNum":threadnum,"bytes":numCurrentBytes}) bulkOpList = [] bulkOpListReplace = [] numCurrentBulkOps = 0 @@ -91,33 +119,67 @@ def full_load_loader(threadnum, appConfig, perfQ): def reporter(appConfig, perfQ): + createCloudwatchMetrics = appConfig['createCloudwatchMetrics'] + numDocumentsToMigrate = appConfig['numDocumentsToMigrate'] + clusterName = appConfig['clusterName'] + if appConfig['verboseLogging']: logIt(-1,'reporting thread started') + + if createCloudwatchMetrics: + # only instantiate client if needed + cloudWatchClient = boto3.client('cloudwatch') startTime = time.time() lastTime = time.time() + + # number of seconds between posting metrics to cloudwatch + cloudwatchPutSeconds = 30 + lastCloudwatchPutTime = time.time() lastProcessedOplogEntries = 0 nextReportTime = startTime + appConfig["feedbackSeconds"] numWorkersCompleted = 0 + numWorkersLoading = 0 numProcessedOplogEntries = 0 while (numWorkersCompleted < appConfig["numProcessingThreads"]): time.sleep(appConfig["feedbackSeconds"]) nowTime = time.time() + numThisBytes = 0 while not perfQ.empty(): qMessage = perfQ.get_nowait() if qMessage['name'] == "batchCompleted": numProcessedOplogEntries += qMessage['operations'] + numThisBytes += qMessage['bytes'] elif qMessage['name'] == "processCompleted": numWorkersCompleted += 1 + numWorkersLoading -= 1 + elif qMessage['name'] == "findCompleted": + numWorkersLoading += 1 # total total elapsedSeconds = nowTime - startTime totalOpsPerSecond = numProcessedOplogEntries / elapsedSeconds + # estimated time to done + if numProcessedOplogEntries > 0: + pctDone = max(numProcessedOplogEntries / numDocumentsToMigrate,0.001) + remainingSeconds = max(int(elapsedSeconds / pctDone) - elapsedSeconds,0) + else: + remainingSeconds = 0 + + thisHours, rem = divmod(remainingSeconds, 3600) + thisMinutes, thisSeconds = divmod(rem, 60) + remainHMS = "{:0>2}:{:0>2}:{:0>2}".format(int(thisHours),int(thisMinutes),int(thisSeconds)) + + if (numDocumentsToMigrate == 0): + pctDone = 100.0 + else: + pctDone = (numProcessedOplogEntries / numDocumentsToMigrate) * 100.0 + # elapsed hours, minutes, seconds thisHours, rem = divmod(elapsedSeconds, 3600) thisMinutes, thisSeconds = divmod(rem, 60) @@ -127,13 +189,25 @@ def reporter(appConfig, perfQ): intervalElapsedSeconds = nowTime - lastTime intervalOpsPerSecond = (numProcessedOplogEntries - lastProcessedOplogEntries) / intervalElapsedSeconds + numThisGbPerHour = numThisBytes / intervalElapsedSeconds * 60 * 60 / 1024 / 1024 / 1024 + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' - print("[{0}] elapsed {1} | total o/s {2:12,.2f} | interval o/s {3:12,.2f} | tot ops {4:16,d}".format(logTimeStamp,thisHMS,totalOpsPerSecond,intervalOpsPerSecond,numProcessedOplogEntries)) + print("[{0}] elapsed {1} | total o/s {2:12,.2f} | interval o/s {3:12,.2f} | tot ops {4:16,d} | loading {5:5d} | pct {6:6.2f}% | done in {7} | GB/hr {8:6.2f}".format(logTimeStamp,thisHMS,totalOpsPerSecond,intervalOpsPerSecond,numProcessedOplogEntries,numWorkersLoading,pctDone,remainHMS,numThisGbPerHour)) nextReportTime = nowTime + appConfig["feedbackSeconds"] lastTime = nowTime lastProcessedOplogEntries = numProcessedOplogEntries + # output CW metrics every cloudwatchPutSeconds seconds + if createCloudwatchMetrics and ((time.time() - lastCloudwatchPutTime) > cloudwatchPutSeconds): + # log to cloudwatch + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[{'MetricName':'MigratorFLInsertsPerSecond','Dimensions':[{'Name':'Cluster','Value':clusterName}],'Value':intervalOpsPerSecond,'StorageResolution':60}, + {'MetricName':'MigratorFLRemainingSeconds','Dimensions':[{'Name':'Cluster','Value':clusterName}],'Value':remainingSeconds,'StorageResolution':60}]) + + lastCloudwatchPutTime = time.time() + def main(): parser = argparse.ArgumentParser(description='Full Load migration tool.') @@ -190,6 +264,15 @@ def main(): type=str, help='Boundaries for segmenting') + parser.add_argument('--boundary-datatype', + required=False, + type=str, + default='objectid', + choices=['objectid','string','int'], + help='Boundaries for segmenting') + + parser.add_argument('--create-cloudwatch-metrics',required=False,action='store_true',help='Create CloudWatch metrics when garbage collection is active') + parser.add_argument('--cluster-name',required=False,type=str,help='Name of cluster for CloudWatch metrics') args = parser.parse_args() @@ -197,6 +280,9 @@ def main(): if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + if args.create_cloudwatch_metrics and (args.cluster_name is None): + sys.exit("\nMust supply --cluster-name when capturing CloudWatch metrics.\n") + appConfig = {} appConfig['sourceUri'] = args.source_uri appConfig['targetUri'] = args.target_uri @@ -209,8 +295,22 @@ def main(): else: appConfig['targetNs'] = args.target_namespace appConfig['verboseLogging'] = args.verbose - appConfig['boundaries'] = args.boundaries.split(',') + appConfig['boundaryDatatype'] = args.boundary_datatype + appConfig['createCloudwatchMetrics'] = args.create_cloudwatch_metrics + appConfig['clusterName'] = args.cluster_name + + boundaryList = args.boundaries.split(',') + appConfig['boundaries'] = [] + for thisBoundary in boundaryList: + if appConfig['boundaryDatatype'] == 'objectid': + appConfig['boundaries'].append(ObjectId(thisBoundary)) + elif appConfig['boundaryDatatype'] == 'string': + appConfig['boundaries'].append(thisBoundary) + else: + appConfig['boundaries'].append(int(thisBoundary)) + appConfig['numProcessingThreads'] = len(appConfig['boundaries'])+1 + appConfig['numDocumentsToMigrate'] = getCollectionCount(appConfig) logIt(-1,"processing using {} threads".format(appConfig['numProcessingThreads'])) diff --git a/migration/mongodb-changestream-review/README.md b/migration/mongodb-changestream-review/README.md new file mode 100644 index 0000000..f708ff9 --- /dev/null +++ b/migration/mongodb-changestream-review/README.md @@ -0,0 +1,26 @@ +# MongoDB Changestream Review Tool + +The mongodb changestream review tool connects to any instance in a MongoDB replicaset (primary or secondary), reads the changestream, and produces a log file containing counters for insert/update/delete operations by collection. + +# Requirements + - Python 3.7+ + - If using Snappy wire protocol compression and MongoDB, "apt install python-snappy" + - PyMongo + - MongoDB 2.6 - 3.4 | pymongo 3.10 - 3.12 + - MongoDB 3.6 - 5.0 | pymongo 3.12 - 4.0 + - MongoDB 5.1+ | pymongo 4.0+ + - DocumentDB | pymongo 3.10 - 4.0 + +## Using the MongoDB Changestream Review Tool +`python3 mongo-changestream-review.py --server-alias --uri --stop-when-changestream-current --start-position 2025-04-02+12:00:00` + +- Run on any instance in the replicaset (the larger the changestream the better) +- Use a different \ for each execution +- If sharded, run on one instance in each shard +- Avoid running the tool from the server itself if possible, it consume disk space for the output files +- Each execution creates a file starting with \ and ending with .log +- The \ options can be found at https://www.mongodb.com/docs/manual/reference/connection-string/ +- Consider adding "&compressor=snappy" to your \ if your MongoDB server supports it + +## License +This tool is licensed under the Apache 2.0 License. diff --git a/migration/mongodb-changestream-review/mongodb-changestream-review-bigdocs.py b/migration/mongodb-changestream-review/mongodb-changestream-review-bigdocs.py new file mode 100644 index 0000000..88a7b79 --- /dev/null +++ b/migration/mongodb-changestream-review/mongodb-changestream-review-bigdocs.py @@ -0,0 +1,261 @@ +import argparse +import os +import sys +import time +import pymongo +from bson.timestamp import Timestamp +from datetime import datetime, timedelta, timezone +import warnings + + +def printLog(thisMessage,thisFile): + print(thisMessage) + thisFile.write("{}\n".format(thisMessage)) + + +def parseChangestream(appConfig): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + startTs = appConfig['startTs'] + + logTimeStamp = datetime.utcnow().strftime('%Y%m%d%H%M%S') + logFileName = "{}-{}-mongo-changestream-review.log".format(appConfig['serverAlias'],logTimeStamp) + fp = open(logFileName, 'w') + + printLog('connecting to MongoDB aliased as {}'.format(appConfig['serverAlias']),fp) + client = pymongo.MongoClient(host=appConfig['uri'],appname='mdbcstrv',unicode_decode_error_handler='ignore') + + secondsBehind = 999999 + + printLog("starting with timestamp = {}".format(startTs.as_datetime()),fp) + + numTotalChangestreamEntries = 0 + opDict = {} + + startTime = time.time() + lastFeedback = time.time() + allDone = False + + noChangesPauseSeconds = 5.0 + + ''' + i = insert + u = update + d = delete + c = command + db = database + n = no-op + ''' + + #with client.watch(start_at_operation_time=startTs, full_document=None, pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},{'$project':{'updateDescription':0,'fullDocument':0}}]) as stream: + #with client.watch(start_at_operation_time=startTs, full_document='updateLookup', pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},{'$project':{'updateDescription':0,'fullDocument':0}}]) as stream: + with client.watch(start_at_operation_time=startTs, full_document='updateLookup', pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}}]) as stream: + while stream.alive and not allDone: + try: + change = stream.try_next() + change2Resume = change["_id"] + except: + with client.watch(resume_after=change2Resume, full_document=None, pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},]) as stream2: + change2 = stream2.try_next() + print("Exception Raised") + print("{} | {} | {} | {}".format(change["_id"],change["clusterTime"],change["ns"],change["documentKey"])) + + resumeToken = stream.resume_token + #print("{} | {} | {} | {}".format(change["_id"],change["clusterTime"],change["ns"],change["documentKey"])) + + # check if time to stop - elapsed time + elapsedSeconds = (time.time() - startTime) + if (elapsedSeconds >= appConfig['collectSeconds']): + print("reached requested elapsed {} seconds, stopping".format(appConfig['collectSeconds'])) + allDone = True + break + + if change is None: + # no changes available - might be time to stop + if appConfig['stopWhenChangestreamCurrent']: + print("change stream is current, stopping") + allDone = True + break + #print(" no changes, pausing for {} second(s)".format(noChangesPauseSeconds)) + time.sleep(noChangesPauseSeconds) + continue + else: + #print("change doc is | {}".format(change)) + pass + + # check if time to stop - current enough + if (appConfig['stopWhenChangestreamCurrent'] and (secondsBehind < 60)): + print("change stream is current, stopping") + allDone = True + break + + currentTs = change['clusterTime'] + resumeToken = change['_id']['_data'] + thisNs = change['ns']['db']+'.'+change['ns']['coll'] + thisOpType = change['operationType'] + + numTotalChangestreamEntries += 1 + if ((numTotalChangestreamEntries % appConfig['numOperationsFeedback']) == 0) or ((lastFeedback + appConfig['numSecondsFeedback']) < time.time()): + lastFeedback = time.time() + elapsedSeconds = time.time() - startTime + secondsBehind = int((datetime.now(timezone.utc) - currentTs.as_datetime().replace(tzinfo=timezone.utc)).total_seconds()) + if (elapsedSeconds != 0): + printLog(" tot changestream entries read {:16,d} @ {:12,.0f} per second | {:12,d} seconds behind".format(numTotalChangestreamEntries,numTotalChangestreamEntries//elapsedSeconds,secondsBehind),fp) + else: + printLog(" tot changestream entries read {:16,d} @ {:12,.0f} per second | {:12,d} seconds behind".format(0,0.0,secondsBehind),fp) + + if (thisOpType == 'insert'): + # insert + if thisNs in opDict: + opDict[thisNs]['ins'] += 1 + else: + opDict[thisNs] = {'ins':1,'upd':0,'del':0} + + elif (thisOpType in ['update','replace']): + # update + if thisNs in opDict: + opDict[thisNs]['upd'] += 1 + else: + opDict[thisNs] = {'ins':0,'upd':1,'del':0} + + elif (thisOpType == 'delete'): + # delete + if thisNs in opDict: + opDict[thisNs]['del'] += 1 + else: + opDict[thisNs] = {'ins':0,'upd':0,'del':1} + + else: + printLog(change,fp) + sys.exit(1) + + # print overall ops, ips/ups/dps + + oplogSeconds = (currentTs.as_datetime()-startTs.as_datetime()).total_seconds() + oplogMinutes = oplogSeconds/60 + oplogHours = oplogMinutes/60 + oplogDays = oplogHours/24 + + if appConfig['unitOfMeasure'] == 'sec': + calcDivisor = oplogSeconds + elif appConfig['unitOfMeasure'] == 'min': + calcDivisor = oplogMinutes + elif appConfig['unitOfMeasure'] == 'hr': + calcDivisor = oplogHours + else: + calcDivisor = oplogDays + + # determine width needed for namespace + nsWidth = 10 + for thisOpKey in opDict.keys(): + if len(thisOpKey) > nsWidth: + nsWidth = len(thisOpKey) + + printLog("",fp) + printLog("-----------------------------------------------------------------------------------------",fp) + printLog("",fp) + + printLog("changestream elapsed seconds = {}".format(oplogSeconds),fp) + + # print collection ops, ips/ups/dps + printLog("{:<{dbWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s}".format('Namespace', + 'Tot Inserts','Per '+appConfig['unitOfMeasure'], + 'Tot Updates','Per '+appConfig['unitOfMeasure'], + 'Tot Deletes','Per '+appConfig['unitOfMeasure'], + dbWidth=nsWidth, + intWidth=15, + floatWidth=10 + ),fp) + + for thisOpKey in sorted(opDict.keys()): + printLog("{:<{dbWidth}s} | {:<{intWidth},d} | {:<{floatWidth},.0f} | {:<{intWidth},d} | {:<{floatWidth},.0f} | {:<{intWidth},d} | {:<{floatWidth},.0f}".format(thisOpKey, + opDict[thisOpKey]['ins'],opDict[thisOpKey]['ins']//calcDivisor, + opDict[thisOpKey]['upd'],opDict[thisOpKey]['upd']//calcDivisor, + opDict[thisOpKey]['del'],opDict[thisOpKey]['del']//calcDivisor, + dbWidth=nsWidth, + intWidth=15, + floatWidth=10 + ),fp) + + printLog("",fp) + + client.close() + fp.close() + + +def main(): + parser = argparse.ArgumentParser(description='Calculate collection level acvitivity using a changestream.') + + parser.add_argument('--skip-python-version-check', + required=False, + action='store_true', + help='Permit execution on Python 3.6 and prior') + + parser.add_argument('--uri', + required=True, + type=str, + help='MongoDB Connection URI') + + parser.add_argument('--server-alias', + required=True, + type=str, + help='Alias for server, used to name output file') + + parser.add_argument('--unit-of-measure', + required=False, + default='day', + choices=['sec','min','hr','day'], + help='Unit of measure for reporting [sec | min | hr | day]') + + parser.add_argument('--collect-seconds', + required=False, + type=int, + default=10800, + help='Number of seconds to parse changestream before stopping.') + + parser.add_argument('--stop-when-changestream-current', + required=False, + action='store_true', + help='Stop processing and output results when fully caught up on the changestream') + + parser.add_argument('--start-position', + required=True, + type=str, + help='Starting position - YYYY-MM-DD+HH:MM:SS in UTC') + + parser.add_argument('--num-operations-feedback', + required=False, + type=int, + default=200000, + help='Maximum number of operations per feedback') + + parser.add_argument('--num-seconds-feedback', + required=False, + type=int, + default=5, + help='Maximum number of seconds per feedback') + + args = parser.parse_args() + + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + appConfig = {} + appConfig['uri'] = args.uri + appConfig['serverAlias'] = args.server_alias + appConfig['collectSeconds'] = args.collect_seconds + appConfig['unitOfMeasure'] = args.unit_of_measure + appConfig['stopWhenChangestreamCurrent'] = args.stop_when_changestream_current + appConfig["startTs"] = Timestamp(datetime.fromisoformat(args.start_position), 1) + appConfig['numOperationsFeedback'] = int(args.num_operations_feedback) + appConfig['numSecondsFeedback'] = int(args.num_seconds_feedback) + + # consume all of the changestream rather than scoping to particular namespaces + appConfig['includeAllDatabases'] = True + + parseChangestream(appConfig) + + +if __name__ == "__main__": + main() diff --git a/migration/mongodb-changestream-review/mongodb-changestream-review.py b/migration/mongodb-changestream-review/mongodb-changestream-review.py new file mode 100644 index 0000000..07846a9 --- /dev/null +++ b/migration/mongodb-changestream-review/mongodb-changestream-review.py @@ -0,0 +1,250 @@ +import argparse +import os +import sys +import time +import pymongo +from bson.timestamp import Timestamp +from datetime import datetime, timedelta, timezone +import warnings + + +def printLog(thisMessage,thisFile): + print(thisMessage) + thisFile.write("{}\n".format(thisMessage)) + + +def parseChangestream(appConfig): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + startTs = appConfig['startTs'] + + logTimeStamp = datetime.utcnow().strftime('%Y%m%d%H%M%S') + logFileName = "{}-{}-mongo-changestream-review.log".format(appConfig['serverAlias'],logTimeStamp) + fp = open(logFileName, 'w') + + printLog('connecting to MongoDB aliased as {}'.format(appConfig['serverAlias']),fp) + client = pymongo.MongoClient(host=appConfig['uri'],appname='mdbcstrv',unicode_decode_error_handler='ignore') + + secondsBehind = 999999 + + printLog("starting with timestamp = {}".format(startTs.as_datetime()),fp) + + numTotalChangestreamEntries = 0 + opDict = {} + + startTime = time.time() + lastFeedback = time.time() + allDone = False + + noChangesPauseSeconds = 5.0 + + ''' + i = insert + u = update + d = delete + c = command + db = database + n = no-op + ''' + + with client.watch(start_at_operation_time=startTs, full_document=None, pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},{'$project':{'updateDescription':0,'fullDocument':0}}]) as stream: + while stream.alive and not allDone: + change = stream.try_next() + resumeToken = stream.resume_token + + # check if time to stop - elapsed time + elapsedSeconds = (time.time() - startTime) + if (elapsedSeconds >= appConfig['collectSeconds']): + print("reached requested elapsed {} seconds, stopping".format(appConfig['collectSeconds'])) + allDone = True + break + + if change is None: + # no changes available - might be time to stop + if appConfig['stopWhenChangestreamCurrent']: + print("change stream is current, stopping") + allDone = True + break + #print(" no changes, pausing for {} second(s)".format(noChangesPauseSeconds)) + time.sleep(noChangesPauseSeconds) + continue + else: + #print("change doc is | {}".format(change)) + pass + + # check if time to stop - current enough + if (appConfig['stopWhenChangestreamCurrent'] and (secondsBehind < 60)): + print("change stream is current, stopping") + allDone = True + break + + currentTs = change['clusterTime'] + resumeToken = change['_id']['_data'] + thisNs = change['ns']['db']+'.'+change['ns']['coll'] + thisOpType = change['operationType'] + + numTotalChangestreamEntries += 1 + if ((numTotalChangestreamEntries % appConfig['numOperationsFeedback']) == 0) or ((lastFeedback + appConfig['numSecondsFeedback']) < time.time()): + lastFeedback = time.time() + elapsedSeconds = time.time() - startTime + secondsBehind = int((datetime.now(timezone.utc) - currentTs.as_datetime().replace(tzinfo=timezone.utc)).total_seconds()) + if (elapsedSeconds != 0): + printLog(" tot changestream entries read {:16,d} @ {:12,.0f} per second | {:12,d} seconds behind".format(numTotalChangestreamEntries,numTotalChangestreamEntries//elapsedSeconds,secondsBehind),fp) + else: + printLog(" tot changestream entries read {:16,d} @ {:12,.0f} per second | {:12,d} seconds behind".format(0,0.0,secondsBehind),fp) + + if (thisOpType == 'insert'): + # insert + if thisNs in opDict: + opDict[thisNs]['ins'] += 1 + else: + opDict[thisNs] = {'ins':1,'upd':0,'del':0} + + elif (thisOpType in ['update','replace']): + # update + if thisNs in opDict: + opDict[thisNs]['upd'] += 1 + else: + opDict[thisNs] = {'ins':0,'upd':1,'del':0} + + elif (thisOpType == 'delete'): + # delete + if thisNs in opDict: + opDict[thisNs]['del'] += 1 + else: + opDict[thisNs] = {'ins':0,'upd':0,'del':1} + + else: + printLog(change,fp) + sys.exit(1) + + # print overall ops, ips/ups/dps + + oplogSeconds = (currentTs.as_datetime()-startTs.as_datetime()).total_seconds() + oplogMinutes = oplogSeconds/60 + oplogHours = oplogMinutes/60 + oplogDays = oplogHours/24 + + if appConfig['unitOfMeasure'] == 'sec': + calcDivisor = oplogSeconds + elif appConfig['unitOfMeasure'] == 'min': + calcDivisor = oplogMinutes + elif appConfig['unitOfMeasure'] == 'hr': + calcDivisor = oplogHours + else: + calcDivisor = oplogDays + + # determine width needed for namespace + nsWidth = 10 + for thisOpKey in opDict.keys(): + if len(thisOpKey) > nsWidth: + nsWidth = len(thisOpKey) + + printLog("",fp) + printLog("-----------------------------------------------------------------------------------------",fp) + printLog("",fp) + + printLog("changestream elapsed seconds = {}".format(oplogSeconds),fp) + + # print collection ops, ips/ups/dps + printLog("{:<{dbWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s}".format('Namespace', + 'Tot Inserts','Per '+appConfig['unitOfMeasure'], + 'Tot Updates','Per '+appConfig['unitOfMeasure'], + 'Tot Deletes','Per '+appConfig['unitOfMeasure'], + dbWidth=nsWidth, + intWidth=15, + floatWidth=10 + ),fp) + + for thisOpKey in sorted(opDict.keys()): + printLog("{:<{dbWidth}s} | {:<{intWidth},d} | {:<{floatWidth},.0f} | {:<{intWidth},d} | {:<{floatWidth},.0f} | {:<{intWidth},d} | {:<{floatWidth},.0f}".format(thisOpKey, + opDict[thisOpKey]['ins'],opDict[thisOpKey]['ins']//calcDivisor, + opDict[thisOpKey]['upd'],opDict[thisOpKey]['upd']//calcDivisor, + opDict[thisOpKey]['del'],opDict[thisOpKey]['del']//calcDivisor, + dbWidth=nsWidth, + intWidth=15, + floatWidth=10 + ),fp) + + printLog("",fp) + + client.close() + fp.close() + + +def main(): + parser = argparse.ArgumentParser(description='Calculate collection level acvitivity using a changestream.') + + parser.add_argument('--skip-python-version-check', + required=False, + action='store_true', + help='Permit execution on Python 3.6 and prior') + + parser.add_argument('--uri', + required=True, + type=str, + help='MongoDB Connection URI') + + parser.add_argument('--server-alias', + required=True, + type=str, + help='Alias for server, used to name output file') + + parser.add_argument('--unit-of-measure', + required=False, + default='day', + choices=['sec','min','hr','day'], + help='Unit of measure for reporting [sec | min | hr | day]') + + parser.add_argument('--collect-seconds', + required=False, + type=int, + default=10800, + help='Number of seconds to parse changestream before stopping.') + + parser.add_argument('--stop-when-changestream-current', + required=False, + action='store_true', + help='Stop processing and output results when fully caught up on the changestream') + + parser.add_argument('--start-position', + required=True, + type=str, + help='Starting position - YYYY-MM-DD+HH:MM:SS in UTC') + + parser.add_argument('--num-operations-feedback', + required=False, + type=int, + default=200000, + help='Maximum number of operations per feedback') + + parser.add_argument('--num-seconds-feedback', + required=False, + type=int, + default=5, + help='Maximum number of seconds per feedback') + + args = parser.parse_args() + + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + appConfig = {} + appConfig['uri'] = args.uri + appConfig['serverAlias'] = args.server_alias + appConfig['collectSeconds'] = args.collect_seconds + appConfig['unitOfMeasure'] = args.unit_of_measure + appConfig['stopWhenChangestreamCurrent'] = args.stop_when_changestream_current + appConfig["startTs"] = Timestamp(datetime.fromisoformat(args.start_position), 1) + appConfig['numOperationsFeedback'] = int(args.num_operations_feedback) + appConfig['numSecondsFeedback'] = int(args.num_seconds_feedback) + + # consume all of the changestream rather than scoping to particular namespaces + appConfig['includeAllDatabases'] = True + + parseChangestream(appConfig) + + +if __name__ == "__main__": + main() diff --git a/migration/mongodb-oplog-review/README.md b/migration/mongodb-oplog-review/README.md index 0293614..538888c 100644 --- a/migration/mongodb-oplog-review/README.md +++ b/migration/mongodb-oplog-review/README.md @@ -13,6 +13,7 @@ The mongodb oplog review tool connects to any instance in a MongoDB replicaset ( ## Using the MongoDB Oplog Review Tool `python3 mongo-oplog-review.py --server-alias --uri --stop-when-oplog-current` +Optionally add arguments `--output-to-csv --file-name ` if you want to save the output to a csv file - Run on any instance in the replicaset (the larger the oplog the better) - Use a different \ for each execution diff --git a/migration/mongodb-oplog-review/mongodb-oplog-review.py b/migration/mongodb-oplog-review/mongodb-oplog-review.py index 497c0d6..c5f97ba 100644 --- a/migration/mongodb-oplog-review/mongodb-oplog-review.py +++ b/migration/mongodb-oplog-review/mongodb-oplog-review.py @@ -2,6 +2,7 @@ import os import sys import time +import csv import pymongo from bson.timestamp import Timestamp from datetime import datetime, timedelta, timezone @@ -18,7 +19,7 @@ def parseOplog(appConfig): fp = open(logFileName, 'w') printLog('connecting to MongoDB aliased as {}'.format(appConfig['serverAlias']),fp) - client = pymongo.MongoClient(appConfig['uri']) + client = pymongo.MongoClient(host=appConfig['uri'],appname='mdboplrv') oplog = client.local.oplog.rs secondsBehind = 999999 @@ -165,6 +166,31 @@ def parseOplog(appConfig): printLog("opLog elapsed seconds = {}".format(oplogSeconds),fp) + # Write to CSV if requested + if appConfig['outputToCsv']: + try: + with open(appConfig['fileName'], 'w', newline='') as csvfile: + csvwriter = csv.writer(csvfile) + # Write header + csvwriter.writerow(['Namespace', + 'Tot Inserts', 'Per '+appConfig['unitOfMeasure'], + 'Tot Updates', 'Per '+appConfig['unitOfMeasure'], + 'Tot Deletes', 'Per '+appConfig['unitOfMeasure'], + 'Tot Commands', 'Per '+appConfig['unitOfMeasure'], + 'Tot No-Ops', 'Per '+appConfig['unitOfMeasure']]) + + # Write data rows + for thisOpKey in sorted(opDict.keys()): + csvwriter.writerow([thisOpKey, + opDict[thisOpKey]['ins'], opDict[thisOpKey]['ins']//calcDivisor, + opDict[thisOpKey]['upd'], opDict[thisOpKey]['upd']//calcDivisor, + opDict[thisOpKey]['del'], opDict[thisOpKey]['del']//calcDivisor, + opDict[thisOpKey]['com'], opDict[thisOpKey]['com']//calcDivisor, + opDict[thisOpKey]['nop'], opDict[thisOpKey]['nop']//calcDivisor]) + printLog(f"CSV output written to {appConfig['fileName']}", fp) + except Exception as e: + printLog(f"Error writing to CSV file: {str(e)}", fp) + # print collection ops, ips/ups/dps printLog("{:<{dbWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s} | {:<{intWidth}s} | {:<{floatWidth}s}".format('Namespace', 'Tot Inserts','Per '+appConfig['unitOfMeasure'], @@ -250,6 +276,16 @@ def main(): required=False, action='store_true', help='Stop processing and output results when fully caught up on the oplog') + + parser.add_argument('--output-to-csv', + required=False, + action='store_true', + help='Output results to a CSV file') + + parser.add_argument('--file-name', + required=False, + type=str, + help='Name of the CSV file to write (default: _oplog_stats.csv)') args = parser.parse_args() @@ -267,6 +303,8 @@ def main(): appConfig['collectSeconds'] = args.collect_seconds appConfig['unitOfMeasure'] = args.unit_of_measure appConfig['stopWhenOplogCurrent'] = args.stop_when_oplog_current + appConfig['outputToCsv'] = args.output_to_csv + appConfig['fileName'] = args.file_name if args.file_name else f"{args.server_alias}_oplog_stats.csv" # start from the beginning of the oplog rather than an aribtrary timestamp appConfig['startFromOplogStart'] = True diff --git a/migration/mongodb-ops/README.md b/migration/mongodb-ops/README.md new file mode 100644 index 0000000..f341803 --- /dev/null +++ b/migration/mongodb-ops/README.md @@ -0,0 +1,28 @@ +# MongoDB Ops Tool + +The MongoDB Ops tool gathers collection level query/insert/update/delete counters to assist in the process of sizing. + +# Requirements + - Python 3.7+ + - PyMongo + - MongoDB 2.6 - 3.4 | pymongo 3.10 - 3.12 + - MongoDB 3.6 - 5.0 | pymongo 3.12 - 4.0 + - MongoDB 5.1+ | pymongo 4.0+ + - DocumentDB | pymongo 3.10 - 4.0 + +## Using the MongoDB Ops Tool +`python3 mongodb-ops.py --uri --server-alias --collect` +- Produces an output file for comparison + +`python3 mongodb-ops.py --compare --file1 --file2 ` +- Compares the results of two executions to estimate the number of queries, inserts, updates, and deletes per second at the collection level. + +## Notes +- Run on any instance in the replicaset (the larger the oplog the better) +- If sharded, run on one instance in each shard +- Each execution creates a file starting with \ and ending with .json +- The \ options can be found at https://www.mongodb.com/docs/manual/reference/connection-string/ +- Use &directConnection=true + +## License +This tool is licensed under the Apache 2.0 License. diff --git a/migration/mongodb-ops/mongodb-ops.py b/migration/mongodb-ops/mongodb-ops.py index 70c5b3f..5c696d9 100644 --- a/migration/mongodb-ops/mongodb-ops.py +++ b/migration/mongodb-ops/mongodb-ops.py @@ -30,7 +30,7 @@ def mongoCollect(appConfig): startTime = time.time() print('connecting to MongoDB aliased as {}'.format(appConfig['serverAlias'])) - client = pymongo.MongoClient(appConfig['uri']) + client = pymongo.MongoClient(host=appConfig['uri'],appname='mdbops') startServerOpCounters = client.admin.command("serverStatus")['opcounters'] startServerMetricsDocument = client.admin.command("serverStatus")['metrics']['document'] @@ -114,13 +114,14 @@ def getCollectionStats(client): #print(" skipping view {}".format(thisColl['name'])) pass else: - collStats = client[thisDb['name']].command("collstats",thisColl['name']).copy() + collStats = client[thisDb['name']].command("collStats",thisColl['name']).copy() if thisDb['name'] not in returnDict: returnDict[thisDb['name']] = {} returnDict[thisDb['name']][thisColl['name']] = {} - returnDict[thisDb['name']][thisColl['name']]['wiredTiger'] = {} - returnDict[thisDb['name']][thisColl['name']]['wiredTiger']['cursor'] = collStats['wiredTiger']['cursor'] + if collStats.get('wiredTiger') is not None: + returnDict[thisDb['name']][thisColl['name']]['wiredTiger'] = {} + returnDict[thisDb['name']][thisColl['name']]['wiredTiger']['cursor'] = collStats['wiredTiger']['cursor'] returnDict[thisDb['name']][thisColl['name']]['ns'] = collStats['ns'] returnDict[thisDb['name']][thisColl['name']]['size'] = collStats['size'] returnDict[thisDb['name']][thisColl['name']]['count'] = collStats['count'] @@ -173,7 +174,7 @@ def mongoEvaluate(appConfig): for thisDb in dict1Start['collstats']: for thisColl in dict1Start['collstats'][thisDb]: thisCollDict = dict1Start['collstats'][thisDb][thisColl]['wiredTiger']['cursor'] - printEval(thisDb,thisColl,f1UptimeSeconds,thisCollDict['search calls'],thisCollDict['insert calls'],thisCollDict['update calls'],thisCollDict['remove calls'],appConfig,totalDict) + printEval(thisDb,thisColl,f1UptimeSeconds,thisCollDict['search calls'],thisCollDict['insert calls'],thisCollDict.get('modify',0),thisCollDict['remove calls'],appConfig,totalDict) if appConfig['numFiles'] == 2: with open(appConfig['file2'], 'r') as fp: @@ -228,7 +229,7 @@ def mongoEvaluate(appConfig): printEval(thisDb,thisColl,useTime,endCollDict['search calls'] - startCollDict['search calls'], endCollDict['insert calls'] - startCollDict['insert calls'], #endCollDict['update calls'] - startCollDict['update calls'], - endCollDict.get('modify calls',0) - startCollDict.get('modify calls',0), + endCollDict.get('modify',0) - startCollDict.get('modify',0), endCollDict['remove calls'] - startCollDict['remove calls'], dbColumnWidth,collColumnWidth,appConfig,totalDict) @@ -268,7 +269,7 @@ def printTotals(thisLabel1,thisLabel2,thisTime,thisQuery,thisInsert,thisUpdate,t def main(): - parser = argparse.ArgumentParser(description='Dump and restore indexes from MongoDB to DocumentDB.') + parser = argparse.ArgumentParser(description='Gather collection level statistics to estimate query/insert/update/delete rates on a MongoDB server.') parser.add_argument('--skip-python-version-check', required=False, diff --git a/migration/mvu-tool/README.md b/migration/mvu-tool/README.md new file mode 100644 index 0000000..b70a919 --- /dev/null +++ b/migration/mvu-tool/README.md @@ -0,0 +1,72 @@ +# Amazon DocumentDB MVU CDC Migrator Tool + +The purpose of mvu cdc migrator tool is to migrate the cluster wide changes from source Amazon DocumentDB Cluster to target Amazon DocumentDB Cluster. + +It enables to perform near zero downtime Major Version Upgrade(MVU) from Amazon DocumentDB 3.6 to Amazon DocumentDB 5.0. + +This tool is only recommended for performing MVU from Amazon DocumentDB 3.6. If you are performing MVU from Amazon DocumentDB 4.0 to 5.0, we recommend using the AWS Database Migration Service CDC approach. + +## Prerequisites: + + - Python 3 + - Modules: pymongo +``` + pip3 install pymongo +``` +## How to use + +1. Clone the repository and go to the tool folder: +``` +git clone https://github.com/awslabs/amazon-documentdb-tools.git +cd amazon-documentdb-tools/mvu-tool/ +``` + +2. Run the mvu-cdc-migrator.py tool to capature the cluster wide change stream token and migrate the changes. It accepts the following arguments: +``` +python3 mvu-cdc-migrator.py --help +usage: mvu-cdc-migrator.py [-h] [--skip-python-version-check] --source-uri SOURCE_URI [--target-uri TARGET_URI] + [--source-database SOURCE_DATABASE] + [--duration-seconds DURATION_SECONDS] + [--feedback-seconds FEEDBACK_SECONDS] [--threads THREADS] + [--max-seconds-between-batches MAX_SECONDS_BETWEEN_BATCHES] + [--max-operations-per-batch MAX_OPERATIONS_PER_BATCH] + [--dry-run] --start-position START_POSITION + [--verbose] [--get-resume-token] + +MVU CDC Migrator Tool. + +options: + -h, --help show this help message and exit + --skip-python-version-check + Permit execution on Python 3.6 and prior + --source-uri SOURCE_URI + Source URI + --target-uri TARGET_URI + Target URI you can skip if you run with get-resume-token + --source-database SOURCE_DATABASE + Source database name if you skip it will replicate all the databases + --duration-seconds DURATION_SECONDS + Number of seconds to run before exiting, 0 = run forever + --feedback-seconds FEEDBACK_SECONDS + Number of seconds between feedback output + --threads THREADS Number of threads (parallel processing) + --max-seconds-between-batches MAX_SECONDS_BETWEEN_BATCHES + Maximum number of seconds to await full batch + --max-operations-per-batch MAX_OPERATIONS_PER_BATCH + Maximum number of operations to include in a single batch + --dry-run Read source changes only, do not apply to target + --start-position START_POSITION + Starting position - 0 to get change stream resume token, or change stream resume token + --verbose Enable verbose logging + --get-resume-token Display the current change stream resume token +``` +## Example usage: + +* To get the cluster wide change stream token +``` +python3 mvu-cdc-migrator.py --source-uri -- start-position 0 --verbose --get-resume-token +``` +* To Migrate the CDC changes during MVU +``` +python3 migrate-cdc-cluster.py --source-uri -- target-uri --start-position --verbose +``` diff --git a/migration/mvu-tool/mvu-cdc-migrator.py b/migration/mvu-tool/mvu-cdc-migrator.py new file mode 100644 index 0000000..46900d2 --- /dev/null +++ b/migration/mvu-tool/mvu-cdc-migrator.py @@ -0,0 +1,415 @@ +from datetime import datetime, timedelta +import os +import sys +import time +import pymongo +from bson.timestamp import Timestamp +import threading +import multiprocessing as mp +import hashlib +import argparse +from collections import defaultdict + + +#Logger function +def logIt(threadnum, message): + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' + print("[{}] thread {:>3d} | {}".format(logTimeStamp,threadnum,message)) + +#Function to process the change stream + +def change_stream_processor(threadnum, appConfig, perfQ): + if appConfig['verboseLogging']: + logIt(threadnum,'thread started') + + sourceConnection = pymongo.MongoClient(host=appConfig["sourceUri"],appname='mvutool') + destConnection = pymongo.MongoClient(host=appConfig["targetUri"],appname='mvutool') + startTime = time.time() + lastFeedback = time.time() + lastBatch = time.time() + allDone = False + threadOplogEntries = 0 + waitcount=0 + nsBulkOpDict = defaultdict(list) + nsBulkOpDictReplace= defaultdict(list) + # list with replace, not insert, in case document already exists (replaying old oplog) + numCurrentBulkOps = 0 + numTotalBatches = 0 + printedFirstTs = False + myClusterOps = 0 + + # starting timestamp + endTs = appConfig["startTs"] + + if (appConfig["startTs"] == "RESUME_TOKEN") and not appConfig["sourceDb"] : + stream = sourceConnection.watch(resume_after={'_data': appConfig["startPosition"]}, full_document='updateLookup', pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},{'$project':{'updateDescription':0}}]) + + else: + sourceDatabase=sourceConnection[appConfig["sourceDb"]] + stream = sourceDatabase.watch(resume_after={'_data': appConfig["startPosition"]}, full_document='updateLookup', pipeline=[{'$match': {'operationType': {'$in': ['insert','update','replace','delete']}}},{'$project':{'updateDescription':0}}]) + + + if appConfig['verboseLogging']: + if (appConfig["startTs"] == "RESUME_TOKEN"): + logIt(threadnum,"Creating change stream cursor for resume token {}".format(appConfig["startPosition"])) + + while not allDone: + while stream.alive: + change = stream.try_next() + if ((time.time() - startTime) > appConfig['durationSeconds']) and (appConfig['durationSeconds'] != 0): + allDone = True + break + + # FIXED: Separate the timeout logic from change processing + if change is None: + waitcount += 1 + if waitcount <= appConfig["maxSecondsBetweenBatches"]: + time.sleep(1) + continue + else: + # Timeout reached - process any pending batch and reset + waitcount = 0 + if numCurrentBulkOps > 0: + # Force batch processing due to timeout + if appConfig['verboseLogging']: + logIt(threadnum, f'Timeout reached, processing batch of {numCurrentBulkOps} operations') + + bulkOpList=[] + bulkOpListReplace=[] + if not appConfig['dryRun']: + for ns in nsBulkOpDict: + destDatabase=destConnection[(ns.split('.',1)[0])] + destCollection=destDatabase[(ns.split('.',1)[1])] + bulkOpList=nsBulkOpDict[ns] + try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + except: + # replace inserts as replaces + bulkOpListReplace=nsBulkOpDictReplace[ns] + result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"endts":endTs,"processNum":threadnum,"resumeToken":"N/A"}) + nsBulkOpDict = defaultdict(list) + nsBulkOpDictReplace= defaultdict(list) + numCurrentBulkOps = 0 + numTotalBatches += 1 + lastBatch = time.time() + continue + + # Reset wait count when we get a real change + waitcount = 0 + + # Process the actual change + endTs = change['clusterTime'] + resumeToken = change['_id']['_data'] + thisDb = change['ns']['db'] + thisCol=change['ns']['coll'] + thisNs=thisDb+'.'+thisCol + thisOp = change['operationType'] + + if ((int(hashlib.sha512(str(change['documentKey']).encode('utf-8')).hexdigest(), 16) % appConfig["numProcessingThreads"]) == threadnum): + threadOplogEntries += 1 + if (not printedFirstTs) and (thisOp in ['insert','update','replace','delete']): + if appConfig['verboseLogging']: + logIt(threadnum,'first timestamp = {} aka {}'.format(change['clusterTime'],change['clusterTime'].as_datetime())) + printedFirstTs = True + + if (thisOp == 'insert'): + myClusterOps += 1 + nsBulkOpDict[thisNs].append(pymongo.InsertOne(change['fullDocument'])) + nsBulkOpDictReplace[thisNs].append(pymongo.ReplaceOne(change['documentKey'],change['fullDocument'],upsert=True)) + numCurrentBulkOps += 1 + elif (thisOp in ['update','replace']): + # update/replace + if (change['fullDocument'] is not None): + myClusterOps += 1 + nsBulkOpDict[thisNs].append(pymongo.ReplaceOne(change['documentKey'],change['fullDocument'],upsert=True)) + nsBulkOpDictReplace[thisNs].append(pymongo.ReplaceOne(change['documentKey'],change['fullDocument'],upsert=True)) + numCurrentBulkOps += 1 + else: + pass + elif (thisOp == 'delete'): + myClusterOps += 1 + nsBulkOpDict[thisNs].append(pymongo.DeleteOne({'_id':change['documentKey']['_id']})) + nsBulkOpDictReplace[thisNs].append(pymongo.DeleteOne({'_id':change['documentKey']['_id']})) + numCurrentBulkOps += 1 + elif (thisOp in ['drop','rename','dropDatabase','invalidate']): + # operations we do not track + pass + else: + print(change) + sys.exit(1) + + # Check if we need to process batch (either by count or time) + if ((numCurrentBulkOps >= appConfig["maxOperationsPerBatch"]) or (time.time() >= (lastBatch + appConfig["maxSecondsBetweenBatches"])) ) and (numCurrentBulkOps > 0): + bulkOpList=[] + bulkOpListReplace=[] + if not appConfig['dryRun']: + for ns in nsBulkOpDict: + destDatabase=destConnection[(ns.split('.',1)[0])] + destCollection=destDatabase[(ns.split('.',1)[1])] + bulkOpList=nsBulkOpDict[ns] + try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + except: + # replace inserts as replaces + bulkOpListReplace=nsBulkOpDictReplace[ns] + result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + perfQ.put({"name":"batchCompleted","operations":numCurrentBulkOps,"endts":endTs,"processNum":threadnum,"resumeToken":resumeToken}) + nsBulkOpDict = defaultdict(list) + nsBulkOpDictReplace= defaultdict(list) + numCurrentBulkOps = 0 + numTotalBatches += 1 + lastBatch = time.time() + + if (numCurrentBulkOps > 0): + bulkOpList=[] + bulkOpListReplace=[] + print("Inside While",numCurrentBulkOps) + if not appConfig['dryRun']: + for ns in nsBulkOpDict: + destDatabase=destConnection[(ns.split('.',1)[0])] + destCollection=destDatabase[(ns.split('.',1)[1])] + bulkOpList=nsBulkOpDict[ns] + try: + result = destCollection.bulk_write(bulkOpList,ordered=True) + except: + # replace inserts as replaces + bulkOpListReplace=nsBulkOpDictReplace[ns] + result = destCollection.bulk_write(bulkOpListReplace,ordered=True) + nsBulkOpDict = defaultdict(list) + nsBulkOpDictReplace= defaultdict(list) + numCurrentBulkOps = 0 + numTotalBatches += 1 + + sourceConnection.close() + destConnection.close() + perfQ.put({"name":"processCompleted","processNum":threadnum}) + +#Function to get the Change stream token +def get_resume_token(appConfig): + sourceConnection = pymongo.MongoClient(host=appConfig["sourceUri"],appname='mvutool') + + allDone = False + if not appConfig["sourceDb"]: + stream = sourceConnection.watch() + logIt(-1,'getting current change stream resume token') + else: + sourceDatabase=sourceConnection[appConfig["sourceDb"]] + stream=sourceDatabase.watch() + logIt(-1,'getting current change stream resume token for ' + appConfig["sourceDb"] + " database") + + while not allDone: + for change in stream: + resumeToken = change['_id']['_data'] + logIt(-1,'Change stream resume token is {}'.format(resumeToken)) + filename="get-resume-token-"+time.strftime("%Y%m%d-%H%M%S")+".txt" + f = open(filename, "w") + f.write("Change stream resume token is "+ str(resumeToken)) + f.close() + allDone = True + break + + +def reporter(appConfig, perfQ): + if appConfig['verboseLogging']: + logIt(-1,'reporting thread started') + + startTime = time.time() + lastTime = time.time() + + lastProcessedOplogEntries = 0 + nextReportTime = startTime + appConfig["feedbackSeconds"] + + resumeToken = 'N/A' + + numWorkersCompleted = 0 + numProcessedOplogEntries = 0 + + dtDict = {} + + while (numWorkersCompleted < appConfig["numProcessingThreads"]): + time.sleep(appConfig["feedbackSeconds"]) + nowTime = time.time() + + while not perfQ.empty(): + qMessage = perfQ.get_nowait() + if qMessage['name'] == "batchCompleted": + numProcessedOplogEntries += qMessage['operations'] + thisEndDt = qMessage['endts'].as_datetime().replace(tzinfo=None) + thisProcessNum = qMessage['processNum'] + if (thisProcessNum in dtDict) and (thisEndDt > dtDict[thisProcessNum]): + dtDict[thisProcessNum] = thisEndDt + else: + dtDict[thisProcessNum] = thisEndDt + #print("received endTs = {}".format(thisEndTs.as_datetime())) + if 'resumeToken' in qMessage: + resumeToken = qMessage['resumeToken'] + else: + resumeToken = 'N/A' + + elif qMessage['name'] == "processCompleted": + numWorkersCompleted += 1 + + # total total + elapsedSeconds = nowTime - startTime + totalOpsPerSecond = numProcessedOplogEntries / elapsedSeconds + + # elapsed hours, minutes, seconds + thisHours, rem = divmod(elapsedSeconds, 3600) + thisMinutes, thisSeconds = divmod(rem, 60) + thisHMS = "{:0>2}:{:0>2}:{:05.2f}".format(int(thisHours),int(thisMinutes),thisSeconds) + + # this interval + intervalElapsedSeconds = nowTime - lastTime + intervalOpsPerSecond = (numProcessedOplogEntries - lastProcessedOplogEntries) / intervalElapsedSeconds + + # how far behind current time + dtUtcNow = datetime.utcnow() + totSecondsBehind = 0 + numSecondsBehindEntries = 0 + for thisDt in dtDict: + totSecondsBehind = (dtUtcNow - dtDict[thisDt].replace(tzinfo=None)).total_seconds() + numSecondsBehindEntries += 1 + + avgSecondsBehind = int(totSecondsBehind / max(numSecondsBehindEntries,1)) + + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' + print("[{0}] elapsed {1} | total o/s {2:12,.2f} | interval o/s {3:12,.2f} | tot {4:16,d} | {5:12,d} secs behind | resume token = {6}".format(logTimeStamp,thisHMS,totalOpsPerSecond,intervalOpsPerSecond,numProcessedOplogEntries,avgSecondsBehind,resumeToken)) + nextReportTime = nowTime + appConfig["feedbackSeconds"] + + lastTime = nowTime + lastProcessedOplogEntries = numProcessedOplogEntries + + +def main(): + parser = argparse.ArgumentParser(description='MVU CDC Migrator Tool.') + + parser.add_argument('--skip-python-version-check', + required=False, + action='store_true', + help='Permit execution on Python 3.6 and prior') + + parser.add_argument('--source-uri', + required=True, + type=str, + help='Source URI') + + parser.add_argument('--target-uri', + required=False, + type=str, + default="no-target-uri", + help='Target URI you can skip if you run with get-resume-token') + + parser.add_argument('--source-database', + required=False, + type=str, + help='Source database name if you skip it will replicate all the databases') + + + parser.add_argument('--duration-seconds', + required=False, + type=int, + default=0, + help='Number of seconds to run before exiting, 0 = run forever') + + parser.add_argument('--feedback-seconds', + required=False, + type=int, + default=15, + help='Number of seconds between feedback output') + + parser.add_argument('--threads', + required=False, + type=int, + default=1, + help='Number of threads (parallel processing)') + + parser.add_argument('--max-seconds-between-batches', + required=False, + type=int, + default=5, + help='Maximum number of seconds to await full batch') + + parser.add_argument('--max-operations-per-batch', + required=False, + type=int, + default=100, + help='Maximum number of operations to include in a single batch') + + parser.add_argument('--dry-run', + required=False, + action='store_true', + help='Read source changes only, do not apply to target') + + parser.add_argument('--start-position', + required=True, + type=str, + help='Starting position - 0 to get change stream resume token, or change stream resume token') + + parser.add_argument('--verbose', + required=False, + action='store_true', + help='Enable verbose logging') + + parser.add_argument('--get-resume-token', + required=False, + action='store_true', + help='Display the current change stream resume token') + + args = parser.parse_args() + + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + + appConfig = {} + appConfig['sourceUri'] = args.source_uri + appConfig['targetUri'] = args.target_uri + appConfig['numProcessingThreads'] = args.threads + appConfig['maxSecondsBetweenBatches'] = args.max_seconds_between_batches + appConfig['maxOperationsPerBatch'] = args.max_operations_per_batch + appConfig['durationSeconds'] = args.duration_seconds + appConfig['feedbackSeconds'] = args.feedback_seconds + appConfig['dryRun'] = args.dry_run + appConfig['sourceDb'] = args.source_database + appConfig['startPosition'] = args.start_position + appConfig['verboseLogging'] = args.verbose + appConfig['cdcSource'] = 'changeStream' + + if args.get_resume_token: + get_resume_token(appConfig) + sys.exit(0) + elif (not args.get_resume_token) and args.target_uri =='no-target-uri': + message = "you need to supply target uri to run it" + parser.error(message) + + logIt(-1,"processing {} using {} threads".format(appConfig['cdcSource'],appConfig['numProcessingThreads'])) + + if len(appConfig["startPosition"]) == 36: + # resume token + appConfig["startTs"] = "RESUME_TOKEN" + logIt(-1,"starting with resume token = {}".format(appConfig["startPosition"])) + + mp.set_start_method('spawn') + q = mp.Manager().Queue() + + t = threading.Thread(target=reporter,args=(appConfig,q)) + t.start() + + processList = [] + for loop in range(appConfig["numProcessingThreads"]): + p = mp.Process(target=change_stream_processor,args=(loop,appConfig,q)) + processList.append(p) + + for process in processList: + process.start() + + for process in processList: + process.join() + + t.join() + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/monitoring/README.md b/monitoring/README.md new file mode 100644 index 0000000..2f143d1 --- /dev/null +++ b/monitoring/README.md @@ -0,0 +1,6 @@ +# Amazon DocumentDB Monitoring Tools + +* [docdb-dashboarder](./docdb-dashboarder) - create a "starter" dashboard for a DocumentDB cluster. +* [docdb-stat](./docdb-stat) - display high level DocumentDB instance statistics. +* [documentdb-top](./documentdb-top) - display detailed DocumentDB collection level statistics. +* [gc-watchdog](./gc-watchdog) - track garbage collection activity to a file or CloudWatch metrics diff --git a/monitoring/custom-metrics/README.md b/monitoring/custom-metrics/README.md new file mode 100644 index 0000000..d261848 --- /dev/null +++ b/monitoring/custom-metrics/README.md @@ -0,0 +1,77 @@ +# Custom Metrics Tool +There are Amazon DocumentDB cluster limits that are not currently exposed as Amazon CloudWatch metrics. The **custom-metrics** tool connects to an Amazon DocumentDB cluster, collects the specified metrics, and publishes them as custom CloudWatch metrics. The following metrics can be collected by the **custom-metrics** tool: + +1. collection count (per cluster) +2. collection size (per collection) +3. database count (per cluster) +4. index count (per collection) +5. index size (per index) +6. user count (per cluster) + +CloudWatch metrics will be published to the following dimensions in the **CustomDocDB** namespace: + +1. **Cluster, Collection, Database, Index** - index size +2. **Cluster, Collection, Database** - collection size and index count +3. **Database** - collection count, database count, and user count + + + +------------------------------------------------------------------------------------------------------------------------ +## Requirements + +Python 3.x with modules: + +* boto3 - AWS SDK that allows management of AWS resources through Python +* pymongo - MongoDB driver for Python applications + +``` +pip install boto3 +pip install pymongo +``` + +Download the Amazon DocumentDB Certificate Authority (CA) certificate required to authenticate to your cluster: +``` +wget https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem +``` + +------------------------------------------------------------------------------------------------------------------------ +## Usage + +The tool accepts the following arguments: + +``` +python3 custom-metrics.py --help +usage: custom-metrics.py [-h] [--skip-python-version-check] --cluster_name + CLUSTER_NAME --uri URI --namespaces NAMESPACES + [--collection_count] [--database_count] + [--user_count] [--collection_size] [--index_count] + [--index_size] + +optional arguments: + -h, --help show this help message and exit + --skip-python-version-check + Permit execution on Python 3.6 and prior + --cluster_name CLUSTER_NAME + Name of cluster for Amazon CloudWatch custom metric + --uri URI Amazon DocumentDB Connection URI + --namespaces NAMESPACES + comma separated list of namespaces to monitor + --collection_count log cluster collection count + --database_count log cluster database count + --user_count log cluster user count + --collection_size log collection size + --index_count log collection index count + --index_size log collection index size +``` + +Examples of ```namespaces``` parameter: + +1. Specific namespace: ```"."``` +2. All collections in specific database: ```".*"``` +3. Specific collection in any database: ```"*."``` +4. All namespaces: ```"*.*"``` +5. Multiple namespaces: ```".*, *., ."``` + + + + diff --git a/monitoring/custom-metrics/custom-metrics.py b/monitoring/custom-metrics/custom-metrics.py new file mode 100644 index 0000000..36adedf --- /dev/null +++ b/monitoring/custom-metrics/custom-metrics.py @@ -0,0 +1,342 @@ +"""Python script to publish custom Amazon DocumentDB CloudWatch metrics.""" +import sys +import re +import logging +import argparse +import boto3 +import pymongo + +boto3.set_stream_logger(name='botocore.credentials', level=logging.ERROR) +logger = logging.getLogger() +logger.setLevel(logging.INFO) + +cloudWatchClient = boto3.client('cloudwatch') +namespaceRegex = re.compile(r".+\..+") +DATABASE_CLIENT = None + +def connect_to_docdb(app_config): + """Connect to Amazon DocumentDB cluster in specified secret.""" + global DATABASE_CLIENT + if DATABASE_CLIENT is None: + try: + DATABASE_CLIENT = pymongo.MongoClient(host=app_config['uri'], appname='customMetrics') + print('Successfully created new DocumentDB client.') + except pymongo.errors.ConnectionFailure as connection_failure: + print('An error occurred while connecting to DocumentDB: %s', connection_failure) + +def log_collection_size_metric(cluster_name, database_name, collection_name, collection_size): + """Create custom metric for collection size.""" + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[ + { + 'MetricName': 'CollectionSize', + 'Dimensions': [ + { + 'Name': 'Cluster', + 'Value': cluster_name + }, + { + 'Name': 'Database', + 'Value': database_name + }, + { + 'Name': 'Collection', + 'Value': collection_name + } + ], + 'Value': collection_size, + 'Unit': 'Bytes', + 'StorageResolution': 60 + } + ] + ) + +def log_index_count_metric(cluster_name, database_name, collection_name, index_count): + """Create custom metric for number of indexes in collection.""" + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[ + { + 'MetricName': 'IndexCount', + 'Dimensions': [ + { + 'Name': 'Cluster', + 'Value': cluster_name + }, + { + 'Name': 'Database', + 'Value': database_name + }, + { + 'Name': 'Collection', + 'Value': collection_name + } + ], + 'Value': index_count, + 'StorageResolution': 60 + } + ] + ) + +def log_index_size_metric(cluster_name, database_name, collection_name, index_name, index_size): + """Create custom metric for index size.""" + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[ + { + 'MetricName': 'IndexSize', + 'Dimensions': [ + { + 'Name': 'Cluster', + 'Value': cluster_name + }, + { + 'Name': 'Database', + 'Value': database_name + }, + { + 'Name': 'Collection', + 'Value': collection_name + }, + { + 'Name': 'Index', + 'Value': index_name + } + ], + 'Value': index_size, + 'Unit': 'Bytes', + 'StorageResolution': 60 + } + ] +) + +def log_number_of_databases_metric(cluster_name, number_of_databases): + """Create custom metric for number of databases in cluster.""" + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[ + { + 'MetricName': 'DatabaseCount', + 'Dimensions': [ + { + 'Name': 'Cluster', + 'Value': cluster_name + } + ], + 'Value': number_of_databases, + 'StorageResolution': 60 + } + ] + ) + +def log_number_of_collections_metric(cluster_name, collection_count): + """Create custom metric for number of collections in cluster.""" + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[ + { + 'MetricName': 'CollectionCount', + 'Dimensions': [ + { + 'Name': 'Cluster', + 'Value': cluster_name + } + ], + 'Value': collection_count, + 'StorageResolution': 60 + } + ] + ) + +def log_number_of_users_metric(cluster_name, number_of_users): + """Create custom metric for number of users in cluster.""" + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[ + { + 'MetricName': 'UserCount', + 'Dimensions': [ + { + 'Name': 'Cluster', + 'Value': cluster_name + } + ], + 'Value': number_of_users, + 'StorageResolution': 60 + } + ] + ) + +def monitor_namespace(database, collection, namespaces): + """Add this namespace to the list of namespaces to monitor.""" + namespace = f"{database}.{collection}" + if (namespace in namespaces) is False: + namespaces.append(namespace) + + return namespaces + +def log_custom_metrics(parameters): + """Determine which custom metrics to log and then log them.""" + connect_to_docdb(parameters) + database_names = DATABASE_CLIENT.list_database_names() + + if parameters['log_cluster_database_count'] is True: + log_number_of_databases_metric(parameters["cluster_name"], len(database_names) if len(database_names) > 0 else 0) + + if parameters["log_cluster_user_count"] is True: + number_of_users = 0 + if len(database_names) > 0: + database = DATABASE_CLIENT[database_names[0]] + number_of_users = len(database.command("usersInfo")["users"]) + + log_number_of_users_metric(parameters["cluster_name"], number_of_users) + + if (parameters["log_cluster_collection_count"] is True or + parameters["log_collection_size"] is True or + parameters["log_collection_index_count"] is True or + parameters["log_collection_index_size"] is True): + collections_by_database = {} + for database_name in database_names: + collections_by_database[database_name] = DATABASE_CLIENT[database_name].list_collection_names() + + if parameters["log_cluster_collection_count"] is True: + collection_count = 0 + for database_name in database_names: + collection_count += len(collections_by_database[database_name]) + + log_number_of_collections_metric(parameters["cluster_name"], collection_count) + + if (parameters["log_collection_size"] is True or + parameters["log_collection_index_count"] is True or + parameters["log_collection_index_size"] is True): + # build list of namespaces to monitor + namespaces_to_monitor = [] + for namespace in parameters["namespaces"]: + namespace = namespace.strip() + if namespaceRegex.match(namespace) is None: + logger.error("Skipping invalid namespace %s", namespace) + else: + # split namespace into database and collection + tokens = namespace.split(".") + database = tokens[0] + collection = tokens[1] + + if database == "*": + # all databases + for database_to_monitor in database_names: + if collection == "*": + # all collections in all databases + # add all namespaces returned by list_database_names() and list_collection_names() + for collection_to_monitor in collections_by_database[database_to_monitor]: + namespaces_to_monitor = monitor_namespace(database_to_monitor, collection_to_monitor, namespaces_to_monitor) + else: + # specific collection in all databases + # add namespace if collection exists in the database + if collection in collections_by_database[database_to_monitor]: + namespaces_to_monitor = monitor_namespace(database_to_monitor, collection, namespaces_to_monitor) + else: + database_to_monitor = database + if database_to_monitor in collections_by_database: + if collection == "*": + # all collections in a specific database + for collection_to_monitor in collections_by_database[database_to_monitor]: + namespaces_to_monitor = monitor_namespace(database_to_monitor, collection_to_monitor, namespaces_to_monitor) + else: + # specific collection in a specific database + if collection in collections_by_database[database_to_monitor]: + namespaces_to_monitor = monitor_namespace(database_to_monitor, collection, namespaces_to_monitor) + + for namespace in namespaces_to_monitor: + tokens = namespace.split(".") + database_name = tokens[0] + collection_name = tokens[1] + database = DATABASE_CLIENT[database_name] + collection_statistics = database.command("collStats", collection_name) + if parameters["log_collection_size"] is True: + log_collection_size_metric(parameters["cluster_name"], database_name, collection_name, collection_statistics["storageSize"]) + + if parameters["log_collection_index_count"] is True: + log_index_count_metric(parameters["cluster_name"], database_name, collection_name, collection_statistics["nindexes"]) + + if parameters["log_collection_index_size"] is True: + for index_name in collection_statistics["indexSizes"]: + log_index_size_metric(parameters["cluster_name"], database_name, collection_name, index_name, collection_statistics["indexSizes"][index_name]) + +def main(): + """custom_metrics script entry point.""" + parser = argparse.ArgumentParser() + parser.add_argument('--skip-python-version-check', + required=False, + action='store_true', + help='Permit execution on Python 3.6 and prior') + + parser.add_argument('--cluster_name', + required=True, + type=str, + help='Name of cluster for Amazon CloudWatch custom metric') + + parser.add_argument('--uri', + required=True, + type=str, + help='Amazon DocumentDB Connection URI') + + parser.add_argument('--namespaces', + required=True, + type=str, + help="comma separated list of namespaces to monitor") + + parser.add_argument('--collection_count', + action='store_true', + help="log cluster collection count") + + parser.add_argument('--database_count', + action='store_true', + help="log cluster database count") + + parser.add_argument('--user_count', + action='store_true', + help="log cluster user count") + + parser.add_argument('--collection_size', + action='store_true', + help="log collection size") + + parser.add_argument('--index_count', + action='store_true', + help="log collection index count") + + parser.add_argument('--index_size', + action='store_true', + help="log collection index size") + + args = parser.parse_args() + + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + if (args.collection_count is False and + args.database_count is False and + args.user_count is False and + args.collection_size is False and + args.index_count is False and + args.index_size is False): + print('Specify at least 1 metric to monitor.') + return + + app_config = {} + app_config['cluster_name'] = args.cluster_name + app_config['uri'] = args.uri + app_config['namespaces'] = args.namespaces.split(",") + app_config['log_cluster_collection_count'] = args.collection_count + app_config['log_cluster_database_count'] = args.database_count + app_config['log_cluster_user_count'] = args.user_count + app_config['log_collection_size'] = args.collection_size + app_config['log_collection_index_count'] = args.index_count + app_config['log_collection_index_size'] = args.index_size + + log_custom_metrics(app_config) + +if __name__ == "__main__": + main() diff --git a/monitoring/docdb-dashboarder/.gitignore b/monitoring/docdb-dashboarder/.gitignore new file mode 100644 index 0000000..c18dd8d --- /dev/null +++ b/monitoring/docdb-dashboarder/.gitignore @@ -0,0 +1 @@ +__pycache__/ diff --git a/monitoring/docdb-dashboarder/README.md b/monitoring/docdb-dashboarder/README.md index 951a305..a85ead2 100644 --- a/monitoring/docdb-dashboarder/README.md +++ b/monitoring/docdb-dashboarder/README.md @@ -1,15 +1,11 @@ # DocumentDB Dashboarder Tool -DocumentDB Dashboarder creates a CloudWatch monitoring dashboard for a DocumentDB cluster. Monitor your workload -and easily identify problems when dealing with slow performance and high cost consumption. +DocumentDB Dashboarder creates a CloudWatch monitoring dashboard for a DocumentDB cluster. ------------------------------------------------------------------------------------------------------------------------ ## Package Requirements -**boto3** - AWS SDK that allows management of aws resources through python - -**awscli** - Command line tools that allow access public APIs to manage AWS services - -**argparse** - Python library that allows for the use of command line arguments +* boto3 - AWS SDK that allows management of aws resources through python +* awscli - Command line tools that allow access public APIs to manage AWS services ------------------------------------------------------------------------------------------------------------------------ ## Installing Packages @@ -18,7 +14,6 @@ and easily identify problems when dealing with slow performance and high cost c ``` pip install boto3 pip install awscli -pip install argparse ``` ------------------------------------------------------------------------------------------------------------------------ ## IAM User Creation and Setup @@ -26,22 +21,12 @@ pip install argparse **Note: If you already have an existing IAM user for DocDB, associate the roles in step 4 and can move on to the next section "Configure your AWS Credentials"** - 1. Open IAM Service in your AWS Management Console - - 2. Select the "Users" tab using the toolbar on the left side of your screen - - 3. Create a new user and under "Select AWS Access Type" choose "Access Key - Programmatic Access" and click next. Be sure to save this access key for later on. - - 4. Associate the following permissions for your IAM User - CloudWatchFullAccess, AmazonDocDBReadOnlyAccess - - 5. Complete the user creation and save the csv file with your access key and secret access key in a safe place - _Congratulations you have successfully set up your IAM User to interact with CloudWatch and DocumentDB!_ ------------------------------------------------------------------------------------------------------------------------ @@ -88,4 +73,11 @@ cd ``` ``` python create-docdb-dashboard.py --name --region --clusterID + +``` +### Only include one of the below metrics. If migrating using migrator script, include --monitor-migration. If migrating using DMS, include --monitor-dms following by --dms-task-id +``` +optionally include --monitor-migration to add full load / cdc metrics from migrator tool to the dashboard + +optionally include --monitor-dms --dms-task-id (dms task identifier) to monitor DMS task during migration ``` diff --git a/monitoring/docdb-dashboarder/create-docdb-dashboard.py b/monitoring/docdb-dashboarder/create-docdb-dashboard.py index e6fa707..74608e2 100644 --- a/monitoring/docdb-dashboarder/create-docdb-dashboard.py +++ b/monitoring/docdb-dashboarder/create-docdb-dashboard.py @@ -4,82 +4,243 @@ import widgets as w -# Checking to see if widget metric requires are cluster level or instance level. -# If the metric is instance level, associate all instances for instance level metrics -def add_metric(widJson, widgets, region, instance, cluster): - for widget in widgets: - widget["properties"]['region'] = region - if 'metrics' in widget["properties"]: - if 'DBInstanceIdentifier' in instance: - for i, DBInstanceIdentifier in enumerate(instance): - widget["properties"]["metrics"][i].append(DBInstanceIdentifier['DBInstanceIdentifier']) - else: - widget["properties"]["metrics"][0].append(cluster) +def create_dashboard(widgets, region, instanceList, clusterList, monitoring_type=None): + tempWidgets = [] + widthX = 24 - widJson["widgets"].append(widget) + dashboardY = 0 + for thisRow in widgets: + dashboardX = 0 + incrementX = widthX // len(thisRow['panels']) + + for widget in thisRow['panels']: + widget["properties"]['region'] = region + widget["height"] = thisRow["height"] + widget["width"] = incrementX + widget["x"] = dashboardX + widget["y"] = dashboardY + + if 'metrics' in widget["properties"]: + if monitoring_type == 'dms' and 'AWS/DMS' in widget["properties"]["metrics"][0]: + # DMS metrics already have their task IDs set + pass + elif 'DBInstanceIdentifier' in widget["properties"]["metrics"][0]: + for i, DBInstanceIdentifier in enumerate(instanceList): + if DBInstanceIdentifier['IsClusterWriter']: + instanceType = '|PRIMARY' + else: + instanceType = '|REPLICA' + + if (i == 0): + widget["properties"]["metrics"][i].append(DBInstanceIdentifier['DBInstanceIdentifier']) + widget["properties"]["metrics"][i].append({"label":DBInstanceIdentifier['DBInstanceIdentifier']+instanceType}) + else: + widget["properties"]["metrics"].append([".",".",".",DBInstanceIdentifier['DBInstanceIdentifier'],{"label":DBInstanceIdentifier['DBInstanceIdentifier']+instanceType}]) + + else: + # Check if this is a CustomDocDB metric with Cluster dimension + is_custom_metric = ('CustomDocDB' in widget["properties"]["metrics"][0][0]) + + for i, DBClusterIdentifier in enumerate(clusterList): + if (i == 0): + widget["properties"]["metrics"][i].append(DBClusterIdentifier) + widget["properties"]["metrics"][i].append({"label":DBClusterIdentifier}) + else: + if is_custom_metric: + widget["properties"]["metrics"].append([".",".",".",".",DBClusterIdentifier,{"label":DBClusterIdentifier}]) + else: + widget["properties"]["metrics"].append([".",".",".",DBClusterIdentifier,{"label":DBClusterIdentifier}]) + + tempWidgets.append(widget) + dashboardX += incrementX + + dashboardY += thisRow["height"] + + return tempWidgets # Main method def main(): # Command line arguments for user to pass parser = argparse.ArgumentParser() - parser.add_argument('--name', type=str, required=True) - parser.add_argument('--region', type=str, required=True) - parser.add_argument('--clusterID', type=str, required=True) + parser.add_argument('--name', type=str, required=True, help="Name of CloudWatch dashboard to create") + parser.add_argument('--region', type=str, required=True, help="Region of Amazon DocumentDB cluster(s)") + parser.add_argument('--clusterID', type=str, required=True, help="Single Amazon DocumentDB cluster ID or comma separated list of cluster IDs") + parser.add_argument('--monitor-migration',required=False,action='store_true',help='Include MongoDB to DocumentDB migration metrics') + parser.add_argument('--endpoint-url',type=str,required=False,help='Override default endpoint URL') + parser.add_argument('--monitor-dms', required=False, action='store_true', help='Include AWS DMS task metrics') + parser.add_argument('--dms-task-id', type=str, required=False, help='DMS Replication Task ID') args = parser.parse_args() + if args.monitor_migration and args.monitor_dms: + print("\u2717 Error: Only one monitoring option can be selected. Use either --monitor-migration OR --monitor-dms, not both.") + return + + # Validate DMS task ID is provided when monitoring DMS + if args.monitor_dms and not args.dms_task_id: + print("\u2717 Error: --dms-task-id is required when using --monitor-dms") + return # DocumentDB Configurations - docdbclient = boto3.client('docdb', region_name=args.region) + if args.endpoint_url is not None: + docdbclient = boto3.client('docdb', region_name=args.region, endpoint_url=args.endpoint_url) + else: + docdbclient = boto3.client('docdb', region_name=args.region) - response = docdbclient.describe_db_clusters(DBClusterIdentifier=args.clusterID, - Filters=[ - {'Name': 'engine', - 'Values': ['docdb'] - }, - ], - ) + burstableInstances = ["db.t3.medium", "db.t4g.medium"] + nvmeInstances = ['db.r6gd.xlarge','db.r6gd.2xlarge','db.r6gd.4xlarge','db.r6gd.8xlarge','db.r6gd.12xlarge','db.r6gd.16xlarge'] + serverlessInstances = ['db.serverless'] - instanceID = response["DBClusters"][0]["DBClusterMembers"] + clusterList = args.clusterID.split(',') + instanceList = [] + foundBurstable = False + foundNvme = False + foundServerless = False + for thisCluster in clusterList: + response = docdbclient.describe_db_clusters(DBClusterIdentifier=thisCluster,Filters=[{'Name': 'engine','Values': ['docdb']}]) + for thisInstance in response["DBClusters"][0]["DBClusterMembers"]: + instanceList.append(thisInstance) + dbInstanceResponse = docdbclient.describe_db_instances(DBInstanceIdentifier=thisInstance["DBInstanceIdentifier"]) + if dbInstanceResponse["DBInstances"][0]["DBInstanceClass"] in burstableInstances: + foundBurstable = True + if dbInstanceResponse["DBInstances"][0]["DBInstanceClass"] in nvmeInstances: + foundNvme = True + if dbInstanceResponse["DBInstances"][0]["DBInstanceClass"] in serverlessInstances: + foundServerless = True # CloudWatch client client = boto3.client('cloudwatch', region_name=args.region) # All widgets to be displayed on the dashboard widgets = [ - w.ClusterHeading, - w.DBClusterReplicaLagMaximum, - w.DatabaseCursorsTimedOut, - w.VolumeWriteIOPS, - w.VolumeReadIOPS, - w.Opscounter, - w.InstanceHeading, - w.CPUUtilization, - w.IndexBufferCacheHitRatio, - w.BufferCacheHitRatio, - w.DatabaseCursors, - w.DatabaseConnections, - w.FreeableMemory, - w.DocsInserted, - w.DocsDeleted, - w.DocsUpdated, - w.DocsReturned, - w.BackupStorageHeading, - w.BackupRetentionPeriodStorageUsed, - w.TotalBackupStorageBilled, - w.VolumeBytesUsed, - w.metricHelp, - w.bestPractices + {"height":1,"panels":[w.ClusterHeading]}, + {"height":7,"panels":[w.DBClusterReplicaLagMaximum,w.DatabaseCursorsTimedOut,w.VolumeWriteIOPS,w.VolumeReadIOPS]}, + {"height":7,"panels":[w.OpscountersInsert,w.OpscountersUpdate,w.OpscountersDelete,w.OpscountersQuery]}, + {"height":1,"panels":[w.InstanceHeading]}, + {"height":7,"panels":[w.CPUUtilization,w.DatabaseConnections,w.DatabaseCursors]}, + {"height":7,"panels":[w.BufferCacheHitRatio,w.IndexBufferCacheHitRatio,w.FreeableMemory,w.FreeLocalStorage]}, + {"height":7,"panels":[w.NetworkTransmitThroughput,w.NetworkReceiveThroughput,w.StorageNetworkTransmitThroughput,w.StorageNetworkReceiveThroughput]}, + {"height":7,"panels":[w.DocsInserted,w.DocsDeleted,w.DocsUpdated,w.DocsReturned]}, + {"height":7,"panels":[w.ReadLatency,w.WriteLatency,w.DiskQueueDepth,w.DBInstanceReplicaLag]}, + {"height":7,"panels":[w.WriteIops,w.WriteThroughput,w.ReadIops,w.ReadThroughput]}, ] - # Deploy metrics - add_metric(w.widget_json, widgets, args.region, instanceID, args.clusterID) - # Converting python to json - dashBody = json.dumps(w.widget_json) + # NVMe Metrics + if foundNvme: + print("{}".format("\u2713 Adding NVMe-backed instance metrics")) + widgets.append({"height":1,"panels":[w.NVMeHeading]}) + widgets.append({"height":7,"panels":[w.FreeNVMeStorage,w.NVMeStorageCacheHitRatio]}) + widgets.append({"height":7,"panels":[w.ReadIopsNVMeStorage,w.ReadLatencyNVMeStorage,w.ReadThroughputNVMeStorage]}) + widgets.append({"height":7,"panels":[w.WriteIopsNVMeStorage,w.WriteLatencyNVMeStorage,w.WriteThroughputNVMeStorage]}) + # Serverless Metrics + if foundServerless: + print("{}".format("\u2713 Adding serverless instance metrics")) + widgets.append({"height":1,"panels":[w.ServerlessHeading]}) + widgets.append({"height":7,"panels":[w.ServerlessDatabaseCapacity,w.DCUUtilization,w.TempStorageIops,w.TempStorageThroughput]}) + + # Burstable Metrics + if foundBurstable: + print("{}".format("\u2713 Adding burstable instance metrics")) + widgets.append({"height":1,"panels":[w.BurstableHeading]}) + widgets.append({"height":7,"panels":[w.CPUCreditUsage,w.CPUCreditBalance,w.CPUSurplusCreditsCharged,w.CPUSurplusCreditBalance]}) + + # Determine monitoring type + monitoring_type = None + if args.monitor_migration: + monitoring_type = 'migration' + print("{}".format("\u2713 Adding MongoDB to DocumentDB Migration Monitoring metrics")) + widgets.append({"height":1,"panels":[w.MigrationMonitoringHeading]}) + widgets.append({"height":1,"panels":[w.FullLoadMigrationHeading]}) + widgets.append({"height":7,"panels":[w.MigratorFLInsertsPerSecond,w.MigratorFLRemainingSeconds]}) + widgets.append({"height":1,"panels":[w.CDCReplicationHeading]}) + widgets.append({"height":7,"panels":[w.MigratorCDCNumSecondsBehind,w.MigratorCDCOperationsPerSecond]}) + + elif args.monitor_dms: + monitoring_type = 'dms' + print("{}".format("\u2713 Adding AWS DMS Task metrics")) + # Get the task ID + task_id = args.dms_task_id + # Retrieve DMS task information and update widgets with task and instance identifiers + update_dms_widgets(task_id, args.region, w) + # Add DMS widgets to dashboard + widgets.append({"height":1,"panels":[w.DMSHeading]}) + widgets.append({"height":7,"panels":[w.DMSFullLoadThroughputRowsTarget,w.DMSCDCLatencyTarget, w.DMSCDCThroughputRowsTarget]}) + + # Backups + widgets.append({"height":1,"panels":[w.BackupStorageHeading]}) + widgets.append({"height":7,"panels":[w.VolumeBytesUsed,w.BackupRetentionPeriodStorageUsed,w.TotalBackupStorageBilled]}) + + # Create the CW data + dashboardWidgets = create_dashboard(widgets, args.region, instanceList, clusterList, monitoring_type=monitoring_type) + # Converting to json + dashBody = json.dumps({"widgets":dashboardWidgets}) # Create dashboard client.put_dashboard(DashboardName=args.name, DashboardBody=dashBody) - print("Your dashboard has been deployed! Proceed to CloudWatch to view your dashboard") + print("\u2713 Dashboard {} deployed to CloudWatch".format(args.name)) + + +def update_dms_widgets(task_id, region, w): + """ + Retrieve DMS task information and update widgets with task and instance identifiers. + + Args: + task_id: The DMS task ID + region: AWS region + w: Widget definitions module + """ + try: + dms_client = boto3.client('dms', region_name=region) + response = dms_client.describe_replication_tasks( + Filters=[ + { + 'Name': 'replication-task-id', + 'Values': [task_id] + } + ] + ) + if response['ReplicationTasks']: + # Get the full ARN of the task + task_arn = response['ReplicationTasks'][0]['ReplicationTaskArn'] + # Extract the task ID from the ARN (last part) + task_id = task_arn.split(':')[-1] + + # Get the replication instance ARN + replication_instance_arn = response['ReplicationTasks'][0]['ReplicationInstanceArn'] + + # Try to get all replication instances and find the one with matching ARN + try: + all_instances = dms_client.describe_replication_instances() + instance_name = None + + for instance in all_instances.get('ReplicationInstances', []): + if instance.get('ReplicationInstanceArn') == replication_instance_arn: + instance_name = instance.get('ReplicationInstanceIdentifier') + break + + if instance_name: + # Update the widgets with task ID and instance name + for widget in [w.DMSFullLoadThroughputRowsTarget, w.DMSCDCLatencyTarget, w.DMSCDCThroughputRowsTarget]: + for i, metric in enumerate(widget["properties"]["metrics"]): + # Update task ID + task_id_index = metric.index("ReplicationTaskIdentifier") + 1 if "ReplicationTaskIdentifier" in metric else -1 + if task_id_index != -1: + widget["properties"]["metrics"][i][task_id_index] = task_id + + # Update instance name + instance_id_index = metric.index("ReplicationInstanceIdentifier") + 1 if "ReplicationInstanceIdentifier" in metric else -1 + if instance_id_index != -1: + widget["properties"]["metrics"][i][instance_id_index] = instance_name + else: + print("\u2717 Warning: Could not find replication instance name. Using instance ID from ARN.") + + except Exception as e: + print(f"\u2717 Error getting replication instances: {str(e)}. Using instance ID from ARN.") + else: + print("\u2717 Warning: Could not find DMS task with ID '{}'.".format(task_id)) + + except Exception as e: + print("\u2717 Error retrieving DMS task: {}".format(str(e))) if __name__ == "__main__": diff --git a/monitoring/docdb-dashboarder/widgets.py b/monitoring/docdb-dashboarder/widgets.py index 2ff6379..69f2f8b 100644 --- a/monitoring/docdb-dashboarder/widgets.py +++ b/monitoring/docdb-dashboarder/widgets.py @@ -1,377 +1,751 @@ -# Widget Dictionary -widget_json = { - "widgets": [] -} - # CLUSTER LEVEL METRICS ClusterHeading = { - "height": 1, - "width": 24, - "y": 0, - "x": 0, "type": "text", - "properties": { - "markdown": "# Cluster Level Metrics" - } + "properties": {"markdown": "# Cluster Level Metrics"} } + +# ---------------------------------------------- DBClusterReplicaLagMaximum = { - "height": 7, - "width": 12, - "y": 3, - "x": 0, "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "DBClusterReplicaLagMaximum", "DBClusterIdentifier"] - ], - + "metrics": [["AWS/DocDB", "DBClusterReplicaLagMaximum", "DBClusterIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "DBClusterReplicaLagMaximum" } } DatabaseCursorsTimedOut = { - "height": 7, - "width": 12, - "y": 3, - "x": 12, "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "DatabaseCursorsTimedOut", "DBClusterIdentifier"] - ], - "period": 300 + "metrics": [["AWS/DocDB", "DatabaseCursorsTimedOut", "DBClusterIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, + "title": "DatabaseCursorsTimedOut" } } VolumeWriteIOPS = { - "height": 7, - "width": 6, - "y": 12, - "x": 0, "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "VolumeWriteIOPs", "DBClusterIdentifier"] - ], + "metrics": [["AWS/DocDB", "VolumeWriteIOPs", "DBClusterIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "VolumeWriteIOPS" } } VolumeReadIOPS = { - "height": 7, - "width": 6, - "y": 12, - "x": 6, "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "VolumeReadIOPs", "DBClusterIdentifier"] - ], + "metrics": [["AWS/DocDB", "VolumeReadIOPs", "DBClusterIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "VolumeReadIOPS" } } -Opscounter = { - "height": 7, - "width": 12, - "y": 12, - "x": 12, + +# ---------------------------------------------- +OpscountersInsert = { "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "OpcountersInsert", "DBClusterIdentifier"], - [".", "OpcountersDelete", ".", "."], - [".", "OpcountersUpdate", ".", "."], - [".", "OpcountersQuery", ".", "."] - ], + "metrics": [["AWS/DocDB", "OpcountersInsert", "DBClusterIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, + "title": "OpcountersInsert" + } +} +OpscountersUpdate = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "OpcountersUpdate", "DBClusterIdentifier"]], "period": 300, - "title": "Opcounters" + "yAxis": {"left": {"min": 0}}, + "title": "OpcountersUpdate" } } +OpscountersDelete = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "OpcountersDelete", "DBClusterIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, + "title": "OpcountersDelete" + } +} +OpscountersQuery = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "OpcountersQuery", "DBClusterIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, + "title": "OpcountersQuery" + } +} + # INSTANCE LEVEL METRICS InstanceHeading = { - "height": 1, - "width": 24, - "y": 20, - "x": 0, "type": "text", - "properties": { - "markdown": "# Instance Level Metrics" - } + "properties": {"markdown": "# Instance Level Metrics"} } + CPUUtilization = { - "height": 7, - "width": 8, - "y": 21, - "x": 0, "type": "metric", "properties": { - "metrics": [ - ["AWS/DocDB", "CPUUtilization", "DBInstanceIdentifier"], - ["..."], - ["..."] - ], + "metrics": [["AWS/DocDB", "CPUUtilization", "DBInstanceIdentifier"]], "view": "timeSeries", "stacked": False, "title": "CPU Utilization", "period": 300, "stat": "Average", - "yAxis": { - "left": { - "max": 100, - "min": 0 - } - } + "yAxis": {"left": {"max": 100,"min": 0}} } } -IndexBufferCacheHitRatio = { - "height": 7, - "width": 8, - "y": 30, - "x": 8, +DatabaseConnections = { "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "IndexBufferCacheHitRatio", "DBInstanceIdentifier"], - ["..."], - ["..."] - ], + "yAxis": {"left": {"min": 0}}, + "metrics": [["AWS/DocDB", "DatabaseConnections", "DBInstanceIdentifier"]], + } +} +DatabaseCursors = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "yAxis": {"left": {"min": 0}}, + "metrics": [["AWS/DocDB", "DatabaseCursors", "DBInstanceIdentifier"]], + } +} + +# ---------------------------------------------- +BufferCacheHitRatio = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "BufferCacheHitRatio", "DBInstanceIdentifier"]], "period": 300, - "yAxis": { - "left": { - "max": 100, - "min": 0 - } - }, - "title": "Index Buffer Cache Hit Ratio" + "yAxis": {"left": {"max": 100,"min": 0}}, + "title": "Buffer Cache Hit Ratio" } } -BufferCacheHitRatio = { - "height": 7, - "width": 8, - "y": 30, - "x": 0, +IndexBufferCacheHitRatio = { "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "BufferCacheHitRatio", "DBInstanceIdentifier"], - ["..."], - ["..."] - ], + "metrics": [["AWS/DocDB", "IndexBufferCacheHitRatio", "DBInstanceIdentifier"]], "period": 300, - "yAxis": { - "left": { - "max": 100, - "min": 0 - } - }, - "title": "Buffer Cache Hit Ratio" + "yAxis": {"left": {"max": 100,"min": 0}}, + "title": "Index Buffer Cache Hit Ratio" } } -DatabaseCursors = { - "height": 7, - "width": 8, - "y": 21, - "x": 16, +FreeableMemory = { "type": "metric", "properties": { + "sparkline": True, "view": "timeSeries", + "metrics": [["AWS/DocDB", "FreeableMemory", "DBInstanceIdentifier"]], + "title": "Freeable Memory", + "period": 300, "stacked": False, - "metrics": [ - ["AWS/DocDB", "DatabaseCursors", "DBInstanceIdentifier"], - ["..."], - ["..."] - ], - } -} -DatabaseConnections = { - "height": 7, - "width": 8, - "y": 21, - "x": 8, + "yAxis": {"left": {"min": 0}} + } + } +FreeLocalStorage = { "type": "metric", "properties": { + "sparkline": True, "view": "timeSeries", + "metrics": [["AWS/DocDB", "FreeLocalStorage", "DBInstanceIdentifier"]], + "title": "Free Local Storage", + "period": 300, "stacked": False, - "metrics": [ - ["AWS/DocDB", "DatabaseConnections", "DBInstanceIdentifier"], - ["..."], - ["..."] - ], - } -} -FreeableMemory = { - "height": 7, - "width": 8, - "y": 30, - "x": 16, + "yAxis": {"left": {"min": 0}} + } + } + +# ---------------------------------------------- +NetworkTransmitThroughput = { "type": "metric", "properties": { - "sparkline": True, "view": "timeSeries", + "stacked": False, "metrics": [ - ["AWS/DocDB", "FreeableMemory", "DBInstanceIdentifier"], - ["..."], - ["..."] + ["AWS/DocDB", "NetworkTransmitThroughput", "DBInstanceIdentifier"] ], - "title": "Freeable Memory", "period": 300, - "stacked": False, - "yAxis": { - "left": { - "min": 0 - } - } + "yAxis": {"left": {"min": 0}}, + "title": "Network Transmit Throughput" } } -DocsInserted = { - "height": 6, - "width": 6, - "y": 38, - "x": 0, - "type": "metric", - "properties": { - "view": "timeSeries", - "stacked": False, - "metrics": [ - ["AWS/DocDB", "DocumentsInserted", "DBInstanceIdentifier"], - ["..."], - ["..."] - ], - "title": "Documents Inserted" +NetworkReceiveThroughput = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "NetworkReceiveThroughput", "DBInstanceIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, + "title": "Network Receive Throughput" } } -DocsDeleted = { - "height": 6, - "width": 6, - "y": 38, - "x": 6, - "type": "metric", - "properties": { - "view": "timeSeries", - "stacked": False, - "metrics": [ - ["AWS/DocDB", "DocumentsDeleted", "DBInstanceIdentifier"], - ["..."], - ["..."] - ], - "title": "Documents Deleted" +StorageNetworkTransmitThroughput = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "StorageNetworkTransmitThroughput", "DBInstanceIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, + "title": "Storage Network Transmit Throughput" } } -DocsUpdated = { - "height": 6, - "width": 6, - "y": 38, - "x": 12, - "type": "metric", - "properties": { - "view": "timeSeries", - "stacked": False, - "metrics": [ - ["AWS/DocDB", "DocumentsUpdated", "DBInstanceIdentifier"], - ["..."], - ["..."] - ], - "title": "Documents Updated" +StorageNetworkReceiveThroughput = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "StorageNetworkReceiveThroughput", "DBInstanceIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, + "title": "Storage Network Receive Throughput" } } + +# ---------------------------------------------- +DocsInserted = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "DocumentsInserted", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Documents Inserted" + } +} +DocsDeleted = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "DocumentsDeleted", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Documents Deleted" + } +} +DocsUpdated = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "DocumentsUpdated", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Documents Updated" + } +} DocsReturned = { - "height": 6, - "width": 6, - "y": 38, - "x": 18, - "type": "metric", - "properties": { - "view": "timeSeries", - "stacked": False, - "metrics": [ - ["AWS/DocDB", "DocumentsReturned", "DBInstanceIdentifier"], - ["..."], - ["..."] - ], - "title": "Documents Returned" - } + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "DocumentsReturned", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Documents Returned" + } +} + +# ---------------------------------------------- +ReadLatency = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "ReadLatency", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Read Latency" + } +} +WriteLatency = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "WriteLatency", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Write Latency" + } +} +DiskQueueDepth = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "DiskQueueDepth", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Disk Queue Depth" + } +} +DBInstanceReplicaLag = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "DBInstanceReplicaLag", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Replica Lag" + } +} + +# ---------------------------------------------- +WriteIops = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "WriteIOPS", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Write IOPs" + } +} +WriteThroughput = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "WriteThroughput", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Write Throughput" + } +} +ReadIops = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "ReadIOPS", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Read IOPs" + } +} +ReadThroughput = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "ReadThroughput", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Read Throughput" + } } # BACKUP AND STORAGE METRICS BackupStorageHeading = { - "height": 1, - "width": 24, - "y": 44, - "x": 0, "type": "text", + "properties": {"markdown": "# Backup and Storage Metrics"} +} + +# ---------------------------------------------- +VolumeBytesUsed = { + "type": "metric", "properties": { - "markdown": "# Backup and Storage Metrics" + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "VolumeBytesUsed", "DBClusterIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "VolumeBytesUsed" } } BackupRetentionPeriodStorageUsed = { - "height": 7, - "width": 8, - "y": 45, - "x": 8, "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "BackupRetentionPeriodStorageUsed", "DBClusterIdentifier"] - ], + "metrics": [["AWS/DocDB", "BackupRetentionPeriodStorageUsed", "DBClusterIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "BackupRetentionPeriodStorageUsed" } } TotalBackupStorageBilled = { - "height": 7, - "width": 8, - "y": 45, - "x": 16, "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "TotalBackupStorageBilled", "DBClusterIdentifier"] - ], + "metrics": [["AWS/DocDB", "TotalBackupStorageBilled", "DBClusterIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "TotalBackupStorageBilled" } } -VolumeBytesUsed = { - "height": 7, - "width": 8, - "y": 45, - "x": 0, + +# NVME 1 ---------------------------------------------- +NVMeHeading = { + "type": "text", + "properties": {"markdown": "# NVMe-Backed Instances"} +} +# NVME 2 ---------------------------------------------- +FreeNVMeStorage = { "type": "metric", "properties": { "view": "timeSeries", "stacked": False, - "metrics": [ - ["AWS/DocDB", "VolumeBytesUsed", "DBClusterIdentifier"] - ], + "metrics": [["AWS/DocDB", "FreeNVMeStorage", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Free NVMe Storage" + } +} +NVMeStorageCacheHitRatio = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "NVMeStorageCacheHitRatio", "DBInstanceIdentifier"]], + "title": "NVMe Storage Cache Hit Ratio", + "yAxis": {"left": {"max": 100,"min": 0}} + } +} +# NVME 3 ---------------------------------------------- +ReadIopsNVMeStorage = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "ReadIOPSNVMeStorage", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Read IOPs NVMe Storage" + } +} +ReadLatencyNVMeStorage = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "ReadLatencyNVMeStorage", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Read Latency NVMe Storage" + } +} +ReadThroughputNVMeStorage = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "ReadThroughputNVMeStorage", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Read Throughput NVMe Storage" + } +} +# NVME 4 ---------------------------------------------- +WriteIopsNVMeStorage = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "WriteIOPSNVMeStorage", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Write IOPs NVMe Storage" + } +} +WriteLatencyNVMeStorage = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "WriteLatencyNVMeStorage", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Write Latency NVMe Storage" + } +} +WriteThroughputNVMeStorage = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "WriteThroughputNVMeStorage", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Write Throughput NVMe Storage" } } -# ADDITIONAL HELP METRICS -metricHelp = { - "height": 2, - "width": 12, - "y": 0, - "x": 0, - "type": "text", - "properties": { - "markdown": "### Metrics Overview\nLearn more about metric information by visiting the Amazon DocumentDB Metrics section [here](https://docs.aws.amazon.com/documentdb/latest/developerguide/cloud_watch.html#cloud_watch-metrics_list)\n" - } +# ---------------------------------------------- +# MongoDB to DocumentDB Migration Monitoring Widgets +# ---------------------------------------------- + +# Migration Monitoring Heading +MigrationMonitoringHeading = { + "type": "text", + "properties": {"markdown": "# MongoDB to DocumentDB Migration Monitoring"} +} + +# Full Load Migration Metrics +# ---------------------------------------------- +FullLoadMigrationHeading = { + "type": "text", + "properties": {"markdown": "## Full Load Migration Metrics"} +} + +MigratorFLInsertsPerSecond = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["CustomDocDB", "MigratorFLInsertsPerSecond", "Cluster", "DBClusterIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, # No max to allow auto-scaling + "title": "Migration Operations Per Second", + "annotations": { + "horizontal": [ + { + "label": "High Throughput", + "value": 1000, + "color": "#2ca02c" + }, + { + "label": "Medium Throughput", + "value": 500, + "color": "#ffbb78" + } + ] } -bestPractices = { - "height": 2, - "width": 12, - "y": 0, - "x": 12, - "type": "text", - "properties": { - "markdown": "### DocumentDB Specialist Optimization Tips\nLearn how to optimize your workload by visiting the DocDB Specialist recommended guidelines [here](https://docs.aws.amazon.com/documentdb/latest/developerguide/best_practices.html)" - } + } +} + + +MigratorFLRemainingSeconds = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["CustomDocDB", "MigratorFLRemainingSeconds", "Cluster", "DBClusterIdentifier"]], + "period": 300, + "title": "Remaining Time (seconds)", + "stat": "Average", + "yAxis": {"left": {"min": 0}} + } +} + + + +# CDC Replication Metrics +# ---------------------------------------------- +CDCReplicationHeading = { + "type": "text", + "properties": {"markdown": "## CDC Replication Metrics"} +} + +MigratorCDCOperationsPerSecond = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["CustomDocDB", "MigratorCDCOperationsPerSecond", "Cluster", "DBClusterIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, # Removed max to allow auto-scaling + "title": "CDC Operations Per Second", + "annotations": { + "horizontal": [ + { + "label": "High Throughput", + "value": 500, + "color": "#2ca02c" + }, + { + "label": "Medium Throughput", + "value": 100, + "color": "#ffbb78" + } + ] } + } +} + +MigratorCDCNumSecondsBehind = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["CustomDocDB", "MigratorCDCNumSecondsBehind", "Cluster", "DBClusterIdentifier"]], + "period": 300, + "yAxis": {"left": {"min": 0}}, # No max to allow auto-scaling + "title": "CDC Replication Lag (seconds)", + "annotations": { + "horizontal": [ + { + "label": "Critical Lag", + "value": 3600, + "color": "#ff9896" + }, + { + "label": "High Lag", + "value": 900, + "color": "#ffbb78" + }, + { + "label": "Moderate Lag", + "value": 300, + "color": "#98df8a" + }, + { + "label": "Low Lag", + "value": 60, + "color": "#2ca02c" + } + ] + } + } +} + + + + +# ---------------------------------------------- +# DMS Task Metrics +# ---------------------------------------------- + +DMSHeading = { + "type": "text", + "properties": {"markdown": "# AWS DMS Task Metrics"} +} + +DMSFullLoadThroughputRowsTarget = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DMS", "FullLoadThroughputRowsTarget", "ReplicationInstanceIdentifier", "instance_id", "ReplicationTaskIdentifier", "TASK_ID"]], + "period": 60, + "yAxis": {"left": {"min": 0}}, + "title": "Full Load Throughput Rows Target" + } +} + +DMSCDCLatencyTarget = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DMS", "CDCLatencyTarget","ReplicationInstanceIdentifier", "instance_id", "ReplicationTaskIdentifier", "TASK_ID"]], + "period": 60, + "yAxis": {"left": {"min": 0}}, + "title": "CDC Latency Target (seconds)" + } +} + +DMSCDCThroughputRowsTarget = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DMS", "CDCThroughputRowsTarget", "ReplicationInstanceIdentifier", "instance_id","ReplicationTaskIdentifier", "TASK_ID"]], + "period": 60, + "yAxis": {"left": {"min": 0}}, + "title": "CDC Throughput Rows Target" + } +} + + +# Serverless 1 ---------------------------------------------- +ServerlessHeading = { + "type": "text", + "properties": {"markdown": "# Serverless Metrics"} +} +# Serverless 2 ---------------------------------------------- +ServerlessDatabaseCapacity = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "ServerlessDatabaseCapacity", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Serverless Database Capacity" + } +} +DCUUtilization = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "DCUUtilization", "DBInstanceIdentifier"]], + "title": "DCU Utilization", + "yAxis": {"left": {"max": 100,"min": 0}} + } +} +# Serverless 3 ---------------------------------------------- +TempStorageIops = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "TempStorageIops", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "Temp Storage Iops" + } +} +TempStorageThroughput = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "TempStorageThroughput", "DBInstanceIdentifier"]], + "title": "Temp Storage Throughput", + "yAxis": {"left": {"min": 0}} + } +} + + +# Burstable 1 ---------------------------------------------- +BurstableHeading = { + "type": "text", + "properties": {"markdown": "# Burstable Instances"} +} +# Burstable 2 ---------------------------------------------- +CPUCreditUsage = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "CPUCreditUsage", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "CPU Credits Used" + } +} +CPUCreditBalance = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "CPUCreditBalance", "DBInstanceIdentifier"]], + "title": "CPU Credit Balance" + } +} +CPUSurplusCreditsCharged = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "CPUSurplusCreditsCharged", "DBInstanceIdentifier"]], + "yAxis": {"left": {"min": 0}}, + "title": "CPU Credits Charged" + } +} +CPUSurplusCreditBalance = { + "type": "metric", + "properties": { + "view": "timeSeries", + "stacked": False, + "metrics": [["AWS/DocDB", "CPUSurplusCreditBalance", "DBInstanceIdentifier"]], + "title": "CPU Surplus Credit Balance" + } +} diff --git a/monitoring/docdb-stat/README.md b/monitoring/docdb-stat/README.md new file mode 100644 index 0000000..05a1d76 --- /dev/null +++ b/monitoring/docdb-stat/README.md @@ -0,0 +1,52 @@ +# Real-time Amazon DocumentDB server stats monitoring tool. + +The **docdbstat** tool connects to a compute instance and continuously fetches real-time metrics by polling `db.serverStatus()` at a configurable interval (defaults to 1 sec). + + +## Requirements + +- Python 3.x with modules: + - Pymongo + - Pandas +``` +pip3 install pymongo pandas +``` + +- Download the Amazon DocumentDB Certificate Authority (CA) certificate required to authenticate to your instance +``` +wget https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem +``` + +## Usage +The tools accepts the following arguments: + +``` +# python3 docdbstat.py --help +usage: docdbstat.py [-h] --uri URI [-i INTERVAL] [-hi HEADER_INTERVAL] [-f FIELD] + +Real-time Amazon DocumentDB server stats monitoring tool. + +options: + -h, --help show this help message and exit + --uri URI DocumentDB connection URI. + -i INTERVAL, --interval INTERVAL + Polling interval in seconds (Default: 1s). + -hi HEADER_INTERVAL, --header-interval HEADER_INTERVAL + Interval to display the header in iterations (Default: 10). + -f FIELD, --field FIELD + Comma-separated fields to display in the output. +``` + +## Example + +Get stats every 5 seconds: + +``` +python3 docdbstat.py --uri "mongodb://:@:27017/?tls=true&tlsCAFile=global-bundle.pem&retryWrites=false" -i 5 +``` + +Get specific stats, for example to ouput just write operations: + +``` +python3 docdbstat.py --uri "mongodb://:@:27017/?tls=true&tlsCAFile=global-bundle.pem&retryWrites=false" -f inserts,updates,deletes +``` diff --git a/monitoring/docdb-stat/docdbstat.py b/monitoring/docdb-stat/docdbstat.py new file mode 100644 index 0000000..c045f5b --- /dev/null +++ b/monitoring/docdb-stat/docdbstat.py @@ -0,0 +1,147 @@ +# Amazon DocumentDB Stats Monitoring Tool (Version 1.0) + +import argparse +import time +from pymongo import MongoClient +import pandas as pd + + +def connect_to_docdb(uri): + """Connects to a DocumentDB instance. + Check for replicaSet in uri and replace with directConnection. + + Args: + uri: The DocumentDB connection URI. + + Returns: + A pymongo.MongoClient object. + """ + if "replicaSet=rs0" in uri: + uri = uri.replace("replicaSet=rs0", "directConnection=true") + + client = MongoClient(host=uri,appname='ddbstat') + db = client.admin + return db + + +def get_server_stats(db): + """Retrieve the serverStatus() for a DocumentDB instance. + Args: + db: A pymongo.MongoClient object. + + Returns: + A dictionary containing the server stats. + """ + return db.command('serverStatus') + + +def get_replica_status(db): + """Retrieve the replica status for a DocumentDB instance. + + Args: + db: A pymongo.MongoClient object. + + Returns: + The replica status, string. + """ + is_master_result = db.command('isMaster') + if 'setName' in is_master_result: + if is_master_result['ismaster']: + return 'Primary' + else: + return 'Secondary' + + +def display_server_stats(previous_stats, current_stats, header_interval_counter, db, fields, polling_interval): + """Displays db.serverStatus() for a DocumentDB instance. + + Args: + previous_stats: The previous stats, or None if there are no previous stats. + current_stats: The current stats. + header_interval_counter: A counter that is used to determine when to print the header of the table. + db: A pymongo.MongoClient object. + fields: A list of the fields to display in the table. + polling_interval: The polling interval in seconds. + + Returns: + previous_stats: dict, current_stats: dict, header_interval_counter: int, db: object, fields: List[str], polling_interval: int + """ + if previous_stats is None: + return + + metrics = { + 'Host': current_stats['host'].split('.')[0], + 'Status': get_replica_status(db), + 'Connections': current_stats['connections']['current'], + 'Inserts': (current_stats['opcounters']['insert'] - previous_stats['opcounters']['insert']) / polling_interval, + 'Query': (current_stats['opcounters']['query'] - previous_stats['opcounters']['query']) / polling_interval, + 'Updates': (current_stats['opcounters']['update'] - previous_stats['opcounters']['update']) / polling_interval, + 'Deletes': (current_stats['opcounters']['delete'] - previous_stats['opcounters']['delete']) / polling_interval, + 'GetMore': (current_stats['opcounters']['getmore'] - previous_stats['opcounters']['getmore']) / polling_interval, + 'Command': (current_stats['opcounters']['command'] - previous_stats['opcounters']['command']) / polling_interval, + 'CursorsTotal': current_stats['metrics']['cursor']['open']['total'], + 'CursorsNoTimeout': current_stats['metrics']['cursor']['open']['noTimeout'], + 'Transactions': current_stats['transactions']['currentActive'], + 'Timestamp': current_stats['localTime'] + } + + if 'nvme_writes' in fields: + metrics['nvme_writes'] = (current_stats['metrics']['nvme_cache']['nvme_writes'] - previous_stats['metrics']['nvme_cache']['nvme_writes']) / polling_interval + + if 'nvme_missed_writes' in fields: + metrics['nvme_missed_writes'] = (current_stats['metrics']['nvme_cache']['nvme_missed_writes'] - previous_stats['metrics']['nvme_cache']['nvme_missed_writes']) / polling_interval + + fields = [field.lower() for field in fields] + selected_metrics = {key: value for key, value in metrics.items() if key.lower() in fields} + + # Convert the selected metrics dictionary to a DataFrame for tabular display + df = pd.DataFrame(selected_metrics, index=[0]) + + # Convert the DataFrame to a string and remove the index from the output + table_str = df.to_string(header=True, index=False) + + # Print the DataFrame with proper alignment and show the header based on the header_interval_counter + if header_interval_counter == 0: + print(table_str) + else: + print(table_str[table_str.index('\n') + 1:]) + + +def main(uri, polling_interval, header_interval, fields): + db = connect_to_docdb(uri) + previous_server_stats = None + iteration_counter = 0 + header_interval_counter = 0 + + try: + while True: + server_stats = get_server_stats(db) + if previous_server_stats is not None: + display_server_stats(previous_server_stats, server_stats, header_interval_counter, db, fields, polling_interval) + header_interval_counter = (header_interval_counter + 1) % header_interval + + previous_server_stats = server_stats.copy() + time.sleep(polling_interval) + iteration_counter += 1 + except KeyboardInterrupt: + print("\nMonitoring stopped by the user.") + finally: + db.client.close() + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Real-time Amazon DocumentDB server stats monitoring tool.") + parser.add_argument("--uri", required=True, help="DocumentDB connection URI.") + parser.add_argument("-i", "--interval", type=int, default=1, help="Polling interval in seconds (Default: 1s).") + parser.add_argument("-hi", "--header-interval", type=int, default=10, help="Interval to display the header in iterations (Default: 10).") + parser.add_argument("-f", "--field", type=str, default='Host,Status,Connections,Inserts,Query,Updates,Deletes,GetMore,Command,CursorsTotal,CursorsNoTimeout,Transactions,Timestamp',help="Comma-separated fields to display in the output.") + parser.add_argument('--include-nvme',required=False,action='store_true',help='Include NVMe metrics') + args = parser.parse_args() + + fields = [field.strip() for field in args.field.split(',')] + + if args.include_nvme: + fields.append('nvme_writes') + fields.append('nvme_missed_writes') + + main(args.uri, args.interval, args.header_interval, fields) diff --git a/monitoring/documentdb-top/README.md b/monitoring/documentdb-top/README.md new file mode 100644 index 0000000..2baadb8 --- /dev/null +++ b/monitoring/documentdb-top/README.md @@ -0,0 +1,47 @@ +# Real-time Amazon DocumentDB collection level monitoring tool. + +The **documentdb-top** tool connects to a DocumentDB instance and continuously fetches real-time collection level metrics by polling `db..stats()` at a configurable interval (defaults 60 seconds). + + +## Requirements + +- Python 3.x with modules: + - Pymongo +``` +pip3 install pymongo +``` + +- Download the Amazon DocumentDB Certificate Authority (CA) certificate required to authenticate to your instance +``` +wget https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem +``` + +## Usage +The tools accepts the following arguments: + +``` +# python3 documentdb-top.py --help + +usage: documentdb-top.py [-h] --uri URI --database DATABASE [--update-frequency-seconds UPDATE_FREQUENCY_SECONDS] [--must-crud] --log-file-name LOG_FILE_NAME [--skip-python-version-check] [--show-per-second] + +DocumentDB Top + +optional arguments: + -h, --help show this help message and exit + --uri URI URI + --database DATABASE Database name + --update-frequency-seconds UPDATE_FREQUENCY_SECONDS Number of seconds before update + --must-crud Only display when insert/update/delete occurred + --log-file-name LOG_FILE_NAME Log file name + --skip-python-version-check Permit execution on Python 3.6 and prior + --show-per-second Show operations as "per second" +``` + +## Example + +Get collection stats every 15 seconds, only if insert/update/delete has occurred: + +``` +python3 documentdb-top.py --uri "mongodb://:@:27017/?tls=true&tlsCAFile=global-bundle.pem&retryWrites=false&directConnection=true" --database db1 --update-frequency-seconds 15 --log-file-name my-log-file.log --must-crud +``` + diff --git a/monitoring/documentdb-top/documentdb-top.py b/monitoring/documentdb-top/documentdb-top.py new file mode 100644 index 0000000..2a618ee --- /dev/null +++ b/monitoring/documentdb-top/documentdb-top.py @@ -0,0 +1,160 @@ +import datetime as dt +import sys +import json +import pymongo +import time +import os +import argparse + + +def initializeLogFile(appConfig): + with open(appConfig['logFileName'], "w") as logFile: + logFile.write("") + + +def logAndPrint(appConfig,string): + with open(appConfig['logFileName'], "a") as logFile: + logFile.write(string+"\n") + print(string) + + +def reportCollectionInfo(appConfig): + collections = {} + + if appConfig['showPerSecond']: + opsDivisor = appConfig['updateFrequencySeconds'] + opsString = "/s" + else: + opsDivisor = 1 + opsString = "" + + mustCrud = appConfig['mustCrud'] + + logTimeStamp = dt.datetime.now(dt.timezone.utc).isoformat()[:-3] + 'Z' + logAndPrint(appConfig,"{} | {:>10s} {:>10s} {:>10s} {:>10s} {:>10s} {:>10s} {:>10s} {:>10s} {:>10s} {:>10s}".format(logTimeStamp,'collection','ins'+opsString,'upd'+opsString,'del'+opsString,'colBlkHit','colBlkRead','colRatio','idxBlkHit','idxBlkRead','idxRatio')) + + client = pymongo.MongoClient(host=appConfig['uri'],appname='ddbtop') + db = client[appConfig['databaseName']] + + while True: + collCursor = db.list_collections() + + for thisColl in collCursor: + thisCollName = thisColl['name'] + + collStats = db.command("collStats", thisCollName) + + if thisCollName not in collections: + # add it + collections[thisCollName] = {'opCounter': {'numDocsIns': 0, 'numDocsUpd': 0, 'numDocsDel': 0 }, + 'cacheStats': {'collBlksHit': 0, 'collBlksRead': 0, 'collHitRatio': 0.0, 'idxBlksHit': 0, 'idxBlksRead': 0, 'idxHitRatio': 0.0} + } + + # output the differences + diffOI = int((collStats['opCounter']['numDocsIns'] - collections[thisCollName]['opCounter']['numDocsIns'])/opsDivisor) + diffOU = int((collStats['opCounter']['numDocsUpd'] - collections[thisCollName]['opCounter']['numDocsUpd'])/opsDivisor) + diffOD = int((collStats['opCounter']['numDocsDel'] - collections[thisCollName]['opCounter']['numDocsDel'])/opsDivisor) + + diffCCH = int(collStats['cacheStats']['collBlksHit'] - collections[thisCollName]['cacheStats']['collBlksHit']) + diffCCR = int(collStats['cacheStats']['collBlksRead'] - collections[thisCollName]['cacheStats']['collBlksRead']) + diffCCHR = collStats['cacheStats']['collHitRatio'] - collections[thisCollName]['cacheStats']['collHitRatio'] + diffICH = int(collStats['cacheStats']['idxBlksHit'] - collections[thisCollName]['cacheStats']['idxBlksHit']) + diffICR = int(collStats['cacheStats']['idxBlksRead'] - collections[thisCollName]['cacheStats']['idxBlksRead']) + diffICHR = collStats['cacheStats']['idxHitRatio'] - collections[thisCollName]['cacheStats']['idxHitRatio'] + + displayLine = False + + if (mustCrud and (diffOI != 0 or diffOU != 0 or diffOD != 0)): + displayLine = True + + elif ((not mustCrud) and + (diffOI != 0 or diffOU != 0 or diffOD != 0 or + diffCCH != 0 or diffCCR != 0 or diffCCHR != 0 or + diffICH != 0 or diffICR != 0 or diffICHR != 0)): + displayLine = True + + if displayLine: + logTimeStamp = dt.datetime.now(dt.timezone.utc).isoformat()[:-3] + 'Z' + logAndPrint(appConfig,"{} | {:>10s} {:10,d} {:10,d} {:10,d} {:10,d} {:10,d} {:10.4f} {:10,d} {:10,d} {:10.4f}".format(logTimeStamp,thisCollName,diffOI,diffOU,diffOD,diffCCH,diffCCR,diffCCHR,diffICH,diffICR,diffICHR)) + + collections[thisCollName]['opCounter']['numDocsIns'] = collStats['opCounter']['numDocsIns'] + collections[thisCollName]['opCounter']['numDocsUpd'] = collStats['opCounter']['numDocsUpd'] + collections[thisCollName]['opCounter']['numDocsDel'] = collStats['opCounter']['numDocsDel'] + + collections[thisCollName]['cacheStats']['collBlksHit'] = collStats['cacheStats']['collBlksHit'] + collections[thisCollName]['cacheStats']['collBlksRead'] = collStats['cacheStats']['collBlksRead'] + collections[thisCollName]['cacheStats']['collHitRatio'] = collStats['cacheStats']['collHitRatio'] + collections[thisCollName]['cacheStats']['idxBlksHit'] = collStats['cacheStats']['idxBlksHit'] + collections[thisCollName]['cacheStats']['idxBlksRead'] = collStats['cacheStats']['idxBlksRead'] + collections[thisCollName]['cacheStats']['idxHitRatio'] = collStats['cacheStats']['idxHitRatio'] + + time.sleep(appConfig['updateFrequencySeconds']) + + client.close() + + +def main(): + parser = argparse.ArgumentParser(description='DocumentDB Top') + + parser.add_argument('--uri', + required=True, + type=str, + help='URI') + + parser.add_argument('--database', + required=True, + type=str, + help='Database name') + + parser.add_argument('--update-frequency-seconds', + required=False, + type=int, + default=10, + help='Number of seconds before update') + + parser.add_argument('--must-crud', + required=False, + action='store_true', + help='Only display when insert/update/delete occurred') + + parser.add_argument('--log-file-name', + required=True, + type=str, + help='Log file name') + + parser.add_argument('--skip-python-version-check', + required=False, + action='store_true', + help='Permit execution on Python 3.6 and prior') + + parser.add_argument('--show-per-second', + required=False, + action='store_true', + help='Show operations as "per second"') + + args = parser.parse_args() + + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + appConfig = {} + appConfig['uri'] = args.uri + appConfig['updateFrequencySeconds'] = int(args.update_frequency_seconds) + appConfig['databaseName'] = args.database + appConfig['logFileName'] = args.log_file_name + appConfig['showPerSecond'] = args.show_per_second + appConfig['mustCrud'] = args.must_crud + + logAndPrint(appConfig,'---------------------------------------------------------------------------------------') + for thisKey in appConfig: + logAndPrint(appConfig," config | {} | {}".format(thisKey,appConfig[thisKey])) + logAndPrint(appConfig,'---------------------------------------------------------------------------------------') + + reportCollectionInfo(appConfig) + + +if __name__ == "__main__": + main() + + diff --git a/monitoring/gc-watchdog/.gitignore b/monitoring/gc-watchdog/.gitignore new file mode 100644 index 0000000..67ec3ae --- /dev/null +++ b/monitoring/gc-watchdog/.gitignore @@ -0,0 +1,2 @@ +doit-gc.bash +clean.bash diff --git a/monitoring/gc-watchdog/README.md b/monitoring/gc-watchdog/README.md new file mode 100644 index 0000000..119edfe --- /dev/null +++ b/monitoring/gc-watchdog/README.md @@ -0,0 +1,22 @@ +# Amazon DocumentDB Garbage Collection Watchdog +This tool monitors a DocumentDB cluster for garbage collection activity. It displays the start and end of each garbage collection to a log file and can optionally create 3 CloudWatch metrics for monitoring and alerting purposes. + +## Installation +Clone the repository + +## Requirements +* Python 3.7+ +* PyMongo, boto3 + * IAM permission "cloudwatch:PutMetricData" is required to create CloudWatch metrics + +## Using the garbage collection watchdog +``` +python3 gc-watchdog.py --uri --log-file-name [--create-cloudwatch-metrics] [--cluster-name ] +``` + +* \ follows the [MongoDB Connection String URI Format](https://www.mongodb.com/docs/manual/reference/connection-string/) +* \ is the name of the log file created by the tool +* include --create-cloudwatch-metrics to create metrics for the number of ongoing garbage collections, maximum time of an ongoing garbage collection in seconds, and total time of all ongoing garbage collections in seconds + * CloudWatch metrics are captured in namespace "CustomDocDB" as "GCCount", "GCTotalSeconds", and "GCMaxSeconds" +* include --cluster-name \ if capturing CloudWatch metrics via --create-cloudwatch-metrics +* NOTE - The default frequency to check for garbage collection activity is every 5 seconds. Garbage collections requiring less than 5 seconds might not be recorded by this tool. diff --git a/monitoring/gc-watchdog/gc-watchdog.py b/monitoring/gc-watchdog/gc-watchdog.py new file mode 100644 index 0000000..96b9393 --- /dev/null +++ b/monitoring/gc-watchdog/gc-watchdog.py @@ -0,0 +1,169 @@ +import datetime +import sys +import json +import pymongo +import time +import os +import argparse +import boto3 + + +def initializeLogFile(appConfig): + with open(appConfig['logFileName'], mode="w", buffering=1) as logFile: + logFile.write("") + + +def logAndPrint(appConfig,string): + with open(appConfig['logFileName'], mode="a", buffering=1) as logFile: + logFile.write(string+"\n") + print(string) + + +def watchGc(appConfig): + verboseOutput = appConfig['verbose'] + checkFrequencySeconds = appConfig['checkFrequencySeconds'] + createCloudwatchMetrics = appConfig['createCloudwatchMetrics'] + clusterName = appConfig['clusterName'] + client = pymongo.MongoClient(host=appConfig['uri'],appname='gcwatch') + watchStartTime = time.time() + + # number of seconds between posting metrics to cloudwatch + cloudwatchPutSeconds = 60 + lastCloudwatchPutTime = time.time() + activeGcCount = 0 + activeGcTotalSeconds = 0 + activeGcMaxSeconds = 0 + + if createCloudwatchMetrics: + # only instantiate client if needed + cloudWatchClient = boto3.client('cloudwatch') + + gcDict = {} + + while True: + logTimeStamp = datetime.datetime.now(datetime.timezone.utc).isoformat()[:-3] + 'Z' + + # mark all as not seen + for thisNs in gcDict.keys(): + gcDict[thisNs]['active'] = False + + foundGcActivity = False + elapsedSeconds = (time.time() - watchStartTime) + with client.admin.aggregate([{"$currentOp": {"allUsers": True, "idleConnections": True}},{"$match": {"desc": "GARBAGE_COLLECTION"}}]) as cursor: + #if verboseOutput: + # logAndPrint(appConfig,"{} | executionTime (seconds) | {:.2f}".format(logTimeStamp,elapsedSeconds)) + + thisActiveGcCount = 0 + thisActiveGcTotalSeconds = 0 + thisActiveGcMaxSeconds = 0 + + for operation in cursor: + if 'garbageCollection' in operation: + cloudwatchDataExists = True + foundGcActivity = True + thisNs = "{}.{}".format(operation['garbageCollection'].get('databaseName','UNKNOWN-DATABASE'),operation['garbageCollection'].get('collectionName','UNKNOWN-COLLECTION')) + + # get current values + thisActiveGcCount += 1 + thisActiveGcTotalSeconds += int(operation.get('secs_running',0)) + if int(operation.get('secs_running',0)) > thisActiveGcMaxSeconds: + thisActiveGcMaxSeconds = int(operation.get('secs_running',0)) + + # store if larger than existing + activeGcCount = max(activeGcCount,thisActiveGcCount) + activeGcTotalSeconds = max(activeGcTotalSeconds,thisActiveGcTotalSeconds) + activeGcMaxSeconds = max(activeGcMaxSeconds,thisActiveGcMaxSeconds) + + if thisNs in gcDict: + # already tracking as garbage collecting - check if it finish and started again + if operation.get('secs_running',-1) < gcDict[thisNs]['secsRunning']: + # finished and started again, output result + logAndPrint(appConfig,"{} | GC COMPLETED | {} | {:.2f} | seconds".format(logTimeStamp, thisNs, time.time() - gcDict[thisNs]['startTime'])) + + # reset values + gcDict[thisNs]['active'] = True + gcDict[thisNs]['startTime'] = time.time() + gcDict[thisNs]['secsRunning'] = operation.get('secs_running',999999) + + else: + gcDict[thisNs]['active'] = True + + else: + # first time seen as garbage collecting, add to tracking dictionary and mark as active + gcDict[thisNs] = {} + gcDict[thisNs]['active'] = True + gcDict[thisNs]['startTime'] = time.time() + gcDict[thisNs]['secsRunning'] = operation.get('secs_running',999999) + logAndPrint(appConfig,"{} | GC STARTED | {}".format(logTimeStamp, thisNs)) + + if verboseOutput: + logAndPrint(appConfig,"{} | executionTime (seconds) | {:.2f} | {}".format(logTimeStamp,elapsedSeconds,operation)) + + # output CW metrics every cloudwatchPutSeconds seconds + if createCloudwatchMetrics and ((time.time() - lastCloudwatchPutTime) > cloudwatchPutSeconds): + #logAndPrint(appConfig,"{} | CloudWatch count / maxSecs / totSecs = {} / {} / {}".format(logTimeStamp, activeGcCount, activeGcMaxSeconds, activeGcTotalSeconds)) + + # log to cloudwatch + cloudWatchClient.put_metric_data( + Namespace='CustomDocDB', + MetricData=[{'MetricName':'GCCount','Dimensions':[{'Name':'Cluster','Value':clusterName}],'Value':activeGcCount,'StorageResolution':60}, + {'MetricName':'GcMaxSeconds','Dimensions':[{'Name':'Cluster','Value':clusterName}],'Value':activeGcMaxSeconds,'StorageResolution':60}, + {'MetricName':'GCTotalSeconds','Dimensions':[{'Name':'Cluster','Value':clusterName}],'Value':activeGcTotalSeconds,'StorageResolution':60}]) + + lastCloudwatchPutTime = time.time() + activeGcCount = 0 + activeGcTotalSeconds = 0 + activeGcMaxSeconds = 0 + + if not foundGcActivity: + if verboseOutput: + logAndPrint(appConfig,"{} | executionTime (seconds) | {:.2f} | NO GC Activity".format(logTimeStamp,elapsedSeconds)) + + for thisNs in list(gcDict.keys()): + if gcDict[thisNs]['active'] == False: + # GC completed, output result and remove + logAndPrint(appConfig,"{} | GC COMPLETED | {} | {:.2f} | seconds".format(logTimeStamp, thisNs, time.time() - gcDict[thisNs]['startTime'])) + gcDict.pop(thisNs) + + time.sleep(checkFrequencySeconds) + + client.close() + + +def main(): + parser = argparse.ArgumentParser(description='DocumentDB GC Watchdog') + + parser.add_argument('--uri',required=True,type=str,help='URI') + parser.add_argument('--check-frequency-seconds',required=False,type=int,default=5,help='Number of seconds between checks') + parser.add_argument('--skip-python-version-check',required=False,action='store_true',help='Permit execution on Python 3.6 and prior') + parser.add_argument('--log-file-name',required=True,type=str,help='Name of log file') + parser.add_argument('--verbose',required=False,action='store_true',help='Verbose output') + parser.add_argument('--create-cloudwatch-metrics',required=False,action='store_true',help='Create CloudWatch metrics when garbage collection is active') + parser.add_argument('--cluster-name',required=False,type=str,help='Name of cluster for CloudWatch metrics') + + args = parser.parse_args() + + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + if args.create_cloudwatch_metrics and (args.cluster_name is None): + sys.exit("\nMust supply --cluster-name when capturing CloudWatch metrics.\n") + + appConfig = {} + appConfig['uri'] = args.uri + appConfig['checkFrequencySeconds'] = int(args.check_frequency_seconds) + appConfig['logFileName'] = args.log_file_name + appConfig['verbose'] = args.verbose + appConfig['createCloudwatchMetrics'] = args.create_cloudwatch_metrics + appConfig['clusterName'] = args.cluster_name + + initializeLogFile(appConfig) + + watchGc(appConfig) + + +if __name__ == "__main__": + main() + + diff --git a/operations/README.md b/operations/README.md new file mode 100644 index 0000000..831ffda --- /dev/null +++ b/operations/README.md @@ -0,0 +1,3 @@ +# Amazon DocumentDB Operational Tools + +* [index-creator](./index-creator) - create indexes and monitor the progress diff --git a/operations/document-compression-updater/.gitignore b/operations/document-compression-updater/.gitignore new file mode 100644 index 0000000..85d1acc --- /dev/null +++ b/operations/document-compression-updater/.gitignore @@ -0,0 +1 @@ +doit*.bash diff --git a/operations/document-compression-updater/README.md b/operations/document-compression-updater/README.md new file mode 100644 index 0000000..9601d5e --- /dev/null +++ b/operations/document-compression-updater/README.md @@ -0,0 +1,43 @@ +# Python Updater tool +This sample applications compresses pre-existing documents in an existing collection after compression is turned on that existing collection. + +Single threaded application - issues **5000** (controlled by argument --batch-size) updates serially in a _round_, and sleeps for **60** (controlled by argument --wait-period) seconds before starting next _round_. + +Status of the updates are maintained in database **tracker_db** - for each collection there is a tracker collection named **<< collection >>__tracker_col**. + +The application can be restarted if it crashes and it will pick up from last successful _round_ based on data in **<< collection >>__tracker_col**. + +The update statements use field **6nh63** (controlled by argument --update-field), for triggering compression on existing records. + +The application uses **_id** field for tracking and updating existing documents. If you are using a custom value _id, the value should be sort-able. + +## Requirements +Python 3.7 or later, pymongo + +## Installation +Clone the repository and go to the application folder: +``` +git clone https://github.com/awslabs/amazon-documentdb-tools.git +cd amazon-documentdb-tools/operations/document-compression-updater +``` + +## Usage/Examples + +``` + python3 update_apply_compression.py --uri "<>" --database <> --collection <> --update-field << field_name >> --wait-period << int >>> --batch-size << int >> +``` + +The application has the following arguments: + +``` +Required parameters + --uri URI URI (connection string) + --database DATABASE Database + --collection COLLECTION Collection + +Optional parameters + --file-name Starting name of the created log files + --update-field Field used for updating an existing document. This should not conflict with any fieldname you are already using + --wait-period Number of seconds to wait between each batch + --batch-size Number of documents to update in a single batch +``` diff --git a/operations/document-compression-updater/requirements.txt b/operations/document-compression-updater/requirements.txt new file mode 100644 index 0000000..cfd4159 --- /dev/null +++ b/operations/document-compression-updater/requirements.txt @@ -0,0 +1,2 @@ +pymongo + diff --git a/operations/document-compression-updater/update_apply_compression.py b/operations/document-compression-updater/update_apply_compression.py new file mode 100644 index 0000000..cefd1c4 --- /dev/null +++ b/operations/document-compression-updater/update_apply_compression.py @@ -0,0 +1,231 @@ +import datetime +import sys +import random +import json +import pymongo +import time +import threading +import os +import multiprocessing as mp +import argparse +import string +import math + +def deleteLog(appConfig): + if os.path.exists(appConfig['logFileName']): + os.remove(appConfig['logFileName']) + +def printLog(thisMessage,appConfig): + print("{}".format(thisMessage)) + with open(appConfig['logFileName'], 'a') as fp: + fp.write("{}\n".format(thisMessage)) + +def setup(appConfig): + if sys.version_info < (3,7): + sys.exit('Sorry, Python < 3.7 is not supported') + + databaseName = appConfig['databaseName'] + collectionName = appConfig['collectionName'] + + client = pymongo.MongoClient(host=appConfig['uri'],appname='compupd') + + # database and collection for compression + + db = client[databaseName] + adminDb = client['admin'] + col = db[collectionName] + + # database and collection for tracking + + tracker_db=client['tracker_db'] + trackerCollectionName = databaseName+'_'+collectionName+'_tracker_col' + tracker_col=tracker_db[trackerCollectionName] + + list_of_collections = tracker_db.list_collection_names() # Return a list of collections in 'tracker_db' + print("list_of_collections {}".format(list_of_collections)) + + if trackerCollectionName in list_of_collections : + + # tracker db already has entry for collection + + result = tracker_col.find({}).sort({ "_id" : -1}).limit(1) + + for lastEntry in result : + numExistingDocuments = lastEntry["numExistingDocuments"] + maxObjectIdToTouch = lastEntry["maxObjectIdToTouch"] + lastScannedObjectId = lastEntry["lastScannedObjectId"] + numDocumentsUpdated = lastEntry["numDocumentsUpdated"] + print("Found existing record: {}".format(str(lastEntry))) + + else : + + # create first entry in tracker db for collection + result = col.find({},{ "_id" :1}).sort({ "_id" :-1}).limit(1) + + for id in result : + print("result {}".format(result)) + maxObjectIdToTouch = id["_id"] + + lastScannedObjectId = 0 + numDocumentsUpdated = 0 + numExistingDocuments = col.estimated_document_count() + + first_entry = { + "collection_name": appConfig['collectionName'], + "lastScannedObjectId" : lastScannedObjectId, + "ts": datetime.datetime.now(tz=datetime.timezone.utc), + "maxObjectIdToTouch" : maxObjectIdToTouch, + "numExistingDocuments" : numExistingDocuments, + "numDocumentsUpdated" : numDocumentsUpdated + # scan fields in future, for now we use _id + } + tracker_col.insert_one(first_entry) + + printLog("create first entry in tracker db for collection {}".format(first_entry),appConfig) + + client.close() + + returnData = {} + returnData["numExistingDocuments"] = numExistingDocuments + returnData["maxObjectIdToTouch"] = maxObjectIdToTouch + returnData["lastScannedObjectId"] = lastScannedObjectId + returnData["numDocumentsUpdated"] = numDocumentsUpdated + + return returnData + +def task_worker(threadNum,perfQ,appConfig): + maxObjectIdToTouch = appConfig['maxObjectIdToTouch'] + lastScannedObjectId = appConfig['lastScannedObjectId'] + numInsertProcesses = appConfig['numInsertProcesses'] + + numExistingDocuments = appConfig["numExistingDocuments"] + maxObjectIdToTouch = appConfig["maxObjectIdToTouch"] + lastScannedObjectId = appConfig["lastScannedObjectId"] + numDocumentsUpdated = appConfig["numDocumentsUpdated"] + + client = pymongo.MongoClient(appConfig['uri']) + + myDatabaseName = appConfig['databaseName'] + db = client[myDatabaseName] + myCollectionName = appConfig['collectionName'] + col = db[myCollectionName] + tracker_db=client['tracker_db'] + trackerCollectionName = myDatabaseName+'_'+myCollectionName+'_tracker_col' + tracker_col=tracker_db[trackerCollectionName] + + allDone = False + tempLastScannedObjectId = lastScannedObjectId + + while not allDone: + + #start and go through all the docs using _id + + if lastScannedObjectId != 0 : + batch = col.find({"_id" : { "$gt" : lastScannedObjectId }},{ "_id" :1}).sort({"_id" :1}).limit(appConfig['batchSize']) + else : + batch = col.find({},{ "_id" :1}).sort({ "_id" :1}).limit(appConfig['batchSize']) + + batch_count = 0 + updateList = [] + + for id in batch : + if id["_id"]<=maxObjectIdToTouch: + # print("found id {} lesser than maxObjectIdToTouch {}.".format(str(id["_id"]),str(maxObjectIdToTouch))) + updateList.append(pymongo.UpdateOne({ "_id" : id["_id"] } , { "$set": { appConfig['updateField']: 1 } } )) + tempLastScannedObjectId = id["_id"] + batch_count = batch_count + 1 + else: + allDone = True + print("found id {} higher than maxObjectIdToTouch {}. all done .stopping)".format(str(id["_id"]),str(maxObjectIdToTouch))) + break + + if batch_count > 0 : + result = col.bulk_write(updateList) + numDocumentsUpdated = numDocumentsUpdated + batch_count + + tracker_entry = { + "collection_name": appConfig['collectionName'], + "lastScannedObjectId" : tempLastScannedObjectId, + "date": datetime.datetime.now(tz=datetime.timezone.utc), + "maxObjectIdToTouch" : maxObjectIdToTouch, + "numExistingDocuments" : numExistingDocuments, + "numDocumentsUpdated" : numDocumentsUpdated + # scan fields in future, for now we use _id + } + tracker_col.insert_one(tracker_entry) + + printLog( " Last updates applied : {}".format(str(tracker_entry)),appConfig) + + lastScannedObjectId = tempLastScannedObjectId + + printLog("sleeping for {} seconds".format(appConfig['waitPeriod']),appConfig) + time.sleep(appConfig['waitPeriod']) + else : + print("No updates in batch") + allDone = True + break + + client.close() + +def main(): + parser = argparse.ArgumentParser(description='Update and Apply Compression') + parser.add_argument('--uri',required=True,type=str,help='URI (connection string)') + parser.add_argument('--database',required=True,type=str,help='Database') + parser.add_argument('--collection',required=True,type=str,help='Collection') + parser.add_argument('--file-name',required=False,type=str,default='compressor',help='Starting name of the created log files') + parser.add_argument('--update-field',required=False,type=str,default='6nh63',help='Field used for updating an existing document. This should not conflict with any fieldname you are already using ') + parser.add_argument('--wait-period',required=False,type=int,default=60,help='Number of seconds to wait between each batch') + parser.add_argument('--batch-size',required=False,type=int,default=5000,help='Number of documents to update in a single batch') + + args = parser.parse_args() + + appConfig = {} + appConfig['uri'] = args.uri + appConfig['numInsertProcesses'] = 1 #int(args.processes) + appConfig['databaseName'] = args.database + appConfig['collectionName'] = args.collection + appConfig['updateField'] = args.update_field + appConfig['batchSize'] = int(args.batch_size) + appConfig['waitPeriod'] = int(args.wait_period) + appConfig['logFileName'] = "{}.log".format(args.file_name) + + setUpdata = setup(appConfig) + + appConfig['numExistingDocuments'] = setUpdata["numExistingDocuments"] + appConfig['maxObjectIdToTouch'] = setUpdata["maxObjectIdToTouch"] + appConfig['lastScannedObjectId'] = setUpdata["lastScannedObjectId"] + appConfig['numDocumentsUpdated'] = setUpdata["numDocumentsUpdated"] + + deleteLog(appConfig) + + printLog('---------------------------------------------------------------------------------------',appConfig) + for thisKey in sorted(appConfig): + if (thisKey == 'uri'): + thisUri = appConfig[thisKey] + thisParsedUri = pymongo.uri_parser.parse_uri(thisUri) + thisUsername = thisParsedUri['username'] + thisPassword = thisParsedUri['password'] + thisUri = thisUri.replace(thisUsername,'') + thisUri = thisUri.replace(thisPassword,'') + printLog(" config | {} | {}".format(thisKey,thisUri),appConfig) + else: + printLog(" config | {} | {}".format(thisKey,appConfig[thisKey]),appConfig) + printLog('---------------------------------------------------------------------------------------',appConfig) + + mp.set_start_method('spawn') + q = mp.Manager().Queue() + + processList = [] + for loop in range(appConfig['numInsertProcesses']): + p = mp.Process(target=task_worker,args=(loop,q,appConfig)) + processList.append(p) + for process in processList: + process.start() + + for process in processList: + process.join() + + printLog("Created {} with results".format(appConfig['logFileName']),appConfig) + +if __name__ == "__main__": + main() diff --git a/operations/index-compare/.gitignore b/operations/index-compare/.gitignore new file mode 100644 index 0000000..cca4997 --- /dev/null +++ b/operations/index-compare/.gitignore @@ -0,0 +1,8 @@ +doit-compare.bash +doit-create.bash +doit-drop.bash +doit-index-tool.bash +megatest-both.js +megatest-drop.js +megatest-source.js +megatest-target.js diff --git a/operations/index-compare/README.md b/operations/index-compare/README.md new file mode 100644 index 0000000..f33cbe8 --- /dev/null +++ b/operations/index-compare/README.md @@ -0,0 +1,24 @@ +# Amazon DocumentDB Index Compare + +Index Compare detects extra, missing, and differing indexes between two DocumentDB or MongoDB clusters. + +## Requirements +Python 3.7 or greater, Pymongo. + +## Usage/Examples +Index Compare accepts the following arguments: + +``` +--source-uri URI URI to connect to source Amazon DocumentDB or MongoDB cluster (required) +--target-uri URI URI to connect to target Amazon DocumentDB or MongoDB cluster (required) +--verbose Verbose output +``` + +### Compare indexes between two clusters +``` +python3 index-compare.py --source-uri $SOURCE_CLUSTER_URI --target-uri $TARGET_CLUSTER_URI +``` + +## License +[Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) + diff --git a/operations/index-compare/index-compare.py b/operations/index-compare/index-compare.py new file mode 100644 index 0000000..01b71cd --- /dev/null +++ b/operations/index-compare/index-compare.py @@ -0,0 +1,180 @@ +import argparse +import sys +#import json +import pymongo +import os +import warnings + + +def ensureDirect(uri,appname): + # make sure we are directly connecting to the server requested, not via replicaSet + + connInfo = {} + parsedUri = pymongo.uri_parser.parse_uri(uri) + + for thisKey in sorted(parsedUri['options'].keys()): + if thisKey.lower() not in ['replicaset','readpreference']: + connInfo[thisKey] = parsedUri['options'][thisKey] + + # make sure we are using directConnection=true + connInfo['directconnection'] = True + + connInfo['username'] = parsedUri['username'] + connInfo['password'] = parsedUri['password'] + connInfo['host'] = parsedUri['nodelist'][0][0] + connInfo['port'] = parsedUri['nodelist'][0][1] + connInfo['appname'] = appname + + if parsedUri.get('database') is not None: + connInfo['authSource'] = parsedUri['database'] + + return connInfo + + +def getData(appConfig,connectUri,whichServer): + warnings.filterwarnings("ignore","You appear to be connected to a DocumentDB cluster.") + + print('connecting to {} server'.format(whichServer)) + client = pymongo.MongoClient(**ensureDirect(connectUri,'indxcomp')) + + return getCollectionStats(appConfig,client) + + +def getCollectionStats(appConfig,client): + verbose = appConfig['verbose'] + returnDict = {} + + # get databases - filter out admin, config, local, and system + dbDict = client.admin.command("listDatabases",nameOnly=True,filter={"name":{"$nin":['admin','config','local','system']}})['databases'] + for thisDb in dbDict: + collCursor = client[thisDb['name']].list_collections() + for thisColl in collCursor: + if thisColl.get('type','NOT-FOUND') == 'view': + # exclude views + pass + elif thisColl['name'] in ['system.profile']: + # exclude certain collections + pass + else: + if verbose: + print(" retrieving indexes for {}.{}".format(thisDb['name'],thisColl['name'])) + if thisDb['name'] not in returnDict: + returnDict[thisDb['name']] = {} + + indexInfo = client[thisDb['name']][thisColl['name']].index_information() + + returnDict[thisDb['name']][thisColl['name']] = indexInfo.copy() + + return returnDict + + +def compareSpecificIndex(appConfig,index1,index2,keyIndex,keyDatabase,keyCollection): + verbose = appConfig['verbose'] + excludedAttributesList = ['ns','v','textIndexVersion','language_override'] + + if verbose: + print(" {} | {}".format(index1,index2)) + # check for keys in index1 but not index2 + for key in sorted(index1.keys()): + if key not in excludedAttributesList: + if verbose: + print(" checking source attribute {}".format(key)) + if key not in index2: + print("attribute {} on index {} on {}.{} does not exist in target".format(key,keyIndex,keyDatabase,keyCollection)) + else: + # check if the values differ + if index1[key] == index2[key]: + pass + else: + print("attribute {} on index {} on {}.{} has differing values of source == {} and target == {}".format(key,keyIndex,keyDatabase,keyCollection,index1[key],index2[key])) + else: + if verbose: + print("attribute {} on index {} on {}.{} skipped (excluded from check)".format(key,keyIndex,keyDatabase,keyCollection)) + + # check for keys in index2 but not index1 + for key in sorted(index2.keys()): + if key not in excludedAttributesList: + if verbose: + print(" checking target attribute {}".format(key)) + if key not in index1: + print("attribute {} on index {} on {}.{} does not exist in source".format(key,keyIndex,keyDatabase,keyCollection)) + + + +def compareIndexes(appConfig,sourceDict,targetDict): + verbose = appConfig['verbose'] + + # compare source to target + print("") + print("comparing - source to target") + for keyDatabase in sorted(sourceDict.keys()): + if verbose: + print(" checking source database {}".format(keyDatabase)) + for keyCollection in sorted(sourceDict[keyDatabase].keys()): + if verbose: + print(" checking collection {}".format(keyCollection)) + if keyDatabase in targetDict and keyCollection in targetDict[keyDatabase]: + for keyIndex in sorted(sourceDict[keyDatabase][keyCollection]): + if verbose: + print(" checking index {}".format(keyIndex)) + if keyIndex in targetDict[keyDatabase][keyCollection]: + # index exists - compare + compareSpecificIndex(appConfig,sourceDict[keyDatabase][keyCollection][keyIndex],targetDict[keyDatabase][keyCollection][keyIndex],keyIndex,keyDatabase,keyCollection) + else: + print("index {} on {}.{} does not exist in target".format(keyIndex,keyDatabase,keyCollection)) + else: + print("collection {}.{} does not exist in target".format(keyDatabase,keyCollection)) + + # compare target to source + print("") + print("comparing - target to source") + for keyDatabase in sorted(targetDict.keys()): + if verbose: + print(" checking target database {}".format(keyDatabase)) + for keyCollection in sorted(targetDict[keyDatabase].keys()): + if verbose: + print(" checking collection {}".format(keyCollection)) + if keyDatabase in sourceDict and keyCollection in sourceDict[keyDatabase]: + for keyIndex in sorted(targetDict[keyDatabase][keyCollection]): + if verbose: + print(" checking index {}".format(keyIndex)) + if keyIndex in sourceDict[keyDatabase][keyCollection]: + # index exists - skip + #compareSpecificIndex(appConfig,sourceDict[keyDatabase][keyCollection][keyIndex],targetDict[keyDatabase][keyCollection][keyIndex],keyIndex,keyDatabase,keyCollection) + pass + else: + print("index {} on {}.{} does not exist in source".format(keyIndex,keyDatabase,keyCollection)) + else: + print("collection {}.{} does not exist in source".format(keyDatabase,keyCollection)) + + print("") + + +def main(): + parser = argparse.ArgumentParser(description='Compare indexes between two DocumentDB or MongoDB servers') + + parser.add_argument('--skip-python-version-check',required=False,action='store_true',help='Permit execution on Python 3.6 and prior') + parser.add_argument('--source-uri',required=True,type=str,help='Connection URI for source') + parser.add_argument('--target-uri',required=True,type=str,help='Connection URI for target') + parser.add_argument('--verbose',required=False,action='store_true',help='Verbose output') + + args = parser.parse_args() + + # check for minimum Python version + MIN_PYTHON = (3, 7) + if (not args.skip_python_version_check) and (sys.version_info < MIN_PYTHON): + sys.exit("\nPython %s.%s or later is required.\n" % MIN_PYTHON) + + appConfig = {} + appConfig['sourceUri'] = args.source_uri + appConfig['targetUri'] = args.target_uri + appConfig['verbose'] = args.verbose + + sourceDict = getData(appConfig,appConfig['sourceUri'],'source') + targetDict = getData(appConfig,appConfig['targetUri'],'target') + + compareIndexes(appConfig,sourceDict,targetDict) + + +if __name__ == "__main__": + main() diff --git a/operations/index-creator/.gitignore b/operations/index-creator/.gitignore new file mode 100644 index 0000000..ddf168f --- /dev/null +++ b/operations/index-creator/.gitignore @@ -0,0 +1 @@ +*.bash diff --git a/operations/index-creator/README.md b/operations/index-creator/README.md new file mode 100644 index 0000000..662f00c --- /dev/null +++ b/operations/index-creator/README.md @@ -0,0 +1,37 @@ +# Amazon DocumentDB Index Creator + +Index Creator enables the creation of indexes while viewing the status and progress from the command line. + +## Features +- Create single key and compound indexes from the command line, including multi-key indexes +- **NOTE** - does not currently support creation of partial, geospatial, text, or vector indexes +- During index creation the status of the index creation process as well as estimated time to complete the current stage is displayed + +## Requirements +Python 3.7 or greater, Pymongo. + +## Usage/Examples +Index Creator accepts the following arguments: + +``` +--uri URI URI to connect to Amazon DocumentDB (required) +--workers WORKERS Number of worker processes for heap scan stage of index creation (required) +--database DATABASE Database containing collection for index creation (required) +--collection COLLECTION Collection to create index (required) +--index-name INDEX_NAME Name of index to create (required) +--index-keys INDEX_KEYS Comma separated list of index key(s), append :-1 after key for descending (required) +--unique Create unique index +--foreground Create index in the foreground (must provide this or --background) +--background Create index in the background (must provide this or --foreground) +--drop-index Drop index (if exists) +--update-frequency-seconds SECONDS Number of seconds between progress updates (default 15) +--log-file-name LOG_FILE_NAME Name of file for output logging (default index-creator.log) +``` + +### Create a compound index with 4 workers on testdb.testcoll on fields f1 and f2 +``` +python3 index-creator.py --uri $DOCDB_URI --workers 4 --database testdb --collection testcoll --index-name test_idx --index-keys f1,f2 --background +``` + +## License +[Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) diff --git a/operations/index-creator/index-creator.py b/operations/index-creator/index-creator.py new file mode 100644 index 0000000..091dcd7 --- /dev/null +++ b/operations/index-creator/index-creator.py @@ -0,0 +1,213 @@ +from datetime import datetime +import sys +import pymongo +import time +import threading +import os +import argparse + + +allDone = False + + +def deleteLog(appConfig): + if os.path.exists(appConfig['logFileName']): + os.remove(appConfig['logFileName']) + + +def printLog(thisMessage,appConfig): + with open(appConfig['logFileName'], 'a') as fp: + fp.write("{}\n".format(thisMessage)) + print("{}".format(thisMessage)) + + +def reportCollectionInfo(appConfig): + client = pymongo.MongoClient(appConfig['uri']) + db = client[appConfig['databaseName']] + + collStats = db.command("collStats", appConfig['collectionName']) + + compressionRatio = collStats['size'] / collStats['storageSize'] + gbDivisor = 1024*1024*1024 + + printLog("collection statistics | numDocs = {0:12,d}".format(collStats['count']),appConfig) + printLog("collection statistics | avgObjSize = {0:12,d}".format(int(collStats['avgObjSize'])),appConfig) + printLog("collection statistics | size (GB) = {0:12,.4f}".format(collStats['size']/gbDivisor),appConfig) + printLog("collection statistics | storageSize (GB) = {0:12,.4f} ".format(collStats['storageSize']/gbDivisor),appConfig) + printLog("collection statistics | compressionRatio = {0:12,.4f}".format(compressionRatio),appConfig) + printLog("collection statistics | totalIndexSize (GB) = {0:12,.4f}".format(collStats['totalIndexSize']/gbDivisor),appConfig) + + client.close() + + +def reporter(appConfig): + global allDone + + uri = appConfig['uri'] + numSecondsFeedback = appConfig['updateFrequencySeconds'] + ns = "{}.{}".format(appConfig['databaseName'],appConfig['collectionName']) + + startTime = time.time() + lastTime = time.time() + nextReportTime = startTime + numSecondsFeedback + + client = pymongo.MongoClient(uri) + + currentStage = 0 + progressDone = 0 + progressTotal = 0 + + + while not allDone: + time.sleep(numSecondsFeedback) + nowTime = time.time() + + # run the query to get the current status + indexCreateStatus = '' + with client.admin.aggregate([{"$currentOp": {}},{"$match":{"ns":ns}}]) as cursor: + for operation in cursor: + if 'createIndexes' in operation['command']: + #print(operation['command']) + indexCreateStatus = operation.get('msg','') + if 'progress' in operation: + progressDone = operation['progress'].get('done',0) + progressTotal = operation['progress'].get('total',0) + else: + progressDone = 0 + progressTotal = 0 + + if ('stage 3' in indexCreateStatus) and (currentStage < 3): + currentStage = 3 + stageStartTime = time.time() + elif ('stage 4' in indexCreateStatus) and (currentStage < 4): + currentStage = 4 + stageStartTime = time.time() + elif ('stage 5' in indexCreateStatus) and (currentStage < 5): + currentStage = 5 + stageStartTime = time.time() + elif ('stage 6' in indexCreateStatus) and (currentStage < 6): + currentStage = 6 + stageStartTime = time.time() + elif ('stage 8' in indexCreateStatus) and (currentStage < 8): + currentStage = 8 + stageStartTime = time.time() + + if progressDone != 0 and progressTotal != 0 and currentStage in [3,4,5,6,8]: + remainingSeconds = max(int(((nowTime - stageStartTime) * (1 / (progressDone / progressTotal))) - (nowTime - stageStartTime)),0) + else: + remainingSeconds = 0 + + elapsedSeconds = nowTime - startTime + thisHours, rem = divmod(elapsedSeconds, 3600) + thisMinutes, thisSeconds = divmod(rem, 60) + thisHMS = "{:0>2}:{:0>2}:{:05.2f}".format(int(thisHours),int(thisMinutes),thisSeconds) + + logTimeStamp = datetime.utcnow().isoformat()[:-3] + 'Z' + + if remainingSeconds == 0: + printLog("[{}] elapsed {} | {}".format(logTimeStamp,thisHMS,indexCreateStatus),appConfig) + else: + printLog("[{}] elapsed {} | {} | stage {} remaining seconds ~{}".format(logTimeStamp,thisHMS,indexCreateStatus,currentStage,remainingSeconds),appConfig) + + client.close() + + +def task_worker(appConfig): + global allDone + + uri = appConfig['uri'] + databaseName = appConfig['databaseName'] + collectionName = appConfig['collectionName'] + indexName = appConfig['indexName'] + background = appConfig['background'] + unique = appConfig['unique'] + indexKeys = appConfig['indexKeys'] + workers = appConfig['workers'] + dropIndex = appConfig['dropIndex'] + + indexList = [] + for thisKey in indexKeys.split(','): + if ':' in thisKey: + thisKeyField, thisKeyDirection = thisKey.split(':') + if thisKeyDirection == '-1': + thisKeyDirectionPymongo = pymongo.DESCENDING + else: + thisKeyDirectionPymongo = pymongo.ASCENDING + indexList.append((thisKeyField,thisKeyDirectionPymongo)) + else: + thisKeyField = thisKey + thisKeyDirectionPymongo = pymongo.ASCENDING + indexList.append((thisKeyField,thisKeyDirectionPymongo)) + + client = pymongo.MongoClient(host=uri,appname='indxcrtr') + db = client[databaseName] + col = db[collectionName] + + if dropIndex and (indexName in col.index_information()): + printLog("Dropping index {} on {}.{}".format(indexName,databaseName,collectionName),appConfig) + col.drop_index(indexName) + + # output what we are doing before we do it + printLog("Creating index {} on {}.{}, unique={}, background={}, workers={}, keys={}".format(indexName,databaseName,collectionName,unique,background,workers,indexList),appConfig) + + col.create_index(indexList, name=indexName, background=background, unique=unique, workers=workers) + + client.close() + + allDone = True + + +def main(): + parser = argparse.ArgumentParser(description='Index Creator') + + parser.add_argument('--uri',required=True,type=str,help='URI (connection string)') + parser.add_argument('--workers',required=True,type=int,help='Number of index creation workers') + parser.add_argument('--database',required=True,type=str,help='Database') + parser.add_argument('--collection',required=True,type=str,help='Collection') + parser.add_argument('--update-frequency-seconds',required=False,type=int,default=15,help='Number of seconds between updates') + parser.add_argument('--index-name',required=True,type=str,help='Index name') + parser.add_argument('--index-keys',required=True,type=str,help='Index key(s) - comma separated - optional :1 or :-1 for direction') + parser.add_argument('--log-file-name',required=False,type=str,default='index-creator.log',help='Name of log file') + parser.add_argument('--background',required=False,action='store_true',help='Create index in the background') + parser.add_argument('--foreground',required=False,action='store_true',help='Create index in the foreground') + parser.add_argument('--unique',required=False,action='store_true',help='Create unique index') + parser.add_argument('--drop-index',required=False,action='store_true',help='Drop the index (if it exists)') + args = parser.parse_args() + + # fail if not --background or --foreground + if not (args.background or args.foreground): + print("Must supply either --background or --foreground") + sys.exit(1) + + appConfig = {} + appConfig['uri'] = args.uri + appConfig['workers'] = int(args.workers) + appConfig['databaseName'] = args.database + appConfig['collectionName'] = args.collection + appConfig['updateFrequencySeconds'] = int(args.update_frequency_seconds) + appConfig['indexName'] = args.index_name + appConfig['indexKeys'] = args.index_keys + appConfig['logFileName'] = args.log_file_name + appConfig['unique'] = args.unique + appConfig['background'] = args.background + appConfig['dropIndex'] = args.drop_index + + deleteLog(appConfig) + + reportCollectionInfo(appConfig) + + tWorker = threading.Thread(target=task_worker,args=(appConfig,)) + tWorker.start() + + tReporter = threading.Thread(target=reporter,args=(appConfig,)) + tReporter.start() + + tReporter.join() + + reportCollectionInfo(appConfig) + + printLog("Created {} with results".format(appConfig['logFileName']),appConfig) + + +if __name__ == "__main__": + main() diff --git a/operations/large-doc-finder/README.md b/operations/large-doc-finder/README.md new file mode 100644 index 0000000..522e79e --- /dev/null +++ b/operations/large-doc-finder/README.md @@ -0,0 +1,88 @@ +# Large Document Finder for DocumentDB + +This tool scans an Amazon DocumentDB collection to identify documents that exceed a specified size threshold. It processes documents in parallel using multiple threads and outputs results exceeding the threshold to a CSV file. + +# Requirements + - Python 3.9+ + - pymongo Python package - tested versions + - DocumentDB | pymongo 4.10.1 + - If not installed - "$ pip3 install pymongo" + +## Example usage: +Basic usage: + + python large-docs.py --uri "mongodb://..." \ + --processes 8 \ + --batch-size 1000 \ + --database mydb \ + --collection mycollection \ + --csv "mydb_mycollection_" \ + --large-doc-size 10485760 + +## Parameters: +`--uri` : str +- Required +- DocumentDB connection string +- Example: `mongodb://user:password@name.cluster.region.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false` + +`--processes` : int +- Required +- Number of parallel threads to use +- Example: 8 + +`--batch-size` : int +- Required +- Number of documents to process in each batch +- Example: 1000 + +`--database` : str +- Required +- Name of the database to scan +- Example: `mydb` + +`--collection` : str +- Required +- Name of the collection to scan +- Example: `mycollection` + +`--csv` : str +- Prefix for the CSV output filename +- Default: `large_doc_` +- Example: `large_docs_prod` + +`--large-doc-size` : int +- Size threshold in bytes +- Default: 8388608 (8MB) +- Example: 10485760 (10MB) + +## Example output: +---------------- +The output CSV contains: +- Scan details (database, collection, threshold, etc.) +- Document details (ID, size in bytes, size in MB) + + ```Database,mydb + Collection,mycollection + Batch size,50000 + Number of threads,4 + Total documents,3156003 + Large document threshold (bytes),8388608 + Large document threshold (MB),8.00 + Scan Start Time,2025-03-02T22:17:04.761870 + Scan completion time,2025-03-02T22:17:36.291172 + Total scan time,00:00:31 + Large documents found,3 + + Document _id,Size (bytes),Size (MB) + 65e8f2a1b3e8d97531abcdef,9437247,9.00 + 65e8f2a2b3e8d97531abcd01,9437247,9.00 + 65e8f2a3b3e8d97531abcd02,9437247,9.00 + +## Performance Considerations: +1. Thread count: Start with 2x CPU cores, adjust based on monitoring +2. Batch size: Larger batches = more memory but fewer DB round trips +3. Run during off-peak hours and monitor cluster performance metrics +4. Use `secondaryPreferred` read preference + +## License +This tool is licensed under the Apache 2.0 License. \ No newline at end of file diff --git a/operations/large-doc-finder/large-docs.py b/operations/large-doc-finder/large-docs.py new file mode 100644 index 0000000..6de1233 --- /dev/null +++ b/operations/large-doc-finder/large-docs.py @@ -0,0 +1,409 @@ +import bson +import pymongo +import time +import concurrent.futures +import csv +import argparse +import sys +import logging +import signal +import sys +from threading import Lock +from datetime import datetime + +shutdown_flag = False + +logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(levelname)s - %(message)s', + force=True +) +logger = logging.getLogger(__name__) + +class BatchCounter: + def __init__(self, total_docs): + self.docs_processed = 0 + self.start_time = time.time() + self.last_log_time = self.start_time + self.last_processed = 0 + self.lock = Lock() + self.total_docs = total_docs + self.large_docs_count = 0 + + header = f"{'Time':19} | {'Total docs':>11} | {'Processed':>11} | {'Perc':>6} | {'Elapsed':>8} | {'Docs/sec':>10} | {'Large docs':>10}" + print(header) + + def increment(self, docs_in_batch, new_large_docs=0): + current_time = time.time() + with self.lock: + self.docs_processed += docs_in_batch + self.large_docs_count += new_large_docs + self.last_processed += docs_in_batch + + if (current_time - self.last_log_time >= 10) or (self.last_processed >= 100000): + elapsed = current_time - self.start_time + elapsed_str = time.strftime('%H:%M:%S', time.gmtime(elapsed)) + percentage = (self.docs_processed / self.total_docs) * 100 + docs_per_sec = int(self.docs_processed / elapsed) if elapsed > 0 else 0 + + current_time_str = datetime.now().strftime('%Y-%m-%d %H:%M:%S') + line = (f"{current_time_str} | {self.total_docs:>11,d} | {self.docs_processed:>11,d} | " + f"{percentage:>5.1f}% | {elapsed_str} | {docs_per_sec:>10,d} | {self.large_docs_count:>10d}") + print(line) + + self.last_log_time = current_time + self.last_processed = 0 + sys.stdout.flush() + +class DocumentProcessor: + _last_id_lock = Lock() + _last_processed_id = None + + def __init__(self, collection, size_threshold, batch_counter, batch_size): + self.collection = collection + self.size_threshold = size_threshold + self.batch_counter = batch_counter + self.batch_size = batch_size + + def get_next_batch(self): + with DocumentProcessor._last_id_lock: + query = {"_id": {"$gt": DocumentProcessor._last_processed_id}} if DocumentProcessor._last_processed_id else {} + + docs = list(self.collection.find( + query, + {"_id": 1}, + limit=self.batch_size, + sort=[("_id", 1)] + ).hint("_id_")) + + if not docs: + return None, None + + start_id = docs[0]["_id"] + end_id = docs[-1]["_id"] + DocumentProcessor._last_processed_id = end_id + + return start_id, end_id + + def process_batch(self, batch_cursor): + while True: + if shutdown_flag: + break + + start_id, end_id = self.get_next_batch() + if not start_id: + break + + batch_large_docs = [] + doc_count = 0 + + try: + query = { + "_id": { + "$gte": start_id, + "$lte": end_id + } + } + + batch_cursor = self.collection.find(query).hint("_id_") + + for doc in batch_cursor: + if shutdown_flag: + break + + doc_count += 1 + doc_id = doc["_id"] + + try: + size = get_bson_size(doc) + + if size > self.size_threshold: + batch_large_docs.append((doc_id, size)) + + except Exception as e: + if not shutdown_flag: + logger.error(f"Error processing document {doc_id}: {str(e)}") + + doc = None + + if doc_count > 0 or batch_large_docs: + self.batch_counter.increment(doc_count, len(batch_large_docs)) + yield True, batch_large_docs + + except Exception as e: + logger.error(f"Error processing batch: {str(e)}") + yield False, [] + + batch_cursor = None + + +def signal_handler(signum, frame): + global shutdown_flag + print('\nShutdown requested. Cleaning up...', flush=True) + shutdown_flag = True + +def validate_config(config): + # Validate provided parameters + if not config.get('uri'): + raise ValueError("URI cannot be empty") + + # Validate URI format + try: + uri = config['uri'] + if not uri.startswith(('mongodb://', 'mongodb+srv://')): + raise ValueError("URI must start with 'mongodb://' or 'mongodb+srv://'") + + if '@' in uri: + host_part = uri.split('@')[1].split('/')[0] + else: + host_part = uri.split('//')[1].split('/')[0] + + if not host_part: + raise ValueError("Invalid URI: missing host") + + except IndexError: + raise ValueError("Invalid URI format") + + if not isinstance(config['batchSize'], int) or config['batchSize'] <= 0: + raise ValueError("Batch size must be a positive integer") + + if not isinstance(config['numProcesses'], int) or config['numProcesses'] <= 0: + raise ValueError("Number of processes must be a positive integer") + + if not isinstance(config['largeDocThreshold'], int) or config['largeDocThreshold'] <= 0: + raise ValueError("Large document threshold must be a positive integer") + + # Check for unreasonable memory requirements + if config['batchSize'] * config['numProcesses'] > 1000000: + logger.warning("High memory usage configuration detected") + + # Validate string parameters + if not config.get('databaseName'): + raise ValueError("Database name cannot be empty") + + if not config.get('collectionName'): + raise ValueError("Collection name cannot be empty") + +def create_id_ranges(batch_size, collection, total_docs): + if total_docs == 0: + return [] + + # Just getting min_id + pipeline = [ + {"$project": {"_id": 1}}, + {"$sort": {"_id": 1}}, + {"$limit": 1} + ] + + result = list(collection.aggregate(pipeline, hint="_id_")) + if not result: + return [] + + # Each thread will use this as a starting point to create their own range + return [(result[0]["_id"], None)] + +def write_to_csv(filename, data, mode='a', batch_size=1000): + try: + with open(filename, mode, newline='', encoding='utf-8') as csvfile: + csv_writer = csv.writer(csvfile) + + if isinstance(data, list): + csv_writer.writerows(data) + else: + csv_writer.writerow(data) + + except IOError as e: + logger.error(f"Error writing to CSV file: {str(e)}") + raise + except Exception as e: + logger.error(f"Unexpected error writing to CSV: {str(e)}") + raise + +def get_bson_size(document): + try: + return len(bson.BSON.encode(document)) + except Exception as e: + logger.error(f"Error calculating BSON size: {str(e)}") + return None + +def get_collection_doc_count(collection): + try: + stats = collection.database.command('collStats', collection.name) + return stats['count'] + except Exception as e: + logger.error(f"Error getting document count from stats: {str(e)}") + raise + +def process_future_results(future, large_docs_data): + large_docs_count = 0 + try: + for success, batch_large_docs in future.result(): + if batch_large_docs: + large_docs_count += len(batch_large_docs) + for doc_id, size in batch_large_docs: + large_docs_data.append((str(doc_id), size, f"{size / (1024*1024):.2f}")) + except Exception as e: + logger.error(f"Error processing future results: {str(e)}") + return large_docs_count + +def main(): + signal.signal(signal.SIGINT, signal_handler) + signal.signal(signal.SIGTERM, signal_handler) + + start_time = time.time() + start_time_iso = datetime.fromtimestamp(start_time).isoformat() + + parser = argparse.ArgumentParser(description='Large Document Finder') + parser.add_argument('--uri', required=True, type=str, help='URI (connection string)') + parser.add_argument('--processes', required=True, type=int, help='Number of threads') + parser.add_argument('--batch-size', required=True, type=int, help='Number of documents per batch') + parser.add_argument('--database', required=True, type=str, help='Database name') + parser.add_argument('--collection', required=True, type=str, help='Collection name') + parser.add_argument('--csv', default='large_doc_', type=str, help='Prefix for the CSV output filename') + parser.add_argument('--large-doc-size', type=int, default=8388608, help='Large document size threshold in bytes (default 8388608 - 8MB)') + + args = parser.parse_args() + + appConfig = { + 'uri': args.uri, + 'numProcesses': int(args.processes), + 'batchSize': int(args.batch_size), + 'databaseName': args.database, + 'collectionName': args.collection, + 'csvName': args.csv, + 'largeDocThreshold': int(args.large_doc_size) + } + + try: + validate_config(appConfig) + except ValueError as e: + logger.error(f"Configuration error: {str(e)}") + sys.exit(1) + + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + csv_filename = f"{appConfig['csvName']}{timestamp}.csv" + large_docs_found = 0 + large_docs_data = [] + + logger.info("Connecting to database...") + + try: + client = pymongo.MongoClient(appConfig['uri']) + client.admin.command('ping') + db = client[appConfig['databaseName']] + col = db[appConfig['collectionName']] + + total_docs = get_collection_doc_count(col) + logger.info(f"Total Documents: {total_docs:,}") + + batch_size = appConfig['batchSize'] + num_batches = (total_docs + batch_size - 1) // batch_size + + logger.info(f"Creating {num_batches:,} segments with {batch_size:,} docs per segment") + + logger.info('Starting document scan...') + + metadata = [ + ['Database', appConfig['databaseName']], + ['Collection', appConfig['collectionName']], + ['Batch size', batch_size], + ['Number of threads', appConfig['numProcesses']], + ['Total documents', total_docs], + ['Large document threshold (bytes)', appConfig['largeDocThreshold']], + ['Large document threshold (MB)', f"{appConfig['largeDocThreshold'] / (1024*1024):.2f}"], + ['Scan Start Time', datetime.now().isoformat()], + ] + + batch_counter = BatchCounter(total_docs) + processor = DocumentProcessor(col, appConfig['largeDocThreshold'], batch_counter=batch_counter, batch_size=batch_size) + + with concurrent.futures.ThreadPoolExecutor(max_workers=appConfig['numProcesses']) as executor: + futures = set() + large_docs_found = 0 + + for _ in range(appConfig['numProcesses']): + if shutdown_flag: + print('\nStopping batch submission...', flush=True) + break + + processor = DocumentProcessor(col, appConfig['largeDocThreshold'], + batch_counter=batch_counter, + batch_size=batch_size) + future = executor.submit(processor.process_batch, None) + futures.add(future) + + for future in concurrent.futures.as_completed(futures): + if shutdown_flag: + print('\nCancelling remaining tasks...', flush=True) + for f in futures: + if not f.done(): + f.cancel() + break + + large_docs_found += process_future_results(future, large_docs_data) + + end_time = time.time() + duration_str = time.strftime('%H:%M:%S', time.gmtime(end_time - start_time)) + + except pymongo.errors.PyMongoError as e: + logger.error(f"DocumentDB error: {str(e)}") + sys.exit(1) + except Exception as e: + logger.error(f"Main loop error: {str(e)}") + if 'client' in locals(): + client.close() + + finally: + if 'client' in locals(): + client.close() + print('\n\n', flush=True) + + end_time = time.time() + duration_str = time.strftime('%H:%M:%S', time.gmtime(end_time - start_time)) + + summary = { + 'large_docs_found': large_docs_found + } + + if not shutdown_flag: + threshold_mb = appConfig['largeDocThreshold'] / (1024 * 1024) + + try: + metadata = [ + ['Database', appConfig['databaseName']], + ['Collection', appConfig['collectionName']], + ['Batch size', batch_size], + ['Number of threads', appConfig['numProcesses']], + ['Total documents', total_docs], + ['Large document threshold (bytes)', appConfig['largeDocThreshold']], + ['Large document threshold (MB)', f"{appConfig['largeDocThreshold'] / (1024*1024):.2f}"], + ['Scan Start Time', start_time_iso], + ['Scan completion time', datetime.fromtimestamp(end_time).isoformat()], + ['Total scan time', duration_str], + ['Large documents found', large_docs_found], + [], + ['Document _id', 'Size (bytes)', 'Size (MB)'] + ] + + write_to_csv(csv_filename, metadata, mode='w') + + if large_docs_data: + write_to_csv(csv_filename, large_docs_data, mode='a') + + except Exception as e: + logger.error(f"Failed to write to CSV: {str(e)}") + + print("=" * 80) + print("Scan complete") + print("=" * 80) + print(f"Total documents processed: {total_docs:,}") + print(f"Documents larger than {threshold_mb:.0f}MB: {large_docs_found:,}") + print(f"Total scan time: {duration_str}") + if large_docs_found > 0: + print(f"Large documents have been written to: {csv_filename}") + print("=" * 80) + else: + print("\nScript terminated by user") + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/operations/large-doc-finder/requirements.txt b/operations/large-doc-finder/requirements.txt new file mode 100644 index 0000000..8c7d698 --- /dev/null +++ b/operations/large-doc-finder/requirements.txt @@ -0,0 +1 @@ +pymongo \ No newline at end of file diff --git a/operations/server-certificate-check/.gitignore b/operations/server-certificate-check/.gitignore new file mode 100644 index 0000000..397b4a7 --- /dev/null +++ b/operations/server-certificate-check/.gitignore @@ -0,0 +1 @@ +*.log diff --git a/operations/server-certificate-check/README.md b/operations/server-certificate-check/README.md new file mode 100644 index 0000000..a8f7169 --- /dev/null +++ b/operations/server-certificate-check/README.md @@ -0,0 +1,45 @@ +# Amazon DocumentDB Server Certificate Check +The server certificate check returns a list of all instances in a region including the expiration of the servers certificate and maintenance window. + +## Features +- Output may be filtered using case-insensitive matching on cluster name and/or instance name. + +## Requirements +- Python 3.7 or greater, boto3, urllib3 +- IAM privileges in https://github.com/awslabs/amazon-documentdb-tools/blob/master/operations/server-certificate-check/iam-policy.json + +## Installation +Clone the repository and install the requirements: + +``` +git clone https://github.com/awslabs/amazon-documentdb-tools.git +cd amazon-documentdb-tools/performance/server-certificate-check +python3 -m pip install -r requirements.txt +``` + +## Usage/Examples +The utility accepts the following arguments: + +``` +--region AWS region for scan +--log-file-name Name of log file to capture all output +--cluster-filter [optional] Case-insensitive string to use for filtering clusters to include in output +--instance-filter [optional] Case-insensitive string to use for filtering instances to include in output + +``` + +### Report all Amazon DocumentDB instances in us-east-1 +``` +python3 server-certificate-check.py --log-file-name certs.log --region us-east-1 +``` + +### Report all Amazon DocumentDB instances in us-east-1 containing "ddb5" in instance name +``` +python3 server-certificate-check.py --log-file-name certs.log --region us-east-1 --instance-filter ddb5 +``` + +## License +[Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) + +## Contributing +Contributions are always welcome! See the [contributing](https://github.com/awslabs/amazon-documentdb-tools/blob/master/CONTRIBUTING.md) page for ways to get involved. diff --git a/operations/server-certificate-check/iam-policy.json b/operations/server-certificate-check/iam-policy.json new file mode 100644 index 0000000..4aaed6f --- /dev/null +++ b/operations/server-certificate-check/iam-policy.json @@ -0,0 +1,15 @@ +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "rds:DescribeDBClusters", + "rds:DescribeDBInstances" + ], + "Resource": [ + "arn:aws:rds:{region}:{account}:*:*" + ] + } + ] +} diff --git a/operations/server-certificate-check/requirements.txt b/operations/server-certificate-check/requirements.txt new file mode 100644 index 0000000..b60f0db --- /dev/null +++ b/operations/server-certificate-check/requirements.txt @@ -0,0 +1,4 @@ +botocore>=1.33.6 +boto3>=1.33.6 +requests +urllib3<2 diff --git a/operations/server-certificate-check/server-certificate-check.py b/operations/server-certificate-check/server-certificate-check.py new file mode 100644 index 0000000..17177a9 --- /dev/null +++ b/operations/server-certificate-check/server-certificate-check.py @@ -0,0 +1,69 @@ +#!/usr/bin/env python3 + +import boto3 +import datetime +import argparse +import requests +import json +import sys +import os + + +def deleteLog(appConfig): + if os.path.exists(appConfig['logFileName']): + os.remove(appConfig['logFileName']) + + +def printLog(thisMessage,appConfig): + print("{}".format(thisMessage)) + with open(appConfig['logFileName'], 'a') as fp: + fp.write("{}\n".format(thisMessage)) + + +def scan_clusters(appConfig): + client = boto3.client('docdb',region_name=appConfig['region']) + + response = client.describe_db_clusters(Filters=[{'Name': 'engine','Values': ['docdb']}]) + + printLog("{:<30} | {:<30} | {:<25} | {:<20}".format("cluster-name","instance-name","server-cert-expire","server-maint-window"),appConfig) + + for thisCluster in response['DBClusters']: + thisClusterName = thisCluster['DBClusterIdentifier'] + if appConfig['clusterFilter'] is None or appConfig['clusterFilter'].upper() in thisClusterName.upper(): + for thisInstance in thisCluster['DBClusterMembers']: + thisInstanceName = thisInstance['DBInstanceIdentifier'] + if appConfig['instanceFilter'] is None or appConfig['instanceFilter'].upper() in thisInstanceName.upper(): + responseInstance = client.describe_db_instances(DBInstanceIdentifier=thisInstanceName) + validTill = responseInstance['DBInstances'][0]['CertificateDetails']['ValidTill'] + preferredMaintenanceWindow = responseInstance['DBInstances'][0]['PreferredMaintenanceWindow'] + printLog("{:<30} | {:<30} | {} | {:<20}".format(thisClusterName,thisInstanceName,validTill,preferredMaintenanceWindow),appConfig) + + client.close() + + +def main(): + parser = argparse.ArgumentParser(description='DocumentDB Server Certificate Checker') + + parser.add_argument('--region',required=True,type=str,help='AWS Region') + parser.add_argument('--cluster-filter',required=False,type=str,help='Cluster name filter (substring match)') + parser.add_argument('--instance-filter',required=False,type=str,help='Instance name filter (substring match)') + parser.add_argument('--log-file-name',required=True,type=str,help='Log file name') + + args = parser.parse_args() + + appConfig = {} + appConfig['region'] = args.region + appConfig['logFileName'] = args.log_file_name + appConfig['clusterFilter'] = args.cluster_filter + appConfig['instanceFilter'] = args.instance_filter + + deleteLog(appConfig) + scan_clusters(appConfig) + + print("") + print("Created {} with results".format(appConfig['logFileName'])) + print("") + + +if __name__ == "__main__": + main() diff --git a/performance/README.md b/performance/README.md new file mode 100644 index 0000000..141b009 --- /dev/null +++ b/performance/README.md @@ -0,0 +1,9 @@ +# Amazon DocumentDB Performance Tools + +* [compression-review](./compression-review) - calculate compressibility of each collection by samping random documents. +* [deployment-scanner](./deployment-scanner) - scan all clusters in one account/region and provide potential cost savings suggestions. +* [index-cardinality-detection](./index-cardinality-detection) - sample random documents to estimate index cardinality and selectivity. +* [index-review](./index-review) - provide analysis of all collections and indexes including sizing, index usage, and redundant indexes. +* [metric-collector](./metric-collector) - collect and summarize major cluster and instance level metrics in a single view. +* [metric-analyzer](./metric-analyzer) - process the output of metric-collector for size, cost, and performance recommendations. +* [documentdb-top-operations-report](./documentdb-top-operations-report) - analyze DocumentDB profiler logs to identify top operations and performance bottlenecks. \ No newline at end of file diff --git a/performance/compression-review/.gitignore b/performance/compression-review/.gitignore new file mode 100644 index 0000000..85d1acc --- /dev/null +++ b/performance/compression-review/.gitignore @@ -0,0 +1 @@ +doit*.bash diff --git a/performance/compression-review/README.md b/performance/compression-review/README.md index ec45e64..0f50216 100644 --- a/performance/compression-review/README.md +++ b/performance/compression-review/README.md @@ -12,13 +12,17 @@ The compression review tool samples 1000 documents in each collection to determi - If not installed - "$ pip3 install pymongo" - lz4 Python package - If not installed - "$ pip3 install lz4" + - zstandard Python package + - If not installed - "$ pip3 install zstandard" ## Using the Compression Review Tool `python3 compression-review.py --uri --server-alias ` +- Default compressions tested is lz4/fast/level1 and zstandard/level3/4K/Dictionary +- To test other compression techniques provide --compressor \ - Run on any instance in the replica set - Use a different \ for each server analyzed, output file is named using \ as the starting portion -- Creates a single CSV file per execution +- Creates a single CSV file per execution (so default creates two) - The \ options can be found at https://www.mongodb.com/docs/manual/reference/connection-string/ - If your URI contains ampersand (&) characters they must be escaped with the backslash or enclosed your URI in double quotes - For DocumentDB use either the cluster endpoint or any of the instance endpoints diff --git a/performance/compression-review/compression-review.py b/performance/compression-review/compression-review.py index 6555fc9..5167803f 100644 --- a/performance/compression-review/compression-review.py +++ b/performance/compression-review/compression-review.py @@ -1,28 +1,64 @@ import argparse -from datetime import datetime, timedelta +import datetime as dt import sys import json import pymongo import time -import os -import lz4.frame +import lz4.block +import bz2 +import lzma +import zstandard as zstd +import zlib + + +def createDictionary(appConfig, databaseName, collectionName, client): + dictionarySampleSize = appConfig['dictionarySampleSize'] + dictionarySize = appConfig['dictionarySize'] + + col = client[databaseName][collectionName] + + print("creating dictionary for {}.{} of {:d} bytes using {:d} samples".format(databaseName,collectionName,dictionarySize,dictionarySampleSize)) + dictTrainingDocs = [] + dictSampleDocs = col.aggregate([{"$sample":{"size":dictionarySampleSize}}]) + for thisDoc in dictSampleDocs: + docAsString = json.dumps(thisDoc,default=str) + docAsBytes = str.encode(docAsString) + dictTrainingDocs.append(docAsBytes) + dict_data = zstd.train_dictionary(dictionarySize,dictTrainingDocs) + + return dict_data def getData(appConfig): print('connecting to server') - client = pymongo.MongoClient(appConfig['uri']) + client = pymongo.MongoClient(host=appConfig['uri'],appname='comprevw') + + compressor = appConfig['compressor'] sampleSize = appConfig['sampleSize'] # log output to file - logTimeStamp = datetime.utcnow().strftime('%Y%m%d%H%M%S') - logFileName = "{}-{}-compression-review.csv".format(appConfig['serverAlias'],logTimeStamp) + logTimeStamp = dt.datetime.now(dt.timezone.utc).strftime('%Y%m%d%H%M%S') + logFileName = "{}-{}-{}-compression-review.csv".format(appConfig['serverAlias'],compressor,logTimeStamp) logFileHandle = open(logFileName, "w") - logFileHandle.write("{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}\n".format('dbName','collName','numDocs','avgDocSize','sizeGB','storageGB','compRatio','minSample','maxSample','avgSample','minLz4','maxLz4','avgLz4','lz4Ratio','exceptions')) + # output miscellaneos parameters to csv + logFileHandle.write("{},{},{},{}\n".format('compressor','docsSampled','dictDocsSampled','dictBytes')) + logFileHandle.write("{},{:d},{:d},{:d}\n".format(compressor,sampleSize,appConfig['dictionarySampleSize'],appConfig['dictionarySize'])) + logFileHandle.write("\n") + + # output header to csv + logFileHandle.write("{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}\n".format('dbName','collName','numDocs','avgDocSize','sizeGB','storageGB','existingCompRatio','compEnabled','minSample','maxSample','avgSample','minComp','maxComp','avgComp','projectedCompRatio','exceptions','compTime(ms)')) # get databases - filter out admin, config, local, and system - dbDict = client.admin.command("listDatabases",nameOnly=True,filter={"name":{"$nin":['admin','config','local','system']}})['databases'] + dbSkipList = ['admin','config','local','system'] + try: + dbDict = client.admin.command("listDatabases",nameOnly=True,filter={"name":{"$nin":dbSkipList}})['databases'] + except pymongo.errors.OperationFailure: + dbDict = client.admin.command("listDatabases",nameOnly=True)['databases'] for thisDb in dbDict: + # workaround for EC + if thisDb['name'] in dbSkipList: + continue thisDbName = thisDb['name'] collCursor = client[thisDbName].list_collections() for thisColl in collCursor: @@ -38,12 +74,27 @@ def getData(appConfig): print("analyzing collection {}.{}".format(thisDbName,thisCollName)) collStats = client[thisDbName].command("collStats",thisCollName) + if collStats['count'] == 0: + # exclude collections with no documents + continue + collectionCompressionRatio = collStats['size'] / collStats['storageSize'] gbDivisor = 1024*1024*1024 collectionCount = collStats['count'] collectionAvgObjSize = int(collStats.get('avgObjSize',0)) collectionSizeGB = collStats['size']/gbDivisor collectionStorageSizeGB = collStats['storageSize']/gbDivisor + # check if compression is enabled + compressionInfo = collStats.get('compression',{'enable':False,'threshold':-1}) + compressionEnabled = compressionInfo.get('enable',False) + compressionThreshold = compressionInfo.get('threshold',0) + if compressionEnabled: + #compressionEnabledString = 'Y' + compCsvString = "{}/{}".format('Y',compressionThreshold) + else: + #compressionEnabledString = 'N' + compCsvString = "" + numExceptions = 0 minDocBytes = 999999999 @@ -53,12 +104,33 @@ def getData(appConfig): minLz4Bytes = 999999999 maxLz4Bytes = 0 totLz4Bytes = 0 + totTimeMs = 0 + totTimeNs = 0 + + # build the dictionary if needed (and there are enough documents) + if compressor in ['lz4-fast-dict','lz4-high-dict','zstd-1-dict','zstd-3-dict','zstd-5-dict']: + if collectionCount >= 100: + zstdDict = createDictionary(appConfig, thisDbName, thisCollName, client) + + # instantiate the compressor for zstandard (it doesn't support 1-shot compress) + if compressor == 'zstd-1' or (compressor == 'zstd-1-dict' and collectionCount < 100): + zstdCompressor = zstd.ZstdCompressor(level=1,dict_data=None) + elif compressor == 'zstd-3' or (compressor == 'zstd-3-dict' and collectionCount < 100): + zstdCompressor = zstd.ZstdCompressor(level=3,dict_data=None) + elif compressor == 'zstd-5' or (compressor == 'zstd-5-dict' and collectionCount < 100): + zstdCompressor = zstd.ZstdCompressor(level=5,dict_data=None) + elif compressor == 'zstd-1-dict': + zstdCompressor = zstd.ZstdCompressor(level=1,dict_data=zstdDict) + elif compressor == 'zstd-3-dict': + zstdCompressor = zstd.ZstdCompressor(level=3,dict_data=zstdDict) + elif compressor == 'zstd-5-dict': + zstdCompressor = zstd.ZstdCompressor(level=5,dict_data=zstdDict) try: sampleDocs = client[thisDbName][thisCollName].aggregate([{"$sample":{"size":sampleSize}}]) for thisDoc in sampleDocs: totDocs += 1 - docAsString = str(thisDoc) + docAsString = json.dumps(thisDoc,default=str) docBytes = len(docAsString) totDocBytes += docBytes if (docBytes < minDocBytes): @@ -66,16 +138,42 @@ def getData(appConfig): if (docBytes > maxDocBytes): maxDocBytes = docBytes + startTimeMs = time.time() + startTimeNs = time.time_ns() + # compress it - compressed = lz4.frame.compress(docAsString.encode()) + if compressor == 'lz4-fast' or (compressor == 'lz4-fast-dict' and collectionCount < 100): + compressed = lz4.block.compress(docAsString.encode(),mode='fast',acceleration=1) + elif compressor == 'lz4-high' or (compressor == 'lz4-high-dict' and collectionCount < 100): + compressed = lz4.block.compress(docAsString.encode(),mode='high_compression',compression=1) + elif compressor == 'lz4-fast-dict': + compressed = lz4.block.compress(docAsString.encode(),mode='fast',acceleration=1,dict=zstdDict.as_bytes()) + elif compressor == 'lz4-high-dict': + compressed = lz4.block.compress(docAsString.encode(),mode='high_compression',compression=1,dict=zstdDict.as_bytes()) + elif compressor in ['zstd-1','zstd-3','zstd-5','zstd-1-dict','zstd-3-dict','zstd-5-dict']: + compressed = zstdCompressor.compress(docAsString.encode()) + elif compressor == 'bz2-1': + compressed = bz2.compress(docAsString.encode(),compresslevel=1) + elif compressor == 'lzma-0': + compressed = lzma.compress(docAsString.encode(),format=lzma.FORMAT_XZ,preset=0) + elif compressor == 'zlib-1': + compressed = zlib.compress(docAsString.encode(),level=1) + else: + print('Unknown compressor | {}'.format('compressor')) + sys.exit(1) + + totTimeMs += time.time() - startTimeMs + totTimeNs += time.time_ns() - startTimeNs + lz4Bytes = len(compressed) totLz4Bytes += lz4Bytes if (lz4Bytes < minLz4Bytes): minLz4Bytes = lz4Bytes if (lz4Bytes > maxLz4Bytes): maxLz4Bytes = lz4Bytes + except: - numExceptions = 0 + numExceptions += 1 if (totDocs == 0): avgDocBytes = 0 @@ -90,8 +188,8 @@ def getData(appConfig): avgLz4Bytes = int(totLz4Bytes / totDocs) lz4Ratio = collectionAvgObjSize / avgLz4Bytes - logFileHandle.write("{},{},{:d},{:d},{:.4f},{:.4f},{:.4f},{:d},{:d},{:d},{:d},{:d},{:d},{:.4f},{:d}\n".format(thisDb['name'],thisColl['name'],collectionCount, - collectionAvgObjSize,collectionSizeGB,collectionStorageSizeGB,collectionCompressionRatio,minDocBytes,maxDocBytes,avgDocBytes,minLz4Bytes,maxLz4Bytes,avgLz4Bytes,lz4Ratio,numExceptions)) + logFileHandle.write("{},{},{:d},{:d},{:.4f},{:.4f},{:.4f},{},{:d},{:d},{:d},{:d},{:d},{:d},{:.4f},{:d},{:.4f}\n".format(thisDb['name'],thisColl['name'],collectionCount, + collectionAvgObjSize,collectionSizeGB,collectionStorageSizeGB,collectionCompressionRatio,compCsvString,minDocBytes,maxDocBytes,avgDocBytes,minLz4Bytes,maxLz4Bytes,avgLz4Bytes,lz4Ratio,numExceptions,totTimeNs/1000000)) logFileHandle.close() @@ -122,6 +220,24 @@ def main(): default=1000, help='Number of documents to sample in each collection, default 1000') + parser.add_argument('--compressor', + required=False, + choices=['lz4-fast','lz4-high','lz4-fast-dict','lz4-high-dict','zstd-1','zstd-3','zstd-5','zstd-1-dict','zstd-3-dict','zstd-5-dict','bz2-1','lzma-0','zlib-1'], + type=str, + help='Compressor') + + parser.add_argument('--dictionary-sample-size', + required=False, + type=int, + default=100, + help='Number of documents to sample for dictionary creation') + + parser.add_argument('--dictionary-size', + required=False, + type=int, + default=4096, + help='Size of dictionary (bytes)') + args = parser.parse_args() # check for minimum Python version @@ -133,8 +249,19 @@ def main(): appConfig['uri'] = args.uri appConfig['serverAlias'] = args.server_alias appConfig['sampleSize'] = int(args.sample_size) - - getData(appConfig) + appConfig['compressor'] = args.compressor + appConfig['dictionarySampleSize'] = int(args.dictionary_sample_size) + appConfig['dictionarySize'] = int(args.dictionary_size) + + supportedCompressors=['lz4-fast','zstd-3-dict'] + + if appConfig['compressor'] is None: + # execute for each supported compression algorithm + for thisCompressor in supportedCompressors: + appConfig['compressor'] = thisCompressor + getData(appConfig) + else: + getData(appConfig) if __name__ == "__main__": diff --git a/performance/compression-review/requirements.txt b/performance/compression-review/requirements.txt index 62bd22b..220379f 100644 --- a/performance/compression-review/requirements.txt +++ b/performance/compression-review/requirements.txt @@ -1,2 +1,3 @@ pymongo lz4 +zstandard diff --git a/performance/deployment-scanner/README.md b/performance/deployment-scanner/README.md new file mode 100644 index 0000000..467dc72 --- /dev/null +++ b/performance/deployment-scanner/README.md @@ -0,0 +1,42 @@ +# Amazon DocumentDB Deployment Scanner +The deployment scanner reviews DocumentDB clusters for possible cost optimization and utilization. + +## Features +- Estimate the monthly cost for each cluster in a region in both standard storage and IO optimized storage configurations + +## Requirements +- Python 3.7 or greater, boto3, urllib3 +- IAM privileges in https://github.com/awslabs/amazon-documentdb-tools/blob/master/performance/deployment-scanner/iam-policy.json + +## Installation +Clone the repository and install the requirements: + +``` +git clone https://github.com/awslabs/amazon-documentdb-tools.git +cd amazon-documentdb-tools/performance/deployment-scanner +python3 -m pip install -r requirements.txt +``` + +## Usage/Examples +The deployment scanner accepts the following arguments: + +``` +--region AWS region for scan +--log-file-name Name of file write CSV data to +--start-date [optional] Starting date in YYYYMMDD for historical review of cluster resource usage +--end-date [optional] Ending date in YYYYMMDD for historical review of cluster resource usage + +If --start-date and --end-date are not provided, the last 30 days are used for historical cluster resource usage. +``` + +### Review Amazon DocumentDB clusters in us-east-1 for November 2023: +``` +python3 deployment-scanner.py --log-file-name nov-23-us-east-1 --start-date 20231101 --end-date 20231130 --region us-east-1 +``` + + +## License +[Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) + +## Contributing +Contributions are always welcome! See the [contributing](https://github.com/awslabs/amazon-documentdb-tools/blob/master/CONTRIBUTING.md) page for ways to get involved. diff --git a/performance/deployment-scanner/deployment-scanner-debug.py b/performance/deployment-scanner/deployment-scanner-debug.py new file mode 100644 index 0000000..63ee312 --- /dev/null +++ b/performance/deployment-scanner/deployment-scanner-debug.py @@ -0,0 +1,43 @@ +#!/usr/bin/env python3 + +import boto3 +import datetime +import argparse +import requests +import json +import sys +import os + + +def get_docdb_instance_based_clusters(appConfig): + client = boto3.client('docdb',region_name=appConfig['region']) + + response = client.describe_db_clusters(Filters=[{'Name': 'engine','Values': ['docdb']}]) + + for thisCluster in response['DBClusters']: + if thisCluster['DBClusterIdentifier'] == appConfig['clusterName']: + if 'StorageType' in thisCluster: + print("StorageType is {}".format(thisCluster['StorageType'])) + else: + print("StorageType not present in describe_db_clusters() output") + + client.close() + + +def main(): + parser = argparse.ArgumentParser(description='DocumentDB Deployment Scanner') + + parser.add_argument('--region',required=True,type=str,help='AWS Region') + parser.add_argument('--cluster-name',required=True,type=str,help='name of the cluster') + + args = parser.parse_args() + + appConfig = {} + appConfig['region'] = args.region + appConfig['clusterName'] = args.cluster_name + + clusterList = get_docdb_instance_based_clusters(appConfig) + + +if __name__ == "__main__": + main() diff --git a/performance/deployment-scanner/deployment-scanner.py b/performance/deployment-scanner/deployment-scanner.py new file mode 100644 index 0000000..2fcd1c4 --- /dev/null +++ b/performance/deployment-scanner/deployment-scanner.py @@ -0,0 +1,383 @@ +#!/usr/bin/env python3 + +import boto3 +import datetime +import argparse +import requests +import json +import sys +import os + + +def deleteLog(appConfig): + if os.path.exists(appConfig['logFileName']): + os.remove(appConfig['logFileName']) + + +def printLog(thisMessage,appConfig): + with open(appConfig['logFileName'], 'a') as fp: + fp.write("{}\n".format(thisMessage)) + + +def get_cw_metric_daily_average(appConfig, cwClient, cwMetric, cwMath, cwCluster): + namespace = "AWS/DocDB" + metric = cwMetric + period = 87600 # Seconds in a day + dimensions = [{"Name":"DBClusterIdentifier","Value":cwCluster}] + + startTime = appConfig['startTime'] + endTime = appConfig['endTime'] + + response = cwClient.get_metric_statistics( + Namespace=namespace, + MetricName=metric, + StartTime=startTime, + EndTime=endTime, + Period=period, + Statistics=[cwMath], + Dimensions=dimensions, + ) + + metricValues = {} + + cwMetricTotal = 0 + cwMetricValues = 0 + cwMetricAverage = 0 + + for cw_metric in response['Datapoints']: + metricValues[cw_metric['Timestamp']] = cw_metric.get(cwMath) + cwMetricTotal += cw_metric.get(cwMath) + cwMetricValues += 1 + #if cwMetric == 'CPUSurplusCreditsCharged': + # print("{}".format(cw_metric)) + + if cwMetricValues == 0: + cwMetricAverage = int(0) + else: + cwMetricAverage = int(cwMetricTotal // cwMetricValues) + + return cwMetricAverage + + +def get_docdb_instance_based_clusters(appConfig, pricingDict): + gbBytes = 1000 * 1000 * 1000 + gibBytes = 1024 * 1024 * 1024 + + client = boto3.client('docdb',region_name=appConfig['region']) + cwClient = boto3.client('cloudwatch',region_name=appConfig['region']) + + printLog("{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}".format('cluster','io-type','version','num-instances','standard-compute','standard-io','standard-storage','standard-backup','standard-total','io-optimized-compute','io-optimized-io','io-optimized-storage','io-optimized-backup','io-optimized-total','recommendation','estimated-potential-savings'),appConfig) + + #response = client.describe_db_clusters() + response = client.describe_db_clusters(Filters=[{'Name': 'engine','Values': ['docdb']}]) + + for thisCluster in response['DBClusters']: + monthlyStandard = 0.00 + monthlyIoOptimized = 0.00 + + ioType = thisCluster.get('StorageType','standard') + #print("{}".format(thisCluster.get('StorageType','missing'))) + thisMonthlyStandardIoCompute = 0.00 + thisMonthlyOptimizedIoCompute = 0.00 + numInstances = 0 + engineVersionFull = thisCluster['EngineVersion'] + engineVersionMajor = int(engineVersionFull.split('.')[0]) + clusterContainsServerless = False + for thisInstance in thisCluster['DBClusterMembers']: + # get instance type + responseInstance = client.describe_db_instances(DBInstanceIdentifier=thisInstance['DBInstanceIdentifier']) + numInstances += 1 + dbInstanceClass = responseInstance['DBInstances'][0]['DBInstanceClass'] + if dbInstanceClass == 'db.serverless': + clusterContainsServerless = True + continue + + thisStandardIoCompute = round(float(pricingDict['compute|'+appConfig['region']+'|'+dbInstanceClass+'|standard']['price'])*30*24,0) + thisOptimizedIoCompute = round(float(pricingDict['compute|'+appConfig['region']+'|'+dbInstanceClass+'|iopt1']['price'])*30*24,0) + thisMonthlyStandardIoCompute += thisStandardIoCompute + thisMonthlyOptimizedIoCompute += thisOptimizedIoCompute + + if clusterContainsServerless: + print("") + print("cluster = {} | contains one or more serverless instances, this utility does not support this instance type".format(thisCluster['DBClusterIdentifier'])) + continue + + print("") + print("cluster = {} | IO type = {} | version = {} | instances = {:d}".format(thisCluster['DBClusterIdentifier'],ioType,engineVersionFull,numInstances)) + print(" ESTIMATED standard storage costs | ESTIMATED io optimized storage costs") + + monthlyStandard += thisMonthlyStandardIoCompute + monthlyIoOptimized += thisMonthlyOptimizedIoCompute + + # get historical cloudwatch information + avgReadIopsMonth = get_cw_metric_daily_average(appConfig, cwClient, 'VolumeReadIOPs', 'Sum', thisCluster['DBClusterIdentifier']) + avgWriteIopsMonth = get_cw_metric_daily_average(appConfig, cwClient, 'VolumeWriteIOPs', 'Sum', thisCluster['DBClusterIdentifier']) + totStorageBytes = get_cw_metric_daily_average(appConfig, cwClient, 'VolumeBytesUsed', 'Maximum', thisCluster['DBClusterIdentifier']) + totBackupStorageBilledBytes = get_cw_metric_daily_average(appConfig, cwClient, 'TotalBackupStorageBilled', 'Maximum', thisCluster['DBClusterIdentifier']) + totCPUCredits = get_cw_metric_daily_average(appConfig, cwClient, 'CPUSurplusCreditsCharged', 'Sum', thisCluster['DBClusterIdentifier']) + + totIopsMonth = (avgReadIopsMonth * 30) + (avgWriteIopsMonth * 30) + + # estimated CPU credits + thisCPUCreditCost = round(totCPUCredits * float(pricingDict['cpu-credits|'+appConfig['region']]['price']) / 60 * 30,0) + thisMonthlyStandardIoCompute += thisCPUCreditCost + thisMonthlyOptimizedIoCompute += thisCPUCreditCost + + monthlyStandard += thisCPUCreditCost + monthlyIoOptimized += thisCPUCreditCost + + # estimated io cost + thisStandardIopsCost = round(totIopsMonth * float(pricingDict['io|'+appConfig['region']+'|standard']['price']),0) + thisOptimizedIopsCost = round(totIopsMonth * float(pricingDict['io|'+appConfig['region']+'|iopt1']['price']),0) + + monthlyStandard += thisStandardIopsCost + monthlyIoOptimized += thisOptimizedIopsCost + + # estimated storage cost + thisStandardStorageCost = round(totStorageBytes * float(pricingDict['storage|'+appConfig['region']+'|standard']['price']) / gbBytes,0) + thisOptimizedStorageCost = round(totStorageBytes * float(pricingDict['storage|'+appConfig['region']+'|iopt1']['price']) / gbBytes,0) + + monthlyStandard += thisStandardStorageCost + monthlyIoOptimized += thisOptimizedStorageCost + + # estimated backup cost + thisBackupCost = round(totStorageBytes * float(pricingDict['storage-snapshot|'+appConfig['region']]['price']) / gbBytes,0) + + monthlyStandard += thisBackupCost + monthlyIoOptimized += thisBackupCost + + print(" compute = ${:10,.0f} | compute = ${:10,.0f}".format(thisMonthlyStandardIoCompute,thisMonthlyOptimizedIoCompute)) + print(" io = ${:10,.0f} | io = ${:10,.0f}".format(thisStandardIopsCost,thisOptimizedIopsCost)) + print(" storage = ${:10,.0f} | storage = ${:10,.0f}".format(thisStandardStorageCost,thisOptimizedStorageCost)) + print(" backup storage = ${:10,.0f} | backup storage = ${:10,.0f}".format(thisBackupCost,thisBackupCost)) + print(" ESTIMATED monthly total = ${:10,.0f} | Estimated monthly total = ${:10,.0f}".format(monthlyStandard,monthlyIoOptimized)) + + recommendationString = "" + estimatedMonthlySavings = 0.00 + if (ioType == "standard") and (monthlyIoOptimized < monthlyStandard) and (engineVersionMajor < 5): + estimatedMonthlySavings = monthlyStandard-monthlyIoOptimized + recommendationString = " **** recommendation - consider switching to IO optimized to potentially save ${:.0f} per month but requires upgrading to DocumentDB v5+".format(estimatedMonthlySavings) + print("") + print(recommendationString) + elif (ioType == "standard") and (monthlyIoOptimized < monthlyStandard): + estimatedMonthlySavings = monthlyStandard-monthlyIoOptimized + recommendationString = " **** recommendation - consider switching to IO optimized to potentially save ${:.0f} per month".format(estimatedMonthlySavings) + print("") + print(recommendationString) + elif (ioType != "standard") and (monthlyStandard < monthlyIoOptimized): + estimatedMonthlySavings = monthlyIoOptimized-monthlyStandard + recommendationString = " **** recommendation - consider switching to standard IO to potentially save ${:.0f} per month".format(estimatedMonthlySavings) + print("") + print(recommendationString) + else: + estimatedMonthlySavings = 0.00 + if (ioType == "standard"): + ioTypeText = "standard" + else: + ioTypeText = "io optimized" + + recommendationString = " **** current {} storage configuration achieves the lowest possible price point".format(ioTypeText) + print("") + print(recommendationString) + + printLog("{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}".format(thisCluster['DBClusterIdentifier'],ioType,engineVersionFull,numInstances, + thisMonthlyStandardIoCompute,thisStandardIopsCost,thisStandardStorageCost,thisBackupCost,monthlyStandard, + thisMonthlyOptimizedIoCompute,thisOptimizedIopsCost,thisOptimizedStorageCost,thisBackupCost,monthlyIoOptimized, + recommendationString,estimatedMonthlySavings),appConfig) + + client.close() + cwClient.close() + + +def get_pricing(appConfig): + pd = {} + + print("retrieving pricing...") + pricingUrl = 'https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonDocDB/current/index.json' + response = requests.get(pricingUrl) + pricingDict = json.loads(response.text) + + # get the terms + terms = {} + for thisTermKey in pricingDict['terms']['OnDemand']: + for thisTerm in pricingDict['terms']['OnDemand'][thisTermKey].values(): + # find the price + thisTermSku = thisTerm['sku'] + thisTermPrice = list(thisTerm['priceDimensions'].values())[0]['pricePerUnit']['USD'] + terms[thisTermSku] = thisTermPrice + + # get the pricing + for thisProductKey in pricingDict['products']: + thisProduct = pricingDict['products'][thisProductKey] + if 'ExtendedSupport' in thisProduct['attributes'].get('usagetype','***MISSING***'): + # not yet + continue + + elif 'productFamily' not in thisProduct and thisProduct['attributes']['group'] == 'Global Cluster I/O Operation': + # Global Cluster IO cost + thisSku = thisProduct['sku'] + thisRegion = thisProduct["attributes"]["regionCode"] + thisIoType = 'standard' + thisPrice = terms[thisSku] + thisPricingDictKey = "{}|{}|{}".format('io',thisRegion,thisIoType) + pd[thisPricingDictKey] = {'type':'global-cluster-io','region':thisRegion,'ioType':thisIoType,'price':thisPrice} + # no charge for IO iopt1 + thisIoType = 'iopt1' + thisPrice = 0.00 + thisPricingDictKey = "{}|{}|{}".format('io',thisRegion,thisIoType) + pd[thisPricingDictKey] = {'type':'global-cluster-io','region':thisRegion,'ioType':thisIoType,'price':thisPrice} + + elif 'productFamily' not in thisProduct: + print("*** missing productFamily *** | {}".format(thisProduct)) + sys.exit(1) + + elif (thisProduct["productFamily"] == "System Operation"): + # IO cost + thisSku = thisProduct['sku'] + thisRegion = thisProduct["attributes"]["regionCode"] + thisIoType = 'standard' + thisPrice = terms[thisSku] + thisPricingDictKey = "{}|{}|{}".format('io',thisRegion,thisIoType) + pd[thisPricingDictKey] = {'type':'io','region':thisRegion,'ioType':thisIoType,'price':thisPrice} + # no charge for IO iopt1 + thisIoType = 'iopt1' + thisPrice = 0.00 + thisPricingDictKey = "{}|{}|{}".format('io',thisRegion,thisIoType) + pd[thisPricingDictKey] = {'type':'io','region':thisRegion,'ioType':thisIoType,'price':thisPrice} + + elif (thisProduct["productFamily"] == "Database Instance"): + # Database Instance + thisSku = thisProduct['sku'] + thisRegion = thisProduct["attributes"]["regionCode"] + thisInstanceType = thisProduct["attributes"]["instanceType"] + thisPrice = terms[thisSku] + if thisProduct["attributes"]["volumeType"] in ["IO-Optimized-DocDB","NVMe SSD IO-Optimized"]: + volumeType = 'iopt1' + elif thisProduct["attributes"]["volumeType"] in ["General Purpose","NVMe SSD"]: + volumeType = 'standard' + else: + print("*** Unknown volumeType {}, exiting".format(thisProduct["attributes"]["volumeType"])) + sys.exit(1) + thisPricingDictKey = "{}|{}|{}|{}".format('compute',thisRegion,thisInstanceType,volumeType) + pd[thisPricingDictKey] = {'type':'compute','region':thisRegion,'instanceType':thisInstanceType,'price':thisPrice,'volumeType':volumeType} + + elif (thisProduct["productFamily"] == "Database Storage"): + # Database Storage + # volumeType in ['General Purpose','IO-Optimized-DocDB'] + # skip elastic clusters storage + thisStorageUsage = thisProduct["attributes"].get('usagetype','UNKNOWN') + if (thisProduct["attributes"].get('volumeType','UNKNOWN') in ['General Purpose','IO-Optimized-DocDB','NVMe SSD','NVMe SSD IO-Optimized']) and ('StorageUsage' in thisStorageUsage) and ('Elastic' not in thisStorageUsage): + thisSku = thisProduct['sku'] + thisRegion = thisProduct["attributes"]["regionCode"] + if thisProduct["attributes"]["volumeType"] in ["IO-Optimized-DocDB","NVMe SSD IO-Optimized"]: + thisIoType = 'iopt1' + elif thisProduct["attributes"]["volumeType"] in ["General Purpose","NVMe SSD"]: + thisIoType = 'standard' + thisPrice = terms[thisSku] + thisPricingDictKey = "{}|{}|{}".format('storage',thisRegion,thisIoType) + pd[thisPricingDictKey] = {'type':'storage','region':thisRegion,'ioType':thisIoType,'price':thisPrice} + elif thisProduct["attributes"].get('volumeType','UNKNOWN') not in ['General Purpose','IO-Optimized-DocDB','NVMe SSD','NVMe SSD IO-Optimized']: + print("*** Unknown volumeType {}, exiting".format(thisProduct["attributes"].get('volumeType','UNKNOWN'))) + sys.exit(1) + + elif (thisProduct["productFamily"] == "Storage Snapshot"): + # Storage Snapshot + thisSku = thisProduct['sku'] + thisRegion = thisProduct["attributes"]["regionCode"] + thisPrice = terms[thisSku] + thisPricingDictKey = "{}|{}".format('storage-snapshot',thisRegion) + pd[thisPricingDictKey] = {'type':'storage-snapshot','region':thisRegion,'price':thisPrice} + + elif (thisProduct["productFamily"] == "Database Utilization"): + # Database Utilization - EC vCPU pricing + thisSku = thisProduct['sku'] + thisRegion = thisProduct["attributes"]["regionCode"] + thisPrice = terms[thisSku] + thisPricingDictKey = "{}|{}".format('ec-vcpu',thisRegion) + pd[thisPricingDictKey] = {'type':'ec-vcpu','region':thisRegion,'price':thisPrice} + + elif (thisProduct["productFamily"] == "Serverless"): + # Serverless + thisSku = thisProduct['sku'] + thisRegion = thisProduct["attributes"]["regionCode"] + thisPrice = terms[thisSku] + if thisProduct["attributes"]["volume_optimization"] in ["IO-Optimized"]: + volumeType = 'iopt1' + elif thisProduct["attributes"]["volume_optimization"] in ["General Purpose"]: + volumeType = 'standard' + else: + print("*** Unknown volumeType {}, exiting".format(thisProduct["attributes"]["volume_optimization"])) + sys.exit(1) + thisPricingDictKey = "{}|{}|{}".format('dcu',thisRegion,volumeType) + pd[thisPricingDictKey] = {'type':'dcu','region':thisRegion,'price':thisPrice,'volumeType':volumeType} + + elif (thisProduct["productFamily"] == "CPU Credits"): + # CPU Credits + # using db.t4g.medium for all burstable [conserving cloudwatch calls - cluster only, minor price difference] but use t3g if that is all that is available + thisRegion = thisProduct["attributes"]["regionCode"] + if (thisProduct["attributes"]["instanceType"] == 'db.t3.medium' and 'cpu-credits|'+thisRegion not in pd): + thisSku = thisProduct['sku'] + thisPrice = terms[thisSku] + thisInstanceType = thisProduct["attributes"]["instanceType"] + thisPricingDictKey = "{}|{}".format('cpu-credits',thisRegion) + pd[thisPricingDictKey] = {'type':'cpu-credits','region':thisRegion,'price':thisPrice,'instanceType':thisInstanceType} + elif (thisProduct["attributes"]["instanceType"] == 'db.t4g.medium'): + thisSku = thisProduct['sku'] + thisPrice = terms[thisSku] + thisInstanceType = thisProduct["attributes"]["instanceType"] + thisPricingDictKey = "{}|{}".format('cpu-credits',thisRegion) + pd[thisPricingDictKey] = {'type':'cpu-credits','region':thisRegion,'price':thisPrice,'instanceType':thisInstanceType} + + else: + print("UNKNOWN - {}".format(thisProduct)) + sys.exit(1) + + return pd + + +def main(): + parser = argparse.ArgumentParser(description='DocumentDB Deployment Scanner') + + parser.add_argument('--region',required=True,type=str,help='AWS Region') + parser.add_argument('--start-date',required=False,type=str,help='Start date for historical usage calculations, format=YYYYMMDD') + parser.add_argument('--end-date',required=False,type=str,help='End date for historical usage calculations, format=YYYYMMDD') + parser.add_argument('--log-file-name',required=True,type=str,help='Log file for CSV output') + + args = parser.parse_args() + + if (args.start_date is not None and args.end_date is None): + print("Must provide --end-date when providing --start-date, exiting.") + sys.exit(1) + + elif (args.start_date is None and args.end_date is not None): + print("Must provide --start-date when providing --end-date, exiting.") + sys.exit(1) + + if (args.start_date is None) and (args.end_date is None): + # use last 30 days + startTime = (datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=30)).strftime("%Y-%m-%dT00:00:00") + endTime = (datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=0)).strftime("%Y-%m-%dT00:00:00") + else: + # use provided start/end dates + startTime = "{}-{}-{}T00:00:00".format(args.start_date[0:4],args.start_date[4:6],args.start_date[6:8]) + endTime = "{}-{}-{}T00:00:00".format(args.end_date[0:4],args.end_date[4:6],args.end_date[6:8]) + + print("collecting CloudWatch data for {} to {}".format(startTime,endTime)) + + appConfig = {} + appConfig['region'] = args.region + appConfig['logFileName'] = args.log_file_name+'.csv' + appConfig['startTime'] = startTime + appConfig['endTime'] = endTime + + deleteLog(appConfig) + pricingDict = get_pricing(appConfig) + clusterList = get_docdb_instance_based_clusters(appConfig,pricingDict) + + print("") + print("Created {} with CSV data".format(appConfig['logFileName'])) + print("") + + +if __name__ == "__main__": + main() diff --git a/performance/deployment-scanner/iam-policy.json b/performance/deployment-scanner/iam-policy.json new file mode 100644 index 0000000..c2e78fc --- /dev/null +++ b/performance/deployment-scanner/iam-policy.json @@ -0,0 +1,22 @@ +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "rds:DescribeDBClusters", + "rds:DescribeDBInstances" + ], + "Resource": [ + "arn:aws:rds:{region}:{account}:*:*" + ] + }, + { + "Effect": "Allow", + "Action": [ + "cloudwatch:GetMetricStatistics" + ], + "Resource": "*" + } + ] +} \ No newline at end of file diff --git a/performance/deployment-scanner/requirements.txt b/performance/deployment-scanner/requirements.txt new file mode 100644 index 0000000..b60f0db --- /dev/null +++ b/performance/deployment-scanner/requirements.txt @@ -0,0 +1,4 @@ +botocore>=1.33.6 +boto3>=1.33.6 +requests +urllib3<2 diff --git a/performance/documentdb-top-operations-report/README.md b/performance/documentdb-top-operations-report/README.md new file mode 100644 index 0000000..e447cc3 --- /dev/null +++ b/performance/documentdb-top-operations-report/README.md @@ -0,0 +1,147 @@ +# Amazon DocumentDB Top Operations Report +A tool for analyzing DocumentDB profiler logs to identify top operations and performance bottlenecks. Analyze DocumentDB profiler logs for slowest operations, run as AWS Lambda for automated reports or command-line for ad-hoc analysis, email reports via Amazon SES or CSV file output, support for multiple DocumentDB clusters, and configurable time windows for analysis. + +## Requirements +- Python 3.8+ +- AWS credentials configured for CloudWatch Logs access +- IAM privileges for DocumentDB profiler log access + + +## Installation +Clone the repository and install the requirements: + +``` +git clone https://github.com/awslabs/amazon-documentdb-tools.git +cd amazon-documentdb-tools/performance/documentdb-top-operations-report +python3 -m pip install -r requirements.txt +``` + +## Usage/Examples +The profiler analyzer accepts the following arguments: + +``` +--log-groups Comma-separated list of DocumentDB profiler log groups +--top-ops-count [optional] Number of top operations to return (default: all) +--output-dir [optional] Output directory for CSV files (default: current directory) +--start-time [optional] Report start time in format "YYYY-MM-DD HH:MM:SS" +--end-time [optional] Report end time in format "YYYY-MM-DD HH:MM:SS" + +If --start-time and --end-time are not provided, the last 24 hours are used. +``` + + +### Analyze DocumentDB clusters for the last 24 hours: +``` +python3 src/docdb_profiler_analyzer.py --log-groups "/aws/docdb/cluster1/profiler,/aws/docdb/cluster2/profiler" +``` + +### Custom time range with specific output directory: +``` +python3 src/docdb_profiler_analyzer.py --log-groups "/aws/docdb/cluster1/profiler" --start-time "2024-01-01 00:00:00" --end-time "2024-01-02 00:00:00" --top-ops-count 20 --output-dir "./reports" +``` +**_NOTE:_** +This tool uses the CloudWatch Logs large query functionality to handle result sets larger than 10,000 entries. The implementation is based on the AWS SDK examples for [CloudWatch Logs Large Query](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/scenarios/features/cloudwatch_logs_large_query), which uses recursive querying with threading to efficiently retrieve large volumes of log data from CloudWatch Logs. + +## AWS Lambda Deployment +For automated reporting, deploy as an AWS Lambda function with scheduled execution. + +### Prerequisites +- Amazon SES configured with verified sender and recipient email addresses +- SAM CLI installed +- Docker installed + +### Deploy with SAM CLI +``` +sam build --use-container +sam deploy --guided +``` + +The first command will build the source of your application. The second command will package and deploy your application to AWS, with a series of prompts: + +- **Stack Name**: The name of the stack to deploy to CloudFormation. This should be unique to your account and region. +- **AWS Region**: The AWS region you want to deploy your app to. +- **Parameter LambdaTriggerSchedule**: Provide cron schedule to trigger Lambda in UTC. Example: cron(0 13 * * ? *) +- **Parameter DocDBProfilerLogGrpsName**: Comma separated list of CloudWatch Profile log group names for DocumentDB +- **Parameter TopOpsCount**: Number of top operations to be reported. If left empty then it will report all operations +- **Parameter ReportStartTime**: Start time for adhoc report in UTC. Example: 2024-02-11 00:00:00 . If left empty then it will run the report for the last one day. +- **Parameter ReportEndTime**: End time for adhoc report in UTC. Example: 2024-02-11 00:00:00 . If left empty then it will run the report for the last one day. +- **Parameter SenderEmail**: Sender email address for the report +- **Parameter RecipientEmailList**: Comma separated list of recipient email addresses for the report +- **Confirm changes before deploy**: If set to yes, any change sets will be shown to you before execution for manual review. If set to no, the AWS SAM CLI will automatically deploy application changes. +- **Allow SAM CLI IAM role creation**: Many AWS SAM templates, including this example, create AWS IAM roles required for the AWS Lambda function(s) included to access AWS services. By default, these are scoped down to minimum required permissions. To deploy an AWS CloudFormation stack which creates or modifies IAM roles, the `CAPABILITY_IAM` value for `capabilities` must be provided. If permission isn't provided through this prompt, to deploy this example you must explicitly pass `--capabilities CAPABILITY_IAM` to the `sam deploy` command. +- **Save arguments to samconfig.toml**: If set to yes, your choices will be saved to a configuration file inside the project, so that in the future you can just re-run `sam deploy` without parameters to deploy changes to your application. + +### Use the SAM CLI to build and test locally + +Build your application with the `sam build --use-container` command. + +``` +sam build --use-container +``` + +The SAM CLI installs dependencies defined in `src/requirements.txt`, creates a deployment package, and saves it in the `.aws-sam/build` folder. + +Test a single function by invoking it directly with a test event. An event is a JSON document that represents the input that the function receives from the event source. Test events are included in the `events` folder in this project. + +Set environment variables for testing: +``` +export DOCDB_LOG_GROUP_NAME="/aws/docdb/your-cluster/profiler" +export TOP_OPS_COUNT="10" +export SENDER_EMAIL="test@example.com" +export RECIPIENT_EMAIL_LIST="recipient@example.com" +export REPORT_START_TIME="2024-01-01 00:00:00" +export REPORT_END_TIME="2024-01-02 00:00:00" +``` + +Run functions locally and invoke them with the `sam local invoke` command. + +``` +sam local invoke -e events/manual_invoke.json | jq '.' +``` + +### Deploying with CloudFormation Template +If SAM CLI is not available then you can deploy with cloudformation template `src/cfn_template.yaml` + +1. Create a zip file for AWS Lambda function: +``` +(cd src && zip -r ./function.zip . -x "__pycache__/*" "__init__.py") +``` + +2. Create an Amazon S3 bucket named 'lambda-code-bucket-SourceAWSAccountID' in the source AWS account. + +3. Upload the 'src/function.zip' file to the Amazon S3 bucket named 'lambda-code-bucket-SourceAWSAccountID' created in step 2. + +4. Deploy the CloudFormation template `src/cfn_template.yaml` in the AWS account to create required resources such Amazon IAM roles, Amazon EventBridge Scheduler and AWS Lambda function. + +## Cleanup + +### For SAM Deployment +To delete the sample application that you created, use the AWS CLI. Assuming you used your project name for the stack name, you can run the following: + +``` +sam delete --stack-name "docdb-profiler-top-ops" +``` + +### For CloudFormation Deployment +If you deployed using the CloudFormation template directly, you can delete the stack using: + +``` +# Using AWS CLI +aws cloudformation delete-stack --stack-name "Stack Name" + +# Check deletion status +aws cloudformation describe-stacks --stack-name "Stack Name" +``` + +Alternatively, you can delete the stack through the AWS Management Console: +1. Go to the CloudFormation service in the AWS Console +2. Select your stack (e.g., "docdb-profiler-top-ops") +3. Click "Delete" and confirm the deletion + +**Note**: Make sure to also clean up any S3 bucket you created for storing the Lambda code if you used the CloudFormation deployment method. + +## License +[Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) + +## Contributing +Contributions are always welcome! See the [contributing](CONTRIBUTING.md) page for ways to get involved. diff --git a/performance/documentdb-top-operations-report/cfn_template.yaml b/performance/documentdb-top-operations-report/cfn_template.yaml new file mode 100644 index 0000000..e2915e9 --- /dev/null +++ b/performance/documentdb-top-operations-report/cfn_template.yaml @@ -0,0 +1,151 @@ +AWSTemplateFormatVersion: '2010-09-09' +Description: > + docdb-profiler-top-ops + +Parameters: + LambdaTriggerSchedule: + Type: String + Default: "cron(0 13 * * ? *)" + Description: "Provide cron schedule to trigger Lambda in UTC. Example: cron(0 13 * * ? *)" + DocDBProfilerLogGrpsName: + Type: String + Description: "Comma separated list of CloudWatch Profile log group names for DocumentDB" + TopOpsCount: + Type: String + Default: "" + Description: "Number of top operations to be reported. If left empty then it will report all operations" + ReportStartTime: + Type: String + Default: "" + Description: "Start time of the report in UTC. Example: 2024-02-11 00:00:00. If left empty then it will run the report for the last one day" + ReportEndTime: + Type: String + Default: "" + Description: "End time of the report in UTC. Example: 2024-02-11 00:00:00. If left empty then it will run the report for the last one day" + SenderEmail: + Type: String + Description: "Sender email address for the report" + RecipientEmailList: + Type: String + Description: "Comma separated list of recipient email addresses for the report" + CodeS3Bucket: + Type: String + Description: "S3 bucket name that contains the lambda code zip file" + CodeS3Key: + Type: String + Default: "function.zip" + Description: "S3 key for the lambda code zip file" + +Conditions: + HasReportStartTime: + !Not [!Equals [!Ref ReportStartTime, ""]] + HasReportEndTime: + !Not [!Equals [!Ref ReportEndTime, ""]] + HasTopOpsCount: + !Not [!Equals [!Ref TopOpsCount, ""]] + +Resources: + DocDBProfilerScheduleRule: + Type: AWS::Scheduler::Schedule + Properties: + Name: docdb-profiler-daily-schedule + Description: "Triggers DocDB Profiler function every day at 8:00 AM" + ScheduleExpression: !Ref LambdaTriggerSchedule + FlexibleTimeWindow: + Mode: "OFF" + Target: + Arn: !GetAtt DocDBProfilerFunction.Arn + RoleArn: !GetAtt DocDBProfilerSchedulerRole.Arn + + DocDBProfilerSchedulerRole: + Type: AWS::IAM::Role + Properties: + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Principal: + Service: scheduler.amazonaws.com + Action: sts:AssumeRole + Policies: + - PolicyName: InvokeLambdaPolicy + PolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: lambda:InvokeFunction + Resource: !GetAtt DocDBProfilerFunction.Arn + + DocDBProfilerFunction: + Type: AWS::Lambda::Function + Properties: + Handler: lambda_handler.lambda_handler + Code: + S3Bucket: !Ref CodeS3Bucket + S3Key: !Ref CodeS3Key + Description: DocDB profiler top operations + Runtime: python3.13 + MemorySize: 128 + Timeout: 900 + TracingConfig: + Mode: Active + Architectures: + - x86_64 + Role: !GetAtt DocDBProfilerLambdaExecutionRole.Arn + Layers: + - !Sub arn:aws:lambda:${AWS::Region}:336392948345:layer:AWSSDKPandas-Python313:1 + - !Sub arn:aws:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python313-x86_64:6 + Environment: + Variables: + POWERTOOLS_SERVICE_NAME: DocDBProfilerService + POWERTOOLS_METRICS_NAMESPACE: DocDBProfilerMetrics + DOCDB_LOG_GROUP_NAME: !Ref DocDBProfilerLogGrpsName + TOP_OPS_COUNT: !If + - HasTopOpsCount + - !Ref TopOpsCount + - !Ref AWS::NoValue + REPORT_START_TIME: !If + - HasReportStartTime + - !Ref ReportStartTime + - !Ref AWS::NoValue + REPORT_END_TIME: !If + - HasReportEndTime + - !Ref ReportEndTime + - !Ref AWS::NoValue + SENDER_EMAIL: !Ref SenderEmail + RECIPIENT_EMAIL_LIST: !Ref RecipientEmailList + + DocDBProfilerLambdaExecutionRole: + Type: AWS::IAM::Role + Properties: + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Principal: + Service: lambda.amazonaws.com + Action: sts:AssumeRole + Policies: + - PolicyName: LambdaExecutionPolicy + PolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: + - logs:StartQuery + - logs:GetQueryResults + - logs:StopQuery + - logs:DescribeLogGroups + - logs:DescribeLogStreams + Resource: + - !Sub 'arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/docdb/*/profiler:*' + - Effect: Allow + Action: + - ses:SendEmail + Resource: + - !Sub 'arn:aws:ses:${AWS::Region}:${AWS::AccountId}:identity/*' + +Outputs: + DocDBProfilerFunction: + Description: DocDBProfilerFunction Lambda Function ARN + Value: !GetAtt DocDBProfilerFunction.Arn diff --git a/performance/documentdb-top-operations-report/events/manual_invoke.json b/performance/documentdb-top-operations-report/events/manual_invoke.json new file mode 100644 index 0000000..4c12594 --- /dev/null +++ b/performance/documentdb-top-operations-report/events/manual_invoke.json @@ -0,0 +1,4 @@ +{ + "test": true, + "description": "Manual test invocation for DocumentDB profiler" +} \ No newline at end of file diff --git a/performance/documentdb-top-operations-report/requirements.txt b/performance/documentdb-top-operations-report/requirements.txt new file mode 100644 index 0000000..992bc4c --- /dev/null +++ b/performance/documentdb-top-operations-report/requirements.txt @@ -0,0 +1,4 @@ +boto3 +aws-lambda-powertools[tracer] +pandas +pydantic \ No newline at end of file diff --git a/performance/documentdb-top-operations-report/src/__init__.py b/performance/documentdb-top-operations-report/src/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/performance/documentdb-top-operations-report/src/cloudwatch_query.py b/performance/documentdb-top-operations-report/src/cloudwatch_query.py new file mode 100644 index 0000000..a59a982 --- /dev/null +++ b/performance/documentdb-top-operations-report/src/cloudwatch_query.py @@ -0,0 +1,226 @@ +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. +# SPDX-License-Identifier: Apache-2.0 +import logging +import time +from datetime import datetime +import threading +import boto3 + +from date_utilities import DateUtilities + +DEFAULT_QUERY = "fields @timestamp, @message | sort @timestamp asc" +DEFAULT_LOG_GROUP = "/workflows/cloudwatch-logs/large-query" + +class DateOutOfBoundsError(Exception): + """Exception raised when the date range for a query is out of bounds.""" + + pass + + +class CloudWatchQuery: + """ + A class to query AWS CloudWatch logs within a specified date range. + + :vartype date_range: tuple + :ivar limit: Maximum number of log entries to return. + :vartype limit: int + :log_group str: Name of the log group to query + :query_string str: query + """ + + def __init__(self, log_group: str = DEFAULT_LOG_GROUP, query_string: str=DEFAULT_QUERY) -> None: + self.lock = threading.Lock() + self.log_group = log_group + self.query_string = query_string + self.query_results = [] + self.query_duration = None + self.datetime_format = "%Y-%m-%d %H:%M:%S.%f" + self.date_utilities = DateUtilities() + self.limit = 10000 + + def query_logs(self, date_range): + """ + Executes a CloudWatch logs query for a specified date range and calculates the execution time of the query. + + :return: A batch of logs retrieved from the CloudWatch logs query. + :rtype: list + """ + start_time = datetime.now() + + start_date, end_date = self.date_utilities.normalize_date_range_format( + date_range, from_format="unix_timestamp", to_format="datetime" + ) + + logging.info( + f"Original query:" + f"\n START: {start_date}" + f"\n END: {end_date}" + f"\n LOG GROUP: {self.log_group}" + ) + self.recursive_query((start_date, end_date)) + end_time = datetime.now() + self.query_duration = (end_time - start_time).total_seconds() + + def recursive_query(self, date_range): + """ + Processes logs within a given date range, fetching batches of logs recursively if necessary. + + :param date_range: The date range to fetch logs for, specified as a tuple (start_timestamp, end_timestamp). + :type date_range: tuple + :return: None if the recursive fetching is continued or stops when the final batch of logs is processed. + Although it doesn't explicitly return the query results, this method accumulates all fetched logs + in the `self.query_results` attribute. + :rtype: None + """ + batch_of_logs = self.perform_query(date_range) + # Add the batch to the accumulated logs + with self.lock: + self.query_results.extend(batch_of_logs) + if len(batch_of_logs) == self.limit: + logging.info(f"Fetched {self.limit}, checking for more...") + most_recent_log = self.find_most_recent_log(batch_of_logs) + most_recent_log_timestamp = next( + item["value"] + for item in most_recent_log + if item["field"] == "@timestamp" + ) + new_range = (most_recent_log_timestamp, date_range[1]) + midpoint = self.date_utilities.find_middle_time(new_range) + + first_half_thread = threading.Thread( + target=self.recursive_query, + args=((most_recent_log_timestamp, midpoint),), + ) + second_half_thread = threading.Thread( + target=self.recursive_query, args=((midpoint, date_range[1]),) + ) + + first_half_thread.start() + second_half_thread.start() + + first_half_thread.join() + second_half_thread.join() + + def find_most_recent_log(self, logs): + """ + Search a list of log items and return most recent log entry. + :param logs: A list of logs to analyze. + :return: log + :type :return List containing log item details + """ + most_recent_log = None + most_recent_date = "1970-01-01 00:00:00.000" + + for log in logs: + for item in log: + if item["field"] == "@timestamp": + logging.debug(f"Compared: {item['value']} to {most_recent_date}") + if ( + self.date_utilities.compare_dates( + item["value"], most_recent_date + ) + == item["value"] + ): + logging.debug(f"New most recent: {item['value']}") + most_recent_date = item["value"] + most_recent_log = log + logging.info(f"Most recent log date of batch: {most_recent_date}") + return most_recent_log + + # snippet-start:[python.example_code.cloudwatch_logs.start_query] + def perform_query(self, date_range): + """ + Performs the actual CloudWatch log query. + + :param date_range: A tuple representing the start and end datetime for the query. + :type date_range: tuple + :return: A list containing the query results. + :rtype: list + """ + client = boto3.client("logs") + try: + try: + start_time = round( + self.date_utilities.convert_iso8601_to_unix_timestamp(date_range[0]) + ) + end_time = round( + self.date_utilities.convert_iso8601_to_unix_timestamp(date_range[1]) + ) + response = client.start_query( + logGroupName=self.log_group, + startTime=start_time, + endTime=end_time, + queryString=self.query_string, + limit=self.limit, + ) + query_id = response["queryId"] + except client.exceptions.ResourceNotFoundException as e: + raise DateOutOfBoundsError(f"Resource not found: {e}") + while True: + time.sleep(1) + results = client.get_query_results(queryId=query_id) + if results["status"] in [ + "Complete", + "Failed", + "Cancelled", + "Timeout", + "Unknown", + ]: + return results.get("results", []) + except DateOutOfBoundsError: + return [] + + def _initiate_query(self, client, date_range, max_logs): + """ + Initiates the CloudWatch logs query. + + :param date_range: A tuple representing the start and end datetime for the query. + :type date_range: tuple + :param max_logs: The maximum number of logs to retrieve. + :type max_logs: int + :return: The query ID as a string. + :rtype: str + """ + try: + start_time = round( + self.date_utilities.convert_iso8601_to_unix_timestamp(date_range[0]) + ) + end_time = round( + self.date_utilities.convert_iso8601_to_unix_timestamp(date_range[1]) + ) + response = client.start_query( + logGroupName=self.log_group, + startTime=start_time, + endTime=end_time, + queryString=self.query_string, + limit=max_logs, + ) + return response["queryId"] + except client.exceptions.ResourceNotFoundException as e: + raise DateOutOfBoundsError(f"Resource not found: {e}") + + # snippet-end:[python.example_code.cloudwatch_logs.start_query] + + # snippet-start:[python.example_code.cloudwatch_logs.get_query_results] + def _wait_for_query_results(self, client, query_id): + """ + Waits for the query to complete and retrieves the results. + + :param query_id: The ID of the initiated query. + :type query_id: str + :return: A list containing the results of the query. + :rtype: list + """ + while True: + time.sleep(1) + results = client.get_query_results(queryId=query_id) + if results["status"] in [ + "Complete", + "Failed", + "Cancelled", + "Timeout", + "Unknown", + ]: + return results.get("results", []) + + # snippet-end:[python.example_code.cloudwatch_logs.get_query_results] diff --git a/performance/documentdb-top-operations-report/src/date_utilities.py b/performance/documentdb-top-operations-report/src/date_utilities.py new file mode 100644 index 0000000..ceb9f4c --- /dev/null +++ b/performance/documentdb-top-operations-report/src/date_utilities.py @@ -0,0 +1,222 @@ +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. +# SPDX-License-Identifier: Apache-2.0 +from datetime import datetime, timezone + + +class DateUtilities: + """A class to help mutate dates in Python.""" + + def __init__(self): + """Initialize the DateUtilities class with default datetime format.""" + self.datetime_format = "%Y-%m-%d %H:%M:%S" + + @staticmethod + def is_datetime(date_string, format_string): + """ + Checks if the given date string matches the specified format. + + :param date_string: The date string to be checked. + :type date_string: str + :param format_string: The format string to check against. + :type format_string: str + :return: True if the date_string matches the format_string, False otherwise. + :rtype: bool + """ + try: + datetime.strptime(date_string, format_string) + return True + except ValueError: + return False + + def find_middle_time(self, date_range) -> tuple: + """ + Find the middle time between two timestamps in ISO8601 format. + Returns: + - str: The middle time in ISO8601 format. + """ + # Parse the ISO8601 formatted strings into datetime objects + dt1 = datetime.fromisoformat(date_range[0]) + dt2 = datetime.fromisoformat(date_range[1]) + + # Ensure dt1 is the earlier datetime + if dt1 > dt2: + dt1, dt2 = dt2, dt1 + + # Calculate the difference between the two datetime objects + difference = dt2 - dt1 + + # Find the halfway duration + halfway = difference / 2 + + # Calculate the middle time + middle_time = dt1 + halfway + + return middle_time.isoformat() + + @staticmethod + def format_iso8601(date_str): + # Parse the ISO8601 date string + dt = datetime.fromisoformat(date_str) + + # Format date without microseconds + date_without_microseconds = dt.strftime("%Y-%m-%dT%H:%M:%S") + + # Format microseconds to remove trailing zeros, ensuring at least 3 digits + microseconds = f".{dt.microsecond:06}".rstrip("0")[:4] + + # Construct the final date string + formatted_date = date_without_microseconds + microseconds + + return formatted_date + + # + @staticmethod + def divide_date_range(date_range): + """ + Splits a date range into two equal halves. + + :param date_range: Start and end datetime in a tuple. + :type date_range: tuple + :return: List of tuples with two split date ranges. + :rtype: list of tuples + """ + midpoint = (date_range[0] + date_range[1]) / 2 + return [(date_range[0], round(midpoint)), (round(midpoint), date_range[1])] + + @staticmethod + def convert_unix_timestamp_to_iso8601( + unix_timestamp, iso8601_format="%Y-%m-%d %H:%M:%S" + ): + """ + Converts a UNIX timestamp in milliseconds to a date string in the specified format. + + :param unix_timestamp: UNIX timestamp in milliseconds. + :type unix_timestamp: int + :param iso8601_format: The format string for the output date string, defaults to "%Y-%m-%d %H:%M:%S.%f". + :type iso8601_format: str + :return: The formatted date string. + :rtype: str + """ + in_seconds = unix_timestamp / 1000.0 + date_time = datetime.utcfromtimestamp(in_seconds) + iso8601 = date_time.strftime(iso8601_format) + return iso8601 + + @staticmethod + def convert_iso8601_to_datetime(iso8601, iso8601_format="%Y-%m-%d %H:%M:%S"): + """ + Converts a date string in ISO 8601 format to a Python datetime object. + + :param iso8601: The ISO 8601 formatted date string. + :type iso8601: str + :param iso8601_format: The format string of the input date, defaults to ISO 8601 format. + :type iso8601_format: str + :return: The corresponding Python datetime object. + :rtype: datetime + """ + # date = datetime.strptime(iso8601, iso8601_format) + date = datetime.fromisoformat(iso8601) + return date + + @staticmethod + def convert_datetime_to_unix_timestamp(dt): + """ + Converts a Python datetime object to a UNIX timestamp in milliseconds. + + :param dt: The datetime object to be converted. + :type dt: datetime + :return: UNIX timestamp in milliseconds. + :rtype: int + """ + unix_timestamp = dt.replace(tzinfo=timezone.utc).timestamp() + return unix_timestamp * 1000 + + def convert_unix_timestamp_to_datetime(self, unix_timestamp): + """ + Converts a UNIX timestamp in milliseconds to a Python datetime object. + + :param unix_timestamp: UNIX timestamp in milliseconds. + :type unix_timestamp: int + :return: The corresponding Python datetime object. + :rtype: datetime + """ + ts = self.convert_unix_timestamp_to_iso8601(unix_timestamp) + dt = self.convert_iso8601_to_datetime(ts) + return dt + + def convert_iso8601_to_unix_timestamp(self, iso8601): + """ + Converts a date string in ISO 8601 format to a UNIX timestamp in milliseconds. + + :param iso8601: The ISO 8601 formatted date string. + :type iso8601: str + :return: UNIX timestamp in milliseconds. + :rtype: int + """ + dt = self.convert_iso8601_to_datetime(iso8601) + unix_timestamp = dt.replace(tzinfo=timezone.utc).timestamp() + return unix_timestamp * 1000 + + def convert_datetime_to_iso8601(self, datetime_obj): + """ + Converts a Python datetime object to ISO 1806 format. + + :param dt: The datetime object to be converted. + :type dt: datetime + :return: ISO 1806. + :rtype: str + """ + unix_timestamp = datetime_obj.replace(tzinfo=timezone.utc).timestamp() + iso8601 = self.convert_unix_timestamp_to_iso8601(round(unix_timestamp * 1000)) + return iso8601 + + def compare_dates(self, date_str1, date_str2): + """ + Compares two dates in ISO 8601 format and returns the later one. + + :param date_str1: The first date string in ISO 8601 format. + :type date_str1: str + :param date_str2: The second date string in ISO 8601 format. + :type date_str2: str + :return: The later of the two dates. + :rtype: str + """ + date1 = datetime.fromisoformat(date_str1) + date2 = datetime.fromisoformat(date_str2) + + if date1 > date2: + return date_str1 + else: + return date_str2 + + def normalize_date_range_format(self, date_range, from_format=None, to_format=None): + """ + Normalizes date ranges received in variable formats to a specified format. + + :param date_range: The date range to be normalized. + :type date_range: tuple + :param from_format: The current format of the date range. + :type from_format: str, optional + :param to_format: The target format for the date range. + :type to_format: str, optional + :return: The normalized date range. + :rtype: tuple + :raises Exception: If required parameters are missing. + """ + if not (to_format, from_format): + raise Exception( + "This function requires a date range, a starting format, and a target format" + ) + if "unix_timestamp" in to_format and "datetime" in from_format: + if not self.is_datetime(date_range[0], self.datetime_format): + start_date = self.convert_unix_timestamp_to_datetime(date_range[0]) + else: + start_date = date_range[0] + + if not self.is_datetime(date_range[1], self.datetime_format): + end_date = self.convert_unix_timestamp_to_datetime(date_range[1]) + else: + end_date = date_range[1] + else: + return date_range + return start_date, end_date diff --git a/performance/documentdb-top-operations-report/src/docdb_profiler_analyzer.py b/performance/documentdb-top-operations-report/src/docdb_profiler_analyzer.py new file mode 100644 index 0000000..df15f44 --- /dev/null +++ b/performance/documentdb-top-operations-report/src/docdb_profiler_analyzer.py @@ -0,0 +1,571 @@ +from aws_lambda_powertools import Logger, Tracer, Metrics +from aws_lambda_powertools.metrics import MetricUnit +from aws_lambda_powertools.logging.formatter import LambdaPowertoolsFormatter +import pandas as pd +import boto3 +from datetime import datetime, timedelta +import json +from typing import List, Dict +import time +import os +import argparse + +from cloudwatch_query import CloudWatchQuery +from date_utilities import DateUtilities + +# Initialize Powertools +formatter = LambdaPowertoolsFormatter(utc=True, log_record_order=["message"]) +logger = Logger(service="DocDBProfilerService",logger_formatter=formatter) +tracer = Tracer(service="DocDBProfilerService") +metrics = Metrics(namespace="DocDBProfilerMetrics") + +class DocDBProfilerAnalyzer: + @tracer.capture_method + def __init__(self, log_group: Dict, top_ops_count: int = None, sender_email: str = None, recipient_emails: str = None): + """Initialize the DocDB Profiler Analyzer """ + self.cluster_name = log_group.get("cluster_name") + self.log_group_name = log_group.get("profiler_log") + self.top_ops_count = top_ops_count + session = boto3.Session() + self.region = session.region_name + self.sender_email_address = sender_email + self.recipient_email_list = [email.strip() for email in recipient_emails.split(',')] if recipient_emails else [] + self.logs_client = boto3.client('logs', region_name=self.region) + logger.info(f"Initialized DocDBProfilerAnalyzer for log group: {self.log_group_name}") + + @tracer.capture_method + def fetch_profiler_logs(self, start_time: datetime, end_time: datetime) -> pd.DataFrame: + """Fetch profiler logs using CloudWatchQuery for handling large result sets (>10,000 results).""" + try: + query = """ + fields @timestamp, @message + | sort @timestamp asc + """ + + # Initialize date utilities for conversion + date_utilities = DateUtilities() + + # Convert datetime objects to ISO8601 format for CloudWatchQuery + start_time_iso8601 = date_utilities.convert_datetime_to_iso8601(start_time) + end_time_iso8601 = date_utilities.convert_datetime_to_iso8601(end_time) + + logger.info(f"Starting CloudWatch query for log group: {self.log_group_name}") + logger.info(f"Time range: {start_time_iso8601} to {end_time_iso8601}") + + with tracer.provider.in_subsegment("start_logs_query") as subsegment: + subsegment.put_annotation("log_group", self.log_group_name) + subsegment.put_annotation("start_time", start_time_iso8601) + subsegment.put_annotation("end_time", end_time_iso8601) + + # Use CloudWatchQuery for recursive querying to handle large result sets + cloudwatch_query = CloudWatchQuery( + log_group=self.log_group_name, + query_string=query + ) + + # Execute the query with date range tuple + cloudwatch_query.query_logs((start_time_iso8601, end_time_iso8601)) + + # Get all results from the recursive query + log_entries = cloudwatch_query.query_results + + logger.info(f"CloudWatch query completed in {cloudwatch_query.query_duration} seconds") + logger.info(f"Total log entries retrieved: {len(log_entries)}") + + # Parse results into DataFrame + df = self._parse_logs_to_dataframe(log_entries) + + # Record metrics + metrics.add_metric(name="LogsProcessed", unit=MetricUnit.Count, value=len(df)) + metrics.add_metric(name="QueryDuration", unit=MetricUnit.Seconds, value=cloudwatch_query.query_duration) + + logger.info(f"Successfully processed {len(df)} log entries into DataFrame") + return df + + except Exception as error: + logger.exception("Error fetching profiler logs with CloudWatchQuery") + metrics.add_metric(name="LogFetchErrors", unit=MetricUnit.Count, value=1) + raise + + @tracer.capture_method + def _parse_logs_to_dataframe(self, log_entries: List[Dict]) -> pd.DataFrame: + """Parse log entries into DataFrame with error handling.""" + parsed_entries = [] + with tracer.provider.in_subsegment("parse_logs_to_dataframe"): + for entry in log_entries: + try: + message = json.loads(entry[1]["value"]) + message['timestamp'] = entry[0]["value"] + parsed_entries.append(message) + except json.JSONDecodeError: + logger.warning(f"Failed to parse log entry: {entry}") + metrics.add_metric(name="LogParseErrors", unit=MetricUnit.Count, value=1) + + return pd.DataFrame(parsed_entries) + + @tracer.capture_method + def _process_filter(self,df: pd.DataFrame) -> pd.DataFrame: + """Process filter field in DataFrame by removing actual filter value""" + try: + def get_filter_field(row): + if row["op"] == "query": + return "filter" + elif row["op"] == "command": + return "pipeline" + else: + return "q" + try: + logger.info(f"Processing filters for {len(df)} operations") + logger.info(f"Operations by type: {df['op'].value_counts().to_dict()}") + + def safe_get_filter(row): + try: + if 'command' not in row or pd.isna(row['command']): + return {} + return row["command"].get(get_filter_field(row), {}) + except Exception as e: + logger.warning(f"Error processing filter for row: {e}") + return {} + + filters = df.apply(safe_get_filter, axis=1) + logger.info(f"Successfully processed filters for {len(filters)} operations") + except KeyError as ke: + logger.exception(f"Missing required column: {ke}") + metrics.add_metric(name="ProcessFilterErrors", unit=MetricUnit.Count, value=1) + raise + except AttributeError as ae: + logger.exception(f"Missing required column: {ae}") + metrics.add_metric(name="ProcessFilterErrors", unit=MetricUnit.Count, value=1) + raise + + def replace_values(d): + try: + def process_value(value, value_map, counter): + """Recursively process values, preserving operator structure""" + if isinstance(value, dict): + # For dictionaries, process each key-value pair + new_dict = {} + for k, v in value.items(): + if k.startswith('$'): + # preserve structure, process the value + new_dict[k], counter = process_value(v, value_map, counter) + else: + # Regular field - process the value + new_dict[k], counter = process_value(v, value_map, counter) + return new_dict, counter + elif isinstance(value, list): + # For lists, process each item + new_list = [] + for item in value: + processed_item, counter = process_value(item, value_map, counter) + new_list.append(processed_item) + return new_list, counter + else: + # Leaf value - replace with placeholder + value_key = str(value) + if value_key not in value_map: + value_map[value_key] = f"V{counter[0]}" + counter[0] += 1 + return value_map[value_key], counter + + def process_dict(d, op): + value_map = {} + counter = [1] # Use list to make it mutable + new_d = {} + + for key, value in d.items(): + if (op == 'pipeline' and key == "$match"): + # Process $match stage - preserve operators + processed_match, _ = process_value(value, value_map, counter) + new_d = {"$match": processed_match} + elif op == 'pipeline': + new_d[key] = value + else: + # For query operations, process the value + processed_value, _ = process_value(value, value_map, counter) + new_d[key] = processed_value + return new_d + if isinstance(d, list): + return [process_dict(item,'pipeline') for item in d if isinstance(item, dict)] + elif isinstance(d, dict): + return process_dict(d, 'query') + return d + except (AttributeError, TypeError) as e: + logger.exception(f"Invalid filter format: {str(e)}") + metrics.add_metric(name="ProcessFilterErrors", unit=MetricUnit.Count, value=1) + raise + df = df.copy() + df["modified_filter"] = filters.apply(replace_values) + return df + except Exception as e: + logger.exception(f"Invalid filter format: {str(e)}") + metrics.add_metric(name="ProcessFilterErrors", unit=MetricUnit.Count, value=1) + raise + + @tracer.capture_method + def _process_exec_stats(self,df: pd.DataFrame) -> pd.DataFrame: + """Process executionStats field in DataFrame by removing nReturned and executionTimeMillisEstimate value""" + try: + logger.info(f"Processing execStats for {len(df)} operations") + + def replace_value(d): + try: + if isinstance(d, dict): + for key in ["nReturned", "executionTimeMillisEstimate"]: + if key in d: + d[key] = "?" # Replace value with '?' + + for k, v in d.items(): # Recursively process nested structures + d[k] = replace_value(v) + + elif isinstance(d, list): + return [replace_value(item) for item in d] + return d + except Exception as e: + logger.exception(f"Invalid execStats format: {str(e)}") + metrics.add_metric(name="ProcessExecStatsErrors", unit=MetricUnit.Count, value=1) + raise + + df = df.copy() + + # Handle missing execStats field safely + if 'execStats' not in df.columns: + logger.info("execStats column doesn't exist, creating with empty dict") + df['execStats'] = [{}] * len(df) + else: + logger.info(f"execStats column exists, null count: {df['execStats'].isnull().sum()}") + df['execStats'] = df['execStats'].fillna({}) + + df["modified_execStats"] = df["execStats"].apply(replace_value) + logger.info(f"Successfully processed execStats for {len(df)} operations") + return df + except Exception as e: + logger.exception(f"Error processing execStats: {str(e)}") + metrics.add_metric(name="ProcessExecStatsErrors", unit=MetricUnit.Count, value=1) + raise + + @tracer.capture_method + def _get_agg_dict(self, df: pd.DataFrame) -> Dict: + """ + Create aggregation dictionary based on available columns in dataframe + """ + # Base aggregation dictionary with required fields + agg_dict = { + 'millis': ['mean', 'max', 'count'] + } + + # Optional fields - only add if they exist in the dataframe + optional_fields = ['nreturned','nInserted', 'nModified', 'nRemoved'] + + for field in optional_fields: + if field in df.columns: + agg_dict[field] = 'mean' + + return agg_dict + + @tracer.capture_method + def get_top_operations(self, df: pd.DataFrame, sort_by: str = 'millis') -> pd.DataFrame: + """Extract top N slowest operations with metrics tracking.""" + df = self._process_filter(df) + df = self._process_exec_stats(df) + df["modified_filter_str"] = df["modified_filter"].apply(lambda x: str(x)) # Convert dict to string + df["modified_execStats_str"] = df["modified_execStats"].apply(lambda x: str(x)) # Convert dict to string + + # Always use appName - fill with "N/A" if it doesn't exist or has missing values + if 'appName' not in df.columns: + logger.info("appName column doesn't exist, creating with N/A") + df['appName'] = 'N/A' + else: + logger.info(f"appName column exists, null count: {df['appName'].isnull().sum()}") + df['appName'] = df['appName'].fillna('N/A') + + + # Get dynamic aggregation dictionary based on available columns + agg_dict = self._get_agg_dict(df) + + top_ops = df.groupby(['op', 'ns', 'user', 'appName', 'modified_filter_str', 'modified_execStats_str']).agg( + agg_dict + ).reset_index() + + + # Create list of column names based on available aggregations + columns = ['operation', 'namespace', 'user', 'appName', 'filter_criterion', 'execution_stats', + 'avg_duration_ms', 'max_duration_ms', 'count'] + + # Add optional columns if they exist + if 'nreturned' in agg_dict: + columns.append('avg_docs_returned') + if 'nInserted' in agg_dict: + columns.append('avg_docs_inserted') + if 'nModified' in agg_dict: + columns.append('avg_docs_modified') + if 'nRemoved' in agg_dict: + columns.append('avg_docs_removed') + + top_ops.columns = columns + # Convert duration columns to int + columns_to_convert = [x for x in columns if x not in ['operation','namespace','user', 'appName','filter_criterion','execution_stats']] + for col in columns_to_convert: + top_ops[col] = top_ops[col].fillna(0).astype(int) + + sort_column = 'avg_duration_ms' if sort_by == 'millis' else sort_by + + if self.top_ops_count: + result = top_ops.sort_values(sort_column, ascending=False).head(self.top_ops_count) + else: + result = top_ops.sort_values(sort_column, ascending=False) + + # Record metrics for monitoring + metrics.add_metric( + name="MaxOperationDuration", + unit=MetricUnit.Milliseconds, + value=float(result['max_duration_ms'].max()) + ) + + return result + + @tracer.capture_method + def _format_response_to_html(self,data: Dict): + """Format data(Dict) into html table""" + time_range = f"{data['time_range']['start']} to {data['time_range']['end']}" + + table_rows = "".join( + f""" + + {op['operation']} + {op['namespace']} + {op['user']} + {op['appName']} + {op['filter_criterion']} + {op['execution_stats']} + {op['avg_duration_ms']} + {op['max_duration_ms']} + {op['count']} + {op.get('avg_docs_returned', 'N/A')} + {op.get('avg_docs_inserted', 'N/A')} + {op.get('avg_docs_modified', 'N/A')} + {op.get('avg_docs_removed', 'N/A')} + + """ + for op in data['top_operations'] + ) + + html_message = f""" + + +

Here are the top{' ' + str(self.top_ops_count) if self.top_ops_count is not None else ''} operations from the Document DB cluster {self.cluster_name} for the time range: {time_range}

+ + + + + + + + + + + + + + + + + {table_rows} +
OperationNamespaceUserApplication NameOperation PatternExecution StatsAvg Duration (ms)Max Duration (ms)CountAvg Docs ReturnedAvg Docs InsertedAvg Docs ModifiedAvg Docs Removed
+ + + """ + + return html_message + + @tracer.capture_method + def send_response_via_email(self,response:Dict) -> Dict: + """Send response via email""" + if not self.sender_email_address or not self.recipient_email_list: + raise ValueError("Email configuration missing. Set SENDER_EMAIL and RECIPIENT_EMAIL_LIST environment variables.") + + + message = self._format_response_to_html(response) + ses_client = boto3.client("ses") + email_response = ses_client.send_email( + Source=self.sender_email_address, + Destination={"ToAddresses": self.recipient_email_list}, + Message={ + "Subject": {"Data": f"DocumentDB Top Operations Report for cluster {self.cluster_name}"}, + "Body": {"Html": {"Data": message}} + } + ) + logger.info(f"Email sent to {self.recipient_email_list}") + return email_response + + @tracer.capture_method + @staticmethod + def parse_log_groups(log_groups_str: str) -> list: + """ + Parse comma-separated log group string into a list of dictionaries + containing cluster name and profile log path + """ + try: + # Split the string by comma and strip whitespace + log_groups = [lg.strip() for lg in log_groups_str.split(',')] + + result = [] + for log_group in log_groups: + # Split the path and extract cluster name + parts = log_group.split('/') + if len(parts) >= 4: + cluster_name = parts[3] # Get the cluster name part + result.append({ + "cluster_name": cluster_name, + "profiler_log": log_group + }) + + return result + except Exception as e: + logger.exception(f"Error parsing log groups: {str(e)}") + raise + + + +def analyze_profiler_logs(log_groups_str: str, top_ops_count: int = None, + start_time: datetime = None, end_time: datetime = None, + sender_email: str = None, recipient_emails: str = None) -> List[Dict]: + """ + Common function to analyze profiler logs that can be used by both CLI and Lambda. + + Args: + log_groups_str: Comma-separated list of DocumentDB profiler log groups + top_ops_count: Number of top operations to return + start_time: Report start time (defaults to 24 hours ago) + end_time: Report end time (defaults to now) + sender_email: Sender email for SES (optional) + recipient_emails: Comma-separated recipient emails (optional) + + Returns: + List of dictionaries containing analysis results for each log group + """ + try: + # Parse log groups + log_groups = DocDBProfilerAnalyzer.parse_log_groups(log_groups_str) + + # Set default time range if not provided + if start_time is None: + start_time = datetime.now() - timedelta(days=1) + if end_time is None: + end_time = datetime.now() + + results = [] + + for log_group in log_groups: + logger.info(f"Processing log group: {log_group['cluster_name']}") + + # Initialize analyzer + analyzer = DocDBProfilerAnalyzer(log_group, top_ops_count, sender_email, recipient_emails) + + # Fetch and analyze logs + df = analyzer.fetch_profiler_logs(start_time, end_time) + + # Skip if no logs found + if df.empty: + logger.info(f"No profiler logs found for log group: {log_group['cluster_name']}") + continue + + # Get top operations + top_ops = analyzer.get_top_operations(df) + + # Prepare result + result = { + "cluster_name": log_group['cluster_name'], + "log_group": log_group['profiler_log'], + "top_operations": top_ops, + "time_range": { + "start": start_time, + "end": end_time + }, + "analyzer": analyzer # Include analyzer for email functionality + } + + results.append(result) + logger.info(f"Found {len(top_ops)} top operations for {log_group['cluster_name']}") + + return results + + except Exception as error: + logger.exception("Error in analyze_profiler_logs") + raise + + +def main(): + """Main function for command line execution.""" + parser = argparse.ArgumentParser(description='DocumentDB Profiler Analyzer') + + # Required arguments + parser.add_argument('--log-groups', + required=True, + help='Comma-separated list of DocumentDB profiler log groups') + + # Optional arguments + parser.add_argument('--top-ops-count', + type=int, + help='Number of top operations to return') + + parser.add_argument('--output-dir', + default='.', + help='Output directory for CSV files (default: current directory)') + + parser.add_argument('--start-time', + help='Report start time in format "YYYY-MM-DD HH:MM:SS"') + + parser.add_argument('--end-time', + help='Report end time in format "YYYY-MM-DD HH:MM:SS"') + + args = parser.parse_args() + + try: + # Parse time arguments + start_time = datetime.strptime(args.start_time, "%Y-%m-%d %H:%M:%S") if args.start_time else None + end_time = datetime.strptime(args.end_time, "%Y-%m-%d %H:%M:%S") if args.end_time else None + + # Create output directory if it doesn't exist + os.makedirs(args.output_dir, exist_ok=True) + + # Call analysis function + results = analyze_profiler_logs( + log_groups_str=args.log_groups, + top_ops_count=args.top_ops_count, + start_time=start_time, + end_time=end_time + ) + + # Process results and create CSV files + for result in results: + cluster_name = result['cluster_name'] + top_ops = result['top_operations'] + time_range = result['time_range'] + + print(f"Processing log group: {cluster_name}") + print(f"Analyzing logs from {time_range['start']} to {time_range['end']}") + print(f"Found {len(top_ops)} top operations") + + # Create CSV filename with cluster name and report time range + start_str = time_range['start'].strftime("%Y%m%d_%H%M%S") + end_str = time_range['end'].strftime("%Y%m%d_%H%M%S") + csv_filename = f"docdb_top_operations_{cluster_name}_{start_str}_to_{end_str}.csv" + csv_path = os.path.join(args.output_dir, csv_filename) + + # Save to CSV file + top_ops.to_csv(csv_path, index=False) + print(f"Report saved to: {csv_path}") + + if not results: + print("No profiler logs found for any log groups") + else: + print("Analysis completed successfully!") + + except Exception as error: + print(f"Error processing profiler logs: {str(error)}") + raise + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/performance/documentdb-top-operations-report/src/lambda_handler.py b/performance/documentdb-top-operations-report/src/lambda_handler.py new file mode 100644 index 0000000..594ac50 --- /dev/null +++ b/performance/documentdb-top-operations-report/src/lambda_handler.py @@ -0,0 +1,102 @@ +from aws_lambda_powertools import Logger, Tracer, Metrics +from aws_lambda_powertools.metrics import MetricUnit +from aws_lambda_powertools.utilities.typing import LambdaContext +from aws_lambda_powertools.logging.formatter import LambdaPowertoolsFormatter +from datetime import datetime, timedelta +import json +import os +from typing import Dict + +from docdb_profiler_analyzer import DocDBProfilerAnalyzer, analyze_profiler_logs + +# Initialize Powertools +formatter = LambdaPowertoolsFormatter(utc=True, log_record_order=["message"]) +logger = Logger(service="DocDBProfilerService", logger_formatter=formatter) +tracer = Tracer(service="DocDBProfilerService") +metrics = Metrics(namespace="DocDBProfilerMetrics") + + +@logger.inject_lambda_context +@tracer.capture_lambda_handler +@metrics.log_metrics +def lambda_handler(event: Dict, context: LambdaContext) -> Dict: + """Main Lambda handler with full Powertools instrumentation.""" + try: + # Validate required environment variables + if os.getenv("DOCDB_LOG_GROUP_NAME") is None: + raise ValueError("DOCDB_LOG_GROUP_NAME environment variable is not set") + + if os.getenv("SENDER_EMAIL") is None: + raise ValueError("SENDER_EMAIL environment variable is not set") + + if os.getenv("RECIPIENT_EMAIL_LIST") is None: + raise ValueError("RECIPIENT_EMAIL_LIST environment variable is not set") + + # Read environment variables + log_groups_str = os.getenv("DOCDB_LOG_GROUP_NAME") + + # Handle TOP_OPS_COUNT with error handling for invalid values + top_ops_count = None + if (value := os.getenv("TOP_OPS_COUNT")) and value.strip(): + try: + top_ops_count = int(value) + except ValueError: + logger.warning(f"Invalid TOP_OPS_COUNT value '{value}', using None (all operations)") + top_ops_count = None + + sender_email = os.getenv("SENDER_EMAIL") + recipient_emails = os.getenv("RECIPIENT_EMAIL_LIST") + + # Parse time range from environment variables + start_time = datetime.strptime(start_time_str, "%Y-%m-%d %H:%M:%S") if (start_time_str := os.getenv("REPORT_START_TIME")) else None + end_time = datetime.strptime(end_time_str, "%Y-%m-%d %H:%M:%S") if (end_time_str := os.getenv("REPORT_END_TIME")) else None + + # Call analysis function + results = analyze_profiler_logs( + log_groups_str=log_groups_str, + top_ops_count=top_ops_count, + start_time=start_time, + end_time=end_time, + sender_email=sender_email, + recipient_emails=recipient_emails + ) + + # Process results and send emails + if not results: + logger.info("No profiler logs found") + return { + "statusCode": 200, + "body": "No profiler logs found" + } + + for result in results: + # Prepare response for email + email_response = { + "top_operations": result['top_operations'].to_dict(orient='records'), + "time_range": { + "start": result['time_range']['start'].isoformat(), + "end": result['time_range']['end'].isoformat() + } + } + + logger.info("Successfully analyzed profiler logs", extra={ + "cluster_name": result['cluster_name'], + "total_ops": len(result['top_operations']), + }) + + # Send output via email using the analyzer instance + result['analyzer'].send_response_via_email(email_response) + + return { + "statusCode": 200, + "body": "Top operations report published." + } + + except Exception as error: + logger.exception("Error processing profiler logs") + metrics.add_metric(name="ProcessingErrors", unit=MetricUnit.Count, value=1) + + return { + "statusCode": 500, + "body": json.dumps({"error": str(error)}) + } \ No newline at end of file diff --git a/performance/documentdb-top-operations-report/template.yaml b/performance/documentdb-top-operations-report/template.yaml new file mode 100644 index 0000000..bc71951 --- /dev/null +++ b/performance/documentdb-top-operations-report/template.yaml @@ -0,0 +1,135 @@ +AWSTemplateFormatVersion: '2010-09-09' +Transform: AWS::Serverless-2016-10-31 +Description: > + docdb-profiler-top-ops + +Globals: + Function: + Timeout: 900 + MemorySize: 128 + Runtime: python3.13 + Tracing: Active + LoggingConfig: + LogFormat: JSON + +Parameters: + LambdaTriggerSchedule: + Type: String + Default: "cron(0 13 * * ? *)" + Description: "Provide cron schedule to trigger Lambda in UTC. Example: cron(0 13 * * ? *)" + DocDBProfilerLogGrpsName: + Type: String + Description: "Comma separated list of CloudWatch Profile log group names for DocumentDB" + TopOpsCount: + Type: String + Default: "" + Description: "Number of top operations to be reported. If left empty then it will report all operations" + ReportStartTime: + Type: String + Default: "" + Description: "Start time of the report in UTC. Example: 2024-02-11 00:00:00. If left empty then it will run the report for the last one day" + ReportEndTime: + Type: String + Default: "" + Description: "End time of the report in UTC. Example: 2024-02-11 00:00:00. If left empty then it will run the report for the last one day" + SenderEmail: + Type: String + Description: "Sender email address for the report" + RecipientEmailList: + Type: String + Description: "Comma separated list of recipient email addresses for the report" + +Conditions: + HasReportStartTime: + !Not [!Equals [!Ref ReportStartTime, ""]] + HasReportEndTime: + !Not [!Equals [!Ref ReportEndTime, ""]] + HasTopOpsCount: + !Not [!Equals [!Ref TopOpsCount, ""]] + +Resources: + DocDBProfilerScheduleRule: + Type: AWS::Scheduler::Schedule + Properties: + Name: docdb-profiler-daily-schedule + Description: "Triggers DocDB Profiler function every day at 8:00 AM" + ScheduleExpression: !Ref LambdaTriggerSchedule + FlexibleTimeWindow: + Mode: "OFF" + Target: + Arn: !GetAtt DocDBProfilerFunction.Arn + RoleArn: !GetAtt DocDBProfilerSchedulerRole.Arn + + DocDBProfilerSchedulerRole: + Type: AWS::IAM::Role + Properties: + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Principal: + Service: scheduler.amazonaws.com + Action: sts:AssumeRole + Policies: + - PolicyName: InvokeLambdaPolicy + PolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: lambda:InvokeFunction + Resource: !GetAtt DocDBProfilerFunction.Arn + + DocDBProfilerFunction: + Type: AWS::Serverless::Function + Properties: + Handler: lambda_handler.lambda_handler + CodeUri: src + Description: DocDB profiler top operations + Architectures: + - x86_64 + Tracing: Active + Layers: + - !Sub arn:aws:lambda:${AWS::Region}:336392948345:layer:AWSSDKPandas-Python313:1 + - !Sub arn:aws:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python313-x86_64:6 + Environment: + Variables: + POWERTOOLS_SERVICE_NAME: DocDBProfilerService + POWERTOOLS_METRICS_NAMESPACE: DocDBProfilerMetrics + DOCDB_LOG_GROUP_NAME: !Ref DocDBProfilerLogGrpsName + TOP_OPS_COUNT: !If + - HasTopOpsCount + - !Ref TopOpsCount + - !Ref AWS::NoValue + REPORT_START_TIME: !If + - HasReportStartTime + - !Ref ReportStartTime + - !Ref AWS::NoValue + REPORT_END_TIME: !If + - HasReportEndTime + - !Ref ReportEndTime + - !Ref AWS::NoValue + SENDER_EMAIL: !Ref SenderEmail + RECIPIENT_EMAIL_LIST: !Ref RecipientEmailList + Policies: + - Statement: + - Effect: Allow + Action: + - logs:StartQuery + - logs:GetQueryResults + - logs:StopQuery + - logs:DescribeLogGroups + - logs:DescribeLogStreams + Resource: + - !Sub 'arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/docdb/*/profiler:*' + - Effect: Allow + Action: + - ses:SendEmail + Resource: + - !Sub 'arn:aws:ses:${AWS::Region}:${AWS::AccountId}:identity/*' + Tags: + LambdaPowertools: python + +Outputs: + DocDBProfilerFunction: + Description: DocDBProfilerFunction Lambda Function ARN + Value: !GetAtt DocDBProfilerFunction.Arn diff --git a/performance/documentdb-top-operations-report/tests/__init__.py b/performance/documentdb-top-operations-report/tests/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/performance/documentdb-top-operations-report/tests/conftest.py b/performance/documentdb-top-operations-report/tests/conftest.py new file mode 100644 index 0000000..0ec18d8 --- /dev/null +++ b/performance/documentdb-top-operations-report/tests/conftest.py @@ -0,0 +1,58 @@ +""" +Pytest configuration and shared fixtures for DocumentDB Profiler tests +""" +import os +import pytest +from moto import mock_aws +import boto3 + + +@pytest.fixture(scope="function") +def aws_credentials(): + """Mocked AWS Credentials for moto.""" + os.environ["AWS_ACCESS_KEY_ID"] = "testing" + os.environ["AWS_SECRET_ACCESS_KEY"] = "testing" + os.environ["AWS_SECURITY_TOKEN"] = "testing" + os.environ["AWS_SESSION_TOKEN"] = "testing" + os.environ["AWS_DEFAULT_REGION"] = "us-east-1" + + +@pytest.fixture(scope="function") +def logs_client(aws_credentials): + """Create a mocked CloudWatch Logs client.""" + with mock_aws(): + yield boto3.client("logs", region_name="us-east-1") + + +@pytest.fixture(scope="function") +def ses_client(aws_credentials): + """Create a mocked SES client.""" + with mock_aws(): + yield boto3.client("ses", region_name="us-east-1") + + +@pytest.fixture +def clean_env(): + """Clean environment variables before and after test.""" + # Store original env vars + original_env = dict(os.environ) + + # Clear relevant env vars + env_vars_to_clear = [ + 'DOCDB_LOG_GROUP_NAME', + 'SENDER_EMAIL', + 'RECIPIENT_EMAIL_LIST', + 'TOP_OPS_COUNT', + 'REPORT_START_TIME', + 'REPORT_END_TIME' + ] + + for var in env_vars_to_clear: + if var in os.environ: + del os.environ[var] + + yield + + # Restore original env vars + os.environ.clear() + os.environ.update(original_env) \ No newline at end of file diff --git a/performance/documentdb-top-operations-report/tests/requirements.txt b/performance/documentdb-top-operations-report/tests/requirements.txt new file mode 100644 index 0000000..ba59dd0 --- /dev/null +++ b/performance/documentdb-top-operations-report/tests/requirements.txt @@ -0,0 +1,6 @@ +pytest +boto3 +requests +moto[logs,ses] +pandas +aws-lambda-powertools[tracer] diff --git a/performance/documentdb-top-operations-report/tests/unit/__init__.py b/performance/documentdb-top-operations-report/tests/unit/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/performance/documentdb-top-operations-report/tests/unit/test_handler.py b/performance/documentdb-top-operations-report/tests/unit/test_handler.py new file mode 100644 index 0000000..664ba99 --- /dev/null +++ b/performance/documentdb-top-operations-report/tests/unit/test_handler.py @@ -0,0 +1,891 @@ +import json +import os +import pytest +from unittest.mock import Mock, patch, MagicMock +import pandas as pd +from datetime import datetime + +# Import the actual modules +import sys +import os +sys.path.insert(0, os.path.join(os.path.dirname(__file__), '../../src')) +from lambda_handler import lambda_handler +from docdb_profiler_analyzer import DocDBProfilerAnalyzer, analyze_profiler_logs + + +class MockLambdaContext: + """Mock Lambda context for testing""" + def __init__(self): + self.function_name = "DocDBProfilerFunction" + self.memory_limit_in_mb = 128 + self.invoked_function_arn = "arn:aws:lambda:us-east-1:123456789012:function:DocDBProfilerFunction" + self.aws_request_id = "52fdfc07-2182-154f-163f-5f0f9a621d72" + + def get_remaining_time_in_millis(self) -> int: + return 30000 + + +@pytest.fixture +def lambda_context(): + """Fixture for Lambda context""" + return MockLambdaContext() + + +@pytest.fixture +def scheduler_event(): + """EventBridge Scheduler event fixture""" + return { + "version": "0", + "id": "53dc4d37-cffa-4f76-80c9-8b7d4a4d2eaa", + "detail-type": "Scheduled Event", + "source": "aws.scheduler", + "account": "123456789012", + "time": "2024-02-11T13:00:00Z", + "region": "us-east-1", + "resources": [ + "arn:aws:scheduler:us-east-1:123456789012:schedule/default/docdb-profiler-daily-schedule" + ], + "detail": {} + } + + +@pytest.fixture +def manual_event(): + """Manual invocation event fixture""" + return { + "test": True, + "description": "Manual test invocation for DocumentDB profiler" + } + + +@pytest.fixture +def mock_env_vars(): + """Mock environment variables""" + return { + 'DOCDB_LOG_GROUP_NAME': '/aws/docdb/test-cluster/profiler', + 'SENDER_EMAIL': 'sender@example.com', + 'RECIPIENT_EMAIL_LIST': 'recipient@example.com', + 'TOP_OPS_COUNT': '10', + 'POWERTOOLS_SERVICE_NAME': 'DocDBProfilerService', + 'POWERTOOLS_METRICS_NAMESPACE': 'DocDBProfilerMetrics' + } + + +@pytest.fixture +def sample_log_data(): + """Sample DocumentDB profiler log data""" + return [ + { + 'timestamp': '2024-02-11T13:00:00.000Z', + 'op': 'query', + 'ns': 'testdb.users', + 'user': 'testuser', + 'client': '10.0.0.1:12345', + 'command': {'find': 'users', 'filter': {'status': 'active'}}, + 'execStats': {'nReturned': 100, 'executionTimeMillisEstimate': 50}, + 'millis': 45, + 'nreturned': 100 + }, + { + 'timestamp': '2024-02-11T13:01:00.000Z', + 'op': 'command', + 'ns': 'testdb.orders', + 'user': 'testuser', + 'client': '10.0.0.2:12346', + 'command': {'aggregate': 'orders', 'pipeline': [{'$match': {'date': '2024-02-11'}}]}, + 'execStats': {'nReturned': 50, 'executionTimeMillisEstimate': 120}, + 'millis': 115, + 'nreturned': 50 + } + ] + + +def test_parse_log_groups(): + """Test log group parsing function""" + log_groups_str = "/aws/docdb/cluster1/profiler,/aws/docdb/cluster2/profiler" + result = DocDBProfilerAnalyzer.parse_log_groups(log_groups_str) + + assert len(result) == 2 + assert result[0]['cluster_name'] == 'cluster1' + assert result[0]['profiler_log'] == '/aws/docdb/cluster1/profiler' + assert result[1]['cluster_name'] == 'cluster2' + assert result[1]['profiler_log'] == '/aws/docdb/cluster2/profiler' + + +@patch('docdb_profiler_analyzer.boto3.client') +@patch('docdb_profiler_analyzer.boto3.Session') +def test_docdb_profiler_analyzer_init(mock_session, mock_client): + """Test DocDBProfilerAnalyzer initialization""" + mock_session.return_value.region_name = 'us-east-1' + + log_group = { + 'cluster_name': 'test-cluster', + 'profiler_log': '/aws/docdb/test-cluster/profiler' + } + + analyzer = DocDBProfilerAnalyzer( + log_group=log_group, + top_ops_count=10, + sender_email='sender@example.com', + recipient_emails='recipient@example.com' + ) + + assert analyzer.cluster_name == 'test-cluster' + assert analyzer.log_group_name == '/aws/docdb/test-cluster/profiler' + assert analyzer.top_ops_count == 10 + assert analyzer.sender_email_address == 'sender@example.com' + assert analyzer.recipient_email_list == ['recipient@example.com'] + + +def test_lambda_handler_missing_docdb_log_group(scheduler_event, lambda_context): + """Test lambda handler with missing DOCDB_LOG_GROUP_NAME environment variable""" + with patch.dict(os.environ, {}, clear=True): + result = lambda_handler(scheduler_event, lambda_context) + + assert result['statusCode'] == 500 + assert 'DOCDB_LOG_GROUP_NAME environment variable is not set' in result['body'] + + +def test_lambda_handler_missing_sender_email(scheduler_event, lambda_context): + """Test lambda handler with missing SENDER_EMAIL environment variable""" + with patch.dict(os.environ, {'DOCDB_LOG_GROUP_NAME': '/aws/docdb/test-cluster/profiler'}, clear=True): + result = lambda_handler(scheduler_event, lambda_context) + + assert result['statusCode'] == 500 + assert 'SENDER_EMAIL environment variable is not set' in result['body'] + + +def test_lambda_handler_missing_recipient_email(scheduler_event, lambda_context): + """Test lambda handler with missing RECIPIENT_EMAIL_LIST environment variable""" + with patch.dict(os.environ, { + 'DOCDB_LOG_GROUP_NAME': '/aws/docdb/test-cluster/profiler', + 'SENDER_EMAIL': 'sender@example.com' + }, clear=True): + result = lambda_handler(scheduler_event, lambda_context) + + assert result['statusCode'] == 500 + assert 'RECIPIENT_EMAIL_LIST environment variable is not set' in result['body'] + + +@patch('lambda_handler.analyze_profiler_logs') +def test_lambda_handler_no_logs_found(mock_analyze, scheduler_event, lambda_context): + """Test lambda handler when no logs are found""" + # Mock analyze_profiler_logs to return empty results + mock_analyze.return_value = [] + + with patch.dict(os.environ, { + 'DOCDB_LOG_GROUP_NAME': '/aws/docdb/test-cluster/profiler', + 'SENDER_EMAIL': 'sender@example.com', + 'RECIPIENT_EMAIL_LIST': 'recipient@example.com' + }): + result = lambda_handler(scheduler_event, lambda_context) + + assert result['statusCode'] == 200 + assert result['body'] == "No profiler logs found" + + +@patch('lambda_handler.analyze_profiler_logs') +def test_lambda_handler_with_manual_event(mock_analyze, manual_event, lambda_context): + """Test lambda handler with manual invocation event""" + # Mock analyze_profiler_logs to return empty results + mock_analyze.return_value = [] + + with patch.dict(os.environ, { + 'DOCDB_LOG_GROUP_NAME': '/aws/docdb/test-cluster/profiler', + 'SENDER_EMAIL': 'sender@example.com', + 'RECIPIENT_EMAIL_LIST': 'recipient@example.com' + }): + result = lambda_handler(manual_event, lambda_context) + + assert result['statusCode'] == 200 + + +@patch('docdb_profiler_analyzer.CloudWatchQuery') +@patch('docdb_profiler_analyzer.boto3.client') +@patch('docdb_profiler_analyzer.boto3.Session') +def test_analyze_profiler_logs_success(mock_session, mock_client, mock_cloudwatch_query, sample_log_data): + """Test analyze_profiler_logs function with successful log processing using CloudWatchQuery""" + mock_session.return_value.region_name = 'us-east-1' + + # Mock CloudWatchQuery instance + mock_cw_query_instance = Mock() + + # Create mock log entries that match the expected format from CloudWatchQuery + mock_log_entries = [] + for log_entry in sample_log_data: + mock_log_entries.append([ + {"field": "@timestamp", "value": log_entry['timestamp']}, + {"field": "@message", "value": json.dumps(log_entry)} + ]) + + mock_cw_query_instance.query_results = mock_log_entries + mock_cw_query_instance.query_duration = 2.5 + mock_cloudwatch_query.return_value = mock_cw_query_instance + + # Test the function + results = analyze_profiler_logs( + log_groups_str="/aws/docdb/test-cluster/profiler", + top_ops_count=10, + sender_email="sender@example.com", + recipient_emails="recipient@example.com" + ) + + assert len(results) == 1 + assert results[0]['cluster_name'] == 'test-cluster' + assert results[0]['log_group'] == '/aws/docdb/test-cluster/profiler' + assert 'top_operations' in results[0] + assert 'time_range' in results[0] + assert 'analyzer' in results[0] + + # Verify CloudWatchQuery was instantiated correctly + mock_cloudwatch_query.assert_called_once_with( + log_group='/aws/docdb/test-cluster/profiler', + query_string='\n fields @timestamp, @message\n | sort @timestamp asc\n ' + ) + + # Verify query_logs was called + mock_cw_query_instance.query_logs.assert_called_once() + + +@patch('docdb_profiler_analyzer.CloudWatchQuery') +@patch('docdb_profiler_analyzer.boto3.client') +@patch('docdb_profiler_analyzer.boto3.Session') +def test_analyze_profiler_logs_multiple_clusters(mock_session, mock_client, mock_cloudwatch_query): + """Test analyze_profiler_logs function with multiple clusters using CloudWatchQuery""" + mock_session.return_value.region_name = 'us-east-1' + + # Mock CloudWatchQuery instance with empty results + mock_cw_query_instance = Mock() + mock_cw_query_instance.query_results = [] # Empty results + mock_cw_query_instance.query_duration = 1.0 + mock_cloudwatch_query.return_value = mock_cw_query_instance + + # Test with multiple log groups + results = analyze_profiler_logs( + log_groups_str="/aws/docdb/cluster1/profiler,/aws/docdb/cluster2/profiler", + top_ops_count=5 + ) + + # Should return empty list since no logs found + assert len(results) == 0 + + # Verify CloudWatchQuery was called twice (once for each cluster) + assert mock_cloudwatch_query.call_count == 2 + + +def test_analyze_profiler_logs_empty_log_groups(): + """Test analyze_profiler_logs function with empty log groups""" + # Empty string should return empty results, not raise exception + results = analyze_profiler_logs(log_groups_str="") + assert results == [] + + +@patch('lambda_handler.analyze_profiler_logs') +def test_lambda_handler_successful_email_send(mock_analyze, scheduler_event, lambda_context): + """Test lambda handler with successful log analysis and email sending""" + # Mock successful analysis results + mock_analyzer = Mock() + mock_analyzer.send_response_via_email.return_value = {'MessageId': 'test-message-id'} + + mock_top_ops = pd.DataFrame([ + { + 'operation': 'query', + 'namespace': 'testdb.users', + 'user': 'testuser', + 'appName': 'testapp', + 'filter_criterion': "{'status': 'active'}", + 'execution_stats': "{'nReturned': 100}", + 'avg_duration_ms': 45, + 'max_duration_ms': 50, + 'count': 1 + } + ]) + + mock_results = [{ + 'cluster_name': 'test-cluster', + 'log_group': '/aws/docdb/test-cluster/profiler', + 'top_operations': mock_top_ops, + 'time_range': { + 'start': datetime.now(), + 'end': datetime.now() + }, + 'analyzer': mock_analyzer + }] + + mock_analyze.return_value = mock_results + + with patch.dict(os.environ, { + 'DOCDB_LOG_GROUP_NAME': '/aws/docdb/test-cluster/profiler', + 'SENDER_EMAIL': 'sender@example.com', + 'RECIPIENT_EMAIL_LIST': 'recipient@example.com' + }): + result = lambda_handler(scheduler_event, lambda_context) + + assert result['statusCode'] == 200 + assert result['body'] == "Top operations report published." + mock_analyzer.send_response_via_email.assert_called_once() + + +@patch('lambda_handler.analyze_profiler_logs') +def test_lambda_handler_with_time_range_env_vars(mock_analyze, scheduler_event, lambda_context): + """Test lambda handler with custom time range from environment variables""" + mock_analyze.return_value = [] + + with patch.dict(os.environ, { + 'DOCDB_LOG_GROUP_NAME': '/aws/docdb/test-cluster/profiler', + 'SENDER_EMAIL': 'sender@example.com', + 'RECIPIENT_EMAIL_LIST': 'recipient@example.com', + 'REPORT_START_TIME': '2024-01-01 00:00:00', + 'REPORT_END_TIME': '2024-01-02 00:00:00', + 'TOP_OPS_COUNT': '20' + }): + result = lambda_handler(scheduler_event, lambda_context) + + # Verify analyze_profiler_logs was called with correct parameters + mock_analyze.assert_called_once() + call_args = mock_analyze.call_args + + assert call_args.kwargs['log_groups_str'] == '/aws/docdb/test-cluster/profiler' + assert call_args.kwargs['top_ops_count'] == 20 + assert call_args.kwargs['sender_email'] == 'sender@example.com' + assert call_args.kwargs['recipient_emails'] == 'recipient@example.com' + assert call_args.kwargs['start_time'] is not None + assert call_args.kwargs['end_time'] is not None + + +def test_docdb_profiler_analyzer_parse_log_groups_edge_cases(): + """Test edge cases for log group parsing""" + # Test with whitespace + result = DocDBProfilerAnalyzer.parse_log_groups(" /aws/docdb/cluster1/profiler , /aws/docdb/cluster2/profiler ") + assert len(result) == 2 + assert result[0]['cluster_name'] == 'cluster1' + + # Test with single log group + result = DocDBProfilerAnalyzer.parse_log_groups("/aws/docdb/single-cluster/profiler") + assert len(result) == 1 + assert result[0]['cluster_name'] == 'single-cluster' + + # Test with invalid format (should return empty list since it doesn't have enough parts) + result = DocDBProfilerAnalyzer.parse_log_groups("/invalid/format") + assert len(result) == 0 # Should return empty list for invalid format + + +@patch('docdb_profiler_analyzer.boto3.client') +@patch('docdb_profiler_analyzer.boto3.Session') +def test_docdb_profiler_analyzer_with_none_email_params(mock_session, mock_client): + """Test DocDBProfilerAnalyzer initialization with None email parameters (CLI mode)""" + mock_session.return_value.region_name = 'us-east-1' + + log_group = { + 'cluster_name': 'test-cluster', + 'profiler_log': '/aws/docdb/test-cluster/profiler' + } + + analyzer = DocDBProfilerAnalyzer( + log_group=log_group, + top_ops_count=None, + sender_email=None, + recipient_emails=None + ) + + assert analyzer.cluster_name == 'test-cluster' + assert analyzer.log_group_name == '/aws/docdb/test-cluster/profiler' + assert analyzer.top_ops_count is None + assert analyzer.sender_email_address is None + assert analyzer.recipient_email_list == [] + + +def test_lambda_handler_exception_handling(scheduler_event, lambda_context): + """Test lambda handler exception handling""" + with patch('lambda_handler.analyze_profiler_logs') as mock_analyze: + mock_analyze.side_effect = Exception("Test error") + + with patch.dict(os.environ, { + 'DOCDB_LOG_GROUP_NAME': '/aws/docdb/test-cluster/profiler', + 'SENDER_EMAIL': 'sender@example.com', + 'RECIPIENT_EMAIL_LIST': 'recipient@example.com' + }): + result = lambda_handler(scheduler_event, lambda_context) + + assert result['statusCode'] == 500 + assert 'Test error' in result['body'] + + +# Additional test cases for comprehensive coverage + +@patch('docdb_profiler_analyzer.CloudWatchQuery') +@patch('docdb_profiler_analyzer.boto3.client') +@patch('docdb_profiler_analyzer.boto3.Session') +def test_docdb_profiler_analyzer_fetch_profiler_logs_with_cloudwatch_query(mock_session, mock_client, mock_cloudwatch_query): + """Test fetch_profiler_logs using CloudWatchQuery integration""" + mock_session.return_value.region_name = 'us-east-1' + + log_group = { + 'cluster_name': 'test-cluster', + 'profiler_log': '/aws/docdb/test-cluster/profiler' + } + + # Mock CloudWatchQuery instance + mock_cw_query_instance = Mock() + mock_cw_query_instance.query_results = [ + [ + {"field": "@timestamp", "value": "2024-02-11T13:00:00.000Z"}, + {"field": "@message", "value": '{"op": "query", "ns": "test.collection", "millis": 100}'} + ] + ] + mock_cw_query_instance.query_duration = 2.5 + mock_cloudwatch_query.return_value = mock_cw_query_instance + + analyzer = DocDBProfilerAnalyzer(log_group, None, None, None) + + df = analyzer.fetch_profiler_logs( + start_time=datetime.now(), + end_time=datetime.now() + ) + + # Verify CloudWatchQuery was used correctly + mock_cloudwatch_query.assert_called_once_with( + log_group='/aws/docdb/test-cluster/profiler', + query_string='\n fields @timestamp, @message\n | sort @timestamp asc\n ' + ) + mock_cw_query_instance.query_logs.assert_called_once() + + # Verify DataFrame was created + assert len(df) == 1 + assert df.iloc[0]['op'] == 'query' + + +@patch('docdb_profiler_analyzer.CloudWatchQuery') +@patch('docdb_profiler_analyzer.boto3.client') +@patch('docdb_profiler_analyzer.boto3.Session') +def test_docdb_profiler_analyzer_malformed_json_logs(mock_session, mock_client, mock_cloudwatch_query): + """Test handling of malformed JSON in log entries with CloudWatchQuery""" + mock_session.return_value.region_name = 'us-east-1' + + log_group = { + 'cluster_name': 'test-cluster', + 'profiler_log': '/aws/docdb/test-cluster/profiler' + } + + # Mock CloudWatchQuery with malformed JSON results + mock_cw_query_instance = Mock() + mock_cw_query_instance.query_results = [ + [ + {"field": "@timestamp", "value": "2024-02-11T13:00:00.000Z"}, + {"field": "@message", "value": "invalid json {"} # Malformed JSON + ], + [ + {"field": "@timestamp", "value": "2024-02-11T13:01:00.000Z"}, + {"field": "@message", "value": '{"op": "query", "ns": "test.collection", "millis": 100}'} # Valid JSON + ] + ] + mock_cw_query_instance.query_duration = 1.5 + mock_cloudwatch_query.return_value = mock_cw_query_instance + + analyzer = DocDBProfilerAnalyzer(log_group, None, None, None) + + df = analyzer.fetch_profiler_logs( + start_time=datetime.now(), + end_time=datetime.now() + ) + + # Should only have 1 valid entry + assert len(df) == 1 + assert df.iloc[0]['op'] == 'query' + + +def test_docdb_profiler_analyzer_process_filter(): + """Test _process_filter method with various operation types""" + log_group = {'cluster_name': 'test', 'profiler_log': '/test'} + + with patch('docdb_profiler_analyzer.boto3.Session') as mock_session: + mock_session.return_value.region_name = 'us-east-1' + + analyzer = DocDBProfilerAnalyzer(log_group, None, None, None) + + # Test data with different operation types + test_data = pd.DataFrame([ + { + 'op': 'query', + 'command': {'find': 'users', 'filter': {'status': 'active', 'age': 25}} + }, + { + 'op': 'command', + 'command': {'aggregate': 'orders', 'pipeline': [{'$match': {'date': '2024-01-01'}}]} + } + ]) + + result = analyzer._process_filter(test_data) + + assert 'modified_filter' in result.columns + assert len(result) == 2 + + # Check that values are replaced with placeholders + query_filter = result.iloc[0]['modified_filter'] + assert 'V1' in str(query_filter) or 'V2' in str(query_filter) + + +def test_docdb_profiler_analyzer_process_exec_stats(): + """Test _process_exec_stats method""" + log_group = {'cluster_name': 'test', 'profiler_log': '/test'} + + with patch('docdb_profiler_analyzer.boto3.Session') as mock_session: + mock_session.return_value.region_name = 'us-east-1' + + analyzer = DocDBProfilerAnalyzer(log_group, None, None, None) + + # Test data with execStats + test_data = pd.DataFrame([ + { + 'op': 'query', + 'execStats': {'nReturned': 100, 'executionTimeMillisEstimate': 50, 'stage': 'COLLSCAN'} + }, + { + 'op': 'command', + 'execStats': {'nReturned': 25, 'executionTimeMillisEstimate': 120} + } + ]) + + result = analyzer._process_exec_stats(test_data) + + assert 'modified_execStats' in result.columns + assert len(result) == 2 + + # Check that sensitive values are replaced with '?' + exec_stats = result.iloc[0]['modified_execStats'] + assert exec_stats['nReturned'] == '?' + assert exec_stats['executionTimeMillisEstimate'] == '?' + assert exec_stats['stage'] == 'COLLSCAN' # Other fields should remain + + +def test_docdb_profiler_analyzer_get_top_operations(): + """Test get_top_operations method with complete data processing""" + log_group = {'cluster_name': 'test', 'profiler_log': '/test'} + + with patch('docdb_profiler_analyzer.boto3.Session') as mock_session: + mock_session.return_value.region_name = 'us-east-1' + + analyzer = DocDBProfilerAnalyzer(log_group, top_ops_count=5, sender_email=None, recipient_emails=None) + + # Test data with all required fields + test_data = pd.DataFrame([ + { + 'op': 'query', + 'ns': 'testdb.users', + 'user': 'testuser', + 'appName': 'testapp', + 'command': {'find': 'users', 'filter': {'status': 'active'}}, + 'execStats': {'nReturned': 100, 'executionTimeMillisEstimate': 50}, + 'millis': 45, + 'nreturned': 100 + }, + { + 'op': 'query', + 'ns': 'testdb.users', + 'user': 'testuser', + 'appName': 'testapp', + 'command': {'find': 'users', 'filter': {'status': 'active'}}, + 'execStats': {'nReturned': 150, 'executionTimeMillisEstimate': 75}, + 'millis': 70, + 'nreturned': 150 + } + ]) + + result = analyzer.get_top_operations(test_data) + + assert len(result) == 1 # Should be grouped into one operation + assert 'operation' in result.columns + assert 'namespace' in result.columns + assert 'avg_duration_ms' in result.columns + assert 'max_duration_ms' in result.columns + assert 'count' in result.columns + + # Check aggregated values + assert result.iloc[0]['count'] == 2 + assert result.iloc[0]['avg_duration_ms'] == 57 # (45 + 70) / 2 + assert result.iloc[0]['max_duration_ms'] == 70 + + +def test_docdb_profiler_analyzer_get_top_operations_missing_appname(): + """Test get_top_operations with missing appName column""" + log_group = {'cluster_name': 'test', 'profiler_log': '/test'} + + with patch('docdb_profiler_analyzer.boto3.Session') as mock_session: + mock_session.return_value.region_name = 'us-east-1' + + analyzer = DocDBProfilerAnalyzer(log_group, None, None, None) + + # Test data without appName + test_data = pd.DataFrame([ + { + 'op': 'query', + 'ns': 'testdb.users', + 'user': 'testuser', + 'command': {'find': 'users'}, + 'execStats': {}, + 'millis': 45 + } + ]) + + result = analyzer.get_top_operations(test_data) + + assert len(result) == 1 + assert result.iloc[0]['appName'] == 'N/A' + + +@patch('docdb_profiler_analyzer.boto3.client') +def test_docdb_profiler_analyzer_send_response_via_email(mock_client): + """Test send_response_via_email method""" + log_group = {'cluster_name': 'test-cluster', 'profiler_log': '/test'} + + with patch('docdb_profiler_analyzer.boto3.Session') as mock_session: + mock_session.return_value.region_name = 'us-east-1' + + # Mock SES client + mock_ses_client = Mock() + mock_ses_client.send_email.return_value = {'MessageId': 'test-message-id'} + mock_client.return_value = mock_ses_client + + analyzer = DocDBProfilerAnalyzer( + log_group, + top_ops_count=10, + sender_email='sender@example.com', + recipient_emails='recipient1@example.com,recipient2@example.com' + ) + + response_data = { + 'top_operations': [ + { + 'operation': 'query', + 'namespace': 'testdb.users', + 'user': 'testuser', + 'appName': 'testapp', + 'filter_criterion': "{'status': 'active'}", + 'execution_stats': "{'nReturned': '?'}", + 'avg_duration_ms': 45, + 'max_duration_ms': 50, + 'count': 1 + } + ], + 'time_range': { + 'start': '2024-02-11T00:00:00', + 'end': '2024-02-11T23:59:59' + } + } + + result = analyzer.send_response_via_email(response_data) + + assert result['MessageId'] == 'test-message-id' + mock_ses_client.send_email.assert_called_once() + + # Verify email parameters + call_args = mock_ses_client.send_email.call_args + assert call_args[1]['Source'] == 'sender@example.com' + assert call_args[1]['Destination']['ToAddresses'] == ['recipient1@example.com', 'recipient2@example.com'] + + +def test_docdb_profiler_analyzer_send_email_missing_config(): + """Test send_response_via_email with missing email configuration""" + log_group = {'cluster_name': 'test', 'profiler_log': '/test'} + + with patch('docdb_profiler_analyzer.boto3.Session') as mock_session: + mock_session.return_value.region_name = 'us-east-1' + + analyzer = DocDBProfilerAnalyzer(log_group, None, None, None) + + with pytest.raises(ValueError, match="Email configuration missing"): + analyzer.send_response_via_email({}) + + +def test_docdb_profiler_analyzer_format_response_to_html(): + """Test _format_response_to_html method""" + log_group = {'cluster_name': 'test-cluster', 'profiler_log': '/test'} + + with patch('docdb_profiler_analyzer.boto3.Session') as mock_session: + mock_session.return_value.region_name = 'us-east-1' + + analyzer = DocDBProfilerAnalyzer(log_group, top_ops_count=5, sender_email=None, recipient_emails=None) + + test_data = { + 'top_operations': [ + { + 'operation': 'query', + 'namespace': 'testdb.users', + 'user': 'testuser', + 'appName': 'testapp', + 'filter_criterion': "{'status': 'active'}", + 'execution_stats': "{'nReturned': '?'}", + 'avg_duration_ms': 45, + 'max_duration_ms': 50, + 'count': 1 + } + ], + 'time_range': { + 'start': '2024-02-11T00:00:00', + 'end': '2024-02-11T23:59:59' + } + } + + html_result = analyzer._format_response_to_html(test_data) + + assert '' in html_result + assert ' 0: + total = df['count'].sum() + return { "total": total, "distinct": distinct, "cardinality": ( distinct / total ) * 100 } + else: + return {"total": 0} + + +def _print_collection_max_msg(coll_count, db_name): + print(" ### This script will scan maximum {} of total {} in database: {} \n Consider increase --max-collection to include more collections.".format(args.max_collections, coll_count, db_name)) + +def start_cardinality_check(): + """ + function does the following: + 1. Gets lists of databases + 2. For each database gets lists of collection + 3. For each collection gets list of indexes + 4. For each index runs cardinality check for sample_count set + + User can optionally pass database or collection name to reduce the scope of cardinality check. + + :return: Returns pandas dataframe containing rows for each index and cardinality calculation. + """ + global args + global client + results = [] + connection_string = args.uri + max_collections = int(args.max_collections) + threshold = float(args.threshold) + + try: + + databases = client.list_database_names() + if args.databases != "All": + databases = args.databases.split(",") + + db_counter = 0 + coll_counter = 0 + index_counter = 0 + for db_name in databases: + db_counter = db_counter + 1 + database = client[db_name] + coll_names = database.list_collection_names() + + coll_count = len(coll_names) + + if coll_count > max_collections: + _print_collection_max_msg(coll_count, db_name) + + if args.collections != "All": + coll_names = args.collections.split(",") + for coll_name in coll_names[:max_collections]: + if coll_name in ("oplog.rs"): + continue + + print("### Starting cardinality check for collection - {} .... ".format(coll_name)) + coll_counter = coll_counter + 1 + collection = database[coll_name] + indexes = collection.list_indexes() + for index in indexes: + result_row = {} + if index['name'] != '_id_': + index_name = index['name'] + + print("### checking index - {} .... ".format(index_name)) + + cardinality = 0 + isLowCardinality = 'N' + + index_counter = index_counter + 1 + rs = get_index_cardinality(db_name, coll_name, index) + if rs['total'] > 0: + result_row['index_name'] = index_name + result_row['index_keys'] = index['key'] + result_row['collection_name'] = coll_name + result_row['cardinality'] = round(rs['cardinality'],4) + if rs['cardinality'] < threshold: + isLowCardinality = 'Y' + result_row['isLowCardinality'] = isLowCardinality + result_row['totalDocsWithIndexValue'] = rs['total'] + result_row['totalDistinctValues'] = rs['distinct'] + results.append(result_row) + + print("### Finished cardinality check for collection - {}\n".format(coll_name)) + args.db_counter = db_counter + args.coll_counter = coll_counter + args.index_counter = index_counter + + return pd.DataFrame(results) + + + + except Exception as e: + traceback.print_exception(*sys.exc_info()) + print(e) + +def main(): + """ + main function kicks off parameter collection, initialization of connection and calling cardinality detection. + :return: prints output of cardinality check and saves results to csv file + """ + try: + output = {} + get_param() + init_conn() + print("\nStarting Cardinality Check. Script may take few mins to finish.") + print("Finding indexes where Cardinality/Distinct Values are less than ( {}% )...\n".format(args.threshold)) + results = start_cardinality_check() + + + if results[results["isLowCardinality"]=="Y"].empty: + print("All indexes are in good health. Cardinality detection script did not find any low cardinality indexes. ") + else: + print_output(results) + save_file(results) + except Exception as e: + traceback.print_exception(*sys.exc_info()) + print(e) + +""" +Cardinality check script starts here +""" +if __name__ == "__main__": + main() diff --git a/performance/index-cardinality-detection/requirements.txt b/performance/index-cardinality-detection/requirements.txt new file mode 100644 index 0000000..ca2ecdf --- /dev/null +++ b/performance/index-cardinality-detection/requirements.txt @@ -0,0 +1,3 @@ +termtables==0.2.4 +pymongo==4.6.3 +pandas==2.1.0 \ No newline at end of file diff --git a/performance/index-review/README.md b/performance/index-review/README.md index 5f65ceb..3d5c001 100644 --- a/performance/index-review/README.md +++ b/performance/index-review/README.md @@ -1,26 +1,28 @@ # Amazon DocumentDB Index Review Tool -The index review tool catalogs all collections and their indexes (structure and usage). It outputs a JSON file containing all collected information, a listing of unused and/or redundant indexes, and a pair of CSV files containing collection and index details. *NOTE:* indexes should never be dropped without discussing with all interested parties and performing performance testing. +The index review tool catalogs all collections and their indexes (structure and usage). It outputs a JSON file containing all collected information, a listing of unused and/or redundant indexes, and a pair of CSV files containing collection and index details. + +*NOTE: indexes should never be dropped without discussing with all interested parties and testing performance*. # Requirements - Python 3.7+ - - If using Snappy wire protocol compression and MongoDB, "apt install python-snappy" - PyMongo - - MongoDB 2.6 - 3.4 | pymongo 3.10 - 3.12 - - MongoDB 3.6 - 5.0 | pymongo 3.12 - 4.0 - - MongoDB 5.1+ | pymongo 4.0+ - - DocumentDB | pymongo 3.10 - 4.0 + +# Access Control +The account executing to this script requires the following permissions - + - collStats + - indexStats + - listCollections + - listDatabases + - serverStatus ## Using the Index Review Tool `python3 index-review.py --server-alias --uri ` -- Run on all instances (primary and all secondaries) -- Connect directly to servers, not as replicaSet. If driver version supports &directConnection=true then provide it as part of the --uri -- Use a different \ for each server, output files are named using \ as the starting portion -- Avoid running the tool from the server itself if possible, it consume disk space for the output files -- The \ options can be found at https://www.mongodb.com/docs/manual/reference/connection-string/ -- Consider adding "&compressor=snappy" to your \ if your MongoDB server supports it -- For DocumentDB use the instance endpoints, not the cluster endpoint +- Execute on all instances (primary and all secondaries/read-replicas), this is critical to give a complete review of index usage. +- Use a different `` for each server, output files are named using `` as the starting portion of the filename +- All `` options can be found at https://www.mongodb.com/docs/manual/reference/connection-string/ +- For DocumentDB use the individual instance endpoints, not the cluster endpoint ## License This tool is licensed under the Apache 2.0 License. diff --git a/performance/index-review/index-review.py b/performance/index-review/index-review.py index 99768fb..a0510b7 100644 --- a/performance/index-review/index-review.py +++ b/performance/index-review/index-review.py @@ -1,5 +1,5 @@ import argparse -from datetime import datetime, timedelta +import datetime as dt import sys import json import pymongo @@ -8,6 +8,35 @@ from collections import OrderedDict +def ensureDirect(uri,appname): + # make sure we are directly connecting to the server requested, not via replicaSet + + connInfo = {} + parsedUri = pymongo.uri_parser.parse_uri(uri) + + #print("parsedUri | {}".format(parsedUri)) + + for thisKey in sorted(parsedUri['options'].keys()): + if thisKey.lower() not in ['replicaset','readpreference']: + connInfo[thisKey] = parsedUri['options'][thisKey] + + # make sure we are using directConnection=true + connInfo['directconnection'] = True + + connInfo['username'] = parsedUri['username'] + connInfo['password'] = parsedUri['password'] + connInfo['host'] = parsedUri['nodelist'][0][0] + connInfo['port'] = parsedUri['nodelist'][0][1] + connInfo['appname'] = appname + + if parsedUri.get('database') is not None: + connInfo['authSource'] = parsedUri['database'] + + #print("connInfo | {}".format(connInfo)) + + return connInfo + + def getData(appConfig): serverOpCounters = {} serverMetricsDocument = {} @@ -17,7 +46,7 @@ def getData(appConfig): serverLocalTime = '' print('connecting to server') - client = pymongo.MongoClient(appConfig['connectionString']) + client = pymongo.MongoClient(**ensureDirect(appConfig['connectionString'],'indxrev')) serverOpCounters = client.admin.command("serverStatus")['opcounters'] serverMetricsDocument = client.admin.command("serverStatus")['metrics']['document'] @@ -40,7 +69,7 @@ def getData(appConfig): finalDict['start']['collstats'] = collectionStats # log output to file - logTimeStamp = datetime.utcnow().strftime('%Y%m%d%H%M%S') + logTimeStamp = dt.datetime.now(dt.timezone.utc).strftime('%Y%m%d%H%M%S') logFileName = "{}-{}-index-review.json".format(appConfig['serverAlias'],logTimeStamp) with open(logFileName, 'w') as fp: json.dump(finalDict, fp, indent=4, default=str) @@ -58,7 +87,7 @@ def getCollectionStats(client): collCursor = client[thisDb['name']].list_collections() for thisColl in collCursor: #print(thisColl) - if thisColl['type'] == 'view': + if thisColl.get('type','NOT-FOUND') == 'view': # exclude views pass elif thisColl['name'] in ['system.profile']: @@ -105,10 +134,44 @@ def evalIndexes(appConfig): addlIdxCount += 1 outFile1 = open(appConfig['serverAlias']+'-collections.csv','wt') - outFile1.write("{},{},{},{},{},{},{},{},{},{},{}\n".format('database','collection','doc-count','average-doc-size','size-GB','storageSize-GB','num-indexes','indexSize-GB','ins/day','upd/day','del/day')) outFile2 = open(appConfig['serverAlias']+'-indexes.csv','wt') - outFile2.write("{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}\n".format('database','collection','doc-count','average-doc-size','size-GB','storageSize-GB','num-indexes','indexSize-GB','index-name','index-accesses-total','index-accesses-secondary','redundant','covered-by','ins/day','upd/day','del/day')) + outFile2.write("{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}\n".format('database','collection','doc-count','average-doc-size','size-GB','storageSize-GB','coll-unused-pct','num-indexes','indexSize-GB','index-unused-pct','index-name','index-accesses-total','index-accesses-secondary','redundant','covered-by','ins/day','upd/day','del/day','ins/sec','upd/sec','del/sec','index-keys')) + + # output high level server information + numAlltimeSecondsUptime = idxDict["start"]["uptime"] + numAlltimeDaysUptime = numAlltimeSecondsUptime / 86400 + alltimeInserts = idxDict['start']['opcounters']['insert'] + alltimeUpdates = idxDict['start']['opcounters']['update'] + alltimeDeletes = idxDict['start']['opcounters']['delete'] + alltimeQueries = idxDict['start']['opcounters']['query'] + + if appConfig['priorIndexReviewFile'] is not None: + numPeriodSecondsUptime = idxDict["start"]["uptime"] - appConfig['priorDict']['start']['uptime'] + numPeriodDaysUptime = numPeriodSecondsUptime / 86400 + periodInserts = idxDict['start']['opcounters']['insert'] - appConfig['priorDict']['start']['opcounters']['insert'] + periodUpdates = idxDict['start']['opcounters']['update'] - appConfig['priorDict']['start']['opcounters']['update'] + periodDeletes = idxDict['start']['opcounters']['delete'] - appConfig['priorDict']['start']['opcounters']['delete'] + periodQueries = idxDict['start']['opcounters']['query'] - appConfig['priorDict']['start']['opcounters']['query'] + + outFile1.write("hostname,{}\n".format(idxDict['start']['host'])) + outFile1.write("\n") + if appConfig['priorIndexReviewFile'] is not None: + outFile1.write("period-seconds,{}\n".format(numPeriodSecondsUptime)) + outFile1.write(",,period/sec,period/day,alltime/sec,alltime/day\n") + outFile1.write(",inserts,{:.2f},{:.2f},{:.2f},{:.2f}\n".format(periodInserts/numPeriodSecondsUptime,periodInserts/numPeriodDaysUptime,alltimeInserts/numAlltimeSecondsUptime,alltimeInserts/numAlltimeDaysUptime)) + outFile1.write(",updates,{:.2f},{:.2f},{:.2f},{:.2f}\n".format(periodUpdates/numPeriodSecondsUptime,periodUpdates/numPeriodDaysUptime,alltimeUpdates/numAlltimeSecondsUptime,alltimeUpdates/numAlltimeDaysUptime)) + outFile1.write(",deletes,{:.2f},{:.2f},{:.2f},{:.2f}\n".format(periodDeletes/numPeriodSecondsUptime,periodDeletes/numPeriodDaysUptime,alltimeDeletes/numAlltimeSecondsUptime,alltimeDeletes/numAlltimeDaysUptime)) + outFile1.write(",queries,{:.2f},{:.2f},{:.2f},{:.2f}\n".format(periodQueries/numPeriodSecondsUptime,periodQueries/numPeriodDaysUptime,alltimeQueries/numAlltimeSecondsUptime,alltimeQueries/numAlltimeDaysUptime)) + else: + outFile1.write(",,alltime/sec,alltime/day\n") + outFile1.write(",inserts,{:.2f},{:.2f}\n".format(alltimeInserts/numAlltimeSecondsUptime,alltimeInserts/numAlltimeDaysUptime)) + outFile1.write(",updates,{:.2f},{:.2f}\n".format(alltimeUpdates/numAlltimeSecondsUptime,alltimeUpdates/numAlltimeDaysUptime)) + outFile1.write(",deletes,{:.2f},{:.2f}\n".format(alltimeDeletes/numAlltimeSecondsUptime,alltimeDeletes/numAlltimeDaysUptime)) + outFile1.write(",queries,{:.2f},{:.2f}\n".format(alltimeQueries/numAlltimeSecondsUptime,alltimeQueries/numAlltimeDaysUptime)) + outFile1.write("\n") + + outFile1.write("{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}\n".format('database','collection','doc-count','average-doc-size','size-GB','storageSize-GB','coll-unused-pct','num-indexes','indexSize-GB','ins/day','upd/day','del/day','ins/sec','upd/sec','del/sec')) # for each database for thisDb in idxDict["start"]["collstats"]: @@ -119,20 +182,66 @@ def evalIndexes(appConfig): printedCollection = False thisCollInfo = idxDict["start"]["collstats"][thisDb][thisColl] bToGb = 1024*1024*1024 + collectionUnusedPct = thisCollInfo.get('unusedStorageSize', {}).get('unusedPercent', -1.0) + + if appConfig['opsFile'] is not None: + # calculate churn from oplog/changestream data + thisNs = "{}.{}".format(thisDb,thisColl) + thisInsUpdDel = appConfig['opsDict'].get(thisNs,{"ins":0,"upd":0,"del":0}) + insPerDay = thisInsUpdDel['ins'] + updPerDay = thisInsUpdDel['upd'] + delPerDay = thisInsUpdDel['del'] + insPerSec = int(thisInsUpdDel['ins'] / 86400) + updPerSec = int(thisInsUpdDel['upd'] / 86400) + delPerSec = int(thisInsUpdDel['del'] / 86400) + + elif appConfig['priorIndexReviewFile'] is not None: + # calculate churn from prior primary instance index-review file + try: + numSecondsUptime = idxDict["start"]["uptime"] - appConfig['priorDict']['start']['uptime'] + numDaysUptime = numSecondsUptime / 86400 + insPerDay = int((thisCollInfo['opCounter']['numDocsIns'] - appConfig['priorDict']['start']['collstats'][thisDb][thisColl]['opCounter']['numDocsIns']) / numDaysUptime) + updPerDay = int((thisCollInfo['opCounter']['numDocsUpd'] - appConfig['priorDict']['start']['collstats'][thisDb][thisColl]['opCounter']['numDocsUpd']) / numDaysUptime) + delPerDay = int((thisCollInfo['opCounter']['numDocsDel'] - appConfig['priorDict']['start']['collstats'][thisDb][thisColl]['opCounter']['numDocsDel']) / numDaysUptime) + insPerSec = int((thisCollInfo['opCounter']['numDocsIns'] - appConfig['priorDict']['start']['collstats'][thisDb][thisColl]['opCounter']['numDocsIns']) / numSecondsUptime) + updPerSec = int((thisCollInfo['opCounter']['numDocsUpd'] - appConfig['priorDict']['start']['collstats'][thisDb][thisColl]['opCounter']['numDocsUpd']) / numSecondsUptime) + delPerSec = int((thisCollInfo['opCounter']['numDocsDel'] - appConfig['priorDict']['start']['collstats'][thisDb][thisColl]['opCounter']['numDocsDel']) / numSecondsUptime) + except: + insPerDay = 0 + updPerDay = 0 + delPerDay = 0 + insPerSec = 0 + updPerSec = 0 + delPerSec = 0 - thisNs = "{}.{}".format(thisDb,thisColl) - thisInsUpdDel = appConfig['opsDict'].get(thisNs,{"ins":"","upd":"","del":""}) - insPerDay = thisInsUpdDel['ins'] - updPerDay = thisInsUpdDel['upd'] - delPerDay = thisInsUpdDel['del'] - - outFile1.write("{},{},{},{},{:8.2f},{:8.2f},{},{:8.2f},{},{},{}\n".format(thisDb,thisColl,thisCollInfo['count'],thisCollInfo.get('avgObjSize',0),thisCollInfo['size']/bToGb,thisCollInfo['storageSize']/bToGb,thisCollInfo['nindexes'],thisCollInfo['totalIndexSize']/bToGb,insPerDay,updPerDay,delPerDay)) + else: + # calculate churn as estimate using operations since instance startup + try: + numSecondsUptime = idxDict["start"]["uptime"] + numDaysUptime = numSecondsUptime / 86400 + insPerDay = int(idxDict["start"]["collstats"][thisDb][thisColl]['opCounter']['numDocsIns'] / numDaysUptime) + updPerDay = int(idxDict["start"]["collstats"][thisDb][thisColl]['opCounter']['numDocsUpd'] / numDaysUptime) + delPerDay = int(idxDict["start"]["collstats"][thisDb][thisColl]['opCounter']['numDocsDel'] / numDaysUptime) + insPerSec = int(idxDict["start"]["collstats"][thisDb][thisColl]['opCounter']['numDocsIns'] / numSecondsUptime) + updPerSec = int(idxDict["start"]["collstats"][thisDb][thisColl]['opCounter']['numDocsUpd'] / numSecondsUptime) + delPerSec = int(idxDict["start"]["collstats"][thisDb][thisColl]['opCounter']['numDocsDel'] / numSecondsUptime) + except: + insPerDay = 0 + updPerDay = 0 + delPerDay = 0 + insPerSec = 0 + updPerSec = 0 + delPerSec = 0 + + outFile1.write("{},{},{},{},{:.2f},{:.2f},{:.2f},{},{:.2f},{},{},{},{},{},{}\n".format(thisDb,thisColl,thisCollInfo['count'],thisCollInfo.get('avgObjSize',0),thisCollInfo['size']/bToGb,thisCollInfo['storageSize']/bToGb,collectionUnusedPct,thisCollInfo['nindexes'],thisCollInfo['totalIndexSize']/bToGb,insPerDay,updPerDay,delPerDay,insPerSec,updPerSec,delPerSec)) # for each index for thisIdx in idxDict["start"]["collstats"][thisDb][thisColl]["indexInfo"]: if thisIdx["name"] in ["_id","_id_"]: continue + indexUnusedPct = thisIdx.get('unusedStorageSize', {}).get('unusedSizePercent', -1.0) + # check extra servers for non-usage numXtraOps = 0 if addlIdxCount > 0: @@ -162,9 +271,9 @@ def evalIndexes(appConfig): #with open('output.log', 'a') as fpDet: # fpDet.write("{:40s} {:40s} {:40s} {:12d} {:12d}\n".format(thisDb,thisColl,thisIdx["name"],thisIdx["accesses"]["ops"],numXtraOps)) - outFile2.write("{},{},{},{},{:8.2f},{:8.2f},{},{:8.2f},{},{},{},{},{},{},{},{}\n".format(thisDb,thisColl,thisCollInfo['count'],thisCollInfo.get('avgObjSize'), - thisCollInfo['size']/bToGb,thisCollInfo['storageSize']/bToGb,thisCollInfo['nindexes'],thisCollInfo['indexSizes'][thisIdx["name"]]/bToGb,thisIdx["name"], - thisIdx["accesses"]["ops"]+numXtraOps,numXtraOps,isRedundant,redundantList,insPerDay,updPerDay,delPerDay)) + outFile2.write("{},{},{},{},{:.2f},{:.2f},{:.2f},{},{:.2f},{:.2f},{},{},{},{},{},{},{},{},{},{},{},{}\n".format(thisDb,thisColl,thisCollInfo['count'],thisCollInfo.get('avgObjSize'), + thisCollInfo['size']/bToGb,thisCollInfo['storageSize']/bToGb,collectionUnusedPct,thisCollInfo['nindexes'],thisCollInfo['indexSizes'][thisIdx["name"]]/bToGb,indexUnusedPct,thisIdx["name"], + thisIdx["accesses"]["ops"]+numXtraOps,numXtraOps,isRedundant,redundantList,insPerDay,updPerDay,delPerDay,insPerSec,updPerSec,delPerSec,thisIdx["keyAsString"])) outFile1.close() outFile2.close() @@ -184,10 +293,14 @@ def checkIfRedundant(idxName,idxKeyAsString,indexList): def checkReplicaSet(appConfig): print('connecting to server') - client = pymongo.MongoClient(appConfig['connectionString']) + client = pymongo.MongoClient(host=appConfig['connectionString'],appname='indxrev') rsStatus = client.admin.command("replSetGetStatus") print(" rs.status() = {}".format(rsStatus)) + print(" replica set members") + for thisMember in rsStatus['members']: + print(" {}".format(pymongo.uri_parser.parse_host(thisMember['name']))) + print(" {}".format(thisMember)) client.close() @@ -224,38 +337,18 @@ def readOpsFile(appConfig): return oD +def readPriorIndexReviewFile(appConfig): + oD = {} + + print("loading prior index review file from {}".format(appConfig['priorIndexReviewFile'])) + + with open(appConfig['priorIndexReviewFile'], 'r') as file: + oD = json.load(file) + + return oD + + def main(): - # v0 - # * single server - # * add python 3.7 check - # * save full set of data collected to filesystem - # * find unused and redundant indexes - # * add proper argument system - - # v1 - # * allow override of minimum Python version - # * create CSV files - one for collections, one for indexes - # filter by database and/or collection by name - # filter by database and/or collection by regular expression - # report server uptime with suggestions - # clean up JSON - remove "start" - # check for same index twice - # ensure compatibility with MongoDB 3.2+ - - # v2 - # multi-server (via command line arg) - # compare host in JSON, look for duplicates - # unit testing - - # v3 - # replicaSet discovery - - # v4 - # sharding support? - - # v5 - # diff across multiple runs, find unused - parser = argparse.ArgumentParser(description='Check for redundant and unused indexes.') parser.add_argument('--skip-python-version-check', @@ -283,6 +376,11 @@ def main(): type=str, help='File created by mongodb-oplog-review tool containing collection level operations per day.') + parser.add_argument('--prior-index-review-file', + required=False, + type=str, + help='File from prior run of index-review tool on primary instance, used to calculate collection level operations per day.') + args = parser.parse_args() # check for minimum Python version @@ -300,6 +398,7 @@ def main(): appConfig['connectionString'] = args.uri appConfig['serverAlias'] = args.server_alias appConfig['opsFile'] = args.ops_file + appConfig['priorIndexReviewFile'] = args.prior_index_review_file #checkReplicaSet(appConfig) @@ -317,6 +416,11 @@ def main(): else: appConfig['opsDict'] = {} + if appConfig['priorIndexReviewFile'] is not None: + appConfig['priorDict'] = readPriorIndexReviewFile(appConfig) + else: + appConfig['priorDict'] = {} + evalIndexes(appConfig) diff --git a/performance/metric-analyzer/README.md b/performance/metric-analyzer/README.md new file mode 100644 index 0000000..50e4bba --- /dev/null +++ b/performance/metric-analyzer/README.md @@ -0,0 +1,71 @@ +# Amazon DocumentDB Metric Analyzer + +This tool analyzes the output of the [Amazon DocumentDB Metric Collector Tool](https://github.com/awslabs/amazon-documentdb-tools/tree/master/performance/metric-collector) to provide recommendations for optimizing performance, cost, and availability. + +## Features + +- Analyzes CPU utilization, cache hit ratios, connection limits, and more +- Provides specific recommendations based on best practices +- Includes detailed context for each recommendation type +- Generates CSV output for easy review +- Creates interactive HTML reports with recommendation context details + +## Usage + +```bash +python metric-analyzer.py --metrics-file \ + --region \ + --output \ + --log-level \ + [--no-html] +``` + +### Parameters + +- `--metrics-file`: Path to the metrics CSV file to analyze (required) +- `--region`: AWS Region (default: us-east-1) +- `--output`: Base name for output files (default: metric-analyzer) +- `--log-level`: Log level for logging (choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, default: WARNING) +- `--no-html`: Disable HTML output generation (HTML output is enabled by default) + +## Recommendation Context + +Each recommendation includes a link to a context file in the `context/` directory that provides additional information about: + +- Considerations before implementing the recommendation +- Potential impacts (positive and negative) +- Alternative approaches +- Implementation guidance + +These context files supplement the AWS documentation references and provide more nuanced guidance for decision-making. + +## Output Format + +### CSV Output + +The tool generates a CSV file with the following columns: + +- ClusterName: Name of the DocumentDB cluster +- InstanceName: Name of the instance (if applicable) +- InstanceType: Instance type (e.g., db.r6g.large) +- InstanceRole: PRIMARY or SECONDARY +- Category: Instance or Cluster level recommendation +- ModifyInstance: Action to take (INCREASE, DECREASE, UPGRADE) +- Finding: Specific finding with metrics +- Recommendation: Recommended action +- Reference: Link to AWS documentation + +### HTML Output + +The tool also generates an interactive HTML report that includes: + +- All information from the CSV output +- Interactive "View Context" buttons that display detailed guidance for each recommendation +- Responsive design for better readability + +## Requirements + +- Python 3.6+ +- boto3>=1.26.0 +- pandas>=1.3.0 +- markdown>=3.3.0 diff --git a/performance/metric-analyzer/context/buffer_cache_low.html b/performance/metric-analyzer/context/buffer_cache_low.html new file mode 100644 index 0000000..dde7229 --- /dev/null +++ b/performance/metric-analyzer/context/buffer_cache_low.html @@ -0,0 +1,32 @@ + + + + + + +

Low Buffer Cache Hit Ratio Considerations

+

When BufferCacheHitRatio is low (below 90%) for an extended period of time, your instances may be under-provisioned for your working dataset, which can lead to performance issues and higher I/O costs.

+ +

Considerations before taking action:

+
    +
  1. Data access patterns: Analyze if your application is accessing data in a way that could be optimized to improve cache utilization.
  2. +
  3. Working dataset size: Determine if your working dataset has grown beyond the memory capacity of your current instance.
  4. +
  5. Instance sizing impact: Increasing instance size will increase costs but may improve performance significantly if memory is the bottleneck.
  6. +
+ +

Additional considerations:

+
    +
  1. Enable the profiler and review the CloudWatch logs to identify and optimize slow queries
  2. +
  3. Consider using Performance Insights to identify specific resource bottlenecks
  4. +
  5. Implement BufferCacheHitRatio isolation by directing operational queries to the primary instance and analytic queries only to the replica instances
  6. +
  7. Alternatively, achieve partial isolation by directing analytic queries to a specific replica instance with the understanding that some percentage of regular queries will also run on that replica and could potentially be affected
  8. +
+ + diff --git a/performance/metric-analyzer/context/connection_limit.html b/performance/metric-analyzer/context/connection_limit.html new file mode 100644 index 0000000..573c877 --- /dev/null +++ b/performance/metric-analyzer/context/connection_limit.html @@ -0,0 +1,30 @@ + + + + + + +

Connection Limits Approaching Maximum

+

Each instance has a connection limit that scales with instance size. Additionally, each open connection consumes memory and CPU resources on the Amazon DocumentDB instance. After the connection limit has been reached, Amazon DocumentDB rejects any further connection attempts and the application will encounter connection exceptions.

+ +

Considerations before taking action:

+
    +
  1. Connection pooling: Verify if your application is using connection pooling effectively. Improper connection management can lead to unnecessary connection proliferation.
  2. +
  3. Connection distribution: Verify that you are connecting to your cluster as a replica set, distributing reads to replica instances using the built-in read preference capabilities of your driver.
  4. +
  5. Instance sizing impact: Increasing instance size will increase costs but provides higher connection limits and additional resources.
  6. +
+ +

Additional considerations:

+
    +
  1. Review application code for connection leaks or improper connection handling
  2. +
  3. Consider implementing a connection proxy layer to manage connection distribution
  4. +
+ + \ No newline at end of file diff --git a/performance/metric-analyzer/context/cpu_overutilized.html b/performance/metric-analyzer/context/cpu_overutilized.html new file mode 100644 index 0000000..547e9a1 --- /dev/null +++ b/performance/metric-analyzer/context/cpu_overutilized.html @@ -0,0 +1,34 @@ + + + + + + +

High CPU Usage Considerations

+

When CPU utilization is consistently high (above 90%), your instances may be under-provisioned, which can lead to performance issues, increased latency, and potential outages.

+ +

Considerations before taking action:

+
    +
  1. Query optimization: High CPU might indicate inefficient queries. Review and optimize your queries before scaling up.
  2. +
  3. Workload patterns: Determine if high CPU is consistent or occurs during specific peak periods.
  4. +
  5. Instance sizing impact: Increasing instance size will increase costs but may be necessary for consistent performance.
  6. +
+ +

Additional considerations:

+
    +
  1. Review the slow query profiler to identify and optimize problematic queries
  2. +
  3. Check for common causes of high CPU utilization
  4. +
  5. Consider using Performance Insights to identify specific resource bottlenecks
  6. +
  7. Distribute read operations to replica instances if using a read-heavy workload
  8. +
  9. Implement or optimize connection pooling to reduce connection overhead
  10. +
  11. Consider implementing a caching layer for frequently accessed data
  12. +
+ + diff --git a/performance/metric-analyzer/context/cpu_underutilized.html b/performance/metric-analyzer/context/cpu_underutilized.html new file mode 100644 index 0000000..e15ce18 --- /dev/null +++ b/performance/metric-analyzer/context/cpu_underutilized.html @@ -0,0 +1,32 @@ + + + + + + +

Low CPU Usage Considerations

+

When CPU utilization is consistently low (below 30%), your instances may be over-provisioned, which can lead to unnecessary costs without providing performance benefits.

+ +

Considerations before downsizing:

+
    +
  1. Workload patterns: Verify if the low CPU usage is consistent or if there are periodic spikes that require the current capacity.
  2. +
  3. Future growth: Consider if you're anticipating workload growth that would justify the current instance size.
  4. +
  5. Memory requirements: Even with low CPU, your workload may require the current memory capacity for optimal cache performance. For example, moving from an r6g.xlarge to r6g.large decreases your buffer cache space by 50%, potentially increasing your I/O costs.
  6. +
  7. Instance type selection: Consider if a different instance family might better match your workload characteristics, such as NVMe-backed instances.
  8. +
  9. Network bandwidth impact: In addition to vCPU and RAM, decreasing the instance size may also affect network bandwidth. This can affect your I/O performance.
  10. +
+ +

Additional considerations:

+
    +
  1. If running multiple clusters that are over-provisioned, evaluate if their workloads can be consolidated
  2. +
  3. Monitor other metrics like memory usage and cache hit ratios before making sizing decisions
  4. +
+ + \ No newline at end of file diff --git a/performance/metric-analyzer/context/graviton_upgrade.html b/performance/metric-analyzer/context/graviton_upgrade.html new file mode 100644 index 0000000..039dfbc --- /dev/null +++ b/performance/metric-analyzer/context/graviton_upgrade.html @@ -0,0 +1,31 @@ + + + + + + +

Upgrading Instances to Graviton2

+

AWS Graviton2 processors are custom built by AWS using 64-bit Arm Neoverse cores and provide significant performance and cost benefits over previous generation instances. R6g instances offer up to 30% better price/performance compared to R5/R4 instances, with r6g instances 5% less expensive than their r5 counterparts.

+ +

Considerations before upgrading to Graviton2:

+
    +
  1. Performance benefits: Graviton2 instances provide better CPU performance and memory encryption, which can improve overall database performance.
  2. +
  3. Application compatibility: No application changes are required when migrating from Intel to Graviton2 instances on Amazon DocumentDB.
  4. +
  5. Instance size mapping: Ensure you select the appropriate Graviton2 instance size that matches or exceeds your current workload requirements.
  6. +
+ +

Additional considerations:

+
    +
  1. Identify the appropriate Graviton2 instance type (r6g) that corresponds to your current instance
  2. +
  3. To minimize downtime, consider modifying your replicas first, then promoting the replicas before modifying the primary instance
  4. +
  5. Monitor performance after the upgrade to ensure expected improvements are realized
  6. +
+ + \ No newline at end of file diff --git a/performance/metric-analyzer/context/index_cache_low.html b/performance/metric-analyzer/context/index_cache_low.html new file mode 100644 index 0000000..583c39a --- /dev/null +++ b/performance/metric-analyzer/context/index_cache_low.html @@ -0,0 +1,32 @@ + + + + + + +

Low Index Buffer Cache Hit Ratio Considerations

+

When IndexBufferCacheHitRatio is low (below 90%) for an extended period of time, your instance may have too many indexes for its memory capacity, which can lead to performance degradation.

+ +

Considerations before taking action:

+
    +
  1. Index usage analysis: Use the Index Review Tool to identify which indexes are rarely or never used before removing them.
  2. +
  3. Query impact: Removing indexes may negatively impact some queries that depend on them. Test your workload before removing indexes from your production clusters.
  4. +
  5. Instance sizing trade-offs: Increasing instance size will increase costs but may be necessary if all indexes are required.
  6. +
  7. Review your queries for optimization: Utilize the explain command to veiw your query execution plan. You can use the $hint operator to enforce selection of a preferred index.
  8. +
+ +

Additional considerations:

+
    +
  1. Consider consolidating multiple indexes where possible
  2. +
  3. Evaluate if your application can be modified to require fewer indexes
  4. +
  5. Utilize the index low cardinality tool to idetify indexes with a high number of duplicate values. It is recommended to limit the creation of indexes to fields where the number of duplicate values is less than 1% of the total number of documents in the collection.
  6. +
+ + \ No newline at end of file diff --git a/performance/metric-analyzer/context/read_preference.html b/performance/metric-analyzer/context/read_preference.html new file mode 100644 index 0000000..4dfe447 --- /dev/null +++ b/performance/metric-analyzer/context/read_preference.html @@ -0,0 +1,43 @@ + + + + + + +

Read Preference Driver Settings

+

When the primary instance is handling the majority of read operations, your cluster is not efficiently utilizing read replicas. This can lead to unnecessary load on the primary instance and underutilization of replica resources.

+ +

Considerations before changing read preference:

+
    +
  1. Consistency requirements: Secondary reads provide eventual consistency. Ensure your application can tolerate this for read operations.
  2. +
  3. Driver configuration: Different drivers have different methods for setting read preferences. Review the documentation for your driver.
  4. +
  5. Replica availability: With secondaryPreferred, reads will fall back to the primary if no replicas are available.
  6. +
  7. Monitoring impact: Track metrics after changing read preference to ensure proper load distribution.
  8. +
+ +

Additional considerations:

+
    +
  1. Node.js driver: +
    const client = new MongoClient(uri, {
    +  readPreference: 'secondaryPreferred'
    +});
    +
  2. +
  3. Python driver: +
    client = pymongo.MongoClient(uri, 
    +  read_preference=pymongo.ReadPreference.SECONDARY_PREFERRED)
    +
  4. +
  5. Java driver: +
    MongoClientSettings settings = MongoClientSettings.builder()
    +  .readPreference(ReadPreference.secondaryPreferred())
    +  .build();
    +
  6. +
+ + diff --git a/performance/metric-analyzer/context/remove_instances.html b/performance/metric-analyzer/context/remove_instances.html new file mode 100644 index 0000000..f2a8b90 --- /dev/null +++ b/performance/metric-analyzer/context/remove_instances.html @@ -0,0 +1,33 @@ + + + + + + +

Excessive Instance Deployment Considerations

+

Amazon DocumentDB is designed for 99.99% availability when deployed across two or more AZ’s. Replica instances work well for read scaling because they are fully dedicated to read operations on your cluster volume. However running an Amazon DocumentDB cluster with more than three instances (one primary and two replicas) does not increase the availability beyond 99.99% and can increase costs.

+ +

Considerations before removing instances:

+
    +
  1. Read scaling needs: Determine if your current read workload actually requires the additional replica instances.
  2. +
  3. Regional distribution: Verify that the current instances are distributed across multiple Availability Zones for maximum resilience.
  4. +
  5. Maintenance strategy: Consider how instance removal might affect your maintenance procedures.
  6. +
  7. Future growth plans: Assess if anticipated workload growth justifies keeping additional instances.
  8. +
+ +

Additional considerations:

+
    +
  1. For production workloads, it is recommended to run a cluster with three instances (one primary, two replicas), but no less than two (one primary, one replica)
  2. +
  3. Ensure instances are distributed across different Availability Zones
  4. +
  5. Configure read preference to secondaryPreferred to maximize read scaling
  6. +
  7. Consider increasing instance size instead of adding more instances if additional capacity is needed
  8. +
+ + \ No newline at end of file diff --git a/performance/metric-analyzer/context/single_az.html b/performance/metric-analyzer/context/single_az.html new file mode 100644 index 0000000..f517700 --- /dev/null +++ b/performance/metric-analyzer/context/single_az.html @@ -0,0 +1,33 @@ + + + + + + +

Single Instance Deployment Considerations

+

Running a DocumentDB cluster with only a single instance provides no high availability and no read scaling capabilities. For production workloads, it is recommended to deploy a cluster with at least one replica instance.

+ +

Considerations before adding replica instances:

+
    +
  1. High availability needs: Single-instance clusters have no automatic failover capability, resulting in longer downtime during instance failures.
  2. +
  3. Read scaling requirements: Without replicas, all read operations must be processed by the primary instance.
  4. +
  5. Maintenance impact: With replicas, maintenance operations can be performed with minimal downtime.
  6. +
+ +

Additional considerations:

+
    +
  1. For production workloads, deploy at least one replica instance (two total instances)
  2. +
  3. For critical workloads, consider deploying two replica instances (three total instances)
  4. +
  5. Amazon DocumentDB clusters can be stopped and started, helping to manage costs for development and test environments
  6. +
  7. Configure read preference to secondaryPreferred to utilize replica instances for read operations
  8. +
  9. Monitor replica lag to ensure acceptable data consistency
  10. +
+ + \ No newline at end of file diff --git a/performance/metric-analyzer/metric-analyzer.py b/performance/metric-analyzer/metric-analyzer.py new file mode 100644 index 0000000..133d19b --- /dev/null +++ b/performance/metric-analyzer/metric-analyzer.py @@ -0,0 +1,715 @@ +""" +Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. + +Licensed under the Apache License, Version 2.0 (the "License"). +You may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + +Description: + This script analyzes the output of the Amazon DocumentDB Metric Collector + Tool and provides recommendations for optimizing performance, cost, and + availability. + +Usage: + python metric-analyzer.py --metrics-file \\ + --region \\ + --output \\ + --log-level \\ + [--no-html] + +Script Parameters +----------------- +--metrics-file: str (required) + Path to the metrics CSV file to analyze +--region: str + AWS Region (default: us-east-1) +--output: str + Base name for output CSV file (default: metric-analyzer) + The actual filename will include the current date (YYYY-MM-DD) +--log-level: str + Log level for logging (choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, default: WARNING) +--no-html: bool + Disable HTML output generation (HTML output is enabled by default) +""" +import json +import argparse +import logging +import sys +from datetime import datetime +import boto3 +import pandas as pd + +# Recommendations +RECOMMENDATIONS = { + 'cpu_underutilized': { + 'category': 'Instance', + 'finding': 'Low CPU usage of %s', + 'recommendation': 'Consider decreasing the instance size', + 'reference': 'https://docs.aws.amazon.com/documentdb/latest/developerguide/best_practices.html#best_practices-performance', + 'context': 'context/cpu_underutilized.html' + }, + 'cpu_overutilized': { + 'category': 'Instance', + 'finding': 'High CPU usage of %s', + 'recommendation': 'Consider increasing the instance size', + 'reference': 'https://docs.aws.amazon.com/documentdb/latest/developerguide/best_practices.html#best_practices-performance', + 'context': 'context/cpu_overutilized.html' + }, + 'buffer_cache_low': { + 'category': 'Instance', + 'finding': 'Low BufferCacheHitRatio - %s', + 'recommendation': 'Analyze your cache performance and consider archiving unused data. Consider increasing instance size or utilizing NVMe-backed Instances', + 'reference': 'https://docs.aws.amazon.com/documentdb/latest/developerguide/best_practices.html#best_practices-instance_sizing', + 'context': 'context/buffer_cache_low.html' + }, + 'index_cache_low': { + 'category': 'Instance', + 'finding': 'Low IndexBufferCacheHitRatio - %s', + 'recommendation': 'Review and remove unused/redundant indexes or increase the instance size.', + 'reference': 'https://github.com/awslabs/amazon-documentdb-tools/tree/master/performance/index-review', + 'context': 'context/index_cache_low.html' + }, + 'read_preference': { + 'category': 'Cluster', + 'finding': 'Primary handling majority of OpcounterQuery (primary: %s; replica(s): %s)', + 'recommendation': 'Use secondaryPreferred driver read preference to maximize read scaling', + 'reference': 'https://docs.aws.amazon.com/documentdb/latest/developerguide/connect-to-replica-set.html', + 'context': 'context/read_preference.html' + }, + 'connection_limit': { + 'category': 'Instance', + 'finding': 'Connections (p99) approaching instance limit - %s of %s available', + 'recommendation': 'Consider increasing the instance size', + 'reference': 'https://docs.aws.amazon.com/documentdb/latest/developerguide/limits.html#limits.instance', + 'context': 'context/connection_limit.html' + }, + 'single_az': { + 'category': 'Cluster', + 'finding': 'Cluster deployed with a single instance - %s', + 'recommendation': 'Add replica instances to achieve higher availability and read scaling if this is a production workload.', + 'reference': 'https://docs.aws.amazon.com/documentdb/latest/developerguide/replication.html#replication.high-availability', + 'context': 'context/single_az.html' + }, + 'remove_instances': { + 'category': 'Cluster', + 'finding': 'Cluster deployed with more than 3 instances (%s)', + 'recommendation': 'Consider decreasing instance count - more than 3 instances does not improve availability', + 'reference': 'https://docs.aws.amazon.com/documentdb/latest/developerguide/replication.html#replication.high-availability', + 'context': 'context/remove_instances.html' + }, + 'graviton_upgrade': { + 'category': 'Instance', + 'finding': 'Using previous generation instance %s', + 'recommendation': 'Move to AWS Graviton2 instances which can provide 30% price/performance improvement and are 5% less expensive than their previous-generation counterparts', + 'reference': 'https://aws.amazon.com/blogs/database/achieve-better-performance-on-amazon-documentdb-with-aws-graviton2-instances/', + 'context': 'context/graviton_upgrade.html' + } +} + +# Metric thresholds +THRESHOLDS = { + 'cpu_low': 30.0, + 'cpu_high': 90.0, + 'cache_ratio_low': 90.0, + 'connection_limit_pct': 0.95 +} + +# Setup logger +def setup_logger(log_level=logging.INFO): + logger = logging.getLogger('metric-analyzer') + logger.setLevel(log_level) + console_handler = logging.StreamHandler() + console_handler.setLevel(log_level) + formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') + console_handler.setFormatter(formatter) + logger.addHandler(console_handler) + return logger + +# Get latest instance values +def get_docdb_instance_specs(region_name): + # Connection limits by instance type - unavailable via boto3, pulled from: + # https://docs.aws.amazon.com/documentdb/latest/developerguide/limits.html#limits.instance + + connection_limits = { + 'db.r4.large': 1700, + 'db.r4.xlarge': 3400, + 'db.r4.2xlarge': 6800, + 'db.r4.4xlarge': 13600, + 'db.r4.8xlarge': 27200, + 'db.r4.16xlarge': 30000, + 'db.r6g.large': 3400, + 'db.r6g.xlarge': 7000, + 'db.r6g.2xlarge': 14200, + 'db.r6g.4xlarge': 28400, + 'db.r6g.8xlarge': 60000, + 'db.r6g.12xlarge': 60000, + 'db.r6g.16xlarge': 60000, + 'db.r5.large': 3400, + 'db.r5.xlarge': 7000, + 'db.r5.2xlarge': 14200, + 'db.r5.4xlarge': 28400, + 'db.r5.8xlarge': 60000, + 'db.r5.12xlarge': 60000, + 'db.r5.16xlarge': 60000, + 'db.r5.24xlarge': 60000, + 'db.r6gd.large': 3400, + 'db.r6gd.xlarge': 7000, + 'db.r6gd.2xlarge': 14200, + 'db.r6gd.4xlarge': 28400, + 'db.r6gd.8xlarge': 60000, + 'db.r6gd.12xlarge': 60000, + 'db.r6gd.16xlarge': 60000, + 'db.t4g.medium': 1000, + 'db.t3.medium': 1000 + } + + pricing_client = boto3.client('pricing') + filters = [ + {'Type': 'TERM_MATCH', 'Field': 'productFamily', 'Value': 'Database Instance'}, + {'Type': 'TERM_MATCH', 'Field': 'servicecode', 'Value': 'AmazonDocDB'}, + {'Type': 'TERM_MATCH', 'Field': 'regionCode', 'Value': region_name} + ] + + response = pricing_client.get_products( + ServiceCode='AmazonDocDB', + Filters=filters + ) + + if not response.get('PriceList'): + raise ValueError(f"No pricing data found for Amazon DocumentDB in region: {region_name}") + + instance_data = [] + for price_item in response['PriceList']: + price_dict = json.loads(price_item) + required_fields = ['instanceType', 'volume_optimization', 'networkPerformance', 'memory', 'vcpu'] + if 'product' not in price_dict or 'attributes' not in price_dict['product']: + continue + + attributes = price_dict['product']['attributes'] + if any(field not in attributes for field in required_fields): + continue + + instance_type = attributes['instanceType'] + + # Price + price_value = None + if 'terms' in price_dict and 'OnDemand' in price_dict['terms']: + for term_key in price_dict['terms']['OnDemand']: + if 'priceDimensions' in price_dict['terms']['OnDemand'][term_key]: + for dim_key in price_dict['terms']['OnDemand'][term_key]['priceDimensions']: + dim = price_dict['terms']['OnDemand'][term_key]['priceDimensions'][dim_key] + if 'pricePerUnit' in dim and 'USD' in dim['pricePerUnit']: + currency = 'USD' + price_value = float(dim['pricePerUnit']['USD']) + elif 'pricePerUnit' in dim and 'CNY' in dim['pricePerUnit']: + price_value = float(dim['pricePerUnit']['CNY']) + currency = 'CNY' + if price_value: + break + + if not price_value: + raise ValueError(f"Missing pricePerUnit for instance type {instance_type}") + + # RAM + try: + memory_value = float(attributes['memory'].split(' ')[0]) if 'GiB' in attributes['memory'] else None + if memory_value is None: + raise ValueError(f"Invalid memory format: {attributes['memory']}") + except Exception as e: + raise ValueError(f"Failed to parse memory for {instance_type}: {str(e)}") + + # vCPU + try: + vcpu_value = int(attributes['vcpu']) + except Exception as e: + raise ValueError(f"Failed to parse vcpu for {instance_type}: {str(e)}") + + # Connection limit + if instance_type not in connection_limits: + raise ValueError(f"Missing connection limit for instance type {instance_type}") + + # Build instance data - capturing additional fields for future use + instance_data.append({ + 'instance_type': instance_type, + 'region': region_name, + 'currency': currency, + 'standard_price_per_hour': price_value, + 'connections': connection_limits[instance_type], + 'Type': attributes['volume_optimization'], + 'networkPerformance': attributes['networkPerformance'], + 'memory_GiB': memory_value, + 'vcpu': vcpu_value + }) + + if not instance_data: + raise ValueError(f"Failed to extract any instance pricing data for Amazon DocumentDB in region: {region_name}") + + return pd.DataFrame(instance_data) + +# Load metric-collector output file +def load_data(metrics_file): + metrics_df = pd.read_csv(metrics_file) + numeric_columns = ['P99', 'Mean', 'Std', 'Min', 'Max'] + for col in numeric_columns: + if col in metrics_df.columns: + if metrics_df[col].dtype == 'object': + metrics_df[col] = metrics_df[col].str.replace(',', '').astype(float) + else: + metrics_df[col] = pd.to_numeric(metrics_df[col], errors='coerce') + return metrics_df + +# Get the metric data +def get_metric_data(instance_data, metric_name): + metric_data = instance_data[instance_data['MetricName'] == metric_name] + if metric_data.empty: + return None + + return { + 'p99': float(metric_data['P99'].iloc[0]), + 'mean': float(metric_data['Mean'].iloc[0]), + 'std': float(metric_data['Std'].iloc[0]) + #'min': float(metric_data['Min'].iloc[0]) + #'max': float(metric_data['Max'].iloc[0]) + } + +# Analyze CPU usage +def analyze_cpu_utilization(instance_data): + logger = logging.getLogger('metric-analyzer') + cpu_data = get_metric_data(instance_data, 'CPUUtilization') + if not cpu_data: + logger.info("No CPU data available. Skipping.") + return "OK", "" + + p99 = cpu_data['p99'] + mean = cpu_data['mean'] + std = cpu_data['std'] + cpu_message = f"{p99:.1f}% (p99)" + + if p99 > THRESHOLDS['cpu_high'] or (mean + std) > 100: + return "INCREASE", cpu_message + elif p99 < THRESHOLDS['cpu_low']: + return "DECREASE", cpu_message + return "OK", "" + +# Analyze cache hit ratios +def analyze_cache_ratio(instance_data): + logger = logging.getLogger('metric-analyzer') + results = {} + cache_ratio_metrics = ['BufferCacheHitRatio', 'IndexBufferCacheHitRatio'] + + for cache_metric in cache_ratio_metrics: + cache_data = get_metric_data(instance_data, cache_metric) + if not cache_data: + logger.info(f"No {cache_metric} data available. Skipping.") + continue + + p99 = cache_data['p99'] + mean = cache_data['mean'] + std = cache_data['std'] + mean_std = min(mean + std,100.00) + + if p99 < THRESHOLDS['cache_ratio_low'] or mean_std < THRESHOLDS['cache_ratio_low']: + if p99 < THRESHOLDS['cache_ratio_low'] and mean_std < THRESHOLDS['cache_ratio_low']: + if p99 <= mean_std: + cache_message = f"{p99:.1f}% (p99)" + else: + cache_message = f"{mean_std:.1f}% (Mean+Std)" + elif p99 < THRESHOLDS['cache_ratio_low']: + cache_message = f"{p99:.1f}% (p99)" + else: + cache_message = f"{mean_std:.1f}% (Mean+Std)" + + results[cache_metric] = ("INCREASE", cache_message) + else: + results[cache_metric] = ("OK", "") + + return results + +# Analyze connection limits +def analyze_connections(instance_data, instance_specs_df): + logger = logging.getLogger('metric-analyzer') + instance_type = instance_data['InstanceType'].iloc[0] + connection_data = get_metric_data(instance_data, 'DatabaseConnections') + if not connection_data: + logger.info("No connection data available. Skipping.") + return "OK", "" + + p99 = connection_data['p99'] + instance_specs = instance_specs_df[instance_specs_df['instance_type'] == instance_data['InstanceType'].iloc[0]] + if instance_specs.empty: + logger.info(f"No specifications available for instance type {instance_type}. Skipping.") + return "OK", "" + + max_connections = float(instance_specs['connections'].iloc[0]) + + if p99 > (THRESHOLDS['connection_limit_pct'] * max_connections): + return "INCREASE", (f"{p99:.1f}", f"{max_connections:.0f}") + + return "OK", "" + +# Analyze current instance class for r4, r5, t3 +def analyze_instance_type(instance_data): + instance_type = instance_data['InstanceType'].iloc[0] + if (instance_type.startswith('db.r4.') or + (instance_type.startswith('db.r5.') and not instance_type == 'db.r5.24xlarge') or + instance_type.startswith('db.t3.')): + return "UPGRADE", instance_type + + return "OK", "" + +# Analyze MultiAZ +def analyze_multi_az(cluster_data): + logger = logging.getLogger('metric-analyzer') + multi_az_data = cluster_data[cluster_data['MetricName'] == 'MultiAZ'] + if multi_az_data.empty: + logger.info("No MultiAZ data available. Skipping.") + return "OK", "" + + if 'TRUE' in multi_az_data.values: + valid_instances = cluster_data[cluster_data['InstanceName'] != '---']['InstanceName'].unique() + instance_count = len(valid_instances) + + if instance_count > 3: + return "DECREASE", str(instance_count) + + return "OK", "" + + else: + instance_name = cluster_data[cluster_data['InstanceName'] != '---']['InstanceName'].iloc[0] + return "INCREASE", instance_name + +# Analyze read preference +def analyze_read_preference(cluster_data): + logger = logging.getLogger('metric-analyzer') + primary_data = cluster_data[cluster_data['Primary'] == True] + secondary_data = cluster_data[cluster_data['Primary'] == False] + if primary_data.empty or secondary_data.empty: + logger.info("Missing primary or secondary data for read preference analysis. Skipping.") + return "OK", "" + + metric = 'OpcountersQuery' + primary_metric = primary_data[primary_data['MetricName'] == metric] + secondary_metric = secondary_data[secondary_data['MetricName'] == metric] + if not primary_metric.empty and not secondary_metric.empty: + primary_val = round(float(primary_metric['Mean'].iloc[0])) + secondary_total = round(float(secondary_metric['Mean'].sum())) + if primary_val > secondary_total: + primary_int = str(int(primary_val)) + secondary_int = str(int(secondary_total)) + return "INCREASE", (primary_int, secondary_int) + + return "OK", "" + +# Skip DECREASE recommendation for smallest instance types +def skip_recommendation(status, instance_type, rec_key=None): + logger = logging.getLogger('metric-analyzer') + if rec_key == 'graviton_upgrade': + return False + + if status == "DECREASE" and instance_type.endswith('.medium'): + logger.info(f"Skipping decrease recommendation for {instance_type}: already at smallest instance type") + return True + + return False + +# Create recommendation format +def add_recommendation(results, cluster_name, instance_name, instance_role, rec_key, details, instance_type='---', status=None): + logger = logging.getLogger('metric-analyzer') + try: + if status and skip_recommendation(status, instance_type, rec_key): + return + + rec = RECOMMENDATIONS[rec_key] + finding = rec['finding'] + if '%s' in finding: + finding = finding % details + results.append({ + 'ClusterName': cluster_name, + 'InstanceName': instance_name, + 'InstanceType': instance_type, + 'InstanceRole': instance_role, + 'Category': rec['category'], + 'ModifyInstance': status, + 'Finding': finding, + 'Recommendation': rec['recommendation'], + 'Reference': rec['reference'], + 'Context': rec.get('context', '') + }) + logger.debug(f"Added recommendation: {rec_key} for {cluster_name}/{instance_name}") + except KeyError: + logger.error(f"Invalid recommendation key: {rec_key}") + raise + +# Generate interactive report with access to context files +def generate_html_report(results, output_file): + """Generate an HTML report from the results.""" + html_output = f"{output_file}.html" + context_contents = {} + + for result in results: + if result.get('Context') and result['Context'] not in context_contents: + context_path = result['Context'] + try: + with open(context_path, 'r') as f: + context_contents[context_path] = f.read() + except: + context_contents[context_path] = "

Context file not found

" + + html = """ + + + DocumentDB Metric Analyzer Report + + + + +

DocumentDB Metric Analyzer Report

+

Generated on: """ + datetime.now().strftime('%Y-%m-%d %H:%M:%S') + """

+ + + + + + + + + + + + + + + """ + + # Add rows for each result + for result in results: + context_link = "" + if result.get('Context'): + context_path = result['Context'] + context_link = f'View Context' + + html += f""" + + + + + + + + + + + + + + """ + + html += """ +
#ClusterNameInstanceNameInstanceTypeInstanceRoleCategoryModifyInstanceFindingRecommendationReferenceContext
{result['ClusterName']}{result['InstanceName']}{result['InstanceType']}{result['InstanceRole']}{result['Category']}{result['ModifyInstance']}{result['Finding']}{result['Recommendation']}AWS Docs{context_link}
+ +
+
+ × +
+
+ + + """ + + with open(html_output, 'w') as f: + f.write(html) + return html_output + +def main(): + parser = argparse.ArgumentParser(description='Analyze the output of metric-collector for Amazon DocumentDB clusters and provide recommendations.') + + parser.add_argument('--metrics-file', + type=str, + required=True, + help='Path to the metrics CSV file to analyze') + + parser.add_argument('--region', + type=str, + default='us-east-1', + help='AWS region name (default: us-east-1)') + + parser.add_argument('--output', + type=str, + default='metric-analyzer', + help='Path to output CSV file') + + parser.add_argument('--log-level', + type=str, + choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'], + default='WARNING', + help='Set the logging level') + + parser.add_argument('--no-html', + action='store_false', + dest='html_output', + help='Disable HTML output generation (enabled by default)') + + args = parser.parse_args() + + log_level = getattr(logging, args.log_level) + logger = setup_logger(log_level) + + metrics_file = args.metrics_file + region = args.region + output_file = args.output + + logger.info(f"Starting analysis of metrics file: {metrics_file}") + logger.info(f"Using AWS region: {region}") + + try: + metrics_df = load_data(metrics_file) + logger.info(f"Successfully loaded metrics data with {len(metrics_df)} rows") + + instance_specs_df = get_docdb_instance_specs(region) + logger.info(f"Successfully loaded instance specifications for {len(instance_specs_df)} instance types") + + results = [] + + clusters = metrics_df['ClusterName'].unique() + logger.info(f"Found {len(clusters)} clusters to analyze") + + for cluster_name in clusters: + logger.debug(f"Analyzing cluster: {cluster_name}") + cluster_data = metrics_df[metrics_df['ClusterName'] == cluster_name] + + # Analyze read preference + logger.debug(f"Analyzing read preference for cluster: {cluster_name}") + read_pref_status, read_pref_values = analyze_read_preference(cluster_data) + + if read_pref_status != "OK": + logger.info(f"Read preference finding for {cluster_name}: {read_pref_values}") + add_recommendation(results, cluster_name, '---', '---', 'read_preference', read_pref_values) + + # Analyze MultiAZ + logger.debug(f"Analyzing MultiAZ for cluster: {cluster_name}") + multi_az_status, multi_az_message = analyze_multi_az(cluster_data) + + if multi_az_status != "OK": + logger.info(f"MultiAZ finding for cluster {cluster_name}: {multi_az_message}") + rec_key = 'single_az' if multi_az_status == 'INCREASE' else 'remove_instances' + add_recommendation(results, cluster_name, '---', '---', rec_key, multi_az_message) + + # Analyze instance metrics + instances = [i for i in cluster_data['InstanceName'].unique() if i != '---'] + logger.debug(f"Found {len(instances)} instances to analyze in cluster {cluster_name}") + + for instance_name in instances: + logger.debug(f"Analyzing instance: {instance_name} in cluster {cluster_name}") + instance_data = cluster_data[cluster_data['InstanceName'] == instance_name] + is_primary = instance_data['Primary'].iloc[0] if not instance_data.empty else False + instance_role = "PRIMARY" if is_primary else "SECONDARY" + logger.debug(f"Instance {instance_name} role: {instance_role}") + instance_type = instance_data['InstanceType'].iloc[0] + + # CPU utilization + cpu_status, cpu_message = analyze_cpu_utilization(instance_data) + if cpu_status != "OK": + logger.info(f"CPU utilization finding for {instance_name} in {cluster_name}: {cpu_message}") + rec_key = 'cpu_overutilized' if cpu_status == 'INCREASE' else 'cpu_underutilized' + add_recommendation(results, cluster_name, instance_name, instance_role, rec_key, cpu_message, instance_type, cpu_status) + + # Cache ratios + cache_results = analyze_cache_ratio(instance_data) + + # Buffer cache + buffer_status, buffer_message = cache_results['BufferCacheHitRatio'] + if buffer_status != "OK": + logger.info(f"Buffer cache finding for {instance_name} in {cluster_name}: {buffer_message}") + add_recommendation(results, cluster_name, instance_name, instance_role, 'buffer_cache_low', buffer_message, instance_type, buffer_status) + + # Index cache + index_status, index_message = cache_results['IndexBufferCacheHitRatio'] + if index_status != "OK": + logger.info(f"Index cache finding for {instance_name} in {cluster_name}: {index_message}") + add_recommendation(results, cluster_name, instance_name, instance_role, 'index_cache_low', index_message, instance_type, index_status) + + # Connection limits + conn_status, conn_message = analyze_connections(instance_data, instance_specs_df) + if conn_status != "OK": + logger.info(f"Connection limit finding for {instance_name} in {cluster_name}: {conn_message}") + add_recommendation(results, cluster_name, instance_name, instance_role, 'connection_limit', conn_message, instance_type, conn_status) + + # Instance class + instance_type_status, instance_type_message = analyze_instance_type(instance_data) + if instance_type_status != "OK": + logger.info(f"Instance type finding for {instance_name} in {cluster_name}: {instance_type_message}") + add_recommendation(results, cluster_name, instance_name, instance_role, 'graviton_upgrade', instance_type_message, instance_type, instance_type_status) + + today = datetime.now().strftime('%Y-%m-%d') + if results: + csv_results = [{k: v for k, v in result.items() if k != 'Context'} for result in results] + results_df = pd.DataFrame(csv_results) + csv_output = f"{output_file}-{today}.csv" + results_df.to_csv(csv_output, index=False) + + if args.html_output: + html_output = generate_html_report(results, f"{output_file}-{today}") + logger.info(f"HTML report saved to {html_output}") + + logger.info(f"Analysis complete. Found {len(results)} recommendations.") + logger.info(f"Results saved to {csv_output}") + + else: + logger.info("Analysis complete. No recommendations found.") + + except Exception as e: + logger.error(f"Error during analysis: {str(e)}", exc_info=True) + raise + +if __name__ == "__main__": + try: + main() + except Exception as e: + logging.getLogger('metric-analyzer').critical(f"Unhandled exception: {str(e)}", exc_info=True) + sys.exit(1) \ No newline at end of file diff --git a/performance/metric-analyzer/requirements.txt b/performance/metric-analyzer/requirements.txt new file mode 100644 index 0000000..8cee222 --- /dev/null +++ b/performance/metric-analyzer/requirements.txt @@ -0,0 +1,3 @@ +boto3>=1.26.0 +pandas>=1.3.0 +markdown>=3.3.0 \ No newline at end of file diff --git a/performance/metric-collector/IAM-policy.json b/performance/metric-collector/IAM-policy.json new file mode 100644 index 0000000..d8957c4 --- /dev/null +++ b/performance/metric-collector/IAM-policy.json @@ -0,0 +1,26 @@ +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": "rds:DescribeDBInstances", + "Resource": "*", + "Condition": { + "StringEquals": { + "rds:DatabaseEngine": "docdb" + } + } + }, + { + "Effect": "Allow", + "Action": [ + "rds:DescribeDBClusterParameters", + "cloudwatch:GetMetricData", + "cloudwatch:GetMetricStatistics", + "rds:DescribeDBClusters", + "rds:DescribeDBClusterParameterGroups" + ], + "Resource": "*" + } + ] +} \ No newline at end of file diff --git a/performance/metric-collector/README.md b/performance/metric-collector/README.md new file mode 100644 index 0000000..002539d --- /dev/null +++ b/performance/metric-collector/README.md @@ -0,0 +1,40 @@ +# Amazon DocumentDB Metric Collector Tool + +The metric collector tool provides a csv output consolidating metrics for all DocumentDB clusters within a defined region. In addition to metadata such as cluster name, engine version, multi-AZ configuration, TLS status, and instance types, the script captures the Min, Max, Mean, p99, and Std values for a chosen time period. These can be compared against [Best Practices for Amazon DocumentDB](https://docs.aws.amazon.com/documentdb/latest/developerguide/best_practices.html) to ensure your cluster and instances are correctly sized for performance, resiliency, and cost. + +## Requirements + - Python 3.9+ + - boto3 1.24.49+ + - pandas 2.2.1+ + +``` +pip3 install boto3, pandas +``` + +- This script reads DocumentDB instance and cluster metrics from [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/), as well as DocumentDB cluster details including parameter group information. The required IAM permissions can be found in `IAM-policy.json`. + +## Usage parameters +Usage: + +``` +python3 metric-collector.py --region \\ + --log-file-name \\ + --start-date \\ + --end-date +``` + +Script Parameters: + + - region: str + AWS Region + - start-date: str + Start date for CloudWatch logs, format=YYYYMMDD + - end-date: str + End date for CloudWatch logs, format=YYYYMMDD + - log-file-name: str + Log file for CSV output + - log-level: str + Log level for logging, default=INFO + +## License +This tool is licensed under the Apache 2.0 License. diff --git a/performance/metric-collector/metric-collector.py b/performance/metric-collector/metric-collector.py new file mode 100644 index 0000000..9cc788c --- /dev/null +++ b/performance/metric-collector/metric-collector.py @@ -0,0 +1,325 @@ +""" +Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. + +Licensed under the Apache License, Version 2.0 (the "License"). +You may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + +Usage: + docdb_metric_collector.py --region \\ + --log-file-name \\ + --start-date \\ + --end-date + +Script Parameters +----------------- +--region: str + AWS Region +--start-date: str + Start date for CloudWatch logs, format=YYYYMMDD +--end-date: str + End date for CloudWatch logs, format=YYYYMMDD +--log-file-name: str + Log file for CSV output +--log-level: str + Log level for logging, default=INFO +""" + +import sys +import os +from datetime import timedelta, datetime +import logging +import pandas +import argparse +import boto3 + +logger = logging.getLogger(__name__) + +# Check for minimum Python version for script execution +MIN_PYTHON = (3, 9) +if sys.version_info < MIN_PYTHON: + sys.exit("Python %s.%s or later is required.\n" % MIN_PYTHON) + + +# List of Amazon DocumentDB instances metrics that are collected. +INSTANCE_METRICS = ['CPUUtilization', + 'DatabaseConnections', + 'BufferCacheHitRatio', + 'IndexBufferCacheHitRatio', + 'ReadThroughput', + 'VolumeReadIOPs', + 'OpcountersQuery', + 'OpcountersCommand', + 'OpcountersGetmore', + 'DocumentsReturned'] + +# List of Amazon DocumentDB instance metrics that are collected only from the primary instance. +PRIMARY_INSTANCE_METRICS = ['WriteIOPS', + 'WriteThroughput', + 'OpcountersDelete', + 'OpcountersInsert', + 'OpcountersUpdate', + 'DocumentsDeleted', + 'DocumentsInserted', + 'DocumentsUpdated', + 'TTLDeletedDocuments'] + +#List of Amazon DocumentDB cluster metrics that are collected. +CLUSTER_METRICS = ['IndexBufferCacheHitRatio', + 'DatabaseConnectionsMax', + 'VolumeBytesUsed'] + +def deleteLog(appConfig): + if os.path.exists(appConfig['logFileName']): + os.remove(appConfig['logFileName']) + +def printLog(thisMessage,appConfig): + with open(appConfig['logFileName'], 'a') as fp: + fp.write("{}\n".format(thisMessage)) + +# This function retrieves the metadata details of Amazon DocumentDB clusters and instances in the specified AWS region. +# The function does not connect to the actual DocumentDB clusters or access any data stored in them. +# It collects metadata such as cluster name, engine version, multi-AZ configuration, +# TLS status, profiler status, audit log status, instance names, primary instance, availability zones, and instance types. +# +# Parameters: +# appConfig (dict): A dictionary containing the AWS region and other configuration settings. +# +# Returns: +# None +def getClusterDetails(appConfig): + docdb_client = boto3.client('docdb',region_name=appConfig['region']) + parameterList = ["enabled","ddl","dml_read","dml_write","all"] + statusTLS = 'disabled' + statusProfiler = 'disabled' + statusAudit = 'disabled' + clusterDL = [] + instanceDL = [] + + response = docdb_client.describe_db_clusters(Filters=[{'Name': 'engine','Values': ['docdb']}]) + # Validate clusters exist in region + if not response['DBClusters']: + print("No DocumentDB clusters found in this region. Please try a different region.") + print("Exiting") + exit() + else: + for thisCluster in response['DBClusters']: + clusterName = thisCluster['DBClusterIdentifier'] + engineVersion = thisCluster['EngineVersion'] + isMultiAZ = thisCluster['MultiAZ'] + responsePG = docdb_client.describe_db_cluster_parameters(DBClusterParameterGroupName=thisCluster['DBClusterParameterGroup']) + for thisParameter in responsePG['Parameters']: + if thisParameter['ParameterName'] in 'tls': + statusTLS = thisParameter['ParameterValue'] + if thisParameter['ParameterName'] in 'profiler': + statusProfiler = thisParameter['ParameterValue'] + if thisParameter['ParameterName'] in 'audit_logs': + if thisParameter['ParameterValue'] in parameterList: + statusAudit = 'enabled' + clusterData = {'ClusterName': clusterName, 'EngineVersion': engineVersion, 'MultiAZ': isMultiAZ, 'TLS': statusTLS, 'Profiler': statusProfiler, 'AuditLogs': statusAudit} + clusterDL.append(clusterData) + + for thisInstance in thisCluster['DBClusterMembers']: + instanceIdentifier = thisInstance['DBInstanceIdentifier'] + isPrimary = thisInstance['IsClusterWriter'] + responseInstance = docdb_client.describe_db_instances(DBInstanceIdentifier=thisInstance['DBInstanceIdentifier']) + for instanceName in responseInstance['DBInstances']: + availabilityZone = instanceName['AvailabilityZone'] + instanceType = instanceName['DBInstanceClass'] + clusterIdentifier = instanceName['DBClusterIdentifier'] + instanceData = {'ClusterName': clusterIdentifier,'InstanceName': instanceIdentifier, 'Primary': isPrimary, 'AvailabilityZone': availabilityZone, 'InstanceType': instanceType} + instanceDL.append(instanceData) + + appConfig['clusterDF'] = pandas.DataFrame(clusterDL) + appConfig['instanceDF'] = pandas.DataFrame(instanceDL) + return pandas.merge(appConfig['clusterDF'],appConfig['instanceDF'],on='ClusterName') + +# This function retrieves the metrics for Amazon DocumentDB clusters and instances in the specified AWS region. +# +# Parameters: +# appConfig (dict): A dictionary containing the AWS region and other configuration settings. +# +# Returns: +# Pandas DataFrame: A DataFrame containing the metrics for each cluster +def getClusterMetrics(appConfig): + metricList = [] + + for cluster in appConfig['clusterDF'].dropna().itertuples(): + print("Getting metrics for cluster: {} EngineVersion: {}".format(cluster.ClusterName, cluster.EngineVersion)) + for metric in CLUSTER_METRICS: + getMetricData(appConfig, metricList, cluster.ClusterName, '---', metric, 'DBClusterIdentifier') + + return pandas.DataFrame(metricList) + +# This function retrieves the metrics for Amazon DocumentDB instances in the specified AWS region. +# +# Parameters: +# appConfig (dict): A dictionary containing the AWS region and other configuration settings. +# +# Returns: +# Pandas DataFrame: A DataFrame containing the metrics for each instance +def getInstanceMetrics(appConfig): + metricList = [] + for instance in appConfig['instanceDF'].dropna().itertuples(): + print("Getting metrics for cluster: {} instance: {}".format(instance.ClusterName, instance.InstanceName)) + if instance.Primary: + for metric in PRIMARY_INSTANCE_METRICS: + getMetricData(appConfig, metricList, instance.ClusterName, instance.InstanceName, metric) + for metric in INSTANCE_METRICS: + getMetricData(appConfig, metricList, instance.ClusterName, instance.InstanceName, metric) + + return pandas.DataFrame(metricList) + +# This function retrieves the metrics for Amazon DocumentDB instances in the specified AWS region. +# +# Parameters: +# appConfig (dict): A dictionary containing the AWS region and other configuration settings. +# metricsList (list): A list to store the retrieved metrics +# clusterName (str): The name of the cluster +# instanceName (str): The name of the instance +# metricName (str): The name of the metric +# metricType (str): The type of the metric (DBClusterIdentifier or DBInstanceIdentifier) +# +# Returns: +# None +def getMetricData(appConfig, metricsList, clusterName, instanceName, metricName, metricType='DBInstanceIdentifier'): + cloudwatch_client = boto3.client('cloudwatch',region_name=appConfig['region']) + metricSeries = [] + nextToken = '' + + while True: + dimensions = [] + if metricType == 'DBInstanceIdentifier': + dimensions.append({'Name': metricType, 'Value': instanceName}) + else: + dimensions.append({'Name': metricType, 'Value': clusterName}) + kwargs = {} + kwargs['StartTime'] = appConfig['startTime'] + kwargs['EndTime'] = appConfig['endTime'] + + kwargs['MetricDataQueries'] = [{ + "Id": "m1", + "MetricStat": { + "Metric": { + "Namespace": "AWS/DocDB", + "MetricName": metricName, + "Dimensions": dimensions + }, + "Period": appConfig['period'], + "Stat": "Average" + } + }] + + if nextToken: + kwargs['NextToken'] = nextToken + + response = cloudwatch_client.get_metric_data(**kwargs) + + if response['MetricDataResults'][0]['StatusCode'] == 'Complete': + metricSeries.append(pandas.Series(response['MetricDataResults'][0]['Values'])) + break + # get_metric_data response 'PartialData' requires client to call the function again with the + # NextToken value from the response. This enables CloudWatch to return back the subsequent set of data. + elif response['MetricDataResults'][0]['StatusCode'] == 'PartialData': + metricSeries.append(pandas.Series(response['MetricDataResults'][0]['Values'])) + nextToken = response['NextToken'] + continue + elif response['MetricDataResults'][0]['StatusCode'] == 'InternalError': + print("InternalError, exiting") + sys.exit(1) + elif response['MetricDataResults'][0]['StatusCode'] == 'Forbidden': + print("Forbidden, exiting") + sys.exit(1) + else: + print("Unknown StatusCode, exiting") + sys.exit(1) + + metricValues = pandas.concat(metricSeries) + + metricDataRec = {'ClusterName': clusterName, + 'InstanceName': instanceName, + 'MetricName': metricName, + 'Min': metricValues.min(), + 'Max': metricValues.max(), + 'Mean': metricValues.mean(), + 'P99': metricValues.quantile(0.99), + 'Std': metricValues.std()} + metricsList.append(metricDataRec) + +# This is the main function that is called when the script is run. It parses command-line arguments, +# setup log file, and calls the other functions for collecting various metics. +# +# Parameters: +# None +# +# Returns: +# None +def main(): + parser = argparse.ArgumentParser(description='DocumentDB Metrics Collector') + + parser.add_argument('--region',required=True,type=str,help='AWS Region') + parser.add_argument('--start-date',required=False,type=str,help='Start date for CloudWatch logs, format=YYYYMMDD') + parser.add_argument('--end-date',required=False,type=str,help='End date for CloudWatch logs, format=YYYYMMDD') + parser.add_argument('--log-file-name',required=True,type=str,help='Log file for CSV output') + parser.add_argument('--log-level', required=False, type=str, default='INFO', help='Log level DEBUG or INFO default=INFO') + + args = parser.parse_args() + + if (args.start_date is not None and args.end_date is None): + print("Must provide --end-date when providing --start-date, exiting.") + sys.exit(1) + + elif (args.start_date is None and args.end_date is not None): + print("Must provide --start-date when providing --end-date, exiting.") + sys.exit(1) + + if (args.start_date is None) and (args.end_date is None): + # use last 14 days + startWhen = datetime.now() - timedelta(days=14) + endWhen = datetime.now() + else: + # use provided start/end dates + startWhen = datetime.strptime(args.start_date, '%Y%m%d') + endWhen = datetime.strptime(args.end_date, '%Y%m%d') + + if startWhen < datetime.now() - timedelta(days=62): + period = 3600 + elif startWhen < datetime.now() - timedelta(days=14): + period = 300 + else: + period = 60 + + appConfig = {} + appConfig['region'] = args.region + appConfig['logFileName'] = args.log_file_name+'.csv' + appConfig['startTime'] = startWhen + appConfig['endTime'] = endWhen + appConfig['period'] = period + + clusterDetailsDF = getClusterDetails(appConfig) + clusterMetricsDF = getClusterMetrics(appConfig) + instanceMetricsDF = getInstanceMetrics(appConfig) + + allMetricsDF = pandas.merge(clusterDetailsDF,instanceMetricsDF,on=['ClusterName','InstanceName']) + allMetricsDF = pandas.concat([allMetricsDF, clusterMetricsDF], axis=0, ignore_index=False) + floatCol = ['Min','Max','Mean','P99','Std'] + allMetricsDF[floatCol] = allMetricsDF[floatCol].astype(float).map('{:,.2f}'.format) + allMetricsDF.sort_values(['ClusterName','InstanceName','MetricName']).to_csv(appConfig['logFileName'], encoding='utf-8', index=False, mode='a') + + print("") + print("Created {} with CSV data".format(appConfig['logFileName'])) + print("Region: {}".format(appConfig['region'])) + print("Log start time: ", startWhen) + print("Log end time: ", endWhen) + print("") + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/performance/metric-collector/requirements.txt b/performance/metric-collector/requirements.txt new file mode 100644 index 0000000..6a8ca9f --- /dev/null +++ b/performance/metric-collector/requirements.txt @@ -0,0 +1,2 @@ +boto3>=1.34.49 +pandas>=2.2.1 \ No newline at end of file diff --git a/sizing-tool/README.md b/sizing-tool/README.md new file mode 100644 index 0000000..5d23b3d --- /dev/null +++ b/sizing-tool/README.md @@ -0,0 +1,129 @@ +# Amazon DocumentDB Sizing Tool + +The sizing tool analyzes your MongoDB database and generates a CSV file for use with the [DocumentDB Cost Estimator](https://aws.improving.com/documentdb/cost-estimator/). The tool automatically measures compression ratios using zstd-3-dict (matching Amazon DocumentDB 8.0), collects database statistics, and produces a properly formatted CSV file ready for upload to the cost estimator. + +**Note:** The tool automatically excludes: +- System databases: `admin`, `config`, `local`, and `system` +- Views (only collections are analyzed) +- The `system.profile` collection +- Collections with no documents + +# Requirements + - Python 3.7+ + - pymongo Python package + - MongoDB 2.6 - 3.4 | pymongo 3.10 - 3.12 + - MongoDB 3.6 - 5.0 | pymongo 3.12 - 4.0 + - MongoDB 5.1+ | pymongo 4.0+ + - DocumentDB | pymongo 3.10+ + - If not installed - "$ pip3 install pymongo" + - lz4 Python package + - If not installed - "$ pip3 install lz4" + - zstandard Python package + - If not installed - "$ pip3 install zstandard" + - compression-review.py script (/performance/compression-review/compression-review.py) + +**Quick Install**: `pip3 install -r requirements.txt` + +## Using the Sizing Tool +`python3 sizing.py --uri ` + +- Automatically uses zstd-3-dict compression (matching DocumentDB 8.0) +- Samples 1000 documents per collection by default +- Run on any instance in the replica set +- Creates a single CSV file per execution: `sizing-.csv` +- The \ options can be found at https://www.mongodb.com/docs/manual/reference/connection-string/ + - If your URI contains ampersand (&) characters they must be escaped with the backslash or enclosed your URI in double quotes +- For DocumentDB use either the cluster endpoint or any of the instance endpoints + +### Optional Parameters + +| Parameter | Default | Description | +| ----------- | ----------- | ----------- | +| --sample-size | 1000 | Number of documents to sample per collection | +| --dictionary-sample-size | 100 | Number of documents for dictionary creation | + +### Example Usage + +Localhost (no authentication): +``` +python3 sizing.py --uri "mongodb://localhost:27017" +``` + +Remote server with authentication: +``` +python3 sizing.py --uri "mongodb://username:password@hostname:27017" +``` + +With custom sample size: +``` +python3 sizing.py --uri "mongodb://username:password@hostname:27017" --sample-size 2000 +``` + +## Output + +The tool generates a CSV file named: `sizing-.csv` in your current working directory (where you run the command). + +Example: `sizing-20260204123045.csv` + +### CSV Columns +- **SLNo** - Serial number +- **Database_Name** - Name of the database +- **Collection_Name** - Name of the collection +- **Document_Count** - Number of documents +- **Average_Document_Size** - Average document size (bytes) +- **Total_Indexes** - Number of indexes +- **Index_Size** - Total index size (GB) +- **Index_Working_Set** - Percentage of indexes in memory (%) +- **Data_Working_Set** - Percentage of data in memory (%) +- **Inserts_Per_Day** - Daily insert operations (count) +- **Updates_Per_Day** - Daily update operations (count) +- **Deletes_Per_Day** - Daily delete operations (count) +- **Reads_Per_Day** - Daily read operations (count) +- **Compression_Ratio** - Compression ratio + +### Important Note: Manual Updates Required + +The generated CSV includes default placeholder values for workload metrics that **MUST be manually updated** in a text editor: + +| Field | Default Value | Description | +|-------|---------------|-------------| +| **Index_Working_Set** | 100 | Percentage of indexes that need to be in memory | +| **Data_Working_Set** | 10 | Percentage of data that needs to be in memory | +| **Inserts_Per_Day** | 0 | Number of insert operations per day | +| **Updates_Per_Day** | 0 | Number of update operations per day | +| **Deletes_Per_Day** | 0 | Number of delete operations per day | +| **Reads_Per_Day** | 0 | Number of read operations per day | + +**Why manual updates are required:** +- These statistics cannot be calculated automatically from database metadata +- They require knowledge of your application's workload patterns +- Accurate values are critical for proper instance sizing and cost estimation + +**How to update:** +1. Locate the generated CSV file in your current working directory (where you ran the command) +2. Open the CSV file in a text editor (not Excel, which may corrupt the format) +3. Locate the columns for the fields above +4. Update each row with values based on your workload knowledge +5. Save the file +6. Upload to the [DocumentDB Cost Estimator](https://aws.improving.com/documentdb/cost-estimator/) + +**Tips for determining values:** +- **Working Sets**: Use MongoDB monitoring tools or `db.serverStatus()` to understand memory usage patterns +- **Daily Operations**: Check application logs, MongoDB profiler, or monitoring dashboards for operation counts +- **Conservative estimates**: If unsure, use higher working set percentages and operation counts for safer sizing + +## How It Works +1. Runs compression-review.py to analyze compression ratios using zstd-3-dict +2. Connects to MongoDB to gather collection statistics (document counts, sizes, indexes) +3. Combines compression data with collection metadata +4. Generates a CSV file formatted for the [DocumentDB Cost Estimator](https://aws.improving.com/documentdb/cost-estimator/) +5. Cleans up temporary files + +## Next Steps +1. Run the sizing tool to generate your CSV file +2. Open the CSV and update workload metrics (working sets and daily operations) with your actual values +3. Upload the CSV to the [DocumentDB Cost Estimator](https://aws.improving.com/documentdb/cost-estimator/) +4. Review the sizing recommendations + +## License +This tool is licensed under the Apache 2.0 License. diff --git a/sizing-tool/requirements.txt b/sizing-tool/requirements.txt new file mode 100644 index 0000000..220379f --- /dev/null +++ b/sizing-tool/requirements.txt @@ -0,0 +1,3 @@ +pymongo +lz4 +zstandard diff --git a/sizing-tool/sizing.py b/sizing-tool/sizing.py new file mode 100644 index 0000000..9775850 --- /dev/null +++ b/sizing-tool/sizing.py @@ -0,0 +1,395 @@ +import argparse +import sys +import csv +import glob +import os +import datetime as dt +import pymongo +import importlib.util + +# Compressor to use for compression analysis +# zstd-3-dict matches Amazon DocumentDB 8.0 dictionary-based compression +COMPRESSOR = 'zstd-3-dict' + +# Fixed dictionary size in Amazon DocumentDB 8.0 dictionary-based compression +DICTIONARY_SIZE_BYTES = 4096 + +# Server alias base for output file naming +SERVER_ALIAS_BASE = 'temp' + + +def load_compression_module(): + """ + Load the compression-review.py module dynamically. + + Returns: + module: The loaded compression_review module + + Raises: + RuntimeError: If the compression-review.py file does not exist or cannot be loaded + """ + script_dir = os.path.dirname(os.path.abspath(__file__)) + compression_script = os.path.join( + script_dir, '..', 'performance', 'compression-review', 'compression-review.py' + ) + + # Check if the file exists + if not os.path.exists(compression_script): + raise RuntimeError( + f"Compression module not found at: {compression_script}\n" + f"Expected location: ../performance/compression-review/compression-review.py\n" + f"Please ensure the compression-review tool is available in the correct directory." + ) + + # Check if it's a file (not a directory) + if not os.path.isfile(compression_script): + raise RuntimeError( + f"Path exists but is not a file: {compression_script}\n" + f"Expected a Python script at this location." + ) + + try: + spec = importlib.util.spec_from_file_location("compression_review", compression_script) + if spec is None or spec.loader is None: + raise RuntimeError( + f"Failed to create module spec for: {compression_script}\n" + f"The file may not be a valid Python module." + ) + + compression_module = importlib.util.module_from_spec(spec) + spec.loader.exec_module(compression_module) + + # Verify the module has the required getData function + if not hasattr(compression_module, 'getData'): + raise RuntimeError( + f"Compression module loaded but missing required 'getData' function.\n" + f"The compression-review.py file may be corrupted or outdated." + ) + + return compression_module + + except Exception as e: + if isinstance(e, RuntimeError): + raise + raise RuntimeError( + f"Error loading compression module from {compression_script}: {e}" + ) + + +def cleanup_csv_files(csv_files): + """ + Remove CSV files and log any errors. + + Args: + csv_files: Iterable of CSV file paths to remove + """ + for csv_file in csv_files: + try: + os.remove(csv_file) + print(f"Cleaned up partial file: {csv_file}", file=sys.stderr) + except Exception as e: + print(f"Warning: Could not remove file {csv_file}: {e}", file=sys.stderr) + + +def run_compression_and_get_output(uri, sample_size, dictionary_sample_size): + """ + Run compression analysis and return the path to the generated CSV file. + + Args: + uri: MongoDB connection URI + sample_size: Number of documents to sample per collection + dictionary_sample_size: Number of documents for dictionary creation + + Returns: + str: Path to the generated compression CSV file + + Raises: + RuntimeError: If compression analysis fails or no CSV file is created + """ + print("Running compression analysis...") + + # Load the compression module + compression_module = load_compression_module() + + # Create server alias with PID for concurrency safety + server_alias = f"{SERVER_ALIAS_BASE}-{os.getpid()}" + + # Get list of existing CSV files before running compression analysis + csv_pattern = f"{server_alias}-*-compression-review.csv" + existing_csv_files = set(glob.glob(csv_pattern)) + + # Configure and run compression analysis + app_config = { + 'uri': uri, + 'serverAlias': server_alias, + 'sampleSize': sample_size, + 'compressor': COMPRESSOR, + 'dictionarySampleSize': dictionary_sample_size, + 'dictionarySize': DICTIONARY_SIZE_BYTES + } + + try: + compression_module.getData(app_config) + except Exception as e: + # Clean up any partial CSV files that may have been created + current_csv_files = set(glob.glob(csv_pattern)) + new_csv_files = current_csv_files - existing_csv_files + if new_csv_files: + cleanup_csv_files(new_csv_files) + raise RuntimeError(f"Error running compression analysis: {e}") + + # Find the newly created CSV file by comparing before and after + current_csv_files = set(glob.glob(csv_pattern)) + new_csv_files = current_csv_files - existing_csv_files + + if not new_csv_files: + raise RuntimeError(f"No new CSV file created. Expected pattern: {csv_pattern}") + + if len(new_csv_files) > 1: + print(f"Warning: Multiple new CSV files found: {new_csv_files}", file=sys.stderr) + # Use the most recent one + latest_csv = max(new_csv_files, key=os.path.getmtime) + else: + latest_csv = new_csv_files.pop() + + print(f"Parsing results from: {latest_csv}") + return latest_csv + + +def parse_compression_csv(csv_filepath): + """ + Parse compression review CSV and extract collection data. + + Args: + csv_filepath: Path to the compression review CSV file + + Returns: + dict: Dictionary mapping 'db.collection' to compression data + + Raises: + RuntimeError: If CSV header cannot be found or file is invalid + """ + comp_data = {} + + with open(csv_filepath, 'r') as f: + # Read all lines to find where the actual data starts + lines = f.readlines() + + # Find the header line (starts with dbName) + header_idx = None + for i, line in enumerate(lines): + if line.startswith('dbName'): + header_idx = i + break + + if header_idx is None: + raise RuntimeError("Could not find data header in CSV") + + # Use DictReader for named column access + reader = csv.DictReader(lines[header_idx:]) + + for row in reader: + try: + # Access columns by name instead of index + db_name = row['dbName'] + coll_name = row['collName'] + num_docs = int(row['numDocs']) + avg_doc_size = int(row['avgDocSize']) + comp_ratio = float(row['projectedCompRatio']) + + key = f"{db_name}.{coll_name}" + comp_data[key] = { + 'db_name': db_name, + 'coll_name': coll_name, + 'num_docs': num_docs, + 'avg_doc_size': avg_doc_size, + 'comp_ratio': comp_ratio + } + except (KeyError, ValueError) as e: + # Skip rows with missing columns or invalid data + print(f"Warning: Skipping row due to error: {e}", file=sys.stderr) + continue + + return comp_data + + +def generate_sizing_csv(comp_data, uri): + """ + Generate cost estimator CSV by combining compression data with MongoDB stats. + + Args: + comp_data: Dictionary of compression data from parse_compression_csv() + uri: MongoDB connection URI + + Returns: + str: Path to the generated sizing CSV file + """ + print("Connecting to MongoDB to gather additional stats...") + + # Create output CSV file + log_timestamp = dt.datetime.now(dt.timezone.utc).strftime('%Y%m%d%H%M%S') + output_filename = f"sizing-{log_timestamp}.csv" + + with pymongo.MongoClient(host=uri, appname='workload-calc', serverSelectionTimeoutMS=5000) as client: + with open(output_filename, 'w', newline='') as csvfile: + csvwriter = csv.writer(csvfile) + + # Write header + csvwriter.writerow([ + 'SLNo', 'Database_Name', 'Collection_Name', 'Document_Count', + 'Average_Document_Size', 'Total_Indexes', 'Index_Size', + 'Index_Working_Set', 'Data_Working_Set', 'Inserts_Per_Day', + 'Updates_Per_Day', 'Deletes_Per_Day', 'Reads_Per_Day', + 'Compression_Ratio' + ]) + + sl_no = 1 + + # Iterate through collections from compression data + for key, data in comp_data.items(): + db_name = data['db_name'] + coll_name = data['coll_name'] + + try: + # Get collection stats from MongoDB + stats = client[db_name].command("collStats", coll_name) + + doc_count = data['num_docs'] + avg_doc_size = data['avg_doc_size'] + total_indexes = stats.get('nindexes', 0) + index_size_bytes = stats.get('totalIndexSize', 0) + index_size_gb = index_size_bytes / (1024 * 1024 * 1024) + comp_ratio = data['comp_ratio'] + + # Default estimates for workload metrics + index_working_set = 100 + data_working_set = 10 + inserts_per_day = 0 + updates_per_day = 0 + deletes_per_day = 0 + reads_per_day = 0 + + # Write row + csvwriter.writerow([ + sl_no, + db_name, + coll_name, + doc_count, + avg_doc_size, + total_indexes, + f"{index_size_gb:.4f}", + index_working_set, + data_working_set, + inserts_per_day, + updates_per_day, + deletes_per_day, + reads_per_day, + f"{comp_ratio:.4f}" + ]) + + sl_no += 1 + + except Exception as e: + print(f"Error processing {db_name}.{coll_name}: {e}", file=sys.stderr) + continue + + return output_filename + + +def validate_args(args): + """ + Validate command-line arguments. + + Args: + args: Parsed arguments from argparse + + Raises: + ValueError: If any argument is invalid + """ + # Validate URI format + if not args.uri: + raise ValueError("MongoDB URI cannot be empty") + + if not (args.uri.startswith('mongodb://') or args.uri.startswith('mongodb+srv://')): + raise ValueError("MongoDB URI must start with 'mongodb://' or 'mongodb+srv://'") + + # Validate sample size (only check lower bound) + if args.sample_size <= 0: + raise ValueError(f"Sample size must be positive, got: {args.sample_size}") + + # Validate dictionary sample size (only check lower bound) + if args.dictionary_sample_size <= 0: + raise ValueError(f"Dictionary sample size must be positive, got: {args.dictionary_sample_size}") + + +def main(): + parser = argparse.ArgumentParser(description='Run compression review and analyze results') + + parser.add_argument('--uri', + required=True, + type=str, + help='MongoDB Connection URI') + + parser.add_argument('--sample-size', + required=False, + type=int, + default=1000, + help='Number of documents to sample in each collection, default 1000') + + parser.add_argument('--dictionary-sample-size', + required=False, + type=int, + default=100, + help='Number of documents to sample for dictionary creation') + + args = parser.parse_args() + + # Validate arguments + try: + validate_args(args) + except ValueError as e: + parser.error(str(e)) + + compression_csv = None # Initialize to handle cleanup in finally + + try: + # Run compression analysis and get the output CSV file + compression_csv = run_compression_and_get_output( + uri=args.uri, + sample_size=args.sample_size, + dictionary_sample_size=args.dictionary_sample_size + ) + + # Parse compression CSV to extract collection data + comp_data = parse_compression_csv(compression_csv) + + # Generate sizing CSV by combining compression data with MongoDB stats + output_filename = generate_sizing_csv(comp_data, args.uri) + + print(f"\nSizing CSV generated: {output_filename}") + print("\n" + "="*80) + print("IMPORTANT: Manual Updates Required") + print("="*80) + print("\nThe following fields have been set to default values and MUST be updated") + print("manually in a text editor based on your workload knowledge:\n") + print(" • Index_Working_Set (default: 100) - Percentage of indexes in memory") + print(" • Data_Working_Set (default: 10) - Percentage of data in memory") + print(" • Inserts_Per_Day (default: 0) - Daily insert operations") + print(" • Updates_Per_Day (default: 0) - Daily update operations") + print(" • Deletes_Per_Day (default: 0) - Daily delete operations") + print(" • Reads_Per_Day (default: 0) - Daily read operations") + print("\nThese statistics cannot be calculated automatically and require knowledge") + print("of your existing workload patterns. Open the CSV file in a text editor") + print("and update these values for accurate sizing recommendations.") + print("="*80 + "\n") + + except RuntimeError as e: + print(str(e), file=sys.stderr) + sys.exit(1) + finally: + # Clean up the compression-review CSV file if it was created + if compression_csv is not None: + cleanup_csv_files([compression_csv]) + +if __name__ == "__main__": + main() diff --git a/sizing-tool/test/README.md b/sizing-tool/test/README.md new file mode 100644 index 0000000..ea992d3 --- /dev/null +++ b/sizing-tool/test/README.md @@ -0,0 +1,77 @@ +# Sizing Tool Tests + +This directory contains unit tests for the sizing tool. + +## Prerequisites + +- Python 3.7+ +- No external dependencies required (tests use `unittest.mock` for all external calls) +- Tests do not require MongoDB connection or the compression-review.py script + +## Running Tests + +### Run all tests +```bash +# From the test directory +python -m unittest test_sizing + +# With verbose output +python -m unittest test_sizing -v +``` + +### Run specific test class +```bash +python -m unittest test_sizing.TestValidateArgs +``` + +### Run specific test +```bash +python -m unittest test_sizing.TestValidateArgs.test_valid_args +``` + +## Test Coverage + +The test suite includes unit tests for: + +- **Argument validation** - URI format, sample sizes, parameter bounds +- **CSV parsing** - Valid data, missing headers, invalid rows, empty files +- **Compression module loading** - File existence, module validation, error handling +- **Compression execution** - Successful runs, file creation, error scenarios, cleanup +- **Sizing CSV generation** - MongoDB stats collection, multiple collections, error handling + +## Test Structure + +All tests use mocks to avoid external dependencies: +- MongoDB connections are mocked using `unittest.mock` +- File system operations use temporary files +- The compression-review.py module is mocked for isolation + +This ensures tests run quickly and don't require any external services or configuration. + +## Adding New Tests + +When adding new functionality to sizing.py: + +1. Create a new test class or add to an existing one +2. Use descriptive test names that explain what is being tested +3. Mock all external dependencies (MongoDB, file system, external modules) +4. Test both success and failure scenarios +5. Include edge cases and boundary conditions + +Example test structure: +```python +class TestNewFeature(unittest.TestCase): + """Tests for new_feature function""" + + @patch('sizing.external_dependency') + def test_success_case(self, mock_dependency): + """Test successful execution""" + # Setup mocks + mock_dependency.return_value = expected_value + + # Execute + result = new_feature() + + # Assert + self.assertEqual(result, expected_value) +``` diff --git a/sizing-tool/test/test_sizing.py b/sizing-tool/test/test_sizing.py new file mode 100644 index 0000000..1cbe26d --- /dev/null +++ b/sizing-tool/test/test_sizing.py @@ -0,0 +1,534 @@ +import unittest +import os +import csv +import tempfile +from unittest.mock import Mock, patch, MagicMock +from argparse import Namespace +import sys + +# Import functions from sizing.py (parent directory) +sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))) +from sizing import ( + validate_args, + parse_compression_csv, + run_compression_and_get_output, + generate_sizing_csv, + load_compression_module +) + + +class TestValidateArgs(unittest.TestCase): + """Tests for validate_args function""" + + def test_valid_args(self): + """Test that valid arguments pass validation""" + args = Namespace( + uri='mongodb://localhost:27017', + sample_size=1000, + dictionary_sample_size=100 + ) + # Should not raise any exception + validate_args(args) + + def test_valid_args_with_srv(self): + """Test that mongodb+srv:// URI is valid""" + args = Namespace( + uri='mongodb+srv://cluster.mongodb.net', + sample_size=1000, + dictionary_sample_size=100 + ) + validate_args(args) + + def test_empty_uri(self): + """Test that empty URI raises ValueError""" + args = Namespace( + uri='', + sample_size=1000, + dictionary_sample_size=100 + ) + with self.assertRaisesRegex(ValueError, "MongoDB URI cannot be empty"): + validate_args(args) + + def test_invalid_uri_format(self): + """Test that invalid URI format raises ValueError""" + args = Namespace( + uri='http://localhost:27017', + sample_size=1000, + dictionary_sample_size=100 + ) + with self.assertRaisesRegex(ValueError, "must start with 'mongodb://' or 'mongodb\\+srv://'"): + validate_args(args) + + def test_negative_sample_size(self): + """Test that negative sample size raises ValueError""" + args = Namespace( + uri='mongodb://localhost:27017', + sample_size=-100, + dictionary_sample_size=100 + ) + with self.assertRaisesRegex(ValueError, "Sample size must be positive"): + validate_args(args) + + def test_zero_sample_size(self): + """Test that zero sample size raises ValueError""" + args = Namespace( + uri='mongodb://localhost:27017', + sample_size=0, + dictionary_sample_size=100 + ) + with self.assertRaisesRegex(ValueError, "Sample size must be positive"): + validate_args(args) + + def test_negative_dictionary_sample_size(self): + """Test that negative dictionary sample size raises ValueError""" + args = Namespace( + uri='mongodb://localhost:27017', + sample_size=1000, + dictionary_sample_size=-10 + ) + with self.assertRaisesRegex(ValueError, "Dictionary sample size must be positive"): + validate_args(args) + + def test_large_values_accepted(self): + """Test that large values are accepted (no upper limits)""" + args = Namespace( + uri='mongodb://localhost:27017', + sample_size=10000000, # 10 million + dictionary_sample_size=5000000 # 5 million + ) + # Should not raise any exception + validate_args(args) + + +class TestParseCompressionCsv(unittest.TestCase): + """Tests for parse_compression_csv function""" + + def test_parse_valid_csv(self): + """Test parsing a valid compression CSV""" + csv_content = """compressor,docsSampled,dictDocsSampled,dictBytes +zstd-3-dict,1000,100,4096 + +dbName,collName,numDocs,avgDocSize,sizeGB,storageGB,existingCompRatio,compEnabled,minSample,maxSample,avgSample,minComp,maxComp,avgComp,projectedCompRatio,exceptions,compTime(ms) +testdb,users,10000,512,5.0,2.5,2.0,Y/1024,256,1024,512,128,512,256,2.0,0,123.45 +testdb,orders,5000,1024,5.0,2.0,2.5,Y/1024,512,2048,1024,256,1024,512,2.0,0,234.56 +""" + with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.csv') as f: + f.write(csv_content) + temp_file = f.name + + try: + result = parse_compression_csv(temp_file) + + self.assertEqual(len(result), 2) + self.assertIn('testdb.users', result) + self.assertIn('testdb.orders', result) + + users_data = result['testdb.users'] + self.assertEqual(users_data['db_name'], 'testdb') + self.assertEqual(users_data['coll_name'], 'users') + self.assertEqual(users_data['num_docs'], 10000) + self.assertEqual(users_data['avg_doc_size'], 512) + self.assertEqual(users_data['comp_ratio'], 2.0) + + orders_data = result['testdb.orders'] + self.assertEqual(orders_data['db_name'], 'testdb') + self.assertEqual(orders_data['coll_name'], 'orders') + self.assertEqual(orders_data['num_docs'], 5000) + finally: + os.unlink(temp_file) + + def test_parse_csv_missing_header(self): + """Test that missing header raises RuntimeError""" + csv_content = """compressor,docsSampled,dictDocsSampled,dictBytes +zstd-3-dict,1000,100,4096 + +testdb,users,10000,512,5.0,2.5,2.0,Y/1024,256,1024,512,128,512,256,2.0,0,123.45 +""" + with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.csv') as f: + f.write(csv_content) + temp_file = f.name + + try: + with self.assertRaisesRegex(RuntimeError, "Could not find data header in CSV"): + parse_compression_csv(temp_file) + finally: + os.unlink(temp_file) + + def test_parse_csv_with_invalid_row(self): + """Test that invalid rows are skipped with warning""" + csv_content = """compressor,docsSampled,dictDocsSampled,dictBytes +zstd-3-dict,1000,100,4096 + +dbName,collName,numDocs,avgDocSize,sizeGB,storageGB,existingCompRatio,compEnabled,minSample,maxSample,avgSample,minComp,maxComp,avgComp,projectedCompRatio,exceptions,compTime(ms) +testdb,users,10000,512,5.0,2.5,2.0,Y/1024,256,1024,512,128,512,256,2.0,0,123.45 +testdb,invalid,not_a_number,512,5.0,2.5,2.0,Y/1024,256,1024,512,128,512,256,2.0,0,123.45 +testdb,orders,5000,1024,5.0,2.0,2.5,Y/1024,512,2048,1024,256,1024,512,2.0,0,234.56 +""" + with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.csv') as f: + f.write(csv_content) + temp_file = f.name + + try: + result = parse_compression_csv(temp_file) + + # Should have 2 valid rows (invalid row skipped) + self.assertEqual(len(result), 2) + self.assertIn('testdb.users', result) + self.assertIn('testdb.orders', result) + self.assertNotIn('testdb.invalid', result) + finally: + os.unlink(temp_file) + + def test_parse_empty_csv(self): + """Test parsing an empty CSV""" + csv_content = """compressor,docsSampled,dictDocsSampled,dictBytes +zstd-3-dict,1000,100,4096 + +dbName,collName,numDocs,avgDocSize,sizeGB,storageGB,existingCompRatio,compEnabled,minSample,maxSample,avgSample,minComp,maxComp,avgComp,projectedCompRatio,exceptions,compTime(ms) +""" + with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.csv') as f: + f.write(csv_content) + temp_file = f.name + + try: + result = parse_compression_csv(temp_file) + self.assertEqual(len(result), 0) + finally: + os.unlink(temp_file) + + +class TestLoadCompressionModule(unittest.TestCase): + """Tests for load_compression_module function""" + + def test_load_module_file_not_found(self): + """Test that missing compression module raises RuntimeError""" + with patch('sizing.os.path.exists', return_value=False): + with self.assertRaisesRegex(RuntimeError, "Compression module not found"): + load_compression_module() + + def test_load_module_path_is_directory(self): + """Test that directory path raises RuntimeError""" + with patch('sizing.os.path.exists', return_value=True): + with patch('sizing.os.path.isfile', return_value=False): + with self.assertRaisesRegex(RuntimeError, "Path exists but is not a file"): + load_compression_module() + + def test_load_module_invalid_spec(self): + """Test that invalid module spec raises RuntimeError""" + with patch('sizing.os.path.exists', return_value=True): + with patch('sizing.os.path.isfile', return_value=True): + with patch('sizing.importlib.util.spec_from_file_location', return_value=None): + with self.assertRaisesRegex(RuntimeError, "Failed to create module spec"): + load_compression_module() + + def test_load_module_missing_getdata_function(self): + """Test that module without getData function raises RuntimeError""" + mock_module = MagicMock() + del mock_module.getData # Remove the getData attribute + + with patch('sizing.os.path.exists', return_value=True): + with patch('sizing.os.path.isfile', return_value=True): + with patch('sizing.importlib.util.spec_from_file_location') as mock_spec_from_file: + mock_spec = MagicMock() + mock_spec_from_file.return_value = mock_spec + with patch('sizing.importlib.util.module_from_spec', return_value=mock_module): + with self.assertRaisesRegex(RuntimeError, "missing required 'getData' function"): + load_compression_module() + + def test_load_module_success(self): + """Test successful module loading""" + mock_module = MagicMock() + mock_module.getData = MagicMock() + + with patch('sizing.os.path.exists', return_value=True): + with patch('sizing.os.path.isfile', return_value=True): + with patch('sizing.importlib.util.spec_from_file_location') as mock_spec_from_file: + mock_spec = MagicMock() + mock_spec_from_file.return_value = mock_spec + with patch('sizing.importlib.util.module_from_spec', return_value=mock_module): + result = load_compression_module() + self.assertEqual(result, mock_module) + self.assertTrue(hasattr(result, 'getData')) + + +class TestRunCompressionAndGetOutput(unittest.TestCase): + """Tests for run_compression_and_get_output function""" + + @patch('sizing.load_compression_module') + @patch('sizing.glob.glob') + def test_successful_compression_run(self, mock_glob, mock_load_compression): + """Test successful compression analysis run""" + # Setup mocks + mock_compression_module = MagicMock() + mock_load_compression.return_value = mock_compression_module + + mock_glob.side_effect = [ + [], # No existing files + ['temp-20260209120000-compression-review.csv'] # New file created + ] + + result = run_compression_and_get_output( + uri='mongodb://localhost:27017', + sample_size=1000, + dictionary_sample_size=100 + ) + + self.assertEqual(result, 'temp-20260209120000-compression-review.csv') + mock_compression_module.getData.assert_called_once() + mock_load_compression.assert_called_once() + + @patch('sizing.load_compression_module') + @patch('sizing.glob.glob') + def test_compression_run_with_existing_files(self, mock_glob, mock_load_compression): + """Test compression run when old files exist""" + # Setup mocks + mock_compression_module = MagicMock() + mock_load_compression.return_value = mock_compression_module + + mock_glob.side_effect = [ + ['temp-20260209110000-compression-review.csv'], # Existing file + [ + 'temp-20260209110000-compression-review.csv', + 'temp-20260209120000-compression-review.csv' + ] # Old + new file + ] + + result = run_compression_and_get_output( + uri='mongodb://localhost:27017', + sample_size=1000, + dictionary_sample_size=100 + ) + + self.assertEqual(result, 'temp-20260209120000-compression-review.csv') + + @patch('sizing.load_compression_module') + @patch('sizing.glob.glob') + def test_compression_run_no_file_created(self, mock_glob, mock_load_compression): + """Test error when no CSV file is created""" + # Setup mocks + mock_compression_module = MagicMock() + mock_load_compression.return_value = mock_compression_module + + mock_glob.side_effect = [[], []] + + with self.assertRaisesRegex(RuntimeError, "No new CSV file created"): + run_compression_and_get_output( + uri='mongodb://localhost:27017', + sample_size=1000, + dictionary_sample_size=100 + ) + + @patch('sizing.load_compression_module') + @patch('sizing.glob.glob') + def test_compression_run_failure(self, mock_glob, mock_load_compression): + """Test error handling when compression analysis fails""" + mock_compression_module = MagicMock() + mock_compression_module.getData.side_effect = Exception("Connection failed") + mock_load_compression.return_value = mock_compression_module + + mock_glob.return_value = [] + + with self.assertRaisesRegex(RuntimeError, "Error running compression analysis"): + run_compression_and_get_output( + uri='mongodb://localhost:27017', + sample_size=1000, + dictionary_sample_size=100 + ) + + @patch('sizing.load_compression_module') + @patch('sizing.glob.glob') + @patch('sizing.os.path.getmtime') + def test_multiple_new_files_created(self, mock_getmtime, mock_glob, mock_load_compression): + """Test handling when multiple new files are created""" + # Setup mocks + mock_compression_module = MagicMock() + mock_load_compression.return_value = mock_compression_module + + mock_glob.side_effect = [ + [], # No existing files + [ + 'temp-20260209120000-compression-review.csv', + 'temp-20260209120001-compression-review.csv' + ] # Two new files + ] + + # Mock getmtime to return different times based on filename + def getmtime_side_effect(filename): + if '120001' in filename: + return 2000 # Newer file + else: + return 1000 # Older file + + mock_getmtime.side_effect = getmtime_side_effect + + result = run_compression_and_get_output( + uri='mongodb://localhost:27017', + sample_size=1000, + dictionary_sample_size=100 + ) + + # Should return the most recent file + self.assertEqual(result, 'temp-20260209120001-compression-review.csv') + + @patch('sizing.load_compression_module') + def test_compression_module_load_failure(self, mock_load_compression): + """Test error handling when compression module fails to load""" + mock_load_compression.side_effect = RuntimeError("Compression module not found") + + with self.assertRaisesRegex(RuntimeError, "Compression module not found"): + run_compression_and_get_output( + uri='mongodb://localhost:27017', + sample_size=1000, + dictionary_sample_size=100 + ) + + +class TestGenerateSizingCsv(unittest.TestCase): + """Tests for generate_sizing_csv function""" + + @patch('sizing.pymongo.MongoClient') + @patch('sizing.dt.datetime') + def test_generate_sizing_csv_success(self, mock_datetime, mock_mongo_client): + """Test successful sizing CSV generation""" + # Setup mocks + mock_datetime.now.return_value.strftime.return_value = '20260209120000' + + mock_client = MagicMock() + mock_mongo_client.return_value.__enter__.return_value = mock_client + + # Mock MongoDB collStats response + mock_client.__getitem__.return_value.command.return_value = { + 'nindexes': 3, + 'totalIndexSize': 1073741824 # 1GB + } + + comp_data = { + 'testdb.users': { + 'db_name': 'testdb', + 'coll_name': 'users', + 'num_docs': 10000, + 'avg_doc_size': 512, + 'comp_ratio': 2.0 + } + } + + with tempfile.TemporaryDirectory() as tmpdir: + os.chdir(tmpdir) + + result = generate_sizing_csv( + comp_data=comp_data, + uri='mongodb://localhost:27017' + ) + + self.assertEqual(result, 'sizing-20260209120000.csv') + self.assertTrue(os.path.exists(result)) + + # Verify CSV content + with open(result, 'r') as f: + reader = csv.reader(f) + rows = list(reader) + + # Check header + self.assertEqual(rows[0][0], 'SLNo') + self.assertEqual(rows[0][1], 'Database_Name') + + # Check data row + self.assertEqual(rows[1][0], '1') + self.assertEqual(rows[1][1], 'testdb') + self.assertEqual(rows[1][2], 'users') + self.assertEqual(rows[1][3], '10000') + + @patch('sizing.pymongo.MongoClient') + @patch('sizing.dt.datetime') + def test_generate_sizing_csv_with_error(self, mock_datetime, mock_mongo_client): + """Test sizing CSV generation with collection error""" + # Setup mocks + mock_datetime.now.return_value.strftime.return_value = '20260209120000' + + mock_client = MagicMock() + mock_mongo_client.return_value.__enter__.return_value = mock_client + + # Mock MongoDB collStats to raise exception + mock_client.__getitem__.return_value.command.side_effect = Exception("Collection not found") + + comp_data = { + 'testdb.users': { + 'db_name': 'testdb', + 'coll_name': 'users', + 'num_docs': 10000, + 'avg_doc_size': 512, + 'comp_ratio': 2.0 + } + } + + with tempfile.TemporaryDirectory() as tmpdir: + os.chdir(tmpdir) + + result = generate_sizing_csv( + comp_data=comp_data, + uri='mongodb://localhost:27017' + ) + + # Should still create file, but with no data rows + self.assertTrue(os.path.exists(result)) + + with open(result, 'r') as f: + reader = csv.reader(f) + rows = list(reader) + + # Only header, no data rows + self.assertEqual(len(rows), 1) + + @patch('sizing.pymongo.MongoClient') + @patch('sizing.dt.datetime') + def test_generate_sizing_csv_multiple_collections(self, mock_datetime, mock_mongo_client): + """Test sizing CSV generation with multiple collections""" + # Setup mocks + mock_datetime.now.return_value.strftime.return_value = '20260209120000' + + mock_client = MagicMock() + mock_mongo_client.return_value.__enter__.return_value = mock_client + + # Mock MongoDB collStats response + mock_client.__getitem__.return_value.command.return_value = { + 'nindexes': 2, + 'totalIndexSize': 536870912 # 512MB + } + + comp_data = { + 'testdb.users': { + 'db_name': 'testdb', + 'coll_name': 'users', + 'num_docs': 10000, + 'avg_doc_size': 512, + 'comp_ratio': 2.0 + }, + 'testdb.orders': { + 'db_name': 'testdb', + 'coll_name': 'orders', + 'num_docs': 5000, + 'avg_doc_size': 1024, + 'comp_ratio': 2.5 + } + } + + with tempfile.TemporaryDirectory() as tmpdir: + os.chdir(tmpdir) + + result = generate_sizing_csv( + comp_data=comp_data, + uri='mongodb://localhost:27017' + ) + + with open(result, 'r') as f: + reader = csv.reader(f) + rows = list(reader) + + # Header + 2 data rows + self.assertEqual(len(rows), 3) + self.assertEqual(rows[1][2], 'users') + self.assertEqual(rows[2][2], 'orders') + + +if __name__ == '__main__': + unittest.main()