feat:add migrate for 4.0#3180
Conversation
|
Caution Review failedFailed to post review comments WalkthroughThis PR introduces a comprehensive pika-migrate tool with CMake build support, enabling data migration from Pika to Redis/Pika systems. It adds numerous command implementations, caching subsystems, replication coordination, ACL management, and associated utilities, representing a substantial architectural expansion of the tools directory. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Rationale: This diff is exceptionally large with 100+ file changes spanning multiple subsystems (build system, command framework, database model, replication/consensus, migration tool, caching layer). The changes are highly heterogeneous—mix of architectural refactoring (pink→net migration), new subsystems (ACL, consensus, migration, cache), API reshaping (uint16_t→uint32_t flags, partition→DB model), and complete removal of legacy abstractions. Logic density is moderate-to-high with many interdependencies across layers requiring careful cross-file review. Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 111
♻️ Duplicate comments (1)
tools/pika_migrate_v4.0/src/net/examples/thread_pool_test.cc (1)
64-66: Same ownership concern as lines 44-46.The raw pointer allocation pattern here has the same potential leak issue discussed in the earlier comment.
Also applies to: 83-85
| @@ -0,0 +1 @@ | |||
| ### Contributing to pika | |||
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add actual contribution guidelines.
This file currently contains only a placeholder heading without any guidance for contributors. Consider adding sections on:
- How to set up the development environment
- Coding standards and style guides
- How to submit pull requests
- Testing requirements
- Communication channels
Would you like me to generate a comprehensive contribution guide template?
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/CONTRIBUTING.md around lines 1 to 1, the file only
has a placeholder heading and lacks contribution guidance; replace the
placeholder with a structured CONTRIBUTING guide that includes sections for:
development environment setup (prereqs, install, run), coding standards/style
and linting rules, testing requirements and how to run tests, PR process and
commit message format, and communication channels/issue templates; keep sections
concise, include example commands where helpful, and ensure the document links
to relevant config files (eslint, prettier, CI) and a template PR checklist.
| @@ -0,0 +1,14 @@ | |||
| #!/bin/bash | |||
|
|
|||
| find include src tools -regex '.*\.\(cpp\|hpp\|c\|h\|cc\)' | xargs clang-format -i | |||
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use null-terminated filenames for robustness.
The current find | xargs pattern can break on filenames containing spaces or special characters.
Apply this diff:
-find include src tools -regex '.*\.\(cpp\|hpp\|c\|h\|cc\)' | xargs clang-format -i
+find include src tools -regex '.*\.\(cpp\|hpp\|c\|h\|cc\)' -print0 | xargs -0 clang-format -iBased on static analysis (SC2038).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| find include src tools -regex '.*\.\(cpp\|hpp\|c\|h\|cc\)' | xargs clang-format -i | |
| find include src tools -regex '.*\.\(cpp\|hpp\|c\|h\|cc\)' -print0 | xargs -0 clang-format -i |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 3-3: Use 'find .. -print0 | xargs -0 ..' or 'find .. -exec .. +' to allow non-alphanumeric filenames.
(SC2038)
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/format_code.sh around line 3, the current pipeline
uses find | xargs which breaks on filenames with spaces/special characters;
modify the command to use null-terminated filenames by using find ... -print0
and piping to xargs -0 (or alternatively replace with find ... -exec
clang-format -i {} +) so filenames are handled robustly.
| # for FILE in $(git diff --cached --name-only | grep -E '.*\.(cpp|hpp|c|h|cc)') | ||
| # do | ||
| # if [[ "$FILE" =~ .*.(cpp|hpp|c|h)$ ]];then | ||
| # clang-format -i $FILE | ||
| # fi | ||
| # done |
There was a problem hiding this comment.
Fix regex pattern in commented pre-commit hook.
Line 11 has two issues:
- The dot before the extension needs to be escaped:
\.(cpp|hpp|c|h)$ - The pattern excludes
.ccfiles mentioned in line 9
Apply this diff to fix:
-# if [[ "$FILE" =~ .*.(cpp|hpp|c|h)$ ]];then
+# if [[ "$FILE" =~ \.(cpp|hpp|c|h|cc)$ ]];then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # for FILE in $(git diff --cached --name-only | grep -E '.*\.(cpp|hpp|c|h|cc)') | |
| # do | |
| # if [[ "$FILE" =~ .*.(cpp|hpp|c|h)$ ]];then | |
| # clang-format -i $FILE | |
| # fi | |
| # done | |
| # for FILE in $(git diff --cached --name-only | grep -E '.*\.(cpp|hpp|c|h|cc)') | |
| # do | |
| # if [[ "$FILE" =~ \.(cpp|hpp|c|h|cc)$ ]];then | |
| # clang-format -i $FILE | |
| # fi | |
| # done |
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/format_code.sh around lines 9-14, the commented
pre-commit hook uses an incorrect regex: the dot before the extension must be
escaped and the `.cc` extension is missing in the conditional; update both
occurrences so the pattern reads `\.(cpp|hpp|c|h|cc)$` (use this escaped,
anchored pattern in the grep -E and in the [[ ... =~ ... ]] test) to correctly
match .cpp/.hpp/.c/.h/.cc files.
|
|
||
| #include "include/redis_sender.h" | ||
|
|
||
| class MigratorThread : public net::Thread { |
There was a problem hiding this comment.
Missing include for base class net::Thread (likely compile error).
This header derives from net::Thread but doesn’t include its declaration. Include net_thread.h explicitly to avoid ODR/implicit include dependency.
Apply this diff:
#include <iostream>
#include <mutex>
+#include <atomic>
#include "storage/storage.h"
-#include "net/include/redis_cli.h"
+#include "net/include/net_thread.h"
+#include "net/include/redis_cli.h"Also applies to: 7-11
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/include/migrator_thread.h around lines 12 (and also
applies to lines 7-11), the header declares class MigratorThread : public
net::Thread but does not include the net::Thread declaration; add an explicit
include of the header that defines net::Thread (e.g., include "net/thread.h" or
the project's net_thread.h path) at the top of migrator_thread.h so the base
class is declared before use and remove any implicit/ODR-dependent reliance on
transitive includes.
| void Stop() { | ||
| should_exit_ = true; | ||
| } |
There was a problem hiding this comment.
Data race on should_exit_ (set in Stop, read in worker loops).
should_exit_ is mutated from another thread without synchronization. Use std::atomic.
Apply this diff:
- void Stop() {
- should_exit_ = true;
- }
+ void Stop() {
+ should_exit_.store(true, std::memory_order_relaxed);
+ }And:
- bool should_exit_;
+ std::atomic<bool> should_exit_{false};Also applies to: 55-56
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/include/migrator_thread.h around lines 31-33 and also
at 55-56, the member should_exit_ is written from Stop() on one thread and read
in worker loops on another thread without synchronization; change its type to
std::atomic<bool> (include <atomic>) and update Stop() to use
should_exit_.store(true) and worker-loop checks to use should_exit_.load()
(optionally with explicit memory_order semantics) so accesses are thread-safe.
| if (argc < 2) { | ||
| printf("Usage: ./simple_http_server port"); | ||
| } else { | ||
| port = atoi(argv[1]); | ||
| } |
There was a problem hiding this comment.
Missing return statement causes undefined behavior.
When argc < 2, the program prints usage but continues execution with an uninitialized port variable (Line 70). This will cause undefined behavior when port is passed to NewDispatchThread on Line 80.
Apply this diff:
if (argc < 2) {
- printf("Usage: ./simple_http_server port");
+ printf("Usage: ./simple_http_server port\n");
+ return 1;
} else {
port = atoi(argv[1]);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (argc < 2) { | |
| printf("Usage: ./simple_http_server port"); | |
| } else { | |
| port = atoi(argv[1]); | |
| } | |
| if (argc < 2) { | |
| printf("Usage: ./simple_http_server port\n"); | |
| return 1; | |
| } else { | |
| port = atoi(argv[1]); | |
| } |
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/src/net/examples/simple_http_server.cc around lines
71 to 75, the code prints usage when argc < 2 but then continues with an
uninitialized port variable; modify the branch so the program exits immediately
after printing the usage (for example return 1; or call exit(1)) to avoid
undefined behavior and ensure port is only used when initialized.
| if (argc < 2) { | ||
| printf("Usage: ./simple_http_server port"); | ||
| } else { | ||
| port = atoi(argv[1]); |
There was a problem hiding this comment.
Validate port number input.
atoi returns 0 for invalid input and doesn't detect errors. Invalid inputs like "abc" or out-of-range values will cause runtime failures.
Apply this diff to add proper validation:
- port = atoi(argv[1]);
+ char* endptr;
+ long port_long = strtol(argv[1], &endptr, 10);
+ if (*endptr != '\0' || port_long < 1 || port_long > 65535) {
+ printf("Error: Invalid port number. Must be between 1 and 65535.\n");
+ return 1;
+ }
+ port = static_cast<int>(port_long);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| port = atoi(argv[1]); | |
| char* endptr; | |
| long port_long = strtol(argv[1], &endptr, 10); | |
| if (*endptr != '\0' || port_long < 1 || port_long > 65535) { | |
| printf("Error: Invalid port number. Must be between 1 and 65535.\n"); | |
| return 1; | |
| } | |
| port = static_cast<int>(port_long); |
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/src/net/examples/simple_http_server.cc around line
74, replace the unsafe atoi call with robust validation: parse argv[1] using
strtol (or strtoul), check errno for ERANGE, ensure endptr points to end of
string (no non-numeric chars), and verify the resulting value is within the
valid port range (1–65535); on any validation failure print a clear usage/error
message and exit with non-zero status so invalid or out-of-range inputs (e.g.,
"abc" or huge numbers) are rejected safely.
| BackendThread(ConnFactory* conn_factory, int cron_interval, int keepalive_timeout, BackendHandle* handle, | ||
| void* private_data); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
❓ Verification inconclusive
Clarify ownership semantics for raw pointers.
The constructor accepts raw pointers for conn_factory, handle, and private_data without clear ownership semantics. While own_handle_ (line 143) suggests conditional ownership for the handle, ownership for conn_factory_ and private_data_ is unclear. This could lead to memory leaks or double-free issues.
Run the following script to verify ownership patterns and lifecycle management:
Also applies to: 151-151
🏁 Script executed:
#!/bin/bash
# Find BackendThread destructor implementation and constructor usage
ast-grep --pattern 'BackendThread::~BackendThread() {
$$$
}'
# Search for BackendThread instantiation patterns to understand ownership
rg -n -C5 --type=cpp 'new BackendThread\('
# Look for delete or cleanup of conn_factory and private_data
rg -n -C3 --type=cpp '\bdelete\s+(conn_factory|private_data)'Length of output: 0
Clarify ownership semantics for BackendThread constructor parameters
The raw pointers conn_factory, handle, and private_data passed into BackendThread lack documented ownership: own_handle_ hints at conditional handle ownership, but it’s unclear who owns and deletes the others. Either document and implement cleanup in the destructor or switch to std::unique_ptr/std::shared_ptr to make ownership explicit.
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/src/net/include/backend_thread.h around lines
104-105, the constructor parameters conn_factory, handle, and private_data lack
clear ownership semantics; update the API to make ownership explicit by either
(A) changing conn_factory and handle to std::shared_ptr or std::unique_ptr
(choose shared_ptr if ownership may be shared, unique_ptr if BackendThread
exclusively owns them) and replace void* private_data with a
std::shared_ptr<void> or a typed smart pointer, or (B) document in the header
that BackendThread does not take ownership of conn_factory and private_data
while own_handle_ controls only handle ownership, and implement proper cleanup
in the destructor (delete handle when own_handle_ is true, do not delete
non-owned pointers) and update comments accordingly.
| pstd::Mutex mu_; | ||
| std::map<int, std::vector<std::string>> to_send_; // ip+":"+port, to_send_msg | ||
|
|
||
| std::map<int, std::shared_ptr<NetConn>> conns_; | ||
| std::set<int> connecting_fds_; |
There was a problem hiding this comment.
🧩 Analysis chain
Verify thread-safe access to connection data structures.
The mu_ mutex explicitly protects to_send_ (line 154), but it's unclear whether conns_ (line 156) and connecting_fds_ (line 157) are also protected by this mutex. Methods like GetConn(int fd), Write(int fd, ...), and Close(int fd) likely access these structures from multiple threads and require proper synchronization.
Run the following script to verify mutex usage patterns:
🏁 Script executed:
#!/bin/bash
# Find implementations of thread-sensitive methods
ast-grep --pattern 'Status BackendThread::Write($_, $_) {
$$$
}'
ast-grep --pattern 'std::shared_ptr<NetConn> BackendThread::GetConn($_) {
$$$
}'
# Search for mutex lock patterns around conns_ and connecting_fds_ access
rg -n -C5 --type=cpp 'std::lock_guard.*mu_|mu_\.lock\(\)' tools/pika_migrate_v4.0/src/net/Length of output: 33378
I need to investigate connecting_fds_ usage patterns more thoroughly to determine if there's a real issue.
[/skip_cloning]
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for all connecting_fds_ usage patterns
rg -n 'connecting_fds_' tools/pika_migrate_v4.0/src/net/ --type=cpp --type=h
Length of output: 2111
🏁 Script executed:
#!/bin/bash
# Also check how connecting_fds_ interacts with the lock in the full context
rg -n -B5 -A5 'connecting_fds_' tools/pika_migrate_v4.0/src/net/src/backend_thread.ccLength of output: 2470
Protect connecting_fds_ with mu_
All insert/erase/iteration on connecting_fds_ must be performed under mu_ to avoid data races (e.g., wrap the call in SetWaitConnectOnEpoll at backend_thread.cc:125 and the erases in the event-loop handlers around lines 457–462 with std::lock_guardstd::mutex l(mu_)).
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/src/net/include/backend_thread.h around lines
153–157, connecting_fds_ is accessed without synchronization; protect all
insert/erase/iteration on connecting_fds_ with the mutex mu_. Specifically, add
a lock (e.g., std::lock_guard<std::mutex> l(mu_)) around the call site in
SetWaitConnectOnEpoll at backend_thread.cc:125 and around the erase/iteration
logic in the event-loop handlers near lines 457–462 so every modification and
traversal of connecting_fds_ happens while mu_ is held.
| }; | ||
|
|
||
| } // namespace net | ||
| #endif // NET_INCLUDE_CLIENT_THREAD_H_ |
There was a problem hiding this comment.
Critical: Include guard mismatch.
The closing include guard uses NET_INCLUDE_CLIENT_THREAD_H_ but the opening guard at line 6 uses NET_INCLUDE_BACKEND_THREAD_H_. This breaks include guard protection and will cause compilation errors if a client_thread.h header also exists in the project.
Apply this diff to fix the guard:
-#endif // NET_INCLUDE_CLIENT_THREAD_H_
+#endif // NET_INCLUDE_BACKEND_THREAD_H_📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #endif // NET_INCLUDE_CLIENT_THREAD_H_ | |
| #endif // NET_INCLUDE_BACKEND_THREAD_H_ |
🤖 Prompt for AI Agents
In tools/pika_migrate_v4.0/src/net/include/backend_thread.h around line 161, the
closing include guard macro is incorrect (NET_INCLUDE_CLIENT_THREAD_H_) while
the opening guard is NET_INCLUDE_BACKEND_THREAD_H_; update the final #endif
comment to use NET_INCLUDE_BACKEND_THREAD_H_ so the opening and closing guards
match exactly, ensuring the include guard protection is restored.
Summary by CodeRabbit
New Features
Improvements
Version