-
Notifications
You must be signed in to change notification settings - Fork 529
【Feature】When the cluster capacity is almost full, make the cluster read only #2868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dc1cee6 to
1041ceb
Compare
|
|
||
| message DiskState { | ||
| required uint32 errType = 1; | ||
| required ErrorType errType = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using ErrorType instead of uint32 satisfy compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i have checked them all
src/chunkserver/copyset_node.cpp
Outdated
| } | ||
| } | ||
| // 等待写操作完成,否则on_apply结束后,异步有写错误无法调用set_error_and_rollback() | ||
| concurrentapply_->Flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause performance degradation, which is not acceptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any better ideas. When the on_apply() method completes, last_applied_index will be updated and the Iterator will be destructed, but concurrent tasks may not be completed yet. Calling iterator->set_error_and_callback() may fail when a write error occurs.
|
|
||
| case CHUNK_OP_STATUS::CHUNK_OP_STATUS_READONLY: | ||
| OnReadOnly(); | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the space is full, the client needs to retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
| ChunkServerState state; | ||
| if (request.diskstate().errtype() != 0) { | ||
|
|
||
| switch (request.diskstate().errtype()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the code style should be consistent with the code repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
| topology_->SetCopySetAvalFlag(key, false); | ||
| } | ||
| // 设置disk error,copyset就不会迁移到这个chunkserver | ||
| state.SetDiskState(curve::mds::topology::DISKERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a new disk state, maybe DISKFULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added DISKFULL status
1041ceb to
d39e3fb
Compare
|
cicheck |
src/fs/ext4_filesystem_impl.cpp
Outdated
| if (errno == EINTR && retryTimes < MAX_RETYR_TIME) { | ||
| ++retryTimes; | ||
| continue; | ||
| } else if (errno == ENOSPC) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改在这里可能不合适,需要返回错误,以阻止client端不停的重试IO导致更多的空间不足
|
cicheck |
1 similar comment
|
cicheck |
|
cicheck |
3134351 to
b9219d6
Compare
src/chunkserver/op_request.cpp
Outdated
| << ", request: " << request.ShortDebugString(); | ||
| } | ||
| break; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| }; | |
| } |
src/chunkserver/op_request.cpp
Outdated
| LOG(WARNING) << "write failed: " | ||
| << " data store return: " << ret | ||
| << ", request: " << request_->ShortDebugString(); | ||
| sleep(WAIT_FOR_DISK_FREED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function may be executed in bthread, it's better to use bthread_usleep
proto/chunk.proto
Outdated
| CHUNK_OP_STATUS_CHUNK_EXIST = 11; // chunk已存在 | ||
| CHUNK_OP_STATUS_EPOCH_TOO_OLD = 12; // request epoch too old | ||
| CHUNK_OP_STATUS_READONLY = 13; // copyset其他节点故障,设为只读 | ||
| CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误 | |
| CHUNK_OP_STATUS_NO_SPACE = 14; // 空间不足错误 |
proto/chunk.proto
Outdated
| CHUNK_OP_STATUS_CHUNK_EXIST = 11; // chunk已存在 | ||
| CHUNK_OP_STATUS_EPOCH_TOO_OLD = 12; // request epoch too old | ||
| CHUNK_OP_STATUS_READONLY = 13; // copyset其他节点故障,设为只读 | ||
| CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use English comments
| required uint32 writeIOPS = 4; | ||
| } | ||
|
|
||
| enum ErrorType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reuse DiskState in topology.proto?
src/chunkserver/heartbeat.cpp
Outdated
| for (CopysetNodePtr copyset : copysets) { | ||
|
|
||
| // 如果磁盘空间不足设为readonly | ||
| if (diskState->errtype() == curve::mds::heartbeat::DISKFULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to call SetReadOnly only if disk state changed
src/chunkserver/op_request.cpp
Outdated
| } else if (CSErrorCode::NoSpaceError == ret) { | ||
| LOG(ERROR) << "paste chunk failed: " | ||
| << ", request: " << request_->ShortDebugString(); | ||
| sleep(WAIT_FOR_DISK_FREED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, use bthread_usleep and it's better to add WAIT_FOR_DISK_FREED into configuration file like chunkfilepool.diskUsagePercentLimit
src/chunkserver/op_request.cpp
Outdated
| << ", request: " << request.ShortDebugString(); | ||
| } | ||
| break; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| }; | |
| } |
| curve::mds::heartbeat::ErrorType errType = request.diskstate().errtype(); | ||
|
|
||
| if (errType == curve::mds::heartbeat::DISKFULL) { | ||
| // 当chunkserver磁盘接近满,需要将copyset availflag设为false,避免新空间从这些copyset分配 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use English comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caoxianfei1 PTAL~
e4f77ce to
6f09bcd
Compare
|
cicheck |
1 similar comment
|
cicheck |
6f09bcd to
46d06b4
Compare
|
cicheck |
46d06b4 to
fb3b7a4
Compare
|
cicheck |
c4e6aca to
9771cbf
Compare
…ead only Signed-off-by: liuminjian <[email protected]>
|
cicheck |
What problem does this PR solve?
Issue Number: #2561
Problem Summary: When the space of a single chunkserver of curvebs is insufficient, chunkserver will down directly
What is changed and how it works?
What's Changed:
How it Works:
Side effects(Breaking backward compatibility? Performance regression?):
Older versions of chunkserver need to add disk limit usage percentage configuration
Check List