-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Refactor global_server_args_dict #6866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
5cc3937
a6489ab
0e86679
fd3c474
597414b
8bf9367
b4e6bfc
038645a
bc657be
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -64,7 +64,10 @@ | |
| get_global_expert_location_metadata, | ||
| set_global_expert_location_metadata, | ||
| ) | ||
| from sglang.srt.managers.schedule_batch import global_server_args_dict | ||
| from sglang.srt.managers.schedule_batch import ( | ||
| GLOBAL_SERVER_ARGS_KEYS, | ||
| global_server_args_dict, | ||
| ) | ||
| from sglang.srt.mem_cache.memory_pool import ( | ||
| DoubleSparseTokenToKVPool, | ||
| MHATokenToKVPool, | ||
|
|
@@ -186,33 +189,9 @@ def __init__( | |
|
|
||
| # Global vars | ||
| global_server_args_dict.update( | ||
| { | ||
| "attention_backend": server_args.attention_backend, | ||
| "debug_tensor_dump_inject": server_args.debug_tensor_dump_inject, | ||
| "debug_tensor_dump_output_folder": server_args.debug_tensor_dump_output_folder, | ||
| "deepep_mode": server_args.deepep_mode, | ||
| "device": server_args.device, | ||
| "disable_chunked_prefix_cache": server_args.disable_chunked_prefix_cache, | ||
| "disable_radix_cache": server_args.disable_radix_cache, | ||
| "enable_nan_detection": server_args.enable_nan_detection, | ||
| "enable_dp_attention": server_args.enable_dp_attention, | ||
| "enable_two_batch_overlap": server_args.enable_two_batch_overlap, | ||
| "enable_dp_lm_head": server_args.enable_dp_lm_head, | ||
| "enable_ep_moe": server_args.enable_ep_moe, | ||
| "enable_deepep_moe": server_args.enable_deepep_moe, | ||
| "deepep_config": server_args.deepep_config, | ||
| "flashinfer_mla_disable_ragged": server_args.flashinfer_mla_disable_ragged, | ||
| "moe_dense_tp_size": server_args.moe_dense_tp_size, | ||
| "ep_dispatch_algorithm": server_args.ep_dispatch_algorithm, | ||
| "num_fused_shared_experts": server_args.num_fused_shared_experts, | ||
| "triton_attention_reduce_in_fp32": server_args.triton_attention_reduce_in_fp32, | ||
| "torchao_config": server_args.torchao_config, | ||
| "sampling_backend": server_args.sampling_backend, | ||
| "speculative_accept_threshold_single": server_args.speculative_accept_threshold_single, | ||
| "speculative_accept_threshold_acc": server_args.speculative_accept_threshold_acc, | ||
| {k: getattr(server_args, k) for k in GLOBAL_SERVER_ARGS_KEYS} | ||
| | { | ||
| "use_mla_backend": self.use_mla_backend, | ||
| "mm_attention_backend": server_args.mm_attention_backend, | ||
| "ep_num_redundant_experts": server_args.ep_num_redundant_experts, | ||
| } | ||
| ) | ||
|
Comment on lines
191
to
198
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This code updates the global |
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line initializes a module-level global dictionary
global_server_args_dict. This dictionary is later updated inmodel_runner.py. Using mutable global state initialized at module load time and updated elsewhere can lead to issues with initialization order and make the system harder to reason about. If other parts of the code access this dictionary before the update inmodel_runner.py, they might get default or incomplete configuration values. Consider managing configuration via explicit object passing or a properly initialized singleton.