Skip to content

metal : fix out-of-bounds access + style changes#2416

Merged
ggerganov merged 2 commits intomasterfrom
fix-concurrency
Aug 7, 2023
Merged

metal : fix out-of-bounds access + style changes#2416
ggerganov merged 2 commits intomasterfrom
fix-concurrency

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented Jul 27, 2023

ref #2413

This line could perform out-of-bounds access when the concur_list element is -1:

if (gf->nodes[ctx->concur_list[j]] == src_cur) {is_found = 1; break;}

However, there is still bug in the logic. See #2413 for description.
For now, I've disabled the concurrency optimization on master. We should try to fix and re-enable it

@lshzh-ww
Copy link
Copy Markdown
Contributor

I just checked that for a 33B model, which has n_nodes 2224, concur_list_len is 3668. For 65B model with n_nodes 2964 we certainly hit the maximum length of concur_list. We should set the length of concur_list to 2*GGML_MAX_NODES. In this case, even if there is no concurrency between the nodes and we have to insert a barrier between every two operations, we still won't hit the limit.

Not sure if there are more bugs. I am still waiting to download a 65B model to test if this change will work.

@lshzh-ww
Copy link
Copy Markdown
Contributor

lshzh-ww commented Jul 27, 2023

Change int concur_list[GGML_MAX_NODES]; to int concur_list[2 * GGML_MAX_NODES]; in ggml_metal_context. Also change following lines in ggml_metal_graph_find_concurrency accordingly.

for (int i = 0; i < 2 * GGML_MAX_NODES; i++) { ctx->concur_list[i] = 0; }
......
if (ctx->concur_list_len > 2 * GGML_MAX_NODES) {
    fprintf(stderr, "%s: too many elements for metal ctx->concur_list!\n", __func__);
}

I can complete the execution of the ggml_metal_graph_find_concurrency with suggested changes. My laptop only has 32GB memory so I can't really run the inference. Still we need someone with 64GB memory to test if this change works.

@ProjectAtlantis-dev
Copy link
Copy Markdown

Confirmed that above changes result in ggml_metal_graph_find_concurrency() running to completion with 65B

@ggerganov
Copy link
Copy Markdown
Member Author

Need to check if concurrency feature still works after #2411

@slaren
Copy link
Copy Markdown
Member

slaren commented Jul 31, 2023

It should work as before, since the graph allocator is disabled with Metal. If it works with the allocator it will only be by chance, there is no guarantee that the operations executed concurrently won't write to the same memory address.

@lshzh-ww
Copy link
Copy Markdown
Contributor

The concurrency feature does work. I just checked it.

@ggerganov
Copy link
Copy Markdown
Member Author

If the allocator works with Metal and disabled concurrency optimization, it would be better to enable it and try to implement a smarter concurrency logic that would be compatible with the allocator.

@slaren
Copy link
Copy Markdown
Member

slaren commented Jul 31, 2023

The allocator should work with Metal without concurrency if buf_alloc is registered with ggml_metal_add_buffer. However, this is not a problem that the concurrency algorithm can solve, this needs to be fixed in the allocator so that concurrent operations are assigned different addresses.

@ggerganov
Copy link
Copy Markdown
Member Author

I think the allocator gains are bigger compared to the concurrency optimization, so we should enable it for Metal builds, and try to improve it to do concurrency detection as you suggest.

@lshzh-ww
Copy link
Copy Markdown
Contributor

lshzh-ww commented Aug 1, 2023

Actually it’s quite easy to make concurrency optimization work with the new allocator. We only need to pass metal_ctx->concur_list to the allocator, and then let the allocator parse all the nodes in order of metal_ctx->concur_list.

We will use a little bit more memory because we have to reserve memory for nodes that can be issued concurrently:

Memory difference n_batch=512 n_batch=128
33B n_ctx=512 +21MB +5.3MB
33B n_ctx=2048 +27MB +6.8MB

I think we can merge this PR for now. I may open a separate PR later this week to make concurrency optimization work with the new allocator.

@ggerganov
Copy link
Copy Markdown
Member Author

OK, lets apply the array size fix and merge

@ggerganov ggerganov merged commit f6f9896 into master Aug 7, 2023
@ggerganov ggerganov deleted the fix-concurrency branch August 7, 2023 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants