Document choosing cluster options#799
Document choosing cluster options#799TomAugspurger wants to merge 1 commit intopangeo-data:masterfrom
Conversation
|
that's exactly what I have been wondering about - what is the right combination to squeeze the most out of it. I will try it out and see. I have a combination of a lot of tasks and need of memory, so I will have to play a bit, but the fact of the 26gb mem is helpful! |
|
@TomAugspurger I tried to scale down worker_memory to 11 and requests more workers - but I don't get them all either. I then tried to scale up the memory (to both 25 and 22) and the workers are not starting at all. @rabernat looked at the dashboard and it seems like the node pools is not scaling up. Can you help? This is the latest clusters I tried to request that didn't start at all and |
|
When I request 11gb instead I get something out of it. So it appears like it's better to be on the 11gb max? I am not trying to be difficult :) I am just trying to understand! |
|
I think the memory per core should (probably) always be a factor of about 6, since that's what the physical machine has (6.25 - 6.5 GB/core) or That ensures that we hit the CPU and memory limits at about the same time. |
|
Tom, thanks for helping to debug this.
The problem with this suggestion is that it fixes the memory / CPU ratio. Some workloads, like @chiaral's just require more memory per core due to the nature of the data and the computation being performed. How do you recommend handling this? |
|
Mmm so there are a couple layers here:
But the short answer is yes: the application's workload is the first factor to consider. Past that we're just optimizing resources. I noticed that our limits didn't have the right ratio. The max memory:CPU ratio should reflect the type of machines we're using: pangeo-data/pangeo-cloud-federation#807 |
cc @chiaral. I think this is why you weren't getting all your workers. 15Gb / worker happens to not schedule well onto our physical machines (with ~26GB of memory). Something like 11GB will work better.
You might also try 4 cores / worker with ~ 25 GB / worker. I think that would also get better utilization. I plan to verify this once things quiet down on the cluster a bit :)