Skip to content

[Seeking Feedback] Flux Resource Model Strawman #247

@dongahn

Description

@dongahn

I implemented a strawman to help me further design the upcoming flux resource comms. module -- a new service that selects the best-matching resources for each job. At this point, it will be extremely helpful if I can get some feedback from advanced scheduling researchers to make sure I will have a future-scheduling-proof design. The strawman code you can play with is at https://github.com/dongahn/resource_module_strawman

Using your feedback, I would like to determine:

  • What are the resource representations suited well for various advanced scheduling you are working on? Answer, if we can use our primitives to model, traverse, and match on such representations.

  • Programmability -- can scheduler writers easily express their algorithms using our graph match visitor patterns.

This is a strawman and so doesn't have a whole lot in it. So, do assume some of other core technologies as you evaluate this. In particular, please assume you can use planner on any level in each subsystem
and keep aggregate information about the vertex’s subtree in a scalable fashion. This is descried in here and here

@SteVwonder: Please look at this from our IO-Aware scheduling algorithms angle.

  • Use PFS1 IO bandwidth subsystem as the dominant subsystem; and see if and how you would replicate your algorithm in our HPDC 16 paper. I configured a simple model as a starting point under
    --matcher=PFS1BA option (See below for the graph rendered for a mini scale system)

  • Use the containment subsystem as your dominant subsystem and PFS1 IO bandwidth subsystem as an auxiliary, and see if you can implement a similar IO Aware algorithm. I configured a simple model as a starting point under --matcher=C+PFS1BA option (See below for the graph rendered for mini scale)

  • In both cases, I suspect that you might need a bit different graph representations. Please let me know if you want to / need to have the graph model refined for your algorithms

@tpatki: Please look at this from a Power-Aware scheduling algorithm angle

  • Use the power distribution subsystem as the dominant subsystem; and see if and how you would implement power-aware scheduling algorithms. I configured a simple model as a starting point under
    --matcher=PA option (See below for the graph rendered for mini scale)

  • Use the containment subsystem as the dominant subsystem and power distribution subsystem as an auxiliary subsystem; and see if and how you can implement effective algorithms. I configured a simple model as a starting point under --matcher=C+PA option (See below for the graph rendered for mini scale)

  • In both cases, I suspect you might need a bit different graph models. Please let me know what are the models that are most suitable for your algorithms. I happened to talk with an IBM power-scheduling researcher at my CORAL face to face meeting, and he believes that future compute nodes may need to be grouped together based on their power efficiency levels. I do believe this can be easily done by refining power hierarchy model. If you can suggest some models based on your expertise, this would be very helpful at this point.

Nikil (I don't have his github id): Please look at this from the standpoint of your IB-network-connection-aware scheduling algorithm:

  • Use the IB connection subsystem as the dominant subsystem; and see if an dhow you would implement pod-based scheduling algorithms. I configured a simple model as a starting point under --matcher=IBA option (See below for the graph rendered for mini scale)

  • Use the containment subsystem as the dominant subsystem and the IB connection subsystem as an auxiliary; see if you can implement these algorithms as well. I configured a simple model as a starting point under --matcher=C+IBA option (See below for the graph rendered for mini scale)

  • In your case, I highly suspect we would want to refine the network subsystem graph model. This opens up a good test for me because I want to see how flexible our new flux model strawman implementation allows a scheduler plugin writer like you can do this.

@lipari and @morrone: any early feedback from you two should also be high useful, of course!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions