-
Notifications
You must be signed in to change notification settings - Fork 0
Adding SkyPilot example for FlexGen #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
b49ffd9
5679f85
421e5ae
8ea3cde
bef90a9
f9c227f
ed4ec13
7237585
22340cf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -66,13 +66,20 @@ python3 -m flexgen.apps.helm_run --description mmlu:model=text,subject=abstract_ | |||||
| ``` | ||||||
| Note that only a subset of HELM scenarios is tested. See more tested scenarios [here](flexgen/apps/helm_passed_30b.sh). | ||||||
|
|
||||||
| ### Run FlexGen on any cloud with SkyPilot | ||||||
| FlexGen benchmark can be launched with [SkyPilot](http://skypilot.co), a tool for launching ML jobs on any cloud. | ||||||
| You can use a single command below to automatically launch the benchmark on any cloud with SkyPilot, after you setup your cloud account locally (check how to setup SkyPilot [here](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)). | ||||||
| ### Run FlexGen on Any Cloud with SkyPilot | ||||||
| FlexGen benchmark can be launched with [SkyPilot](https://github.com/skypilot-org/skypilot), a tool for launching ML jobs on any cloud. | ||||||
| First, install SkyPilot and check you have some cloud credentials ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)): | ||||||
| ```bash | ||||||
| pip install "skypilot[aws,gcp,azure,lambda]" # pick your clouds | ||||||
| sky check | ||||||
| ``` | ||||||
| sky launch -c flexgen --detach-setup flexgen/apps/task.yaml | ||||||
| You can now use a single command to automatically launch the benchmark on any cloud: | ||||||
|
||||||
| You can now use a single command to automatically launch the benchmark on any cloud: | |
| You can now use a single command to launch the benchmark on any cloud, which automatically finds a region (in the cheapest-price order) with availability for the requested GPUs: |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,7 +27,16 @@ python3 helm_run.py --description mmlu:model=text,subject=abstract_algebra,data_ | |
| ``` | ||
|
|
||
| ### Run on any cloud with SkyPilot | ||
| Run FlexGen benchmark on any cloud with [SkyPilot](http://skypilot.co). | ||
| FlexGen benchmark can be launched with [SkyPilot](https://github.com/skypilot-org/skypilot), a tool for launching ML jobs on any cloud. | ||
| First, install SkyPilot and check you have some cloud credentials ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)): | ||
| ```bash | ||
| pip install "skypilot[aws,gcp,azure,lambda]" # pick your clouds | ||
| sky check | ||
| ``` | ||
Michaelvll marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| sky launch -c flexgen --detach-setup task.yaml | ||
| You can now use a single command to automatically launch the benchmark on any cloud: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
| ```bash | ||
| sky launch -c flexgen --detach-setup skypilot.yaml | ||
| ``` | ||
| You can then log into the cluster running the job with `ssh flexgen` for monitoring. Once the job has finished, you can terminate the cluster with `sky down flexgen` or pass in `--down` flag to the command above to have the cluster terminate itself automatically. | ||
|
|
||
| To run any other FlexGen command, you can edit [`skypilot.yaml`](skypilot.yaml) and replace the `run` section. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,11 @@ | ||
| # benchmark.yaml | ||
| # A SkyPilot job definition for benchmarking FlexGen. | ||
Michaelvll marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # References: | ||
| # https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html | ||
| # https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html | ||
|
|
||
| # Specify the resources required for this job. | ||
| resources: | ||
| accelerators: T4:1 | ||
| accelerators: T4:1 # Can replace with other GPU type and count, see `sky show-gpus`. | ||
| instance_type: n1-highmem-32 # On GCP with 1 T4 GPU and more than 200GB of RAM. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Q: if a user doesn't use GCP (likely), then this YAML may not work? Is it possible to leave out the instance_type (blocked by the memory filter)?
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, leaving out the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe ok to ship first and see what reports we get. We should expect a non-GCP user to fail at the |
||
| # instance_type: g4dn.16xlarge # On AWS with 1 T4 GPU and more than 200GB of RAM. | ||
| # Azure does not support T4 GPUs with more than 200GB of RAM. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.