[WAN] Use different sharding strategy for self and cross attention.#250
[WAN] Use different sharding strategy for self and cross attention.#250hyeygit wants to merge 1 commit intoAI-Hypercomputer:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
I noticed that the running command doesn't set |
Good call out. I re-ran the benchmark with 720p: 197s (prev 200s, baseline 215s) 480p: 62s (prev 66s, baseline 78s) |
I tried running with yielded a generation time of 240s, worse than the 215s baseline (where data=2, fsdp=4, tensor=1). |
Benchmark results on v6e-8:
480p video generation (81 frames)
720p video generation (81 frames)
Benchmark command used:
480p:
720p: