ggml-webgpu: improve flastAttention performance by software pipelining (#19151)
* webgpu : pipeline flash_attn Q/K loads in WGSL * ggml-webgpu: unroll Q*K accumlation inner loop * ggml-webgpu: vectorization * ggml-webgpu: unrolling * ggml-webgpu: remove redundant unrolling * ggml-webgpu: restore the config * ggml-webgpu: remove redundant comments * ggml-webgpu: formatting * ggml-webgpu: formatting and remove vectorization * ggml-webgpu: remove unnecessary constants * ggml-webgpu: change QKV buffer to read_write to pass validation * ggml-webgpu: add explanation for the additional bracket around Q K accumulate * Indentation and for -> if for tail * Kick off CI on wgsl only commits --------- Co-authored-by: Reese Levine <reeselevine1@gmail.com>
This commit is contained in:
parent
ce38a4db47
commit
bd90fc74c3
2 changed files with 109 additions and 62 deletions
6
.github/workflows/build.yml
vendored
6
.github/workflows/build.yml
vendored
|
|
@ -21,7 +21,8 @@ on:
|
|||
'**/*.m',
|
||||
'**/*.metal',
|
||||
'**/*.comp',
|
||||
'**/*.glsl'
|
||||
'**/*.glsl',
|
||||
'**/*.wgsl'
|
||||
]
|
||||
|
||||
pull_request:
|
||||
|
|
@ -42,7 +43,8 @@ on:
|
|||
'**/*.m',
|
||||
'**/*.metal',
|
||||
'**/*.comp',
|
||||
'**/*.glsl'
|
||||
'**/*.glsl',
|
||||
'**/*.wgsl'
|
||||
]
|
||||
|
||||
concurrency:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue