llama-cpp-turboquant/ggml/src
Jeff Bolz e68aa10d8f
vulkan: sort graph to allow more parallel execution (#15850)
* vulkan: sort graph to allow more parallel execution

Add a backend proc to allow the backend to modify the graph. The
vulkan implementation looks at which nodes depend on each other
and greedily reorders them to group together nodes that don't
depend on each other. It only reorders the nodes, doesn't change
the contents of any of them.

With #15489, this reduces the number of synchronizations needed.

* call optimize_graph per-split
2025-09-09 02:10:07 +08:00
..
ggml-blas vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-cann vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-cpu vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-cuda vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-hip HIP: bump requirement to rocm 6.1 (#15296) 2025-08-13 20:44:30 +02:00
ggml-metal vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-musa CUDA: replace GGML_CUDA_F16 with CUDA arch checks (#15433) 2025-08-20 16:58:49 +02:00
ggml-opencl vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-rpc vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-sycl vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-vulkan vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-webgpu vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-zdnn vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
CMakeLists.txt ggml: initial IBM zDNN backend (#14975) 2025-08-15 21:11:22 +08:00
ggml-alloc.c llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
ggml-backend-impl.h vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-backend-reg.cpp ggml: initial IBM zDNN backend (#14975) 2025-08-15 21:11:22 +08:00
ggml-backend.cpp vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
ggml-common.h llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
ggml-impl.h llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
ggml-opt.cpp finetune: SGD optimizer, more CLI args (#13873) 2025-08-14 12:03:57 +02:00
ggml-quants.c ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (#15379) 2025-08-18 09:23:56 +02:00
ggml-quants.h llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml.c cuda : fix supports_op condition for get_rows when number of blocks is too large (#15868) 2025-09-08 13:56:51 +03:00
ggml.cpp ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) 2025-06-01 13:43:57 +03:00
gguf.cpp gguf: gguf_writer refactor (#15691) 2025-09-05 11:34:28 +02:00