llama-cpp-turboquant

History

Jeff Bolz e68aa10d8f vulkan: sort graph to allow more parallel execution (#15850 ) * vulkan: sort graph to allow more parallel execution Add a backend proc to allow the backend to modify the graph. The vulkan implementation looks at which nodes depend on each other and greedily reorders them to group together nodes that don't depend on each other. It only reorders the nodes, doesn't change the contents of any of them. With #15489, this reduces the number of synchronizations needed. * call optimize_graph per-split		2025-09-09 02:10:07 +08:00
..
ggml-blas	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-cann	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-cpu	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-cuda	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-hip	HIP: bump requirement to rocm 6.1 (#15296 )	2025-08-13 20:44:30 +02:00
ggml-metal	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-musa	CUDA: replace GGML_CUDA_F16 with CUDA arch checks (#15433 )	2025-08-20 16:58:49 +02:00
ggml-opencl	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-rpc	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-sycl	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-vulkan	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-webgpu	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-zdnn	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
CMakeLists.txt	ggml: initial IBM zDNN backend (#14975 )	2025-08-15 21:11:22 +08:00
ggml-alloc.c	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-backend-impl.h	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-backend-reg.cpp	ggml: initial IBM zDNN backend (#14975 )	2025-08-15 21:11:22 +08:00
ggml-backend.cpp	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
ggml-common.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-impl.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-opt.cpp	finetune: SGD optimizer, more CLI args (#13873 )	2025-08-14 12:03:57 +02:00
ggml-quants.c	ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (#15379 )	2025-08-18 09:23:56 +02:00
ggml-quants.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-threading.cpp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-threading.h	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )	2024-12-12 19:02:49 +01:00
ggml.c	cuda : fix supports_op condition for get_rows when number of blocks is too large (#15868 )	2025-09-08 13:56:51 +03:00
ggml.cpp	ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)	2025-06-01 13:43:57 +03:00
gguf.cpp	gguf: gguf_writer refactor (#15691 )	2025-09-05 11:34:28 +02:00