llama-cpp-turboquant/ggml/src
hipudding f9bc66c3eb
CANN: Update several operators to support FP16 data format (#16251)
Many Ascend operators internally use FP16 precision for computation.
If input data is in FP32, it must first be cast to FP16 before
computation, and then cast back to FP32 after computation, which
introduces unnecessary cast operations. Moreover, FP16 computation
requires significantly less workload compared to FP32, leading to
noticeable efficiency improvements.

In this change, `get_rows`, `rms_norm`, and `flash_attn_ext` are extended
to support multiple data types. Validation on the Qwen2 0.5b model shows
correct accuracy and about 10% performance gain in concurrent scenarios.

Co-authored-by: noemotiovon <757486878@qq.com>
2025-10-13 08:52:22 +08:00
..
ggml-blas sync : whisper.cpp (ggml/1359) 2025-09-29 17:43:58 +03:00
ggml-cann CANN: Update several operators to support FP16 data format (#16251) 2025-10-13 08:52:22 +08:00
ggml-cpu ggml : Fix FP16 ELU positive branch (#16519) 2025-10-12 08:25:37 +03:00
ggml-cuda CUDA: faster tile FA, add oob checks, more HSs (#16492) 2025-10-11 20:54:32 +02:00
ggml-hip CUDA: faster tile FA, add oob checks, more HSs (#16492) 2025-10-11 20:54:32 +02:00
ggml-metal metal : add opt_step_adamw and op_sum (#16529) 2025-10-12 21:43:14 +03:00
ggml-musa CUDA: faster tile FA, add oob checks, more HSs (#16492) 2025-10-11 20:54:32 +02:00
ggml-opencl opencl: support pad_ext (#15888) 2025-09-30 10:45:45 -07:00
ggml-rpc rpc : check src buffer when copying tensor (#16421) 2025-10-04 16:22:45 +03:00
ggml-sycl [SYCL] fix UT fault cases: count-equal, argsort, pad OPs (#16521) 2025-10-12 21:53:35 +08:00
ggml-vulkan vulkan: use a more appropriate amount of threads when generating shaders (#16418) 2025-10-04 22:04:27 +02:00
ggml-webgpu ggml webgpu: profiling, CI updates, reworking of command submission (#16452) 2025-10-07 13:48:56 -07:00
ggml-zdnn zdnn: refactor codebase + add docs (#16178) 2025-09-23 14:53:05 +08:00
CMakeLists.txt cmake : Dont define XOPENSOURCE on AIX (#16481) 2025-10-10 11:15:46 +03:00
ggml-alloc.c ggml : fix graph reallocation with multiple chunks (#16396) 2025-10-03 13:49:08 +02:00
ggml-backend-impl.h rpc : add support for multiple devices (#16276) 2025-10-04 12:49:16 +03:00
ggml-backend-reg.cpp ggml-backend : add root cause in error message if loading backend library fails (#16172) 2025-09-29 13:17:09 +02:00
ggml-backend.cpp llama: print memory breakdown on exit (#15860) 2025-09-24 16:53:48 +02:00
ggml-common.h llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
ggml-impl.h model : Apertus model implementation (#15852) 2025-10-02 20:43:22 +03:00
ggml-opt.cpp finetune: SGD optimizer, more CLI args (#13873) 2025-08-14 12:03:57 +02:00
ggml-quants.c ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15928) 2025-09-23 10:25:20 +02:00
ggml-quants.h llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml.c ggml webgpu: add support for soft_max, optimize rms_norm (#16357) 2025-10-02 11:00:31 -07:00
ggml.cpp ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) 2025-06-01 13:43:57 +03:00
gguf.cpp gguf: gguf_writer refactor (#15691) 2025-09-05 11:34:28 +02:00