llama-cpp-turboquant

History

Diego Devesa a5e47592b6 cuda : optimize argmax (#10441 ) * cuda : optimize argmax * remove unused parameter ggml-ci * fixup : use full warps ggml-ci * Apply suggestions from code review Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * fix ub * ggml : check ne00 <= INT32_MAX in argmax and argsort --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>		2024-11-21 18:18:50 +01:00
..
.gitignore
CMakeLists.txt	tests : remove test-grad0	2024-11-17 08:30:29 +02:00
get-model.cpp
get-model.h
run-json-schema-to-grammar.mjs
test-arg-parser.cpp
test-autorelease.cpp
test-backend-ops.cpp	cuda : optimize argmax (#10441 )	2024-11-21 18:18:50 +01:00
test-barrier.cpp
test-c.c
test-chat-template.cpp
test-double-float.cpp
test-grammar-integration.cpp
test-grammar-parser.cpp
test-json-schema-to-grammar.cpp
test-llama-grammar.cpp
test-log.cpp
test-lora-conversion-inference.sh
test-model-load-cancel.cpp
test-opt.cpp	ggml : inttypes.h -> cinttypes (#0 )	2024-11-17 08:30:29 +02:00
test-quantize-fns.cpp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
test-quantize-perf.cpp	ggml : inttypes.h -> cinttypes (#0 )	2024-11-17 08:30:29 +02:00
test-rope.cpp
test-sampling.cpp
test-tokenizer-0.cpp
test-tokenizer-0.py	py : logging and flake8 suppression refactoring (#7081 )	2024-05-05 08:07:48 +03:00
test-tokenizer-0.sh
test-tokenizer-1-bpe.cpp
test-tokenizer-1-spm.cpp
test-tokenizer-random.py