llama-cpp-turboquant

History

Xuan-Son Nguyen 6c2131773c cli: new CLI experience (#17824 ) * wip * wip * fix logging, add display info * handle commands * add args * wip * move old cli to llama-completion * rm deprecation notice * move server to a shared library * move ci to llama-completion * add loading animation * add --show-timings arg * add /read command, improve LOG_ERR * add args for speculative decoding, enable show timings by default * add arg --image and --audio * fix windows build * support reasoning_content * fix llama2c workflow * color default is auto * fix merge conflicts * properly fix color problem Co-authored-by: bandoti <bandoti@users.noreply.github.com> * better loading spinner * make sure to clean color on force-exit * also clear input files on "/clear" * simplify common_log_flush * add warning in mtmd-cli * implement console writter * fix data race * add attribute * fix llama-completion and mtmd-cli * add some notes about console::log * fix compilation --------- Co-authored-by: bandoti <bandoti@users.noreply.github.com>		2025-12-10 15:28:59 +01:00
..
batched	examples : add -kvu to batched usage example [no ci] (#17469 )	2025-11-24 15:38:45 +02:00
batched.swift	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
convert-llama2c-to-ggml	gguf: gguf_writer refactor (#15691 )	2025-09-05 11:34:28 +02:00
deprecation-warning
diffusion	models : Added support for RND1 Diffusion Language Model (#17433 )	2025-11-24 14:16:56 +08:00
embedding	ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276 )	2025-11-28 17:33:23 +02:00
eval-callback	common : more accurate sampling timing (#17382 )	2025-11-20 13:40:10 +02:00
gen-docs	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
gguf	examples(gguf): GGUF example outputs (#17025 )	2025-11-05 19:58:16 +02:00
gguf-hash
idle	metal : add residency sets keep-alive heartbeat (#17766 )	2025-12-05 19:38:54 +02:00
llama.android
llama.swiftui
lookahead	lookahead : add sample command to readme (#15447 )	2025-08-20 13:30:46 +03:00
lookup
model-conversion	model-conversion : add token ids to prompt token output [no ci] (#17863 )	2025-12-08 17:13:08 +01:00
parallel
passkey	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
retrieval	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
save-load-state	metal : fix build(#17799 )	2025-12-06 09:33:59 +02:00
simple	examples : support encoder-decoder models in the simple example (#16002 )	2025-09-17 10:29:00 +03:00
simple-chat
simple-cmake-pkg	examples : add missing code block end marker [no ci] (#17756 )	2025-12-04 14:17:30 +01:00
speculative	sampling : optimize samplers by reusing bucket sort (#15665 )	2025-08-31 20:41:02 +03:00
speculative-simple
sycl	sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566 )	2025-11-29 14:59:44 +02:00
training
CMakeLists.txt	metal : add residency sets keep-alive heartbeat (#17766 )	2025-12-05 19:38:54 +02:00
convert_legacy_llama.py
json_schema_pydantic_example.py
json_schema_to_grammar.py	common : fix json schema with '\' in literals (#17307 )	2025-11-29 17:06:32 +01:00
llama.vim	llama : remove KV cache defragmentation logic (#15473 )	2025-08-22 12:22:13 +03:00
pydantic_models_to_grammar.py
pydantic_models_to_grammar_examples.py
reason-act.sh
regex_to_grammar.py
server-llama2-13B.sh
server_embd.py
ts-type-to-grammar.sh