llama-cpp-turboquant/examples
Xuan-Son Nguyen 6c2131773c
cli: new CLI experience (#17824)
* wip

* wip

* fix logging, add display info

* handle commands

* add args

* wip

* move old cli to llama-completion

* rm deprecation notice

* move server to a shared library

* move ci to llama-completion

* add loading animation

* add --show-timings arg

* add /read command, improve LOG_ERR

* add args for speculative decoding, enable show timings by default

* add arg --image and --audio

* fix windows build

* support reasoning_content

* fix llama2c workflow

* color default is auto

* fix merge conflicts

* properly fix color problem

Co-authored-by: bandoti <bandoti@users.noreply.github.com>

* better loading spinner

* make sure to clean color on force-exit

* also clear input files on "/clear"

* simplify common_log_flush

* add warning in mtmd-cli

* implement console writter

* fix data race

* add attribute

* fix llama-completion and mtmd-cli

* add some notes about console::log

* fix compilation

---------

Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00
..
batched examples : add -kvu to batched usage example [no ci] (#17469) 2025-11-24 15:38:45 +02:00
batched.swift examples : remove references to make in examples [no ci] (#15457) 2025-08-21 06:12:28 +02:00
convert-llama2c-to-ggml gguf: gguf_writer refactor (#15691) 2025-09-05 11:34:28 +02:00
deprecation-warning
diffusion models : Added support for RND1 Diffusion Language Model (#17433) 2025-11-24 14:16:56 +08:00
embedding ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276) 2025-11-28 17:33:23 +02:00
eval-callback common : more accurate sampling timing (#17382) 2025-11-20 13:40:10 +02:00
gen-docs cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
gguf examples(gguf): GGUF example outputs (#17025) 2025-11-05 19:58:16 +02:00
gguf-hash
idle metal : add residency sets keep-alive heartbeat (#17766) 2025-12-05 19:38:54 +02:00
llama.android
llama.swiftui
lookahead lookahead : add sample command to readme (#15447) 2025-08-20 13:30:46 +03:00
lookup
model-conversion model-conversion : add token ids to prompt token output [no ci] (#17863) 2025-12-08 17:13:08 +01:00
parallel
passkey examples : remove references to make in examples [no ci] (#15457) 2025-08-21 06:12:28 +02:00
retrieval examples : remove references to make in examples [no ci] (#15457) 2025-08-21 06:12:28 +02:00
save-load-state metal : fix build(#17799) 2025-12-06 09:33:59 +02:00
simple examples : support encoder-decoder models in the simple example (#16002) 2025-09-17 10:29:00 +03:00
simple-chat
simple-cmake-pkg examples : add missing code block end marker [no ci] (#17756) 2025-12-04 14:17:30 +01:00
speculative sampling : optimize samplers by reusing bucket sort (#15665) 2025-08-31 20:41:02 +03:00
speculative-simple
sycl sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566) 2025-11-29 14:59:44 +02:00
training
CMakeLists.txt metal : add residency sets keep-alive heartbeat (#17766) 2025-12-05 19:38:54 +02:00
convert_legacy_llama.py
json_schema_pydantic_example.py
json_schema_to_grammar.py common : fix json schema with '\' in literals (#17307) 2025-11-29 17:06:32 +01:00
llama.vim llama : remove KV cache defragmentation logic (#15473) 2025-08-22 12:22:13 +03:00
pydantic_models_to_grammar.py
pydantic_models_to_grammar_examples.py
reason-act.sh
regex_to_grammar.py
server-llama2-13B.sh
server_embd.py
ts-type-to-grammar.sh