llama-cpp-turboquant/examples
Georgi Gerganov a10b36c91a
llama : refactor kv cache guard (#12695)
* llama : refactor kv cache guard

ggml-ci

* cont : fix comment [no ci]

* llama : fix kv_cache restore logic

ggml-ci

* context : simplify kv cache updates

ggml-ci

* cont : better name [no ci]

* llama : fix llama_decode return code when could not find KV slot

ggml-ci

* context : change log err -> warn [no ci]

* kv-cache : add comment + warning
2025-04-02 14:32:59 +03:00
..
batched common : refactor downloading system, handle mmproj with -hf option (#12694) 2025-04-01 23:44:05 +02:00
batched-bench common : refactor downloading system, handle mmproj with -hf option (#12694) 2025-04-01 23:44:05 +02:00
batched.swift llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
convert-llama2c-to-ggml llama : add llama_vocab, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
cvector-generator llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
deprecation-warning
embedding llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
eval-callback llama : add llama_vocab, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
export-lora common : refactor downloading system, handle mmproj with -hf option (#12694) 2025-04-01 23:44:05 +02:00
gbnf-validator Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639) 2025-01-30 19:13:58 +00:00
gen-docs
gguf GGUF: C++ refactor, backend support, misc fixes (#11030) 2025-01-07 18:01:58 +01:00
gguf-hash GGUF: C++ refactor, backend support, misc fixes (#11030) 2025-01-07 18:01:58 +01:00
gguf-split ci : use -no-cnv in gguf-split tests (#11254) 2025-01-15 18:28:35 +02:00
gritlm common : refactor downloading system, handle mmproj with -hf option (#12694) 2025-04-01 23:44:05 +02:00
imatrix llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
infill llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
jeopardy
llama-bench llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
llama.android llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
llama.swiftui llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
llava common : refactor downloading system, handle mmproj with -hf option (#12694) 2025-04-01 23:44:05 +02:00
lookahead llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
lookup llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
main docs : bring llama-cli conversation/template docs up-to-date (#12426) 2025-03-17 21:14:32 +01:00
parallel llama : refactor kv cache guard (#12695) 2025-04-02 14:32:59 +03:00
passkey common : refactor downloading system, handle mmproj with -hf option (#12694) 2025-04-01 23:44:05 +02:00
perplexity llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
quantize ggml : portability fixes for VS 2017 (#12150) 2025-03-04 18:53:26 +02:00
quantize-stats llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
retrieval llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
rpc rpc : update README for cache usage (#12620) 2025-03-28 09:44:13 +02:00
run run: de-duplicate fmt and format functions and optimize (#11596) 2025-03-25 18:46:11 +01:00
save-load-state llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
server common : remove json.hpp from common.cpp (#12697) 2025-04-02 09:58:34 +02:00
simple llama : add llama_vocab, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
simple-chat llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
simple-cmake-pkg repo : update links to new url (#11886) 2025-02-15 16:40:57 +02:00
speculative common : refactor downloading system, handle mmproj with -hf option (#12694) 2025-04-01 23:44:05 +02:00
speculative-simple common : refactor downloading system, handle mmproj with -hf option (#12694) 2025-04-01 23:44:05 +02:00
sycl [SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035) 2025-02-24 22:33:23 +08:00
tokenize llama : add llama_vocab, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
tts common : refactor downloading system, handle mmproj with -hf option (#12694) 2025-04-01 23:44:05 +02:00
chat-13B.bat
chat-13B.sh
chat-persistent.sh
chat-vicuna.sh
chat.sh
CMakeLists.txt tts : add OuteTTS support (#10784) 2024-12-18 19:27:21 +02:00
convert_legacy_llama.py
json_schema_pydantic_example.py
json_schema_to_grammar.py tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034) 2025-03-05 13:05:13 +00:00
llama.vim repo : update links to new url (#11886) 2025-02-15 16:40:57 +02:00
llm.vim
Miku.sh
pydantic_models_to_grammar.py
pydantic_models_to_grammar_examples.py repo : update links to new url (#11886) 2025-02-15 16:40:57 +02:00
reason-act.sh
regex_to_grammar.py
server-llama2-13B.sh
server_embd.py
ts-type-to-grammar.sh