common/parser: add proper reasoning tag prefill reading (#20424)

* Implement proper prefill extraction * Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp * Update tools/server/server-task.cpp * refactor: move grammars to variant, remove grammar_external, handle exception internally * Make code less C++y Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-19 16:58:21 +01:00 · 2026-03-19 16:58:21 +01:00 · 5e54d51b19
commit 5e54d51b19
parent c1258830b2
33 changed files with 651 additions and 454 deletions
--- a/tools/cli/cli.cpp
+++ b/tools/cli/cli.cpp
@ -105,7 +105,7 @@ struct cli_context {
                    llama_get_model(ctx_server.get_llama_context()));

                task.params.sampling.reasoning_budget_tokens = reasoning_budget;
-                task.params.sampling.reasoning_budget_activate_immediately = chat_params.thinking_forced_open;
+                task.params.sampling.generation_prompt = chat_params.generation_prompt;

                if (!chat_params.thinking_start_tag.empty()) {
                    task.params.sampling.reasoning_budget_start =