Commit graph

630 commits

Author SHA1 Message Date
Vitaly Chikunov
aadfc7a67f ALT: Generate tools/server/public/index.html.gz
State before update since 8192-alt1:
  tools/server/webui: 3306dbaef 2026-03-21 misc : prefer ggml-org models in docs and examples (#20827) (ddh0)

+ mkdir /usr/src/.npm-global
+ npm config set prefix /usr/src/.npm-global
+ npm install -g @aikidosec/safe-chain
npm warn deprecated glob@10.5.0: Old versions of glob are not supported, and contain widely publicized security vulnerabilities, which have been fixed in the current version. Please update. Support for old versions may be purchased (at exorbitant rates) by contacting i@izs.me

added 138 packages in 6s

24 packages are looking for funding
  run `npm fund` for details
+ PATH=/usr/src/.npm-global/bin:/usr/bin:/bin:/usr/local/bin
+ rm -rf llama.cpp/tools/server/public/index.html.gz
+ cd llama.cpp/tools/server/webui
+ workdir=tools/server/webui
+ target=tools/server/public/index.html.gz
+ aikido-npm ci --ignore-scripts

added 661 packages, and audited 662 packages in 40s

260 packages are looking for funding
  run `npm fund` for details

15 vulnerabilities (2 low, 4 moderate, 9 high)

To address all issues, run:
  npm audit fix

Run `npm audit` for details.
ℹ Safe-chain: Some package versions were suppressed due to minimum age requirement.
  To disable this check, use: --safe-chain-skip-minimum-package-age
+ aikido-npm audit --audit-level=critical fix

added 1 package, removed 11 packages, changed 25 packages, and audited 651 packages in 17s

253 packages are looking for funding
  run `npm fund` for details

# npm audit report

cookie  <0.7.0
cookie accepts cookie name, path, and domain with out of bounds characters - https://github.com/advisories/GHSA-pxg6-pf52-xh8x
fix available via `npm audit fix --force`
Will install @sveltejs/kit@0.0.30, which is a breaking change
node_modules/cookie
  @sveltejs/kit  >=1.0.0-next.0
  Depends on vulnerable versions of cookie
  node_modules/@sveltejs/kit
    @sveltejs/adapter-static  >=1.0.0-next.0
    Depends on vulnerable versions of @sveltejs/kit
    node_modules/@sveltejs/adapter-static
    runed  >=0.32.0
    Depends on vulnerable versions of @sveltejs/kit
    node_modules/bits-ui/node_modules/runed
      bits-ui  >=2.11.8
      Depends on vulnerable versions of runed
      Depends on vulnerable versions of svelte-toolbelt
      node_modules/bits-ui
      svelte-toolbelt  >=0.10.6
      Depends on vulnerable versions of runed
      node_modules/bits-ui/node_modules/svelte-toolbelt

6 low severity vulnerabilities

To address issues that do not require attention, run:
  npm audit fix

To address all issues (including breaking changes), run:
  npm audit fix --force
ℹ Safe-chain: Some package versions were suppressed due to minimum age requirement.
  To disable this check, use: --safe-chain-skip-minimum-package-age
+ npm run build

> webui@1.0.0 build
> vite build && ./scripts/post-build.sh

▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]

    tsconfig.json:2:12:
      2 │   "extends": "./.svelte-kit/tsconfig.json",
        ╵              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vite v7.2.2 building ssr environment for production...
transforming...
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.

More info and automated migrator: https://sass-lang.com/d/import

   ╷
17 │ @import 'katex/src/styles/katex.scss';
   │         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   ╵
    src/styles/katex-custom.scss 17:9  root stylesheet

DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.

More info and automated migrator: https://sass-lang.com/d/import

  ╷
2 │ @import "./fonts.scss";
  │         ^^^^^^^^^^^^^^
  ╵
    node_modules/katex/src/styles/katex.scss 2:9  @import
    src/styles/katex-custom.scss 17:9             root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.append instead.

More info and automated migrator: https://sass-lang.com/d/import

  ╷
9 │         $src: append($src, url('#{$font-folder}/KaTeX_#{$family}-#{$family-suffix}.woff2') format('woff2'), comma);
  │               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ╵
    node_modules/katex/src/styles/fonts.scss 9:15   generate-src()
    node_modules/katex/src/styles/fonts.scss 42:11  font-face()
    node_modules/katex/src/styles/fonts.scss 52:1   @import
    node_modules/katex/src/styles/katex.scss 2:9    @import
    src/styles/katex-custom.scss 17:9               root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
344 │         @for $from from 1 through length($sizes) {
    │                                   ^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 344:35  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
345 │             @for $to from 1 through length($sizes) {
    │                                     ^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 345:37  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
348 │                     font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
    │                                      ^^^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 348:38  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
348 │                     font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
    │                                                         ^^^^^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 348:57  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

✓ 4749 modules transformed.
Export "getJsonHeaders" of module "src/lib/utils/api-headers.ts" was reexported through module "src/lib/utils/index.ts" while both modules are dependencies of each other and will end up in different chunks by current Rollup settings. This scenario is not well supported at the moment as it will produce a circular dependency between chunks and will likely lead to broken execution order.
Either change the import in "src/lib/services/chat.service.ts" to point directly to the exporting module or reconfigure "output.manualChunks" to ensure these modules end up in the same chunk.
rendering chunks...
vite v7.2.2 building client environment for production...
transforming...
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.

More info and automated migrator: https://sass-lang.com/d/import

   ╷
17 │ @import 'katex/src/styles/katex.scss';
   │         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   ╵
    src/styles/katex-custom.scss 17:9  root stylesheet

DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.

More info and automated migrator: https://sass-lang.com/d/import

  ╷
2 │ @import "./fonts.scss";
  │         ^^^^^^^^^^^^^^
  ╵
    node_modules/katex/src/styles/katex.scss 2:9  @import
    src/styles/katex-custom.scss 17:9             root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.append instead.

More info and automated migrator: https://sass-lang.com/d/import

  ╷
9 │         $src: append($src, url('#{$font-folder}/KaTeX_#{$family}-#{$family-suffix}.woff2') format('woff2'), comma);
  │               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ╵
    node_modules/katex/src/styles/fonts.scss 9:15   generate-src()
    node_modules/katex/src/styles/fonts.scss 42:11  font-face()
    node_modules/katex/src/styles/fonts.scss 52:1   @import
    node_modules/katex/src/styles/katex.scss 2:9    @import
    src/styles/katex-custom.scss 17:9               root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
344 │         @for $from from 1 through length($sizes) {
    │                                   ^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 344:35  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
345 │             @for $to from 1 through length($sizes) {
    │                                     ^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 345:37  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
348 │                     font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
    │                                      ^^^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 348:38  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
348 │                     font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
    │                                                         ^^^^^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 348:57  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

✓ 5881 modules transformed.
rendering chunks...
computing gzip size...
.svelte-kit/output/client/_app/version.json                             0.03 kB │ gzip:     0.05 kB
.svelte-kit/output/client/.vite/manifest.json                           0.33 kB │ gzip:     0.19 kB
.svelte-kit/output/client/_app/immutable/assets/style.SW4DF8iR.css    499.50 kB │ gzip:   288.93 kB

(!) Some chunks are larger than 3072 kB after minification. Consider:
- Using dynamic import() to code-split the application
- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/configuration-options/#output-manualchunks
- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit.
.svelte-kit/output/client/_app/immutable/bundle.CBB5SKcU.js         4,401.02 kB │ gzip: 1,297.75 kB
✓ built in 13.63s
.svelte-kit/output/server/.vite/manifest.json                                              5.80 kB
.svelte-kit/output/server/_app/immutable/assets/style.LUCY6AWH.css                       499.22 kB
.svelte-kit/output/server/chunks/false.js                                                  0.03 kB
.svelte-kit/output/server/chunks/environment.js                                            0.07 kB
.svelte-kit/output/server/chunks/api-key-validation.js                                     0.17 kB
.svelte-kit/output/server/chunks/server.js                                                 0.20 kB
.svelte-kit/output/server/entries/pages/_page.ts.js                                        0.25 kB
.svelte-kit/output/server/entries/pages/chat/_id_/_page.ts.js                              0.28 kB
.svelte-kit/output/server/internal.js                                                      0.37 kB
.svelte-kit/output/server/chunks/utils.js                                                  0.62 kB
.svelte-kit/output/server/entries/pages/_page.svelte.js                                    1.11 kB
.svelte-kit/output/server/entries/pages/chat/_id_/_page.svelte.js                          1.16 kB
.svelte-kit/output/server/chunks/exports.js                                                1.46 kB
.svelte-kit/output/server/chunks/url.js                                                    1.60 kB
.svelte-kit/output/server/chunks/label.js                                                  2.28 kB
.svelte-kit/output/server/chunks/internal.js                                               2.58 kB
.svelte-kit/output/server/entries/pages/_error.svelte.js                                   8.39 kB
.svelte-kit/output/server/remote-entry.js                                                  8.56 kB
.svelte-kit/output/server/chunks/shared.js                                                11.83 kB
.svelte-kit/output/server/chunks/precision.js                                             22.45 kB
.svelte-kit/output/server/entries/pages/_layout.svelte.js                                 34.39 kB
.svelte-kit/output/server/chunks/root.js                                                  38.85 kB
.svelte-kit/output/server/index.js                                                        55.03 kB
.svelte-kit/output/server/chunks/SyntaxHighlightedCode.svelte_svelte_type_style_lang.js   76.87 kB
.svelte-kit/output/server/chunks/context.svelte.js                                       180.22 kB
.svelte-kit/output/server/chunks/ServerLoadingSplash.js                                  339.43 kB
✓ built in 24.59s

Run npm run preview to preview your production build locally.

> Using @sveltejs/adapter-static
Overwriting ../public/index.html with fallback page. Consider using a different name for the fallback.
  Wrote site to "../public"
  ✔ done
✓ Inlined favicon.svg as base64 data URL
✓ Created index.html.gz
2026-03-22 18:53:17 +03:00
Vitaly Chikunov
4925d4706a Merge signed commit 'b8470' into sisyphus
Extra-Attributes: tools/server/public/index.html.gz merge=ours
Diff-After-Merge: 2 files changed, 6 insertions(+)

# gpg: Signature made Sun Mar 22 13:05:51 2026 MSK
# gpg:                using RSA key B5690EEEBB952194
# gpg: Good signature from "GitHub <noreply@github.com>" [unknown]
2026-03-22 15:46:35 +00:00
ddh0
3306dbaef7
misc : prefer ggml-org models in docs and examples (#20827)
* misc : prefer ggml-org models in docs and examples

Prefer referring to known-good quantizations under ggml-org rather than
3rd-party uploaders.

* remove accidentally committed file
2026-03-21 22:00:26 +01:00
Sigbjørn Skjæret
29b28a9824
ci : switch from pyright to ty (#20826)
* type fixes

* switch to ty

* tweak rules

* tweak more rules

* more tweaks

* final tweak

* use common import-not-found rule
2026-03-21 08:54:34 +01:00
Piotr Wilkin (ilintar)
b1c70e2e54
common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825) 2026-03-21 00:19:04 +01:00
Xuan-Son Nguyen
fb78ad29bb
server: (doc) clarify in-scope and out-scope features (#20794)
* server: (doc) clarify in-scope and out-scope features

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-20 14:03:50 +01:00
Georgi Gerganov
ab9d4c3678
server : improve mtmd ctx checkpoints (#20726)
* server : improve mtmd ctx checkpoints

* server : fix off-by-one in pos_min_thold
2026-03-20 11:13:12 +02:00
Ben Racicot
c1b911654a
server: fix router mode deadlock on child crash and TOCTOU race in models_max (#20763)
Two bugs in `server_models::load()` that affect router mode reliability:

**Bug 1: Deadlock when child process crashes**

When a child process is killed (e.g., SIGKILL from OS code signature
validation), the monitoring thread deadlocks on `stopping_thread.join()`
because the stopping_thread's wait predicate (`is_stopping`) is never
satisfied — the model name was never inserted into `stopping_models`.
`update_status()` is never reached and the model stays stuck in LOADING
state permanently.

Fix: extend the stopping_thread's wait predicate to also wake when the
child process is no longer alive (`!subprocess_alive()`). When woken by
a dead child, the thread skips the shutdown sequence and returns
immediately. The original `stopping_models.erase()` logic is preserved
for normal unloads.

**Bug 2: TOCTOU race bypasses `--models-max` (ref #20137)**

`unload_lru()` is called outside the mutex, then `load()` acquires the
lock afterward. Under concurrent requests, multiple threads observe
capacity and all proceed to load, exceeding the limit.

Fix: re-check capacity under the lock after `unload_lru()` returns.
If another thread filled the slot in the window between `unload_lru()`
and the lock acquisition, reject with an error instead of silently
exceeding the limit.
2026-03-19 22:16:05 +01:00
Tomeamis
b739738dad
docs: Update server README to reflect PR #20297 (#20560) 2026-03-19 21:28:44 +01:00
Ryan Goulden
26c9ce1288
server: Add cached_tokens info to oaicompat responses (#19361)
* tests : fix fetch_server_test_models.py

* server: to_json_oaicompat cached_tokens

Adds OpenAI and Anthropic compatible information about the
number of cached prompt tokens used in a response.
2026-03-19 19:09:33 +01:00
Piotr Wilkin (ilintar)
5e54d51b19
common/parser: add proper reasoning tag prefill reading (#20424)
* Implement proper prefill extraction

* Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp

* Update tools/server/server-task.cpp

* refactor: move grammars to variant, remove grammar_external, handle exception internally

* Make code less C++y

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-19 16:58:21 +01:00
Pascal
4065c1a3a6
Server becomes the source of truth for sampling parameter defaults (#20558)
* webui: make server the source of truth for sampling defaults

* webui: fix Custom badge for sampling parameters

* webui: log user overrides after server sync

* chore: update webui build output

* fix: Default values for sampling settings config object

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-19 13:20:39 +01:00
Xuan-Son Nguyen
1e64534570
mtmd: add clip_graph::build_mm() (#20751)
* clip: add build_mm()

* apply to all models

* add TODO for bias overload
2026-03-19 13:11:39 +01:00
Pascal
cd708db0cc
WebUI: Persist the on/off state of the MCP servers for new conversations (#20750)
* webui: add persistent storage for MCP server on/off state in new chats

* webui: simplify MCP enabled checks, remove dead server.enabled fallback

* chore: update webui build output

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-19 12:54:06 +01:00
Aleksander Grygier
512bba6ee0
webui: Improve model parsing logic + add unit tests (#20749)
* add tests for model id parser

* add test case having activated params

* add structured tests for model id parser

* add ToDo

* feat: Improve model parsing logic + tests

* chore: update webui build output

---------

Co-authored-by: bluemoehre <bluemoehre@gmx.de>
2026-03-19 12:25:50 +01:00
crsawyer
5744d7ec43
Rebuild index.html.gz (#20724) 2026-03-18 18:49:57 +01:00
Julien Chaumond
48e61238e1
webui: improve tooltip wording for attachment requirements (#20688)
* webui: improve tooltip wording for attachment requirements

Co-Authored-By: Claude <Agents+claude@huggingface.co>

* chore: update webui build output

* chore: update webui build output

---------

Co-authored-by: Claude <Agents+claude@huggingface.co>
2026-03-18 14:01:02 +01:00
Aleksander Grygier
7ab321d40d
webui: Fix duplicated messages on q param (#20715)
* fix: Remove duplicate message sending on `?q` param

* chore: update webui build output
2026-03-18 10:32:43 +01:00
Piotr Wilkin (ilintar)
d2ecd2d1cf
common/parser: add --skip-chat-parsing to force a pure content parser. (#20289)
* Add `--force-pure-content` to force a pure content parser.

* Update common/arg.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Change parameter name [no ci]

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 16:16:43 +01:00
Georgi Gerganov
8cc2d81264
server : fix ctx checkpoint invalidation (#20671) 2026-03-17 15:21:14 +02:00
Piotr Wilkin (ilintar)
2e4a6edd4a
tools/server: support refusal content for Responses API (#20285)
* Support refusal content for Responses API

* Update tools/server/server-common.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update tools/server/server-common.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 01:42:04 +01:00
Pascal
dddca026bf
webui: add model information dialog to router mode (#20600)
* webui: add model information dialog to router mode

* webui: add "Available models" section header in model list

* webui: remove nested scrollbar from chat template in model info dialog

* chore: update webui build output

* feat: UI improvements

* refactor: Cleaner rendering + UI docs

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-16 15:38:11 +01:00
Aleksander Grygier
67a2209fab
webui: Add MCP CORS Proxy detection logic & UI (#20167)
* refactor: MCP store cleanup

* feat: Add MCP proxy availability detection

* fix: Sidebar icon

* chore: update webui build output

* chore: Formatting

* chore: update webui build output

* chore: Update package lock

* chore: update webui build output

* chore: update webui build output

* chore: update webui build output
2026-03-16 13:05:36 +01:00
Pascal
d65c4f2dc9
Fix model selector locked to first loaded model with multiple models (#20580)
* webui: fix model selector being locked to first loaded model

When multiple models are loaded, the auto-select effect would re-fire
on every loadedModelIds change, overriding the user's manual model
selection. Guard with selectedModelId so auto-select only kicks in
when no model is chosen yet.

* chore: update webui build output
2026-03-16 12:04:06 +01:00
Woof Dog
d8c331c0af
webui: use date in more human readable exported filename (#19939)
* webui: use date in exported filename

Move conversation naming and export to utils

update index.html.gz

* webui: move literals to message export constants file

* webui: move export naming and download back to the conversation store

* chore: update webui build output

* webui: add comments to some constants

* chore: update webui build output
2026-03-16 11:18:13 +01:00
Piotr Wilkin (ilintar)
9e2e2198b0
tools/cli: fix disable reasoning (#20606) 2026-03-15 22:40:53 +01:00
Georgi Gerganov
88915cb55c
server : fix wait in test_cancel_requests() test (#20601)
* server : fix wait in test_cancel_requests() test

* codeowners : add team for server tests
2026-03-15 20:54:37 +02:00
Xuan-Son Nguyen
94d0262277
mtmd: add llama-mtmd-debug binary (#20508)
* mtmd: add llama-mtmd-debug binary

* adapt

* fixes

* fix compile error

* fix windows compile error

* rm legacy clip_debug_encode()

* add MTMD_API to fix build
2026-03-14 15:52:29 +01:00
Chedrian07
710878a7dd
webui: restore code preview iframe origin isolation (#20477) 2026-03-14 11:28:28 +01:00
Adrien Gallouët
463b6a963c
tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice (#19954)
llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande

    winogrande_score : tokenizing selected tasks
    winogrande_score : calculating winogrande score over selected tasks.
    split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
    decode: failed to find a memory slot for batch of size 46
    failed to decode the batch, n_batch = 2048, ret = 1
    winogrande_score: llama_decode() failed

same for hellaswag:

    split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
    decode: failed to find a memory slot for batch of size 99
    failed to decode the batch, n_batch = 2048, ret = 1
    hellaswag_score: llama_decode() failed

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-13 21:25:57 +01:00
ZeroV0LT
f17b3be63f
llama : fix pooling assertion crash in chunked GDN detection path (#20468)
* llama : fix pooling assertion crash in chunked GDN detection path

The chunked fused Gated Delta Net detection in sched_reserve() calls
graph_reserve(16*n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs.
This creates a dimension mismatch in build_pooling() for embedding models
with mean/rank pooling: build_inp_mean() creates a tensor with shape
[n_tokens=16*n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...]
via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b).

Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation,
matching the pattern used by the pp/tg worst-case reservations.

Regression introduced by #20340 (d28961d).
Same class of bug as #12517, fixed by #12545.

* server : add mean pooling tests to embedding test suite

Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple
to cover the --pooling mean codepath, which was previously untested.

These tests would have caught the regression introduced by #20340 where
build_pooling() crashes with a ggml_mul_mat assertion due to mismatched
dimensions in the chunked GDN detection path.

---------

Co-authored-by: Domenico Crupi <domenico@zerovolt.it>
2026-03-13 20:53:42 +02:00
SoftwareRenderer
d7ba99c485
server: reset counter related to kill-switch on client error (#20513)
* server: reset kill-switch on client error

This avoids triggering a server kill switch.

If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated.

However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates.

* moved counter reset as per recommendation

* cont : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-13 19:58:09 +02:00
Daniel Bevenius
8f974d2392
mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105)
This commit renames the the function `mtmd_get_audio_bitrate` to
`mtmd_get_audio_sample_rate` to better reflect its purpose.

The motivation for this is that the function currently returns the audio
sample rate, not the bitrate (sample_rate × bit_depth × channels), and
that is how it is used in the code as well.

This is a breaking change, but I believe mtmd is still in
experimental/development phase so it might be alright to simply rename.
2026-03-13 12:30:02 +01:00
Piotr Wilkin (ilintar)
0e810413bb
tests : use reasoning instead of reasoning_budget in server tests (#20432) 2026-03-12 13:41:01 +01:00
Pascal
de190154c8
New conversations now auto-select the first loaded model (#20403)
* webui: auto-select first loaded model for new conversations in router mode

* chore: update webui build output
2026-03-12 09:07:05 +01:00
DAN™
fdb17643d3
model : add support for Phi4ForCausalLMV (#20168)
* Add support for Phi4ForCausalLMV.

* Fix Phi-4 vision parity (correcting SigLIP2 patch-kernel export layout) and matching HF NaFlex resize behavior in mtmd.

* Rename contants + fix tokenizer label

* Clean-ups.

* Fix GGUF export.

* Set tokenizer.ggml.pre explicitly.

* Default vocab name rather than forcing it.

* Clean-ups.

* Fix indent.

* Fix subscriptable error.

* remov overcomplicated code path

* Clean-ups.

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-03-12 00:25:54 +01:00
Piotr Wilkin (ilintar)
acb7c79069
common/parser: handle reasoning budget (#20297)
* v1

* Finished!

* Handlie cli

* Reasoning sampler

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Less explosive terminology :)

* Add utf-8 case and tests

* common : migrate reasoning budget sampler to common

* cont : clean up

* cont : expose state and allow passing as initial state

* cont : remove unused imports

* cont : update state machine doc string

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Alde Rojas <hello@alde.dev>
2026-03-11 10:26:12 +01:00
Pascal
00de615345
Fix agentic mcp image single model (#20339)
* webui: fix MCP image attachments dropped during the agentic loop in single-model mode

* chore: update webui build output
2026-03-11 05:31:33 +01:00
Georgi Gerganov
a7b3dee7a5
server : make 2 checkpoints near the end of the prompt (#20288)
* server : make 2 checkpoints near the end of the prompt

* cont : adjust checkpoints
2026-03-10 14:28:23 +02:00
ddh0
1dab5f5a44
llama-quant : fail early on missing imatrix, refactor type selection, code cleanup (#19770)
* quantize : imatrix-fail early + code cleanup

* fix manual override printing

it's in the preliminary loop now, so needs to be on its own line

* revert header changes per ggerganov

* remove old #includes

* clarify naming

rename `tensor_quantization` to `tensor_typo_option` to descirbe its
functionality

* fix per barto
2026-03-10 08:16:05 +02:00
Evan Huus
23fbfcb1ad
server: Parse port numbers from MCP server URLs in CORS proxy (#20208)
* Parse port numbers from MCP server URLs

* Pass scheme to http proxy for determining whether to use SSL

* Fix download on non-standard port and re-add port to logging

* add test

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-03-09 17:47:54 +01:00
Georgi Gerganov
96cfc4992c
server : fix checkpoints n_tokens calculation (#20287) 2026-03-09 16:47:06 +02:00
Georgi Gerganov
344ee2a38a
server : warn swa-full is not supported for non-SWA models (#20291) 2026-03-09 16:44:25 +02:00
Georgi Gerganov
d6e1556499
server : fix off-by-1 in server_tokens::size_up_to_pos() (#20279)
* server : fix off-by-1 in server_tokens::size_up_to_pos()

* cont : fix typo [no ci]
2026-03-09 16:43:38 +02:00
Georgi Gerganov
107d599952
server : add kill switch when server is stuck (#20277) 2026-03-09 10:33:12 +02:00
Aaron Teo
ae87863dc1
llama-bench: introduce -hf and -hff flags & use --mmap 1 by default (#20211) 2026-03-09 09:05:44 +08:00
Georgi Gerganov
d417bc43dd
server : do not create checkpoints right after mtmd chunks (#20232) 2026-03-08 22:16:46 +02:00
Johannes Gäßler
a976ff081b
llama: end-to-end tests (#19802)
* tests: add end-to-end tests per model architecture

* fixup for rebase

* fix use-after-free in llama-model-loader.cpp

* fix CI

* fix WebGPU

* fix CI

* disable CI for macOS-latest-cmake-arm64

* use expert_weights_scale only if != 0.0f

* comments
2026-03-08 12:30:21 +01:00
decahedron1
ff52ee964d
server : correct index on finish in OAI completion streams (#20226) 2026-03-08 10:08:57 +01:00
Piotr Wilkin (ilintar)
566059a26b
Autoparser - complete refactoring of parser architecture (#18675)
* Autoparser - full single commit squish

* Final pre-merge changes: minor fixes, Kimi 2.5 model parser
2026-03-06 21:01:00 +01:00