State before update since 8192-alt1:
tools/server/webui: 3306dbaef 2026-03-21 misc : prefer ggml-org models in docs and examples (#20827) (ddh0)
+ mkdir /usr/src/.npm-global
+ npm config set prefix /usr/src/.npm-global
+ npm install -g @aikidosec/safe-chain
npm warn deprecated glob@10.5.0: Old versions of glob are not supported, and contain widely publicized security vulnerabilities, which have been fixed in the current version. Please update. Support for old versions may be purchased (at exorbitant rates) by contacting i@izs.me
added 138 packages in 6s
24 packages are looking for funding
run `npm fund` for details
+ PATH=/usr/src/.npm-global/bin:/usr/bin:/bin:/usr/local/bin
+ rm -rf llama.cpp/tools/server/public/index.html.gz
+ cd llama.cpp/tools/server/webui
+ workdir=tools/server/webui
+ target=tools/server/public/index.html.gz
+ aikido-npm ci --ignore-scripts
added 661 packages, and audited 662 packages in 40s
260 packages are looking for funding
run `npm fund` for details
15 vulnerabilities (2 low, 4 moderate, 9 high)
To address all issues, run:
npm audit fix
Run `npm audit` for details.
ℹ Safe-chain: Some package versions were suppressed due to minimum age requirement.
To disable this check, use: --safe-chain-skip-minimum-package-age
+ aikido-npm audit --audit-level=critical fix
added 1 package, removed 11 packages, changed 25 packages, and audited 651 packages in 17s
253 packages are looking for funding
run `npm fund` for details
# npm audit report
cookie <0.7.0
cookie accepts cookie name, path, and domain with out of bounds characters - https://github.com/advisories/GHSA-pxg6-pf52-xh8x
fix available via `npm audit fix --force`
Will install @sveltejs/kit@0.0.30, which is a breaking change
node_modules/cookie
@sveltejs/kit >=1.0.0-next.0
Depends on vulnerable versions of cookie
node_modules/@sveltejs/kit
@sveltejs/adapter-static >=1.0.0-next.0
Depends on vulnerable versions of @sveltejs/kit
node_modules/@sveltejs/adapter-static
runed >=0.32.0
Depends on vulnerable versions of @sveltejs/kit
node_modules/bits-ui/node_modules/runed
bits-ui >=2.11.8
Depends on vulnerable versions of runed
Depends on vulnerable versions of svelte-toolbelt
node_modules/bits-ui
svelte-toolbelt >=0.10.6
Depends on vulnerable versions of runed
node_modules/bits-ui/node_modules/svelte-toolbelt
6 low severity vulnerabilities
To address issues that do not require attention, run:
npm audit fix
To address all issues (including breaking changes), run:
npm audit fix --force
ℹ Safe-chain: Some package versions were suppressed due to minimum age requirement.
To disable this check, use: --safe-chain-skip-minimum-package-age
+ npm run build
> webui@1.0.0 build
> vite build && ./scripts/post-build.sh
▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]
tsconfig.json:2:12:
2 │ "extends": "./.svelte-kit/tsconfig.json",
╵ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vite v7.2.2 building ssr environment for production...
transforming...
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.
More info and automated migrator: https://sass-lang.com/d/import
╷
17 │ @import 'katex/src/styles/katex.scss';
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
╵
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.
More info and automated migrator: https://sass-lang.com/d/import
╷
2 │ @import "./fonts.scss";
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 2:9 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.append instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
9 │ $src: append($src, url('#{$font-folder}/KaTeX_#{$family}-#{$family-suffix}.woff2') format('woff2'), comma);
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/fonts.scss 9:15 generate-src()
node_modules/katex/src/styles/fonts.scss 42:11 font-face()
node_modules/katex/src/styles/fonts.scss 52:1 @import
node_modules/katex/src/styles/katex.scss 2:9 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
344 │ @for $from from 1 through length($sizes) {
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 344:35 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
345 │ @for $to from 1 through length($sizes) {
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 345:37 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
348 │ font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
│ ^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 348:38 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
348 │ font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
│ ^^^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 348:57 @import
src/styles/katex-custom.scss 17:9 root stylesheet
✓ 4749 modules transformed.
Export "getJsonHeaders" of module "src/lib/utils/api-headers.ts" was reexported through module "src/lib/utils/index.ts" while both modules are dependencies of each other and will end up in different chunks by current Rollup settings. This scenario is not well supported at the moment as it will produce a circular dependency between chunks and will likely lead to broken execution order.
Either change the import in "src/lib/services/chat.service.ts" to point directly to the exporting module or reconfigure "output.manualChunks" to ensure these modules end up in the same chunk.
rendering chunks...
vite v7.2.2 building client environment for production...
transforming...
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.
More info and automated migrator: https://sass-lang.com/d/import
╷
17 │ @import 'katex/src/styles/katex.scss';
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
╵
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.
More info and automated migrator: https://sass-lang.com/d/import
╷
2 │ @import "./fonts.scss";
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 2:9 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.append instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
9 │ $src: append($src, url('#{$font-folder}/KaTeX_#{$family}-#{$family-suffix}.woff2') format('woff2'), comma);
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/fonts.scss 9:15 generate-src()
node_modules/katex/src/styles/fonts.scss 42:11 font-face()
node_modules/katex/src/styles/fonts.scss 52:1 @import
node_modules/katex/src/styles/katex.scss 2:9 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
344 │ @for $from from 1 through length($sizes) {
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 344:35 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
345 │ @for $to from 1 through length($sizes) {
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 345:37 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
348 │ font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
│ ^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 348:38 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
348 │ font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
│ ^^^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 348:57 @import
src/styles/katex-custom.scss 17:9 root stylesheet
✓ 5881 modules transformed.
rendering chunks...
computing gzip size...
.svelte-kit/output/client/_app/version.json 0.03 kB │ gzip: 0.05 kB
.svelte-kit/output/client/.vite/manifest.json 0.33 kB │ gzip: 0.19 kB
.svelte-kit/output/client/_app/immutable/assets/style.SW4DF8iR.css 499.50 kB │ gzip: 288.93 kB
(!) Some chunks are larger than 3072 kB after minification. Consider:
- Using dynamic import() to code-split the application
- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/configuration-options/#output-manualchunks
- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit.
.svelte-kit/output/client/_app/immutable/bundle.CBB5SKcU.js 4,401.02 kB │ gzip: 1,297.75 kB
✓ built in 13.63s
.svelte-kit/output/server/.vite/manifest.json 5.80 kB
.svelte-kit/output/server/_app/immutable/assets/style.LUCY6AWH.css 499.22 kB
.svelte-kit/output/server/chunks/false.js 0.03 kB
.svelte-kit/output/server/chunks/environment.js 0.07 kB
.svelte-kit/output/server/chunks/api-key-validation.js 0.17 kB
.svelte-kit/output/server/chunks/server.js 0.20 kB
.svelte-kit/output/server/entries/pages/_page.ts.js 0.25 kB
.svelte-kit/output/server/entries/pages/chat/_id_/_page.ts.js 0.28 kB
.svelte-kit/output/server/internal.js 0.37 kB
.svelte-kit/output/server/chunks/utils.js 0.62 kB
.svelte-kit/output/server/entries/pages/_page.svelte.js 1.11 kB
.svelte-kit/output/server/entries/pages/chat/_id_/_page.svelte.js 1.16 kB
.svelte-kit/output/server/chunks/exports.js 1.46 kB
.svelte-kit/output/server/chunks/url.js 1.60 kB
.svelte-kit/output/server/chunks/label.js 2.28 kB
.svelte-kit/output/server/chunks/internal.js 2.58 kB
.svelte-kit/output/server/entries/pages/_error.svelte.js 8.39 kB
.svelte-kit/output/server/remote-entry.js 8.56 kB
.svelte-kit/output/server/chunks/shared.js 11.83 kB
.svelte-kit/output/server/chunks/precision.js 22.45 kB
.svelte-kit/output/server/entries/pages/_layout.svelte.js 34.39 kB
.svelte-kit/output/server/chunks/root.js 38.85 kB
.svelte-kit/output/server/index.js 55.03 kB
.svelte-kit/output/server/chunks/SyntaxHighlightedCode.svelte_svelte_type_style_lang.js 76.87 kB
.svelte-kit/output/server/chunks/context.svelte.js 180.22 kB
.svelte-kit/output/server/chunks/ServerLoadingSplash.js 339.43 kB
✓ built in 24.59s
Run npm run preview to preview your production build locally.
> Using @sveltejs/adapter-static
Overwriting ../public/index.html with fallback page. Consider using a different name for the fallback.
Wrote site to "../public"
✔ done
✓ Inlined favicon.svg as base64 data URL
✓ Created index.html.gz
* misc : prefer ggml-org models in docs and examples
Prefer referring to known-good quantizations under ggml-org rather than
3rd-party uploaders.
* remove accidentally committed file
* server: (doc) clarify in-scope and out-scope features
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Two bugs in `server_models::load()` that affect router mode reliability:
**Bug 1: Deadlock when child process crashes**
When a child process is killed (e.g., SIGKILL from OS code signature
validation), the monitoring thread deadlocks on `stopping_thread.join()`
because the stopping_thread's wait predicate (`is_stopping`) is never
satisfied — the model name was never inserted into `stopping_models`.
`update_status()` is never reached and the model stays stuck in LOADING
state permanently.
Fix: extend the stopping_thread's wait predicate to also wake when the
child process is no longer alive (`!subprocess_alive()`). When woken by
a dead child, the thread skips the shutdown sequence and returns
immediately. The original `stopping_models.erase()` logic is preserved
for normal unloads.
**Bug 2: TOCTOU race bypasses `--models-max` (ref #20137)**
`unload_lru()` is called outside the mutex, then `load()` acquires the
lock afterward. Under concurrent requests, multiple threads observe
capacity and all proceed to load, exceeding the limit.
Fix: re-check capacity under the lock after `unload_lru()` returns.
If another thread filled the slot in the window between `unload_lru()`
and the lock acquisition, reject with an error instead of silently
exceeding the limit.
* tests : fix fetch_server_test_models.py
* server: to_json_oaicompat cached_tokens
Adds OpenAI and Anthropic compatible information about the
number of cached prompt tokens used in a response.
* webui: make server the source of truth for sampling defaults
* webui: fix Custom badge for sampling parameters
* webui: log user overrides after server sync
* chore: update webui build output
* fix: Default values for sampling settings config object
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* add tests for model id parser
* add test case having activated params
* add structured tests for model id parser
* add ToDo
* feat: Improve model parsing logic + tests
* chore: update webui build output
---------
Co-authored-by: bluemoehre <bluemoehre@gmx.de>
* webui: fix model selector being locked to first loaded model
When multiple models are loaded, the auto-select effect would re-fire
on every loadedModelIds change, overriding the user's manual model
selection. Guard with selectedModelId so auto-select only kicks in
when no model is chosen yet.
* chore: update webui build output
* webui: use date in exported filename
Move conversation naming and export to utils
update index.html.gz
* webui: move literals to message export constants file
* webui: move export naming and download back to the conversation store
* chore: update webui build output
* webui: add comments to some constants
* chore: update webui build output
llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande
winogrande_score : tokenizing selected tasks
winogrande_score : calculating winogrande score over selected tasks.
split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
decode: failed to find a memory slot for batch of size 46
failed to decode the batch, n_batch = 2048, ret = 1
winogrande_score: llama_decode() failed
same for hellaswag:
split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
decode: failed to find a memory slot for batch of size 99
failed to decode the batch, n_batch = 2048, ret = 1
hellaswag_score: llama_decode() failed
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* llama : fix pooling assertion crash in chunked GDN detection path
The chunked fused Gated Delta Net detection in sched_reserve() calls
graph_reserve(16*n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs.
This creates a dimension mismatch in build_pooling() for embedding models
with mean/rank pooling: build_inp_mean() creates a tensor with shape
[n_tokens=16*n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...]
via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b).
Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation,
matching the pattern used by the pp/tg worst-case reservations.
Regression introduced by #20340 (d28961d).
Same class of bug as #12517, fixed by #12545.
* server : add mean pooling tests to embedding test suite
Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple
to cover the --pooling mean codepath, which was previously untested.
These tests would have caught the regression introduced by #20340 where
build_pooling() crashes with a ggml_mul_mat assertion due to mismatched
dimensions in the chunked GDN detection path.
---------
Co-authored-by: Domenico Crupi <domenico@zerovolt.it>
* server: reset kill-switch on client error
This avoids triggering a server kill switch.
If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated.
However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates.
* moved counter reset as per recommendation
* cont : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit renames the the function `mtmd_get_audio_bitrate` to
`mtmd_get_audio_sample_rate` to better reflect its purpose.
The motivation for this is that the function currently returns the audio
sample rate, not the bitrate (sample_rate × bit_depth × channels), and
that is how it is used in the code as well.
This is a breaking change, but I believe mtmd is still in
experimental/development phase so it might be alright to simply rename.
* quantize : imatrix-fail early + code cleanup
* fix manual override printing
it's in the preliminary loop now, so needs to be on its own line
* revert header changes per ggerganov
* remove old #includes
* clarify naming
rename `tensor_quantization` to `tensor_typo_option` to descirbe its
functionality
* fix per barto
* Parse port numbers from MCP server URLs
* Pass scheme to http proxy for determining whether to use SSL
* Fix download on non-standard port and re-add port to logging
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* tests: add end-to-end tests per model architecture
* fixup for rebase
* fix use-after-free in llama-model-loader.cpp
* fix CI
* fix WebGPU
* fix CI
* disable CI for macOS-latest-cmake-arm64
* use expert_weights_scale only if != 0.0f
* comments