State before update since 8470-alt1:
tools/server/webui: 0fcb3760b 2026-03-31 fix: Use lower-case proxy headers naming (#21235) (Aleksander Grygier)
tools/server/public: 0fcb3760b 2026-03-31 fix: Use lower-case proxy headers naming (#21235) (Aleksander Grygier)
+ npm config set prefix /usr/src/.npm-global
+ npm install -g @aikidosec/safe-chain
npm warn deprecated glob@10.5.0: Old versions of glob are not supported, and contain widely publicized security vulnerabilities, which have been fixed in the current version. Please update. Support for old versions may be purchased (at exorbitant rates) by contacting i@izs.me
added 138 packages in 8s
24 packages are looking for funding
run `npm fund` for details
+ PATH=/usr/src/.npm-global/bin:/usr/bin:/bin:/usr/local/bin
+ rm -rf llama.cpp/tools/server/public
+ cd llama.cpp/tools/server/webui
+ workdir=tools/server/webui
+ target=tools/server/public
+ aikido-npm ci --ignore-scripts
added 660 packages, and audited 661 packages in 20s
260 packages are looking for funding
run `npm fund` for details
20 vulnerabilities (2 low, 6 moderate, 12 high)
To address all issues, run:
npm audit fix
Run `npm audit` for details.
+ aikido-npm audit --audit-level=critical fix
added 3 packages, removed 11 packages, changed 34 packages, and audited 652 packages in 13s
254 packages are looking for funding
run `npm fund` for details
# npm audit report
cookie <0.7.0
cookie accepts cookie name, path, and domain with out of bounds characters - https://github.com/advisories/GHSA-pxg6-pf52-xh8x
fix available via `npm audit fix --force`
Will install @sveltejs/kit@0.0.30, which is a breaking change
node_modules/cookie
@sveltejs/kit >=1.0.0-next.0
Depends on vulnerable versions of cookie
node_modules/@sveltejs/kit
@sveltejs/adapter-static >=1.0.0-next.0
Depends on vulnerable versions of @sveltejs/kit
node_modules/@sveltejs/adapter-static
runed >=0.32.0
Depends on vulnerable versions of @sveltejs/kit
node_modules/bits-ui/node_modules/runed
bits-ui >=2.11.8
Depends on vulnerable versions of runed
Depends on vulnerable versions of svelte-toolbelt
node_modules/bits-ui
svelte-toolbelt >=0.10.6
Depends on vulnerable versions of runed
node_modules/bits-ui/node_modules/svelte-toolbelt
6 low severity vulnerabilities
To address issues that do not require attention, run:
npm audit fix
To address all issues (including breaking changes), run:
npm audit fix --force
+ npm run build
> webui@1.0.0 build
> vite build && ./scripts/post-build.sh
▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]
tsconfig.json:2:12:
2 │ "extends": "./.svelte-kit/tsconfig.json",
╵ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vite v7.2.2 building ssr environment for production...
transforming...
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.
More info and automated migrator: https://sass-lang.com/d/import
╷
17 │ @import 'katex/src/styles/katex.scss';
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
╵
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.
More info and automated migrator: https://sass-lang.com/d/import
╷
2 │ @import "./fonts.scss";
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 2:9 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.append instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
9 │ $src: append($src, url('#{$font-folder}/KaTeX_#{$family}-#{$family-suffix}.woff2') format('woff2'), comma);
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/fonts.scss 9:15 generate-src()
node_modules/katex/src/styles/fonts.scss 42:11 font-face()
node_modules/katex/src/styles/fonts.scss 52:1 @import
node_modules/katex/src/styles/katex.scss 2:9 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
344 │ @for $from from 1 through length($sizes) {
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 344:35 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
345 │ @for $to from 1 through length($sizes) {
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 345:37 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
348 │ font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
│ ^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 348:38 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
348 │ font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
│ ^^^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 348:57 @import
src/styles/katex-custom.scss 17:9 root stylesheet
✓ 4753 modules transformed.
Export "getAuthHeaders" of module "src/lib/utils/api-headers.ts" was reexported through module "src/lib/utils/index.ts" while both modules are dependencies of each other and will end up in different chunks by current Rollup settings. This scenario is not well supported at the moment as it will produce a circular dependency between chunks and will likely lead to broken execution order.
Either change the import in "src/lib/services/mcp.service.ts" to point directly to the exporting module or reconfigure "output.manualChunks" to ensure these modules end up in the same chunk.
rendering chunks...
vite v7.2.2 building client environment for production...
transforming...
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.
More info and automated migrator: https://sass-lang.com/d/import
╷
17 │ @import 'katex/src/styles/katex.scss';
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
╵
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.
More info and automated migrator: https://sass-lang.com/d/import
╷
2 │ @import "./fonts.scss";
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 2:9 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.append instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
9 │ $src: append($src, url('#{$font-folder}/KaTeX_#{$family}-#{$family-suffix}.woff2') format('woff2'), comma);
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/fonts.scss 9:15 generate-src()
node_modules/katex/src/styles/fonts.scss 42:11 font-face()
node_modules/katex/src/styles/fonts.scss 52:1 @import
node_modules/katex/src/styles/katex.scss 2:9 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
344 │ @for $from from 1 through length($sizes) {
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 344:35 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
345 │ @for $to from 1 through length($sizes) {
│ ^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 345:37 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
348 │ font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
│ ^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 348:38 @import
src/styles/katex-custom.scss 17:9 root stylesheet
DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.
More info and automated migrator: https://sass-lang.com/d/import
╷
348 │ font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
│ ^^^^^^^^^^^^^^^^^^
╵
node_modules/katex/src/styles/katex.scss 348:57 @import
src/styles/katex-custom.scss 17:9 root stylesheet
✓ 5885 modules transformed.
rendering chunks...
computing gzip size...
.svelte-kit/output/client/_app/version.json 0.03 kB │ gzip: 0.05 kB
.svelte-kit/output/client/.vite/manifest.json 0.30 kB │ gzip: 0.19 kB
.svelte-kit/output/client/_app/immutable/assets/bundle.sRqjEHG4.css 500.20 kB │ gzip: 288.96 kB
.svelte-kit/output/client/_app/immutable/bundle.CVqNSTXQ.js 4,415.15 kB │ gzip: 1,301.21 kB
(!) Some chunks are larger than 3072 kB after minification. Consider:
- Using dynamic import() to code-split the application
- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/configuration-options/#output-manualchunks
- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit.
✓ built in 12.67s
.svelte-kit/output/server/.vite/manifest.json 6.07 kB
.svelte-kit/output/server/_app/immutable/assets/McpLogo.bHHIbcsu.css 0.35 kB
.svelte-kit/output/server/_app/immutable/assets/_layout.BB5q6Ssh.css 119.30 kB
.svelte-kit/output/server/_app/immutable/assets/SyntaxHighlightedCode.CPlW7hdh.css 380.27 kB
.svelte-kit/output/server/chunks/false.js 0.03 kB
.svelte-kit/output/server/chunks/environment.js 0.07 kB
.svelte-kit/output/server/chunks/api-key-validation.js 0.16 kB
.svelte-kit/output/server/chunks/server.js 0.20 kB
.svelte-kit/output/server/entries/pages/_page.ts.js 0.25 kB
.svelte-kit/output/server/entries/pages/chat/_id_/_page.ts.js 0.27 kB
.svelte-kit/output/server/internal.js 0.37 kB
.svelte-kit/output/server/chunks/refresh-cw.js 0.44 kB
.svelte-kit/output/server/chunks/utils.js 0.62 kB
.svelte-kit/output/server/entries/pages/_page.svelte.js 1.10 kB
.svelte-kit/output/server/entries/pages/chat/_id_/_page.svelte.js 1.15 kB
.svelte-kit/output/server/chunks/exports.js 1.46 kB
.svelte-kit/output/server/chunks/url.js 1.60 kB
.svelte-kit/output/server/chunks/internal.js 2.58 kB
.svelte-kit/output/server/entries/pages/_error.svelte.js 8.39 kB
.svelte-kit/output/server/remote-entry.js 8.56 kB
.svelte-kit/output/server/chunks/shared.js 11.83 kB
.svelte-kit/output/server/chunks/uuid.js 30.40 kB
.svelte-kit/output/server/chunks/root.js 39.19 kB
.svelte-kit/output/server/index.js 55.03 kB
.svelte-kit/output/server/chunks/SyntaxHighlightedCode.svelte_svelte_type_style_lang.js 73.87 kB
.svelte-kit/output/server/entries/pages/_layout.svelte.js 105.11 kB
.svelte-kit/output/server/chunks/McpLogo.js 205.53 kB
.svelte-kit/output/server/chunks/ServerLoadingSplash.js 249.23 kB
✓ built in 22.56s
Run npm run preview to preview your production build locally.
> Using @sveltejs/adapter-static
Overwriting ../public/index.html with fallback page. Consider using a different name for the fallback.
Wrote site to "../public"
✔ done
✓ Inlined favicon.svg as base64 data URL
✓ Updated index.html
✓ Copied bundle.CVqNSTXQ.js -> bundle.js
✓ Copied bundle.sRqjEHG4.css -> bundle.css
Check the return value of sink.write() in the chunked content provider
and return false when the write fails, matching cpp-httplib's own
streaming contract. This prevents logging chunks as sent when the sink
rejected them and properly aborts the stream on connection failure.
This PR changes the logging that occurs at startup of llama-server.
Currently, it is redundant (including CPU information twice) and it is
missing the build + commit info.
* fix: Bypass API Key validation for static bundle assets
* refactor: All bypassed routes in `public_endpoints`
* test: Update static assets API Key test
The build info is now only for debug, so we avoid the duplicate
with `--version`.
The UTF-8 setup at the beginning is needed to avoid logging
garbage on Windows.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* fix: include API key in CORS proxy requests for MCP connections
When llama-server is started with --api-key-file and --webui-mcp-proxy,
the /cors-proxy endpoint requires authentication. The WebUI was not
including the Authorization header in proxy requests, causing MCP
connections to fail with 401.
Inject getAuthHeaders() into requestInit when useProxy is true so the
proxy request carries the Bearer token alongside the forwarded target
headers.
Fixes#21167
* fix: simplify headers assignment based on reviewer suggestion
Apply buildProxiedHeaders only when useProxy is true, pass headers
directly to the transport otherwise.
* introduce LLAMA_SERVER_NO_WEBUI
* LLAMA_SERVER_NO_WEBUI → LLAMA_BUILD_WEBUI
* LLAMA_BUILD_WEBUI ON by default not based on LLAMA_STANDALONE
* MIssed this
* Add useWebUi to package.nix
* server: respect the verbose_prompt parameter
* Revert "server: respect the verbose_prompt parameter"
This reverts commit 8ed885cf375b2c8ba641c661f3667df70b9797f4.
* Remove --verbose-prompt parameter from llama-server
* Using set_examples instead of set_excludes
* webui: send reasoning_content back to model in context
Preserve assistant reasoning across turns by extracting it from
internal tags and sending it as a separate reasoning_content field
in the API payload. The server and Jinja templates handle native
formatting (e.g. <think> tags for Qwen, GLM, DeepSeek...).
Adds "Exclude reasoning from context" toggle in Settings > Developer
(off by default, so reasoning is preserved). Includes unit tests.
* webui: add syncable parameter for excludeReasoningFromContext
* chore: update webui build output
* common : add standard Hugging Face cache support
- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check with the quant tag
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Improve error handling and report API errors
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Restore common_cached_model_info and align mmproj filtering
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Prefer main when getting cached ref
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use cached files when HF API fails
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use final_path..
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check all inputs
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* misc : prefer ggml-org models in docs and examples
Prefer referring to known-good quantizations under ggml-org rather than
3rd-party uploaders.
* remove accidentally committed file
* server: (doc) clarify in-scope and out-scope features
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Two bugs in `server_models::load()` that affect router mode reliability:
**Bug 1: Deadlock when child process crashes**
When a child process is killed (e.g., SIGKILL from OS code signature
validation), the monitoring thread deadlocks on `stopping_thread.join()`
because the stopping_thread's wait predicate (`is_stopping`) is never
satisfied — the model name was never inserted into `stopping_models`.
`update_status()` is never reached and the model stays stuck in LOADING
state permanently.
Fix: extend the stopping_thread's wait predicate to also wake when the
child process is no longer alive (`!subprocess_alive()`). When woken by
a dead child, the thread skips the shutdown sequence and returns
immediately. The original `stopping_models.erase()` logic is preserved
for normal unloads.
**Bug 2: TOCTOU race bypasses `--models-max` (ref #20137)**
`unload_lru()` is called outside the mutex, then `load()` acquires the
lock afterward. Under concurrent requests, multiple threads observe
capacity and all proceed to load, exceeding the limit.
Fix: re-check capacity under the lock after `unload_lru()` returns.
If another thread filled the slot in the window between `unload_lru()`
and the lock acquisition, reject with an error instead of silently
exceeding the limit.
* tests : fix fetch_server_test_models.py
* server: to_json_oaicompat cached_tokens
Adds OpenAI and Anthropic compatible information about the
number of cached prompt tokens used in a response.
* webui: make server the source of truth for sampling defaults
* webui: fix Custom badge for sampling parameters
* webui: log user overrides after server sync
* chore: update webui build output
* fix: Default values for sampling settings config object
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>