Commit graph

9990 commits

Author SHA1 Message Date
d03abe4285 3 2026-04-09 02:00:36 +03:00
1af95aba36 2 2026-04-09 01:59:40 +03:00
85753dae2c 1 2026-04-09 01:58:51 +03:00
2ad996bfb8 похуй 2026-04-08 23:03:52 +03:00
0591e57dfd похуй 2026-04-08 23:02:56 +03:00
Vitaly Chikunov
01f8650dd9 1:8681-alt1
- Update to b8681 (2026-04-06).
2026-04-06 21:23:51 +00:00
Vitaly Chikunov
c9974af462 ALT: Generate tools/server/public
State before update since 8470-alt1:
  tools/server/webui: 0fcb3760b 2026-03-31 fix: Use lower-case proxy headers naming (#21235) (Aleksander Grygier)
  tools/server/public: 0fcb3760b 2026-03-31 fix: Use lower-case proxy headers naming (#21235) (Aleksander Grygier)

+ npm config set prefix /usr/src/.npm-global
+ npm install -g @aikidosec/safe-chain
npm warn deprecated glob@10.5.0: Old versions of glob are not supported, and contain widely publicized security vulnerabilities, which have been fixed in the current version. Please update. Support for old versions may be purchased (at exorbitant rates) by contacting i@izs.me

added 138 packages in 8s

24 packages are looking for funding
  run `npm fund` for details
+ PATH=/usr/src/.npm-global/bin:/usr/bin:/bin:/usr/local/bin
+ rm -rf llama.cpp/tools/server/public
+ cd llama.cpp/tools/server/webui
+ workdir=tools/server/webui
+ target=tools/server/public
+ aikido-npm ci --ignore-scripts

added 660 packages, and audited 661 packages in 20s

260 packages are looking for funding
  run `npm fund` for details

20 vulnerabilities (2 low, 6 moderate, 12 high)

To address all issues, run:
  npm audit fix

Run `npm audit` for details.
+ aikido-npm audit --audit-level=critical fix

added 3 packages, removed 11 packages, changed 34 packages, and audited 652 packages in 13s

254 packages are looking for funding
  run `npm fund` for details

# npm audit report

cookie  <0.7.0
cookie accepts cookie name, path, and domain with out of bounds characters - https://github.com/advisories/GHSA-pxg6-pf52-xh8x
fix available via `npm audit fix --force`
Will install @sveltejs/kit@0.0.30, which is a breaking change
node_modules/cookie
  @sveltejs/kit  >=1.0.0-next.0
  Depends on vulnerable versions of cookie
  node_modules/@sveltejs/kit
    @sveltejs/adapter-static  >=1.0.0-next.0
    Depends on vulnerable versions of @sveltejs/kit
    node_modules/@sveltejs/adapter-static
    runed  >=0.32.0
    Depends on vulnerable versions of @sveltejs/kit
    node_modules/bits-ui/node_modules/runed
      bits-ui  >=2.11.8
      Depends on vulnerable versions of runed
      Depends on vulnerable versions of svelte-toolbelt
      node_modules/bits-ui
      svelte-toolbelt  >=0.10.6
      Depends on vulnerable versions of runed
      node_modules/bits-ui/node_modules/svelte-toolbelt

6 low severity vulnerabilities

To address issues that do not require attention, run:
  npm audit fix

To address all issues (including breaking changes), run:
  npm audit fix --force
+ npm run build

> webui@1.0.0 build
> vite build && ./scripts/post-build.sh

▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]

    tsconfig.json:2:12:
      2 │   "extends": "./.svelte-kit/tsconfig.json",
        ╵              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vite v7.2.2 building ssr environment for production...
transforming...
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.

More info and automated migrator: https://sass-lang.com/d/import

   ╷
17 │ @import 'katex/src/styles/katex.scss';
   │         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   ╵
    src/styles/katex-custom.scss 17:9  root stylesheet

DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.

More info and automated migrator: https://sass-lang.com/d/import

  ╷
2 │ @import "./fonts.scss";
  │         ^^^^^^^^^^^^^^
  ╵
    node_modules/katex/src/styles/katex.scss 2:9  @import
    src/styles/katex-custom.scss 17:9             root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.append instead.

More info and automated migrator: https://sass-lang.com/d/import

  ╷
9 │         $src: append($src, url('#{$font-folder}/KaTeX_#{$family}-#{$family-suffix}.woff2') format('woff2'), comma);
  │               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ╵
    node_modules/katex/src/styles/fonts.scss 9:15   generate-src()
    node_modules/katex/src/styles/fonts.scss 42:11  font-face()
    node_modules/katex/src/styles/fonts.scss 52:1   @import
    node_modules/katex/src/styles/katex.scss 2:9    @import
    src/styles/katex-custom.scss 17:9               root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
344 │         @for $from from 1 through length($sizes) {
    │                                   ^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 344:35  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
345 │             @for $to from 1 through length($sizes) {
    │                                     ^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 345:37  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
348 │                     font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
    │                                      ^^^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 348:38  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
348 │                     font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
    │                                                         ^^^^^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 348:57  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

✓ 4753 modules transformed.
Export "getAuthHeaders" of module "src/lib/utils/api-headers.ts" was reexported through module "src/lib/utils/index.ts" while both modules are dependencies of each other and will end up in different chunks by current Rollup settings. This scenario is not well supported at the moment as it will produce a circular dependency between chunks and will likely lead to broken execution order.
Either change the import in "src/lib/services/mcp.service.ts" to point directly to the exporting module or reconfigure "output.manualChunks" to ensure these modules end up in the same chunk.
rendering chunks...
vite v7.2.2 building client environment for production...
transforming...
DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.

More info and automated migrator: https://sass-lang.com/d/import

   ╷
17 │ @import 'katex/src/styles/katex.scss';
   │         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   ╵
    src/styles/katex-custom.scss 17:9  root stylesheet

DEPRECATION WARNING [import]: Sass @import rules are deprecated and will be removed in Dart Sass 3.0.0.

More info and automated migrator: https://sass-lang.com/d/import

  ╷
2 │ @import "./fonts.scss";
  │         ^^^^^^^^^^^^^^
  ╵
    node_modules/katex/src/styles/katex.scss 2:9  @import
    src/styles/katex-custom.scss 17:9             root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.append instead.

More info and automated migrator: https://sass-lang.com/d/import

  ╷
9 │         $src: append($src, url('#{$font-folder}/KaTeX_#{$family}-#{$family-suffix}.woff2') format('woff2'), comma);
  │               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ╵
    node_modules/katex/src/styles/fonts.scss 9:15   generate-src()
    node_modules/katex/src/styles/fonts.scss 42:11  font-face()
    node_modules/katex/src/styles/fonts.scss 52:1   @import
    node_modules/katex/src/styles/katex.scss 2:9    @import
    src/styles/katex-custom.scss 17:9               root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
344 │         @for $from from 1 through length($sizes) {
    │                                   ^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 344:35  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.length instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
345 │             @for $to from 1 through length($sizes) {
    │                                     ^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 345:37  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
348 │                     font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
    │                                      ^^^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 348:38  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

DEPRECATION WARNING [global-builtin]: Global built-in functions are deprecated and will be removed in Dart Sass 3.0.0.
Use list.nth instead.

More info and automated migrator: https://sass-lang.com/d/import

    ╷
348 │                     font-size: calc((nth($sizes, $to) / nth($sizes, $from)) * 1em);
    │                                                         ^^^^^^^^^^^^^^^^^^
    ╵
    node_modules/katex/src/styles/katex.scss 348:57  @import
    src/styles/katex-custom.scss 17:9                root stylesheet

✓ 5885 modules transformed.
rendering chunks...
computing gzip size...
.svelte-kit/output/client/_app/version.json                              0.03 kB │ gzip:     0.05 kB
.svelte-kit/output/client/.vite/manifest.json                            0.30 kB │ gzip:     0.19 kB
.svelte-kit/output/client/_app/immutable/assets/bundle.sRqjEHG4.css    500.20 kB │ gzip:   288.96 kB
.svelte-kit/output/client/_app/immutable/bundle.CVqNSTXQ.js          4,415.15 kB │ gzip: 1,301.21 kB

(!) Some chunks are larger than 3072 kB after minification. Consider:
- Using dynamic import() to code-split the application
- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/configuration-options/#output-manualchunks
- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit.
✓ built in 12.67s
.svelte-kit/output/server/.vite/manifest.json                                              6.07 kB
.svelte-kit/output/server/_app/immutable/assets/McpLogo.bHHIbcsu.css                       0.35 kB
.svelte-kit/output/server/_app/immutable/assets/_layout.BB5q6Ssh.css                     119.30 kB
.svelte-kit/output/server/_app/immutable/assets/SyntaxHighlightedCode.CPlW7hdh.css       380.27 kB
.svelte-kit/output/server/chunks/false.js                                                  0.03 kB
.svelte-kit/output/server/chunks/environment.js                                            0.07 kB
.svelte-kit/output/server/chunks/api-key-validation.js                                     0.16 kB
.svelte-kit/output/server/chunks/server.js                                                 0.20 kB
.svelte-kit/output/server/entries/pages/_page.ts.js                                        0.25 kB
.svelte-kit/output/server/entries/pages/chat/_id_/_page.ts.js                              0.27 kB
.svelte-kit/output/server/internal.js                                                      0.37 kB
.svelte-kit/output/server/chunks/refresh-cw.js                                             0.44 kB
.svelte-kit/output/server/chunks/utils.js                                                  0.62 kB
.svelte-kit/output/server/entries/pages/_page.svelte.js                                    1.10 kB
.svelte-kit/output/server/entries/pages/chat/_id_/_page.svelte.js                          1.15 kB
.svelte-kit/output/server/chunks/exports.js                                                1.46 kB
.svelte-kit/output/server/chunks/url.js                                                    1.60 kB
.svelte-kit/output/server/chunks/internal.js                                               2.58 kB
.svelte-kit/output/server/entries/pages/_error.svelte.js                                   8.39 kB
.svelte-kit/output/server/remote-entry.js                                                  8.56 kB
.svelte-kit/output/server/chunks/shared.js                                                11.83 kB
.svelte-kit/output/server/chunks/uuid.js                                                  30.40 kB
.svelte-kit/output/server/chunks/root.js                                                  39.19 kB
.svelte-kit/output/server/index.js                                                        55.03 kB
.svelte-kit/output/server/chunks/SyntaxHighlightedCode.svelte_svelte_type_style_lang.js   73.87 kB
.svelte-kit/output/server/entries/pages/_layout.svelte.js                                105.11 kB
.svelte-kit/output/server/chunks/McpLogo.js                                              205.53 kB
.svelte-kit/output/server/chunks/ServerLoadingSplash.js                                  249.23 kB
✓ built in 22.56s

Run npm run preview to preview your production build locally.

> Using @sveltejs/adapter-static
Overwriting ../public/index.html with fallback page. Consider using a different name for the fallback.
  Wrote site to "../public"
  ✔ done
✓ Inlined favicon.svg as base64 data URL
✓ Updated index.html
✓ Copied bundle.CVqNSTXQ.js -> bundle.js
✓ Copied bundle.sRqjEHG4.css -> bundle.css
2026-04-06 21:23:50 +00:00
Vitaly Chikunov
7c28f3abf0 gear: Change WebUI npm target
Since 4a00bbfed ("server: (webui) no more gzip compression (#21073)").

Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
2026-04-06 21:23:50 +00:00
Vitaly Chikunov
0122d1e6aa Merge signed commit 'b8681' into sisyphus
Diff-After-Merge: 1 file changed, 6 insertions(+)

# gpg: Signature made Mon Apr  6 21:54:06 2026 MSK
# gpg:                using RSA key B5690EEEBB952194
# gpg: Good signature from "GitHub <noreply@github.com>" [unknown]

# Conflicts:
#	tools/server/public/index.html.gz
2026-04-06 21:23:50 +00:00
Bipin Yadav
506200cf8b
cli: fix stripping of \n in multiline input (#21485)
* llama-cli: fix stripping of \n in multiline input

* Change & string to string_view

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Fix EditorConfig linter error

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-06 20:54:06 +02:00
Gaurav Garg
15f786e658
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel (#21159)
* Write an optimized flash_attn_stream_k_fixup kernel

Write a specialized and more optimized kernel for cases where nblocks_stream_k is multiple of ntiles_dst.
Make nblocks_stream_k to multiple of ntiles_dst if nblocks_stream_k > 2 * ntiles_dst

* Use the new kernel only for nblocks_stream_k_raw > 4 * ntiles_dst to make sure we have enough concurrency on GPUs

* Address review comments

* Address review comments

* Revert variable names to original
2026-04-06 20:34:29 +02:00
Aman Gupta
94ca829b60
llama-bench: add -fitc and -fitt to arguments (#21304)
* llama-bench: add `-fitc` and `-fitt` to arguments

* update README.md

* address review comments

* update compare-llama-bench.py
2026-04-06 22:26:02 +08:00
Aldehir Rojas
4aa962e2b0
vocab : add byte token handling to BPE detokenizer for Gemma4 (#21488) 2026-04-06 09:08:37 -05:00
Sigbjørn Skjæret
941146b3f1
convert : fix block_ff_dim retrieval for lfm2 (#21508) 2026-04-06 14:05:18 +02:00
lainon1
482d862bcb
server : handle unsuccessful sink.write in chunked stream provider (#21478)
Check the return value of sink.write() in the chunked content provider
and return false when the write fails, matching cpp-httplib's own
streaming contract. This prevents logging chunks as sent when the sink
rejected them and properly aborts the stream on connection failure.
2026-04-06 14:03:02 +02:00
Xuan-Son Nguyen
3979f2bb08
docs: add hunyuan-ocr gguf, also add test [no ci] (#21490) 2026-04-06 14:02:37 +02:00
Georgi Gerganov
400ac8e194
convert : set "add bos" == True for Gemma 4 (#21500)
* convert : set "add bos" == True for Gemma 4

* cont : handle old GGUFs
2026-04-06 13:52:07 +03:00
Neo Zhang
f51fd36d79
sycl : handle other FA case (#21377) 2026-04-06 13:28:00 +03:00
Yarden Tal
25eec6f327
hexagon: slight optimization for argosrt output init (#21463) 2026-04-05 18:30:25 -07:00
anchortense
58190cc84d
llama : correct platform-independent loading of BOOL metadata (#21428)
* model-loader : fix GGUF bool array conversion

* model-loader : fix remaining GGUF bool pointer uses
2026-04-06 01:40:38 +02:00
Richard Davison
af76639f72
model : add HunyuanOCR support (#21395)
* HunyuanOCR: add support for text and vision models

- Add HunyuanOCR vision projector (perceiver-based) with Conv2d merge
- Add separate HUNYUAN_OCR chat template (content-before-role format)
- Handle HunyuanOCR's invalid pad_token_id=-1 in converter
- Fix EOS/EOT token IDs from generation_config.json
- Support xdrope RoPE scaling type
- Add tensor mappings for perceiver projector (mm.before_rms, mm.after_rms, etc.)
- Register HunYuanVLForConditionalGeneration for both text and mmproj conversion

* fix proper mapping

* Update gguf-py/gguf/tensor_mapping.py

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* address comments

* update

* Fix typecheck

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-05 23:32:14 +02:00
Ludovic Henry
761797ffdf
ci : use default RISE RISC-V Runners (#21263) 2026-04-05 20:29:48 +02:00
ddh0
5d3a4a7da5
server : fix logging of build + system info (#21460)
This PR changes the logging that occurs at startup of llama-server.
Currently, it is redundant (including CPU information twice) and it is
missing the build + commit info.
2026-04-05 16:14:02 +02:00
M1DNYT3
c08d28d088
ci: lower cuda12 floor to 12.8.1 for broader host compatibility (#21438)
Co-authored-by: M1DNYT3 <m1dnyt3@MacBookPro.lan>
2026-04-05 09:04:00 +08:00
Nicholas Sparks
661e9acb36
ci: fix vulkan workflow referencing non-existent action (#21442) 2026-04-05 08:59:51 +08:00
Aldehir Rojas
b8635075ff
common : add gemma 4 specialized parser (#21418)
* common : add gemma4 dedicated parser

* cont : add '<|tool_response>' as eog

* cont : emit JSON from Gemma4 tool call AST

* cont : more fixes

* cont : refactor convert function

* cont : refine rules and mapping

* cont : add more tests

* cont : clean up

* cont : remove autoparser gemma4 implementation

* cont : more cleanup

* cont : rename gemma4.jinja to match the others

* cont : add custom template to support interleaved thinking

* cont : preserve reasoning in model turns

* cont : fix initializer error

* cont : fix unused vars

* cont : fix accidental static

* cont : fix specialized_template signature

* fix extra semicolon

* remove debug line and extra space [no ci]
2026-04-04 20:39:00 +02:00
Dan Hoffman
9c699074c9
server: Fix undefined timing measurement errors in server context (#21201)
Co-authored-by: Dan Hoffman <dhoffman@cyket.net>
2026-04-04 22:11:19 +08:00
Adrien Gallouët
d01f6274c0
common : respect specified tag, only fallback when tag is empty (#21413)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-04-04 15:08:03 +02:00
SamareshSingh
650bf14eb9
llama-model: read final_logit_softcapping for Gemma 4 (#21390) 2026-04-04 13:05:10 +02:00
Aman Gupta
b7ad48ebda
llama: add custom newline split for Gemma 4 (#21406) 2026-04-04 15:06:34 +08:00
Reese Levine
d006858316
ggml-webgpu: move from parameter buffer pool to single buffer with offsets (#21278)
* Work towards removing bitcast

* Move rest of existing types over

* Add timeout back to wait and remove synchronous set_tensor/memset_tensor

* move to unpackf16 for wider compatibility

* cleanup

* Remove deadlock condition in free_bufs

* Start work on removing parameter buffer pools

* Simplify and optimize further

* simplify profile futures

* Fix stride

* Try using a single command buffer per batch

* formatting
2026-04-03 11:40:14 -07:00
Masato Nakasaka
e439700992
ci: Add Windows Vulkan backend testing on Intel (#21292)
* experimenting CI

* Experimenting CI fix for MinGW

* experimenting CI on Windows

* modified script for integration with VisualStudio

* added proxy handling

* adding python version for Windows execution

* fix iterator::end() dereference

* fixed proxy handling

* Fix errors occurring on Windows

* fixed ci script

* Reverted to master

* Stripping test items to simplify Windows test

* adjusting script for windows testing

* Changed shell

* Fixed shell

* Fixed shell

* Fix CI setting

* Fix CI setting

* Fix CI setting

* Experimenting ci fix

* Experimenting ci fix

* Experimenting ci fix

* Experimenting ci fix

* experimenting fix for unit test error

* Changed to use BUILD_LOW_PERF to skip python tests

* Fix CI

* Added option to specify Ninja generator

* Reverted proxy related changes
2026-04-03 20:16:44 +03:00
Yes You Can Have Your Own
50e0ad08fb
server: save and clear idle slots on new task (--clear-idle) (#20993)
* server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE)

* server: move idle slot KV clearing to slot release

The save "cost" is now paid by the finishing request.

* server: add --kv-clear-idle flag, enable by default

* server: skip clearing last idle slot, clear on launch

* server: test --no-kv-clear-idle flag

* server: simplify on-release clearing loop

* server: remove on-release KV clearing, keep launch-only

* cont : clean-up

* tests: update log strings after --clear-idle rename

* tests: use debug tags instead of log message matching

* test: fix Windows CI by dropping temp log file unlink

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-03 19:02:27 +02:00
Piotr Wilkin (ilintar)
f1f793ad06
common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230)
* Fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers

* Rename

* Update common/chat-auto-parser-generator.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-03 17:51:52 +02:00
Samanvya Tripathi
af5c13841f
common : fix tool call type detection for nullable and enum schemas (#21327)
* common : fix tool call type detection for nullable and enum schemas

* common, tests : fix grammar delegation for nullable/enum schemas and add tests

Fix enum type inference to scan all enum values (not just index 0) so
schemas like {"enum": [0, "celsius"]} correctly detect string type.

Fix schema_delegates in peg-parser to handle nullable type arrays
(["string", "null"]) and typeless enum schemas in raw mode, allowing
the tagged parser to use raw text instead of JSON-formatted strings.

Add test cases for Qwen3-Coder (TAG_WITH_TAGGED format):
- nullable string ["string", "null"]
- nullable string with null first ["null", "string"]
- nullable integer ["integer", "null"]
- enum without explicit type key
2026-04-03 17:51:23 +02:00
M1DNYT3
277ff5fff7
docker : bump cuda12 to 12.9.1 (#20920)
Co-authored-by: M1DNYT3 <m1dnyt3@MacBookPro.lan>
Co-authored-by: CISC <CISC@users.noreply.github.com>
2026-04-03 15:06:45 +02:00
jeromew
384c0076bc
docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331)
The `HSA_OVERRIDE_GFX_VERSION` variable can be used in ROCm to override an unsupported target architecture with a similar but supported target architecture.

This does not and has never worked on Windows. I think the clarification could avoid driving Windows people towards this solution that does not work.
2026-04-03 21:05:14 +08:00
Sigbjørn Skjæret
1f34806c44
jinja: coerce input for string-specific filters (#21370) 2026-04-03 15:03:33 +02:00
Aaron Teo
887535c33f
ci: add more binary checks (#21349) 2026-04-03 20:50:00 +08:00
Piotr Wilkin (ilintar)
d3416a4aa9
fix: remove stale assert (#21369) 2026-04-03 13:40:41 +02:00
uvos
43a4ee4a2c
HIP: build eatch ci build test for a different architecture (#21337)
This helps improve our chances of finding build failures before the release workflow
builds for all architectures.
2026-04-03 11:38:22 +02:00
Tillerino
f851fa5ab0
fix: add openssl to nix dependencies (#21353) (#21355) 2026-04-03 12:21:07 +03:00
Vishal Singh
f1ac84119c
ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)
* ggml-zendnn : add MUL_MAT_ID op support for MoE models
- Add MUL_MAT_ID op acceleration for Mixture-of-Experts models
- MUL_MAT_ID op fallback to CPU backend if total experts > 32
- Point ZenDNN lib to latest bits ZenDNN-2026-WW13

* ggml-zendnn : add braces to sgemm failure condition for consistency

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

---------

Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-04-03 12:19:08 +03:00
Piotr Wilkin (ilintar)
b069b10ab4
vocab: fix Gemma4 tokenizer (#21343)
* seems to work

* fix case with new line

Co-authored-by: sayap <sokann@gmail.com>

* gemma 4: fix pre tok regex

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: sayap <sokann@gmail.com>
2026-04-03 10:33:03 +02:00
Radoslav Gerganov
0c58ba3365
rpc : reuse compute graph buffers (#21299)
Reuse the buffer for the ggml context which is used for creating the
compute graph on the server side. This partially addresses a memory leak
created by the CUDA backend due to using buffer addresses as cache
keys.

ref: #21265
ref: #20315
2026-04-03 10:28:09 +03:00
Georgi Gerganov
57ace0d612
chat : avoid including json in chat.h (#21306) 2026-04-03 09:07:59 +03:00
Georgi Gerganov
39b27f0da0
(revert) kv-cache : do not quantize SWA KV cache (#21332)
This reverts commit 17193cce34.
2026-04-03 09:07:01 +03:00
Vishal Singh
f49e917876
ci : add AMD ZenDNN label to PR labeler (#21345)
* ci : add AMD CPU label to PR labeler
Add automatic labeling for PRs that modify AMD CPU (ZenDNN) backend files

* ci : rename label AMD CPU to AMD ZenDNN in labeler config

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

---------

Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-04-03 10:35:15 +08:00
Slobodan Josic
7c7d6ce5c7
[HIP] Bump ROCm version to 7.2.1 (#21066)
Bump ROCm version on Linux from 7.2 to 7.2.1
Add gfx1102 target
Delete LLVM workaround since ROCm 7.2.1 has fix for ROCm 7.2 perf regression https://github.com/ROCm/rocm-systems/issues/2865

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-03 00:59:20 +02:00
Piotr Wilkin (ilintar)
5208e2d5ba
fix: gemma 4 template (#21326) 2026-04-02 23:31:02 +02:00