server: support load model on startup, support preset-only options (#18206)

* server: support autoload model, support preset-only options

* add docs

* load-on-startup

* fix

* Update common/arg.cpp

Co-authored-by: Pascal <admin@serveurperso.com>

---------

Co-authored-by: Pascal <admin@serveurperso.com>
This commit is contained in:
Xuan-Son Nguyen 2025-12-20 09:25:27 +01:00 committed by GitHub
parent 74e05131e9
commit 9e39a1e6a9
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 80 additions and 10 deletions

View file

@ -1480,6 +1480,9 @@ The precedence rule for preset options is as follows:
2. **Model-specific options** defined in the preset file (e.g. `[ggml-org/MY-MODEL...]`)
3. **Global options** defined in the preset file (`[*]`)
We also offer additional options that are exclusive to presets (these aren't treated as command-line arguments):
- `load-on-startup` (boolean): Controls whether the model loads automatically when the server starts
### Routing requests
Requests are routed according to the requested model name.