server : make cache_reuse configurable per request (#17858)
This commit is contained in:
parent
5814b4dce1
commit
2bc96931d2
4 changed files with 31 additions and 15 deletions
|
|
@ -495,6 +495,8 @@ By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to re
|
|||
|
||||
`n_cmpl`: Number of completions to generate from the current prompt. If input has multiple prompts, the output will have N prompts times `n_cmpl` entries.
|
||||
|
||||
`n_cache_reuse`: Min chunk size to attempt reusing from the cache via KV shifting. For more info, see `--cache-reuse` arg. Default: `0`, which is disabled.
|
||||
|
||||
`stream`: Allows receiving each predicted token in real-time instead of waiting for the completion to finish (uses a different response format). To enable this, set to `true`.
|
||||
|
||||
`stop`: Specify a JSON array of stopping strings.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue