llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653)
* llama: automatically fit args to free memory llama-fit-params tool * fix CI * hints for bug reports, ensure no reallocation * fix segfault with Vulkan * add llama-fit-params to CI * fix CI * fix CI * fix CI * minor adjustments * fix assignment of 1 dense layer * fix logger not being reset on model load failure * remove --n-gpu-layer hint on model load failure * fix llama-fit-params verbosity * fix edge case * fix typo [no ci]
This commit is contained in:
parent
4aced7a631
commit
b1f3a6e5db
26 changed files with 1075 additions and 63 deletions
9
.github/ISSUE_TEMPLATE/011-bug-results.yml
vendored
9
.github/ISSUE_TEMPLATE/011-bug-results.yml
vendored
|
|
@ -11,7 +11,7 @@ body:
|
|||
(i.e. the generated text) are incorrect or llama.cpp crashes during model evaluation.
|
||||
If you encountered the issue while using an external UI (e.g. ollama),
|
||||
please reproduce your issue using one of the examples/binaries in this repository.
|
||||
The `llama-cli` binary can be used for simple and reproducible model inference.
|
||||
The `llama-completion` binary can be used for simple and reproducible model inference.
|
||||
- type: textarea
|
||||
id: version
|
||||
attributes:
|
||||
|
|
@ -74,9 +74,12 @@ body:
|
|||
Please give us a summary of the problem and tell us how to reproduce it.
|
||||
If you can narrow down the bug to specific hardware, compile flags, or command line arguments,
|
||||
that information would be very much appreciated by us.
|
||||
|
||||
If possible, please try to reproduce the issue using `llama-completion` with `-fit off`.
|
||||
If you can only reproduce the issue with `-fit on`, please provide logs both with and without `--verbose`.
|
||||
placeholder: >
|
||||
e.g. when I run llama-cli with -ngl 99 I get garbled outputs.
|
||||
When I use -ngl 0 it works correctly.
|
||||
e.g. when I run llama-completion with `-fa on` I get garbled outputs for very long prompts.
|
||||
With short prompts or `-fa off` it works correctly.
|
||||
Here are the exact commands that I used: ...
|
||||
validations:
|
||||
required: true
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue