llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653)

* llama: automatically fit args to free memory

llama-fit-params tool

* fix CI

* hints for bug reports, ensure no reallocation

* fix segfault with Vulkan

* add llama-fit-params to CI

* fix CI

* fix CI

* fix CI

* minor adjustments

* fix assignment of 1 dense layer

* fix logger not being reset on model load failure

* remove --n-gpu-layer hint on model load failure

* fix llama-fit-params verbosity

* fix edge case

* fix typo [no ci]
This commit is contained in:
Johannes Gäßler 2025-12-15 09:24:59 +01:00 committed by GitHub
parent 4aced7a631
commit b1f3a6e5db
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
26 changed files with 1075 additions and 63 deletions

View file

@ -25,6 +25,10 @@ time_meas::~time_meas() {
}
}
void llama_log_get(ggml_log_callback * log_callback, void ** user_data) {
ggml_log_get(log_callback, user_data);
}
void llama_log_set(ggml_log_callback log_callback, void * user_data) {
ggml_log_set(log_callback, user_data);
g_logger_state.log_callback = log_callback ? log_callback : llama_log_callback_default;