llama: end-to-end tests (#19802)

* tests: add end-to-end tests per model architecture * fixup for rebase * fix use-after-free in llama-model-loader.cpp * fix CI * fix WebGPU * fix CI * disable CI for macOS-latest-cmake-arm64 * use expert_weights_scale only if != 0.0f * comments
2026-03-08 12:30:21 +01:00 · 2026-03-08 12:30:21 +01:00 · a976ff081b
commit a976ff081b
parent a95047979a
33 changed files with 1607 additions and 633 deletions
--- a/src/models/bailingmoe2.cpp
+++ b/src/models/bailingmoe2.cpp
@ -90,7 +90,7 @@ llm_build_bailingmoe2::llm_build_bailingmoe2(const llama_model & model, const ll
                model.layers[il].ffn_exp_probs_b,
                n_expert, n_expert_used,
                LLM_FFN_SILU, hparams.expert_weights_norm,
-                true, hparams.expert_weights_scale,
+                hparams.expert_weights_scale, hparams.expert_weights_scale,
                (llama_expert_gating_func_type) hparams.expert_gating_func,
                il);
            cb(moe_out, "ffn_moe_out", il);