* llama-server: recursive GGUF loading Replace flat directory scan with recursive traversal using std::filesystem::recursive_directory_iterator. Support for nested vendor/model layouts (e.g. vendor/model/*.gguf). Model name now reflects the relative path within --models-dir instead of just the filename. Aggregate files by parent directory via std::map before constructing local_model * server : router config POC (INI-based per-model settings) * server: address review feedback from @aldehir and @ngxson PEG parser usage improvements: - Simplify parser instantiation (remove arena indirection) - Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping) - Fix last line without newline bug (+ operator instead of <<) - Remove redundant end position check Feature scope: - Remove auto-reload feature (will be separate PR per @ngxson) - Keep config.ini auto-creation and template generation - Preserve per-model customization logic Co-authored-by: aldehir <aldehir@users.noreply.github.com> Co-authored-by: ngxson <ngxson@users.noreply.github.com> * server: adopt aldehir's line-oriented PEG parser Complete rewrite of INI parser grammar and visitor: - Use p.chars(), p.negate(), p.any() instead of p.until() - Support end-of-line comments (key=value # comment) - Handle EOF without trailing newline correctly - Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]*) - Simplified visitor (no pending state, no trim needed) - Grammar handles whitespace natively via eol rule Business validation preserved: - Reject section names starting with LLAMA_ARG_* - Accept only keys starting with LLAMA_ARG_* - Require explicit section before key-value pairs Co-authored-by: aldehir <aldehir@users.noreply.github.com> * server: fix CLI/env duplication in child processes Children now receive minimal CLI args (executable, model, port, alias) instead of inheriting all router args. Global settings pass through LLAMA_ARG_* environment variables only, eliminating duplicate config warnings. Fixes: Router args like -ngl, -fa were passed both via CLI and env, causing 'will be overwritten' warnings on every child spawn * add common/preset.cpp * fix compile * cont * allow custom-path models * add falsey check * server: fix router model discovery and child process spawning - Sanitize model names: replace / and \ with _ for display - Recursive directory scan with relative path storage - Convert relative paths to absolute when spawning children - Filter router control args from child processes - Refresh args after port assignment for correct port value - Fallback preset lookup for compatibility - Fix missing argv[0]: store server binary path before base_args parsing * Revert "server: fix router model discovery and child process spawning" This reverts commit e3832b42eeea7fcb108995966c7584479f745857. * clarify about "no-" prefix * correct render_args() to include binary path * also remove arg LLAMA_ARG_MODELS_PRESET for child * add co-author for ini parser code Co-authored-by: aldehir <hello@alde.dev> * also set LLAMA_ARG_HOST * add CHILD_ADDR * Remove dead code --------- Co-authored-by: aldehir <aldehir@users.noreply.github.com> Co-authored-by: ngxson <ngxson@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: aldehir <hello@alde.dev>
78 lines
2.1 KiB
CMake
78 lines
2.1 KiB
CMake
include_directories(${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_BINARY_DIR})
|
|
|
|
# server-context containing the core server logic, used by llama-server and CLI
|
|
|
|
set(TARGET server-context)
|
|
|
|
add_library(${TARGET} STATIC
|
|
server-task.cpp
|
|
server-task.h
|
|
server-queue.cpp
|
|
server-queue.h
|
|
server-common.cpp
|
|
server-common.h
|
|
server-context.cpp
|
|
server-context.h
|
|
)
|
|
|
|
if (BUILD_SHARED_LIBS)
|
|
set_target_properties(${TARGET} PROPERTIES POSITION_INDEPENDENT_CODE ON)
|
|
endif()
|
|
|
|
target_include_directories(${TARGET} PRIVATE ../mtmd)
|
|
target_include_directories(${TARGET} PRIVATE ${CMAKE_SOURCE_DIR})
|
|
target_link_libraries(${TARGET} PUBLIC common mtmd ${CMAKE_THREAD_LIBS_INIT})
|
|
|
|
|
|
# llama-server executable
|
|
|
|
set(TARGET llama-server)
|
|
|
|
if (NOT LLAMA_HTTPLIB)
|
|
message(FATAL_ERROR "LLAMA_HTTPLIB is OFF, cannot build llama-server. Hint: to skip building server, set -DLLAMA_BUILD_SERVER=OFF")
|
|
endif()
|
|
|
|
set(TARGET_SRCS
|
|
server.cpp
|
|
server-http.cpp
|
|
server-http.h
|
|
server-models.cpp
|
|
server-models.h
|
|
server-task.cpp
|
|
server-task.h
|
|
server-queue.cpp
|
|
server-queue.h
|
|
server-common.cpp
|
|
server-common.h
|
|
server-context.cpp
|
|
server-context.h
|
|
)
|
|
set(PUBLIC_ASSETS
|
|
index.html.gz
|
|
loading.html
|
|
)
|
|
|
|
foreach(asset ${PUBLIC_ASSETS})
|
|
set(input "${CMAKE_CURRENT_SOURCE_DIR}/public/${asset}")
|
|
set(output "${CMAKE_CURRENT_BINARY_DIR}/${asset}.hpp")
|
|
list(APPEND TARGET_SRCS ${output})
|
|
add_custom_command(
|
|
DEPENDS "${input}"
|
|
OUTPUT "${output}"
|
|
COMMAND "${CMAKE_COMMAND}" "-DINPUT=${input}" "-DOUTPUT=${output}" -P "${PROJECT_SOURCE_DIR}/scripts/xxd.cmake"
|
|
)
|
|
set_source_files_properties(${output} PROPERTIES GENERATED TRUE)
|
|
endforeach()
|
|
|
|
add_executable(${TARGET} ${TARGET_SRCS})
|
|
install(TARGETS ${TARGET} RUNTIME)
|
|
|
|
target_include_directories(${TARGET} PRIVATE ../mtmd)
|
|
target_include_directories(${TARGET} PRIVATE ${CMAKE_SOURCE_DIR})
|
|
target_link_libraries(${TARGET} PRIVATE server-context PUBLIC common cpp-httplib ${CMAKE_THREAD_LIBS_INIT})
|
|
|
|
if (WIN32)
|
|
TARGET_LINK_LIBRARIES(${TARGET} PRIVATE ws2_32)
|
|
endif()
|
|
|
|
target_compile_features(${TARGET} PRIVATE cxx_std_17)
|