Merge branch 'master' into 131_pi4_mesa_vulkan_example

This commit is contained in:
Henry Miskin 2021-03-14 21:31:58 +00:00
commit b8963a4604
85 changed files with 6060 additions and 6407 deletions

1
.ccls
View file

@ -19,6 +19,7 @@
-I./external/googletest/googletest/include/
-I./external/glslang/
-I./external/spdlog/include/
-I./external/fmt/include/
-I./src/include/
-I./single_include/
-I./vk_ndk_wrapper_include/

View file

@ -1,6 +1,76 @@
# Changelog
## [v0.6.0](https://github.com/EthicalML/vulkan-kompute/tree/v0.6.0)
## [v0.7.0](https://github.com/EthicalML/vulkan-kompute/tree/v0.7.0)
[Full Changelog](https://github.com/EthicalML/vulkan-kompute/compare/v0.6.0...v0.7.0)
**Implemented enhancements:**
- Extend non-spdlog print functions to use std::format [\#158](https://github.com/EthicalML/vulkan-kompute/issues/158)
- Add code coverage reports with codecov [\#145](https://github.com/EthicalML/vulkan-kompute/issues/145)
- Explore removing `std::vector mData;` completely from Tensor in favour of always storing data in hostVisible buffer memory \(TBC\) [\#144](https://github.com/EthicalML/vulkan-kompute/issues/144)
- Update all examples to match breaking changes in 0.7.0 [\#141](https://github.com/EthicalML/vulkan-kompute/issues/141)
- Avoid copy when returning python numpy / array [\#139](https://github.com/EthicalML/vulkan-kompute/issues/139)
- Cover all Python & C++ tests in CI [\#121](https://github.com/EthicalML/vulkan-kompute/issues/121)
- Add C++ Test for Simple Work Groups Example [\#117](https://github.com/EthicalML/vulkan-kompute/issues/117)
- Expose push constants in OpAlgo [\#54](https://github.com/EthicalML/vulkan-kompute/issues/54)
- Expose ability to create barriers in OpTensor operations [\#45](https://github.com/EthicalML/vulkan-kompute/issues/45)
- Create delete function in manager to free / destroy sequence [\#36](https://github.com/EthicalML/vulkan-kompute/issues/36)
- Make specialisation data extensible [\#12](https://github.com/EthicalML/vulkan-kompute/issues/12)
- Support multiple types for Kompute Tensors [\#2](https://github.com/EthicalML/vulkan-kompute/issues/2)
- Added re-record sequence functionality and updated docs [\#171](https://github.com/EthicalML/vulkan-kompute/pull/171) ([axsaucedo](https://github.com/axsaucedo))
- Extend non-spdlog print functions to use fmt::format / fmt::print [\#159](https://github.com/EthicalML/vulkan-kompute/pull/159) ([axsaucedo](https://github.com/axsaucedo))
- Added support for custom SpecializedConstants and removed KomputeWorkgroup class [\#151](https://github.com/EthicalML/vulkan-kompute/pull/151) ([axsaucedo](https://github.com/axsaucedo))
- Added destroy functions for tensors and sequences \(named and object\) [\#146](https://github.com/EthicalML/vulkan-kompute/pull/146) ([axsaucedo](https://github.com/axsaucedo))
**Fixed bugs:**
- push\_constant not working in my case? [\#168](https://github.com/EthicalML/vulkan-kompute/issues/168)
- DescriptorPool set is not being freed [\#155](https://github.com/EthicalML/vulkan-kompute/issues/155)
- Updated memory barriers to include staging buffers [\#182](https://github.com/EthicalML/vulkan-kompute/pull/182) ([axsaucedo](https://github.com/axsaucedo))
- Adds push const ranges in pipelinelayout to fix \#168 [\#174](https://github.com/EthicalML/vulkan-kompute/pull/174) ([axsaucedo](https://github.com/axsaucedo))
- Added destructor for staging tensors [\#134](https://github.com/EthicalML/vulkan-kompute/pull/134) ([axsaucedo](https://github.com/axsaucedo))
**Closed issues:**
- Update memory barriers to align with tensor staging/primary memory revamp [\#181](https://github.com/EthicalML/vulkan-kompute/issues/181)
- Move shader defaultResource inside kp::Shader class [\#175](https://github.com/EthicalML/vulkan-kompute/issues/175)
- Reach at least 90% code coverage on tests [\#170](https://github.com/EthicalML/vulkan-kompute/issues/170)
- Add functionality to re-record sequence as now it's possible to update the underlying algorithm [\#169](https://github.com/EthicalML/vulkan-kompute/issues/169)
- Use numpy arrays as default return value [\#166](https://github.com/EthicalML/vulkan-kompute/issues/166)
- Update all shared\_ptr value passes to be by ref or const ref [\#161](https://github.com/EthicalML/vulkan-kompute/issues/161)
- Amend memory hierarchy for kp::Operations so they can be created separately [\#160](https://github.com/EthicalML/vulkan-kompute/issues/160)
- Customise theme of documentation [\#156](https://github.com/EthicalML/vulkan-kompute/issues/156)
- Remove KomputeWorkgroup class in favour of std::array\<uint32\_t, 3\> [\#152](https://github.com/EthicalML/vulkan-kompute/issues/152)
- Passing raw GLSL string to Shader Module depricated so remove this method from supported approach [\#150](https://github.com/EthicalML/vulkan-kompute/issues/150)
- Add python backwards compatibility for eval\_tensor\_create\_def [\#147](https://github.com/EthicalML/vulkan-kompute/issues/147)
- Document breaking changes for 0.7.0 [\#140](https://github.com/EthicalML/vulkan-kompute/issues/140)
- Tensor memory management and memory hierarchy redesign [\#136](https://github.com/EthicalML/vulkan-kompute/issues/136)
- Staging tensor GPU memory is not freed as part of OpCreateTensor removal [\#133](https://github.com/EthicalML/vulkan-kompute/issues/133)
- eStorage Tensors are currently unusable as OpTensorCreate calls mapDataIntoHostMemory [\#132](https://github.com/EthicalML/vulkan-kompute/issues/132)
- 0.6.0 Release [\#126](https://github.com/EthicalML/vulkan-kompute/issues/126)
- java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found [\#125](https://github.com/EthicalML/vulkan-kompute/issues/125)
- Initial exploration: Include explicit GLSL to SPIRV compilation [\#107](https://github.com/EthicalML/vulkan-kompute/issues/107)
- Add support for push constants [\#106](https://github.com/EthicalML/vulkan-kompute/issues/106)
**Merged pull requests:**
- Resolve moving all functions from tensor HPP to CPP [\#186](https://github.com/EthicalML/vulkan-kompute/pull/186) ([axsaucedo](https://github.com/axsaucedo))
- Device Properties [\#184](https://github.com/EthicalML/vulkan-kompute/pull/184) ([alexander-g](https://github.com/alexander-g))
- Too many warnings [\#183](https://github.com/EthicalML/vulkan-kompute/pull/183) ([alexander-g](https://github.com/alexander-g))
- Add support for bool, double, int32, uint32 and float32 on Tensors via TensorT [\#177](https://github.com/EthicalML/vulkan-kompute/pull/177) ([axsaucedo](https://github.com/axsaucedo))
- Support for Timestamping [\#176](https://github.com/EthicalML/vulkan-kompute/pull/176) ([alexander-g](https://github.com/alexander-g))
- Test for ShaderResources [\#165](https://github.com/EthicalML/vulkan-kompute/pull/165) ([aliPMPAINT](https://github.com/aliPMPAINT))
- Amend memory hierarchy to enable for push constants and functional interface for more flexible operations [\#164](https://github.com/EthicalML/vulkan-kompute/pull/164) ([axsaucedo](https://github.com/axsaucedo))
- made changes for include paths for complete installation [\#163](https://github.com/EthicalML/vulkan-kompute/pull/163) ([aliPMPAINT](https://github.com/aliPMPAINT))
- Added dark mode on docs [\#157](https://github.com/EthicalML/vulkan-kompute/pull/157) ([axsaucedo](https://github.com/axsaucedo))
- Glslang implementation for online shader compilation [\#154](https://github.com/EthicalML/vulkan-kompute/pull/154) ([axsaucedo](https://github.com/axsaucedo))
- Adding test code coverage using gcov and lcov [\#149](https://github.com/EthicalML/vulkan-kompute/pull/149) ([axsaucedo](https://github.com/axsaucedo))
- Added temporary backwards compatibility for eval\_tensor\_create\_def function [\#148](https://github.com/EthicalML/vulkan-kompute/pull/148) ([axsaucedo](https://github.com/axsaucedo))
- Amend memory ownership hierarchy to have Tensor owned by Manager instead of OpCreateTensor / OpBase [\#138](https://github.com/EthicalML/vulkan-kompute/pull/138) ([axsaucedo](https://github.com/axsaucedo))
- Removed Staging Tensors in favour of having two buffer & memory in a Tensor to minimise data transfer [\#137](https://github.com/EthicalML/vulkan-kompute/pull/137) ([axsaucedo](https://github.com/axsaucedo))
## [v0.6.0](https://github.com/EthicalML/vulkan-kompute/tree/v0.6.0) (2021-01-31)
[Full Changelog](https://github.com/EthicalML/vulkan-kompute/compare/v0.5.1...v0.6.0)
@ -49,7 +119,6 @@
- Remove the template params from OpAlgoBase for dispatch layout [\#57](https://github.com/EthicalML/vulkan-kompute/issues/57)
- Enable layout to be configured dynamically within shaders [\#26](https://github.com/EthicalML/vulkan-kompute/issues/26)
- replaced "static unsigned const" to "static const unsigned" to avoid SWIG parsing error. [\#95](https://github.com/EthicalML/vulkan-kompute/pull/95) ([0x0f0f0f](https://github.com/0x0f0f0f))
- Added python bindings with kp as python module [\#88](https://github.com/EthicalML/vulkan-kompute/pull/88) ([axsaucedo](https://github.com/axsaucedo))
**Closed issues:**
@ -69,6 +138,7 @@
- Adding Python package for Kompute [\#87](https://github.com/EthicalML/vulkan-kompute/issues/87)
- Python shader extension [\#91](https://github.com/EthicalML/vulkan-kompute/pull/91) ([axsaucedo](https://github.com/axsaucedo))
- Enhanced python build [\#89](https://github.com/EthicalML/vulkan-kompute/pull/89) ([axsaucedo](https://github.com/axsaucedo))
- Added python bindings with kp as python module [\#88](https://github.com/EthicalML/vulkan-kompute/pull/88) ([axsaucedo](https://github.com/axsaucedo))
**Closed issues:**

View file

@ -1,5 +1,5 @@
cmake_minimum_required(VERSION 3.4.1)
project(kompute VERSION 0.6.0)
project(kompute VERSION 0.7.0)
set(CMAKE_CXX_STANDARD 14)
@ -20,6 +20,8 @@ option(KOMPUTE_OPT_REPO_SUBMODULE_BUILD, "Use the submodule repos instead of ext
option(KOMPUTE_OPT_ANDOID_BUILD "Enable android compilation flags required" 0)
option(KOMPUTE_OPT_DISABLE_VK_DEBUG_LAYERS "Explicitly disable debug layers even on debug" 0)
option(KOMPUTE_OPT_DISABLE_SHADER_UTILS "Remove shader util code and dependencies including glslang" 0)
option(KOMPUTE_OPT_DEPENDENCIES_SHARED_LIBS "Whether to use shared libraries for dependencies for install" 0)
option(KOMPUTE_OPT_BUILD_AS_SHARED_LIB "Whether to build kompute as shared library" 0)
# Build flags
set(KOMPUTE_EXTRA_CXX_FLAGS "" CACHE STRING "Extra compile flags for Kompute, see docs for full list")
@ -29,6 +31,10 @@ if(KOMPUTE_OPT_ENABLE_SPDLOG)
if(KOMPUTE_OPT_INSTALL)
# Enable install parameters for spdlog (overrides parameters passed)
set(SPDLOG_INSTALL ON CACHE BOOL "Enables install of spdlot" FORCE)
if(KOMPUTE_OPT_DEPENDENCIES_SHARED_LIBS)
set(SPDLOG_BUILD_SHARED ON CACHE BOOL "Enables build of shared libraries" FORCE)
endif()
endif()
endif()
@ -54,7 +60,11 @@ if(NOT KOMPUTE_OPT_DISABLE_SHADER_UTILS)
# Enable install parameters for glslang (overrides parameters passed)
# When install is enabled the glslang libraries become shared
set(ENABLE_GLSLANG_INSTALL ON CACHE BOOL "Enables install of glslang" FORCE)
set(BUILD_SHARED_LIBS ON CACHE BOOL "Enables build of shared libraries" FORCE)
# By default we enable shared library based installation
if(KOMPUTE_OPT_DEPENDENCIES_SHARED_LIBS)
set(BUILD_SHARED_LIBS ON CACHE BOOL "Enables build of shared libraries" FORCE)
endif()
endif()
else()
set(KOMPUTE_EXTRA_CXX_FLAGS "${KOMPUTE_EXTRA_CXX_FLAGS} -DKOMPUTE_DISABLE_SHADER_UTILS=1")

View file

@ -13,7 +13,7 @@ VCPKG_WIN_PATH ?= "C:\\Users\\axsau\\Programming\\lib\\vcpkg\\scripts\\buildsyst
VCPKG_UNIX_PATH ?= "/c/Users/axsau/Programming/lib/vcpkg/scripts/buildsystems/vcpkg.cmake"
# Regext to pass to catch2 to filter tests
FILTER_TESTS ?= "-TestAsyncOperations.TestManagerParallelExecution"
FILTER_TESTS ?= "-TestAsyncOperations.TestManagerParallelExecution:TestSequence.SequenceTimestamps"
ifeq ($(OS),Windows_NT) # is Windows_NT on XP, 2000, 7, Vista, 10...
CMAKE_BIN ?= "C:\Program Files\CMake\bin\cmake.exe"
@ -57,7 +57,6 @@ MK_KOMPUTE_EXTRA_CXX_FLAGS ?= ""
mk_cmake:
cmake \
-Bbuild \
$(MK_CMAKE_EXTRA_FLAGS) \
-DKOMPUTE_EXTRA_CXX_FLAGS=$(MK_KOMPUTE_EXTRA_CXX_FLAGS) \
-DCMAKE_BUILD_TYPE=$(MK_BUILD_TYPE) \
-DCMAKE_INSTALL_PREFIX=$(MK_INSTALL_PATH) \
@ -69,6 +68,7 @@ mk_cmake:
-DKOMPUTE_OPT_BUILD_SINGLE_HEADER=1 \
-DKOMPUTE_OPT_ENABLE_SPDLOG=1 \
-DKOMPUTE_OPT_CODE_COVERAGE=1 \
$(MK_CMAKE_EXTRA_FLAGS) \
-G "Unix Makefiles"
mk_build_all:
@ -163,6 +163,9 @@ generate_python_docstrings:
python -m pybind11_mkdoc \
-o python/src/docstrings.hpp \
single_include/kompute/Kompute.hpp \
-Iexternal/fmt/include/ \
-Iexternal/spdlog/include/ \
-Iexternal/glslang/ \
-I/usr/include/c++/7.5.0/
install_python_reqs:
@ -196,4 +199,4 @@ format:
build_changelog:
docker run --rm -it -v "$(PWD)":/usr/local/src/your-app -e CHANGELOG_GITHUB_TOKEN=${CHANGELOG_GITHUB_TOKEN} ferrarimarco/github-changelog-generator:1.15.2 -u EthicalML -p vulkan-kompute
chmod 664 CHANGELOG.md # (Read+Write, Read+Write, Read)
sed -i -e 's/\(HEAD\|Unreleased\)/v0.6.0/g' CHANGELOG.md # Replacing unreleased version with latest tag
sed -i -e 's/\(HEAD\|Unreleased\)/v${VERSION}/g' CHANGELOG.md # Replacing unreleased version with latest tag

192
README.md
View file

@ -34,8 +34,8 @@
* [Mobile enabled](#mobile-enabled) with examples via Android NDK across several architectures
* BYOV: [Bring-your-own-Vulkan design](#motivations) to play nice with existing Vulkan applications
* Explicit relationships for GPU and host [memory ownership and memory management](https://kompute.cc/overview/memory-management.html)
* [Hands on examples](#simple-examples) showing the core features
* Longer tutorials for [machine learning 🤖](https://towardsdatascience.com/machine-learning-and-data-processing-in-the-gpu-with-vulkan-kompute-c9350e5e5d3a), [mobile development 📱](https://towardsdatascience.com/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617) and [game development 🎮](https://towardsdatascience.com/supercharging-game-development-with-gpu-accelerated-ml-using-vulkan-kompute-the-godot-game-engine-4e75a84ea9f0).
* Robust codebase with [90% unit test code coverage](https://kompute.cc/codecov/)
* Advanced use-cases on [machine learning 🤖](https://towardsdatascience.com/machine-learning-and-data-processing-in-the-gpu-with-vulkan-kompute-c9350e5e5d3a), [mobile development 📱](https://towardsdatascience.com/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617) and [game development 🎮](https://towardsdatascience.com/supercharging-game-development-with-gpu-accelerated-ml-using-vulkan-kompute-the-godot-game-engine-4e75a84ea9f0).
![](https://raw.githubusercontent.com/ethicalml/vulkan-kompute/master/docs/images/komputer-logos.gif)
@ -48,43 +48,90 @@ Below you can find a GPU multiplication example using the C++ and Python Kompute
The C++ interface provides low level access to the native components of Kompute and Vulkan, enabling for [advanced optimizations](https://kompute.cc/overview/async-parallel.html) as well as [extension of components](https://kompute.cc/overview/reference.html).
```c++
int main() {
// 1. Create Kompute Manager with default settings (device 0 and first compute compatible queue)
void kompute(const std::string& shader) {
// 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)
kp::Manager mgr;
// 2. Create and initialise Kompute Tensors through manager
// Default tensor constructor simplifies creation of float values
auto tensorInA = mgr.tensor({ 2., 2., 2. });
auto tensorInB = mgr.tensor({ 1., 2., 3. });
auto tensorOut = mgr.tensor({ 0., 0., 0. });
// Explicit type constructor supports uint32, int32, double, float and bool
auto tensorOutA = mgr.tensorT<uint32_t>({ 0, 0, 0 });
auto tensorOutB = mgr.tensorT<uint32_t>({ 0, 0, 0 });
// 3. Specify "multiply shader" code (can also be raw string, spir-v bytes or file path)
std::string shaderString = (R"(
std::vector<std::shared_ptr<kp::Tensor>> params = {tensorInA, tensorInB, tensorOutA, tensorOutB};
// 3. Create algorithm based on shader (supports buffers & push/spec constants)
kp::Workgroup workgroup({3, 1, 1});
kp::Constants specConsts({ 2 });
kp::Constants pushConstsA({ 2.0 });
kp::Constants pushConstsB({ 3.0 });
auto algorithm = mgr.algorithm(params,
kp::Shader::compile_source(shader),
workgroup,
specConsts,
pushConstsA);
// 4. Run operation synchronously using sequence
mgr.sequence()
->record<kp::OpTensorSyncDevice>(params)
->record<kp::OpAlgoDispatch>(algorithm) // Binds default push consts
->eval() // Evaluates the two recorded operations
->record<kp::OpAlgoDispatch>(algorithm, pushConstsB) // Overrides push consts
->eval(); // Evaluates only last recorded operation
// 5. Sync results from the GPU asynchronously
sq = mgr.sequence()
sq->evalAsync<kp::OpTensorSyncLocal>(params);
// ... Do other work asynchronously whilst GPU finishes
sq->evalAwait();
// Prints the first output which is: { 4, 8, 12 }
for (const float& elem : tensorOutA->data()) std::cout << elem << " ";
// Prints the second output which is: { 10, 10, 10 }
for (const float& elem : tensorOutB->data()) std::cout << elem << " ";
} // Manages / releases all CPU and GPU memory resources
int main() {
// Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header
// files). This shader shows some of the main components including constants, buffers, etc
std::string shader = (R"(
#version 450
layout (local_size_x = 1) in;
// The input tensors bind index is relative to index in parameter passed
layout(set = 0, binding = 0) buffer bina { float tina[]; };
layout(set = 0, binding = 1) buffer binb { float tinb[]; };
layout(set = 0, binding = 2) buffer bout { float tout[]; };
layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };
layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };
layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };
layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };
// Kompute supports push constants updated on dispatch
layout(push_constant) uniform PushConstants {
float val;
} push_const;
// Kompute also supports spec constants on initalization
layout(constant_id = 0) const float const_one = 0;
void main() {
uint index = gl_GlobalInvocationID.x;
tout[index] = tina[index] * tinb[index];
out_a[index] += uint( in_a[index] * in_b[index] );
out_b[index] += uint( const_one * push_const.val );
}
)");
// 3. Run operation with string shader synchronously
mgr.evalOpDefault<kp::OpAlgoBase>(
{ tensorInA, tensorInB, tensorOut },
kp::Shader::compile_source(shaderString));
// 4. Map results back from GPU memory to print the results
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorInA, tensorInB, tensorOut });
// Prints the output which is Output: { 2, 4, 6 }
for (const float& elem : tensorOut->data()) std::cout << elem << " ";
// Run the function declared above with our raw string shader
kompute(shader);
}
```
@ -94,34 +141,85 @@ int main() {
The [Python package](https://kompute.cc/overview/python-package.html) provides a [high level interactive interface](https://kompute.cc/overview/python-reference.html) that enables for experimentation whilst ensuring high performance and fast development workflows.
```python
# 1. Create Kompute Manager with default settings (device 0 and first compute compatible queue)
mgr = Manager()
# 2. Create and initialise Kompute Tensors (can be initialized with List[] or np.Array)
tensor_in_a = Tensor([2, 2, 2])
tensor_in_b = Tensor([1, 2, 3])
tensor_out = Tensor([0, 0, 0])
def kompute(shader):
# 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)
mgr = kp.Manager()
mgr.eval_tensor_create_def([tensor_in_a, tensor_in_b, tensor_out])
# 2. Create and initialise Kompute Tensors through manager
# 3. Specify "multiply shader" code (can also be raw string, spir-v bytes or file path)
@python2shader
def compute_shader_multiply(index=("input", "GlobalInvocationId", ivec3),
data1=("buffer", 0, Array(f32)),
data2=("buffer", 1, Array(f32)),
data3=("buffer", 2, Array(f32))):
i = index.x
data3[i] = data1[i] * data2[i]
# Default tensor constructor simplifies creation of float values
tensor_in_a = mgr.tensor([2, 2, 2])
tensor_in_b = mgr.tensor([1, 2, 3])
# Explicit type constructor supports uint32, int32, double, float and bool
tensor_out_a = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))
tensor_out_b = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))
# 4. Run multiplication operation synchronously
mgr.eval_algo_data_def(
[tensor_in_a, tensor_in_b, tensor_out], compute_shader_multiply.to_spirv())
params = [tensor_in_a, tensor_in_b, tensor_out_a, tensor_out_b]
# 5. Map results back from GPU memory to print the results
mgr.eval_tensor_sync_local_def([tensor_out])
# 3. Create algorithm based on shader (supports buffers & push/spec constants)
workgroup = (3, 1, 1)
spec_consts = [2]
push_consts_a = [2]
push_consts_b = [3]
spirv = kp.Shader.compile_source(shader)
algo = mgr.algorithm(params, spirv, workgroup, spec_consts, push_consts_a)
# 4. Run operation synchronously using sequence
(mgr.sequence()
.record(kp.OpTensorSyncDevice(params))
.record(kp.OpAlgoDispatch(algo)) # Binds default push consts provided
.eval() # evaluates the two recorded ops
.record(kp.OpAlgoDispatch(algo, push_consts_b)) # Overrides push consts
.eval()) # evaluates only the last recorded op
# 5. Sync results from the GPU asynchronously
sq = mgr.sequence()
sq.eval_async(kp.OpTensorSyncLocal(params))
# ... Do other work asynchronously whilst GPU finishes
sq.eval_await()
# Prints the first output which is: { 4, 8, 12 }
print(tensor_out_a)
# Prints the first output which is: { 10, 10, 10 }
print(tensor_out_b)
if __name__ == "__main__":
# Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header
# files). This shader shows some of the main components including constants, buffers, etc
shader = """
#version 450
layout (local_size_x = 1) in;
// The input tensors bind index is relative to index in parameter passed
layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };
layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };
layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };
layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };
// Kompute supports push constants updated on dispatch
layout(push_constant) uniform PushConstants {
float val;
} push_const;
// Kompute also supports spec constants on initalization
layout(constant_id = 0) const float const_one = 0;
void main() {
uint index = gl_GlobalInvocationID.x;
out_a[index] += uint( in_a[index] * in_b[index] );
out_b[index] += uint( const_one * push_const.val );
}
"""
kompute(shader)
# Prints [2.0, 4.0, 6.0]
print(tensor_out.data())
```
### Interactive Notebooks & Hands on Videos
@ -199,7 +297,7 @@ The core architecture of Kompute includes the following:
* [Kompute Sequence](https://kompute.cc/overview/reference.html#sequence) - Container of operations that can be sent to GPU as batch
* [Kompute Operation (Base)](https://kompute.cc/overview/reference.html#algorithm) - Base class from which all operations inherit
* [Kompute Tensor](https://kompute.cc/overview/reference.html#tensor) - Tensor structured data used in GPU operations
* [Kompute Algorithm](https://kompute.cc/overview/reference.html#algorithm) - Abstraction for (shader) code executed in the GPU
* [Kompute Algorithm](https://kompute.cc/overview/reference.html#algorithm) - Abstraction for (shader) logic executed in the GPU
To see a full breakdown you can read further in the [C++ Class Reference](https://kompute.cc/overview/reference.html).
@ -342,6 +440,12 @@ We appreciate PRs and Issues. If you want to contribute try checking the "Good f
* Uses doxygen and sphinx for documentation and autodocs
* Uses vcpkg for finding the dependencies, it's the recommended set up to retrieve the libraries
If you want to run with debug layers you can add them with the `KOMPUTE_ENV_DEBUG_LAYERS` parameter as:
```
export KOMPUTE_ENV_DEBUG_LAYERS="VK_LAYER_LUNARG_api_dump"
```
##### Updating documentation
To update the documentation you will need to:

View file

@ -1 +1 @@
0.6.0
0.7.0

View file

@ -46,6 +46,9 @@ a:hover {
.md-nav__item a:hover {
color: #0091ea;
}
.md-nav__item a[data-md-state="blur"] {
color: #1a7c80;
}
.md-source {
color: #fff;

View file

@ -27,7 +27,7 @@ html_title = "Vulkan Kompute Documentation (Python & C++)"
author = 'Alejandro Saucedo'
# The full version, including alpha/beta/rc tags
release = '0.6.0'
release = '0.7.0'
# -- General configuration ---------------------------------------------------

Binary file not shown.

Before

Width:  |  Height:  |  Size: 262 KiB

After

Width:  |  Height:  |  Size: 214 KiB

Before After
Before After

View file

@ -10,13 +10,9 @@ The power of Kompute comes in when the interface is used for complex computation
Simple examples
^^^^^^^^^^^^^^^
* `Pass shader as raw string <#simple-shader-example>`_
* `Record batch commands with a Kompute Sequence <#record-batch-commands>`_
* `Create your custom Kompute Operations <#your-custom-kompute-operation>`_
* `Run Asynchronous Operations <#asynchronous-operations>`_
* `Run Parallel Operations Across Multiple GPU Queues <#parallel-operations>`_
* `Create your custom Kompute Operations <#your-custom-kompute-operation>`_
* `Implementing logistic regression from scratch <#logistic-regression-example>`_
End-to-end examples
^^^^^^^^^^^^^^^^^^^
@ -27,270 +23,63 @@ End-to-end examples
* `Android NDK Mobile Kompute ML Application <https://towardsdatascience.com/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617>`_
* `Game Development Kompute ML in Godot Engine <https://towardsdatascience.com/supercharging-game-development-with-gpu-accelerated-ml-using-vulkan-kompute-the-godot-game-engine-4e75a84ea9f0>`_
Add Vulkan Extensions
^^^^^^^^^^^^^^^^^^^^
Simple Shader Example
~~~~~~~~~~~~~~~~~~~~~
Kompute provides a simple way to add Vulkan extensions through kp::Manager initialisation. When debug is enabled you will be able to see logs that show what are the desired extensions requested and the ones that are added based on the available extensions on the current driver.
Pass compute shader data in glsl/hlsl text or compiled SPIR-V format (or as path to the file). Back to `examples list <#simple-examples>`_.
.. code-block:: cpp
:linenos:
int main() {
// You can allow Kompute to create the Vulkan components, or pass your existing ones
kp::Manager mgr; // Selects device 0 unless explicitly requested
// Creates tensor an initializes GPU memory (below we show more granularity)
auto tensorA = std::make_shared<kp::Tensor>(kp::Tensor({ 3., 4., 5. }));
auto tensorB = std::make_shared<kp::Tensor>(kp::Tensor({ 0., 0., 0. }));
// Create tensors data explicitly in GPU with an operation
mgr.rebuild({ tensorA, tensorB });
// Define your shader as a string (using string literals for simplicity)
// (You can also pass the raw compiled bytes, or even path to file)
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
layout(set = 0, binding = 1) buffer b { float pb[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pb[index] = pa[index];
pa[index] = index;
}
)");
// Run Kompute operation on the parameters provided with dispatch layout
mgr.evalOpDefault<kp::OpAlgoBase>(
{ tensorA, tensorB },
kp::Shader::compile_source(shader));
// Sync the GPU memory back to the local tensor
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB });
// Prints the output which is A: { 0, 1, 2 } B: { 3, 4, 5 }
std::cout << fmt::format("A: {}, B: {}",
tensorA.data(), tensorB.data()) << std::endl;
}
Record batch commands
~~~~~~~~~~~~~~~~~~~~~
Record commands in a single submit by using a Sequence to send in batch to GPU. Back to `examples list <#simple-examples>`_
The example below shows how you can enable the "VK_EXT_shader_atomic_float" extension so we can use the adomicAdd for floats in the shaders.
.. code-block:: cpp
:linenos:
int main() {
std::string shader(R"(
#version 450
kp::Manager mgr;
#extension GL_EXT_shader_atomic_float: enable
std::shared_ptr<kp::Tensor> tensorLHS{ new kp::Tensor({ 1., 1., 1. }) };
std::shared_ptr<kp::Tensor> tensorRHS{ new kp::Tensor({ 2., 2., 2. }) };
std::shared_ptr<kp::Tensor> tensorOutput{ new kp::Tensor({ 0., 0., 0. }) };
layout(push_constant) uniform PushConstants {
float x;
float y;
float z;
} pcs;
// Create all the tensors in memory
mgr.evalOpDefault<kp::OpCreateTensor>({tensorLHS, tensorRHS, tensorOutput});
layout (local_size_x = 1) in;
// Create a new sequence
std::weak_ptr<kp::Sequence> sqWeakPtr = mgr.sequence();
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
atomicAdd(pa[0], pcs.x);
atomicAdd(pa[1], pcs.y);
atomicAdd(pa[2], pcs.z);
})");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
std::shared_ptr<kp::Sequence> sq = nullptr;
if (std::shared_ptr<kp::Sequence> sq = sqWeakPtr.lock())
{
// Begin recording commands
sq->begin();
kp::Manager mgr(0, {}, { "VK_EXT_shader_atomic_float" });
// Record batch commands to send to GPU
sq->record<kp::OpMult>({ tensorLHS, tensorRHS, tensorOutput });
sq->record<kp::OpTensorCopy>({tensorOutput, tensorLHS, tensorRHS});
std::shared_ptr<kp::Tensor> tensor = mgr.tensor({ 0, 0, 0 });
// Stop recording
sq->end();
std::shared_ptr<kp::Algorithm> algo =
mgr.algorithm({ tensor }, spirv, kp::Workgroup({ 1 }), {}, { 0.0, 0.0, 0.0 });
// Submit multiple batch operations to GPU
size_t ITERATIONS = 5;
for (size_t i = 0; i < ITERATIONS; i++) {
sq->eval();
}
sq = mgr.sequence()
->record<kp::OpTensorSyncDevice>({ tensor })
->record<kp::OpAlgoDispatch>(algo,
kp::Constants{ 0.1, 0.2, 0.3 })
->record<kp::OpAlgoDispatch>(algo,
kp::Constants{ 0.3, 0.2, 0.1 })
->record<kp::OpTensorSyncLocal>({ tensor })
->eval();
// Sync GPU memory back to local tensor
sq->begin();
sq->record<kp::OpTensorSyncLocal>({tensorOutput});
sq->end();
sq->eval();
EXPECT_EQ(tensor->data(), kp::Constants({ 0.4, 0.4, 0.4 }));
}
// Print the output which iterates through OpMult 5 times
// in this case the output is {32, 32 , 32}
std::cout << fmt::format("Output: {}", tensorOutput.data()) << std::endl;
}
Asynchronous Operations
~~~~~~~~~~~~~~~~~~~~~~~
You can submit operations asynchronously with the async/await commands in the kp::Manager and kp::Sequence, which provides granularity on waiting on the vk::Fence. Back to `examples list <#simple-examples>`_
.. code-block:: cpp
:linenos:
int main() {
// You can allow Kompute to create the Vulkan components, or pass your existing ones
kp::Manager mgr; // Selects device 0 unless explicitly requested
// Creates tensor an initializes GPU memory (below we show more granularity)
auto tensor = std::make_shared<kp::Tensor>(kp::Tensor(std::vector<float>(10, 0.0)));
// Create tensors data explicitly in GPU with an operation
mgr.rebuild(tensor)
// Define your shader as a string (using string literals for simplicity)
// (You can also pass the raw compiled bytes, or even path to file)
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer b { float pb[]; };
shared uint sharedTotal[1];
void main() {
uint index = gl_GlobalInvocationID.x;
sharedTotal[0] = 0;
// Iterating to simulate longer process
for (int i = 0; i < 100000000; i++)
{
atomicAdd(sharedTotal[0], 1);
}
pb[index] = sharedTotal[0];
}
)");
std::vector<uint32_t> spirv = kp::Shader::compile_source(shader);
// We can now await for the previous submitted command
// The first parameter can be the amount of time to wait
// The time provided is in nanoseconds
mgr.evalOpAwaitDefault(10000);
// Run Async Kompute operation on the parameters provided
mgr.evalOpAsyncDefault<kp::OpAlgoBase>(
{ tensor },
spirv);
// Here we can do other work
// When we're ready we can wait
// The default wait time is UINT64_MAX
mgr.evalOpAwaitDefault()
// Sync the GPU memory back to the local tensor
// We can still run synchronous jobs in our created sequence
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensor });
// Prints the output: B: { 100000000, ... }
std::cout << fmt::format("B: {}",
tensor.data()) << std::endl;
}
Parallel Operations
~~~~~~~~~~~~~~~~~~~
Besides being able to submit asynchronous operations, you can also leverage the underlying GPU compute queues to process operations in parallel.
This will depend on your underlying graphics card, but for example in NVIDIA graphics cards the operations submitted across queues in one family are not parallelizable, but operations submitted across queueFamilies can be parallelizable.
Below we show how you can parallelize operations in an `NVIDIA 1650 <http://vulkan.gpuinfo.org/displayreport.php?id=9700#queuefamilies>`_\ , which has a ``GRAPHICS+COMPUTE`` family on ``index 0``\ , and ``COMPUTE`` family on ``index 2``.
Back to `examples list <#simple-examples>`_.
.. code-block:: cpp
:linenos:
int main() {
// In this case we select device 0, and for queues, one queue from familyIndex 0
// and one queue from familyIndex 2
uint32_t deviceIndex(0);
std::vector<uint32_t> familyIndices = {0, 2};
// We create a manager with device index, and queues by queue family index
kp::Manager mgr(deviceIndex, familyIndices);
// We need to create explicit sequences with their respective queues
// The second parameter is the index in the familyIndex array which is relative
// to the vector we created the manager with.
mgr.sequence("queueOne", 0);
mgr.sequence("queueTwo", 1);
// Creates tensor an initializes GPU memory (below we show more granularity)
auto tensorA = std::make_shared<kp::Tensor>(kp::Tensor(std::vector<float>(10, 0.0)));
auto tensorB = std::make_shared<kp::Tensor>(kp::Tensor(std::vector<float>(10, 0.0)));
// We run the first step synchronously on the default sequence
mgr.rebuild({ tensorA, tensorB });
// Define your shader as a string (using string literals for simplicity)
// (You can also pass the raw compiled bytes, or even path to file)
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer b { float pb[]; };
shared uint sharedTotal[1];
void main() {
uint index = gl_GlobalInvocationID.x;
sharedTotal[0] = 0;
// Iterating to simulate longer process
for (int i = 0; i < 100000000; i++)
{
atomicAdd(sharedTotal[0], 1);
}
pb[index] = sharedTotal[0];
}
)");
std::vector<uint32_t> spirv = kp::Shader::compile_source(shader);
// Run the first parallel operation in the `queueOne` sequence
mgr.evalOpAsync<kp::OpAlgoBase>(
{ tensorA },
"queueOne",
spirv);
// Run the second parallel operation in the `queueTwo` sequence
mgr.evalOpAsync<kp::OpAlgoBase>(
{ tensorB },
"queueTwo",
spirv);
// Here we can do other work
// We can now wait for the two parallel tasks to finish
mgr.evalOpAwait("queueOne")
mgr.evalOpAwait("queueTwo")
// Sync the GPU memory back to the local tensor
mgr.evalOp<kp::OpTensorSyncLocal>({ tensorA, tensorB });
// Prints the output: A: 100000000 B: 100000000
std::cout << fmt::format("A: {}, B: {}",
tensorA.data()[0], tensorB.data()[0]) << std::endl;
}
Your Custom Kompute Operation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -302,17 +91,47 @@ We also provide tools that allow you to `convert shaders into C++ headers <https
.. code-block:: cpp
:linenos:
class OpMyCustom : public OpAlgoBase
class OpMyCustom : public OpAlgoDispatch
{
public:
OpMyCustom(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors)
: OpAlgoBase(physicalDevice, device, commandBuffer, tensors, "")
OpMyCustom(std::vector<std::shared_ptr<Tensor>> tensors,
std::shared_ptr<kp::Algorithm> algorithm)
: OpAlgoBase(algorithm)
{
// Perform your custom steps such as reading from a shader file
this->mShaderFilePath = "shaders/glsl/opmult.comp.spv";
if (tensors.size() != 3) {
throw std::runtime_error("Kompute OpMult expected 3 tensors but got " + tensors.size());
}
std::vector<uint32_t> spirv = kp::Shader::compileSource(R"(
#version 450
layout(set = 0, binding = 0) buffer tensorLhs {
float valuesLhs[ ];
};
layout(set = 0, binding = 1) buffer tensorRhs {
float valuesRhs[ ];
};
layout(set = 0, binding = 2) buffer tensorOutput {
float valuesOutput[ ];
};
layout (constant_id = 0) const uint LEN_LHS = 0;
layout (constant_id = 1) const uint LEN_RHS = 0;
layout (constant_id = 2) const uint LEN_OUT = 0;
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
void main()
{
uint index = gl_GlobalInvocationID.x;
valuesOutput[index] = valuesLhs[index] * valuesRhs[index];
}
)");
algorithm->rebuild(tensors, spirv);
}
}
@ -322,19 +141,260 @@ We also provide tools that allow you to `convert shaders into C++ headers <https
kp::Manager mgr; // Automatically selects Device 0
// Create 3 tensors of default type float
auto tensorLhs = std::make_shared<kp::Tensor>(kp::Tensor({ 0., 1., 2. }));
auto tensorRhs = std::make_shared<kp::Tensor>(kp::Tensor({ 2., 4., 6. }));
auto tensorOut = std::make_shared<kp::Tensor>(kp::Tensor({ 0., 0., 0. }));
auto tensorLhs = mgr.tensor({ 0., 1., 2. });
auto tensorRhs = mgr.tensor({ 2., 4., 6. });
auto tensorOut = mgr.tensor({ 0., 0., 0. });
// Create tensors data explicitly in GPU with an operation
mgr.rebuild({ tensorLhs, tensorRhs, tensorOut });
// Run Kompute operation on the parameters provided with dispatch layout
mgr.evalOpDefault<kp::OpMyCustom<3, 1, 1>>(
{ tensorLhs, tensorRhs, tensorOut });
mgr.sequence()
->record<kp::OpTensorSyncDevice>({tensorLhs, tensorRhs, tensorOut})
->record<kp::OpMyCustom>({tensorLhs, tensorRhs, tensorOut}, mgr.algorithm())
->record<kp::OpTensorSyncLocal>({tensorLhs, tensorRhs, tensorOut})
->eval();
// Prints the output which is { 0, 4, 12 }
std::cout << fmt::format("Output: {}", tensorOutput.data()) << std::endl;
}
Async/Await Example
^^^^^^^^^^^^^^^^^^^^^
A simple example of asynchronous submission can be found below.
First we are able to create the manager as we normally would.
.. code-block:: cpp
:linenos:
// You can allow Kompute to create the Vulkan components, or pass your existing ones
kp::Manager mgr; // Selects device 0 unless explicitly requested
// Creates tensor an initializes GPU memory (below we show more granularity)
auto tensor = mgr.tensor(10, 0.0);
We can now run our first asynchronous command, which in this case we can use the default sequence.
Sequences can be executed in synchronously or asynchronously without having to change anything.
.. code-block:: cpp
:linenos:
// Create tensors data explicitly in GPU with an operation
mgr.sequence()->eval<kp::OpTensorSyncDevice>({tensor});
While this is running we can actually do other things like in this case create the shader we'll be using.
In this case we create a shader that should take a couple of milliseconds to run.
.. code-block:: cpp
:linenos:
// Define your shader as a string (using string literals for simplicity)
// (You can also pass the raw compiled bytes, or even path to file)
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer b { float pb[]; };
shared uint sharedTotal[1];
void main() {
uint index = gl_GlobalInvocationID.x;
sharedTotal[0] = 0;
// Iterating to simulate longer process
for (int i = 0; i < 100000000; i++)
{
atomicAdd(sharedTotal[0], 1);
}
pb[index] = sharedTotal[0];
}
)");
auto algo = mgr.algorithm({tensor}, kp::Shader::compileSource(shader));
Now we are able to run the await function on the default sequence.
If we are using the manager, we need to make sure that we are awaiting the same named sequence that was triggered asynchronously.
If the sequence is not running or has finished running, it would return immediately.
The parameter provided is the maximum amount of time to wait in nanoseconds. When the timeout expires, the sequence would return (with false value), but it does not stop the processing in the GPU - the processing would continue as normal.
.. code-block:: cpp
:linenos:
auto sq = mgr.sequence()
// Run Async Kompute operation on the parameters provided
sq->evalAsync<kp::OpAlgoDispatch>(algo);
// Here we can do other work
// When we're ready we can wait
// The default wait time is UINT64_MAX
sq.evalAwait()
Finally, below you can see that we can also run syncrhonous commands without having to change anything.
.. code-block:: cpp
:linenos:
// Sync the GPU memory back to the local tensor
// We can still run synchronous jobs in our created sequence
sq.eval<kp::OpTensorSyncLocal>({ tensor });
// Prints the output: B: { 100000000, ... }
std::cout << fmt::format("B: {}",
tensor.data()) << std::endl;
Parallel Operation Submission
-----------
In order to work with parallel execution of tasks, it is important that you understand some of the core GPU processing limitations, as these can be quite broad and hardware dependent, which means they will vary across NVIDIA / AMD / ETC video cards.
Conceptual Overview
^^^^^^^^^^^^^^^^^^^^^
If you are familiar with Vulkan, you will have experience that the first few things you do is fetching the physical Queues from the device. The queues themselves tend to have three main particular features - they can be GRAPHICS, TRANSFER and COMPUTE (between a few others we'll skip for simplicity).
Queues can have multiple properties - namely a queue can be of type GRAPHICS+TRANSFER+COMPUTE, etc. Now here comes the key point: the underlying hardware may (or may not) support parallelized processing at multiple levels.
Let's take a tangible example. The [NVIDIA 1650](http://vulkan.gpuinfo.org/displayreport.php?id=9700#queuefamilies) for example has 16 `GRAPHICS+TRANSFER+COMPUTE` queues on `familyIndex 0`, then 2 `TRANSFER` queues in `familyIndex 1` and finally 8 `COMPUTE+TRANSFER` queues in `familyIndex 2`.
With this in mind, the NVIDIA 1650 as of today does not support intra-family parallelization, which means that if you were to submit commands in multiple queues of the same family, these would still be exectured synchronously.
However the NVIDIA 1650 does support inter-family parallelization, which means that if we were to submit commands across multiple queues from different families, these would execute in parallel.
This means that we would be able to execute parallel workloads as long as we're running them across multiple queue families. This is one of the reasons why Vulkan Kompute enables users to explicitly select the underlying queues and queue families to run particular workloads on.
It is important that you understand what are the capabilities and limitations of your hardware, as parallelization capabilities can vary, so you will want to make sure you account for potential discrepancies in processing structures, mainyl to avoid undesired/unexpected race conditions.
Parallel Execution Example
^^^^^^^^^^^^^^^^^^^^^
In this example we will demonstrate how you can set up parallel processing across two compute families to achieve 2x speedups when running processing workloads.
To start, you will see that we do have to create the manager with extra parameters. This includes the GPU device index we want to use, together with the array of the queues that we want to enable.
In this case we are using only two queues, which as per the section above, these would be familyIndex 0 which is of type `GRAPHICS+COMPUTE+TRANSFER` and familyIndex 2 which is of type `COMPUTE+TRANSFER`.
In this case based on the specifications of the NVIDIA 1650 we could define up to 16 graphics queues (familyIndex 0), 2 transfer queues (familyIndex 1), and 8 compute queues (familyIndex 2) in no particular order. This means that we could have something like `{ 0, 1, 1, 2, 2, 2, 0, ... }` as our initialization value.
You will want to keep track of the indices you initialize your manager, as you will be referring back to this ordering when creating sequences with particular queues.
.. code-block:: cpp
:linenos:
// In this case we select device 0, and for queues, one queue from familyIndex 0
// and one queue from familyIndex 2
uint32_t deviceIndex(0);
std::vector<uint32_t> familyIndices = {0, 2};
// We create a manager with device index, and queues by queue family index
kp::Manager mgr(deviceIndex, familyIndices);
We are now able to create sequences with a particular queue.
By default the Kompute Manager is created with device 0, and with a single queue of the first compatible familyIndex. Similarly, by default sequences are created with the first available queue.
In this case we are able to specify which queue we want to use. Below we initialize "queueOne" named sequence with the graphics family queue, and "queueTwo" with the compute family queue.
It's worth mentioning you can have multiple sequences referencing the same queue.
.. code-block:: cpp
:linenos:
// We need to create explicit sequences with their respective queues
// The second parameter is the index in the familyIndex array which is relative
// to the vector we created the manager with.
sqOne = mgr.sequence(0);
sqTwo = mgr.sequence(1);
We create the tensors without modifications.
.. code-block:: cpp
:linenos:
// Creates tensor an initializes GPU memory (below we show more granularity)
auto tensorA = mgr.tensor({ 10, 0.0 });
auto tensorB = mgr.tensor({ 10, 0.0 });
// Copies the data into GPU memory
mgr.sequence().eval<kp::OpTensorSyncDevice>({tensorA tensorB});
Similar to the asyncrhonous usecase above, we can still run synchronous commands without modifications.
.. code-block:: cpp
:linenos:
// Define your shader as a string (using string literals for simplicity)
// (You can also pass the raw compiled bytes, or even path to file)
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer b { float pb[]; };
shared uint sharedTotal[1];
void main() {
uint index = gl_GlobalInvocationID.x;
sharedTotal[0] = 0;
// Iterating to simulate longer process
for (int i = 0; i < 100000000; i++)
{
atomicAdd(sharedTotal[0], 1);
}
pb[index] = sharedTotal[0];
}
)");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
std::shared_ptr<kp::Algorithm> algo = mgr.algorithm({tensorA, tenssorB}, spirv);
Now we can actually trigger the parallel processing, running two OpAlgoBase Operations - each in a different sequence / queue.
.. code-block:: cpp
:linenos:
// Run the first parallel operation in the `queueOne` sequence
sqOne->evalAsync<kp::OpAlgoDispatch>(algo);
// Run the second parallel operation in the `queueTwo` sequence
sqTwo->evalAsync<kp::OpAlgoDispatch>(algo);
Similar to the asynchronous example above, we are able to do other work whilst the tasks are executing.
We are able to wait for the tasks to complete by triggering the `evalOpAwait` on the respective sequence.
.. code-block:: cpp
:linenos:
// Here we can do other work
// We can now wait for the two parallel tasks to finish
sqOne.evalOpAwait()
sqTwo.evalOpAwait()
// Sync the GPU memory back to the local tensor
mgr.sequence()->eval<kp::OpTensorSyncLocal>({ tensorA, tensorB });
// Prints the output: A: 100000000 B: 100000000
std::cout << fmt::format("A: {}, B: {}",
tensorA.data()[0], tensorB.data()[0]) << std::endl;

View file

@ -40,257 +40,8 @@ One important thing to bare in mind when using asynchronous submissions, is that
The reason why this is important is that the Await function not only waits for the fence, but also runs the `postEval` functions across all operations, which is required for several operations.
Async/Await Example
^^^^^^^^^^^^^^^^^^^^^
Async and Parallel Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A simple example of asynchronous submission can be found below.
First we are able to create the manager as we normally would.
.. code-block:: cpp
:linenos:
// You can allow Kompute to create the Vulkan components, or pass your existing ones
kp::Manager mgr; // Selects device 0 unless explicitly requested
// Creates tensor an initializes GPU memory (below we show more granularity)
auto tensor = std::make_shared<kp::Tensor>(kp::Tensor(std::vector<float>(10, 0.0)));
We can now run our first asynchronous command, which in this case we can use the default sequence.
Sequences can be executed in synchronously or asynchronously without having to change anything.
.. code-block:: cpp
:linenos:
// Create tensors data explicitly in GPU with an operation
mgr.rebuild({ tensor });
While this is running we can actually do other things like in this case create the shader we'll be using.
In this case we create a shader that should take a couple of milliseconds to run.
.. code-block:: cpp
:linenos:
// Define your shader as a string (using string literals for simplicity)
// (You can also pass the raw compiled bytes, or even path to file)
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer b { float pb[]; };
shared uint sharedTotal[1];
void main() {
uint index = gl_GlobalInvocationID.x;
sharedTotal[0] = 0;
// Iterating to simulate longer process
for (int i = 0; i < 100000000; i++)
{
atomicAdd(sharedTotal[0], 1);
}
pb[index] = sharedTotal[0];
}
)");
Now we are able to run the await function on the default sequence.
If we are using the manager, we need to make sure that we are awaiting the same named sequence that was triggered asynchronously.
If the sequence is not running or has finished running, it would return immediately.
The parameter provided is the maximum amount of time to wait in nanoseconds. When the timeout expires, the sequence would return (with false value), but it does not stop the processing in the GPU - the processing would continue as normal.
.. code-block:: cpp
:linenos:
// We can now await for the previous submitted command
// The first parameter can be the amount of time to wait
// The time provided is in nanoseconds
mgr.evalOpAwaitDefault(10000);
Similar to above we can run other commands such as the `OpAlgoBase` asynchronously.
.. code-block:: cpp
:linenos:
// Run Async Kompute operation on the parameters provided
mgr.evalOpAsyncDefault<kp::OpAlgoBase<>>(
{ tensor },
kp::Shader::compile_source(shader));
// Here we can do other work
// When we're ready we can wait
// The default wait time is UINT64_MAX
mgr.evalOpAwaitDefault()
Finally, below you can see that we can also run syncrhonous commands without having to change anything.
.. code-block:: cpp
:linenos:
// Sync the GPU memory back to the local tensor
// We can still run synchronous jobs in our created sequence
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensor });
// Prints the output: B: { 100000000, ... }
std::cout << fmt::format("B: {}",
tensor.data()) << std::endl;
Parallel Operation Submission
-----------
In order to work with parallel execution of tasks, it is important that you understand some of the core GPU processing limitations, as these can be quite broad and hardware dependent, which means they will vary across NVIDIA / AMD / ETC video cards.
Conceptual Overview
^^^^^^^^^^^^^^^^^^^^^
If you are familiar with Vulkan, you will have experience that the first few things you do is fetching the physical Queues from the device. The queues themselves tend to have three main particular features - they can be GRAPHICS, TRANSFER and COMPUTE (between a few others we'll skip for simplicity).
Queues can have multiple properties - namely a queue can be of type GRAPHICS+TRANSFER+COMPUTE, etc. Now here comes the key point: the underlying hardware may (or may not) support parallelized processing at multiple levels.
Let's take a tangible example. The [NVIDIA 1650](http://vulkan.gpuinfo.org/displayreport.php?id=9700#queuefamilies) for example has 16 `GRAPHICS+TRANSFER+COMPUTE` queues on `familyIndex 0`, then 2 `TRANSFER` queues in `familyIndex 1` and finally 8 `COMPUTE+TRANSFER` queues in `familyIndex 2`.
With this in mind, the NVIDIA 1650 as of today does not support intra-family parallelization, which means that if you were to submit commands in multiple queues of the same family, these would still be exectured synchronously.
However the NVIDIA 1650 does support inter-family parallelization, which means that if we were to submit commands across multiple queues from different families, these would execute in parallel.
This means that we would be able to execute parallel workloads as long as we're running them across multiple queue families. This is one of the reasons why Vulkan Kompute enables users to explicitly select the underlying queues and queue families to run particular workloads on.
It is important that you understand what are the capabilities and limitations of your hardware, as parallelization capabilities can vary, so you will want to make sure you account for potential discrepancies in processing structures, mainyl to avoid undesired/unexpected race conditions.
Parallel Execution Example
^^^^^^^^^^^^^^^^^^^^^
In this example we will demonstrate how you can set up parallel processing across two compute families to achieve 2x speedups when running processing workloads.
To start, you will see that we do have to create the manager with extra parameters. This includes the GPU device index we want to use, together with the array of the queues that we want to enable.
In this case we are using only two queues, which as per the section above, these would be familyIndex 0 which is of type `GRAPHICS+COMPUTE+TRANSFER` and familyIndex 2 which is of type `COMPUTE+TRANSFER`.
In this case based on the specifications of the NVIDIA 1650 we could define up to 16 graphics queues (familyIndex 0), 2 transfer queues (familyIndex 1), and 8 compute queues (familyIndex 2) in no particular order. This means that we could have something like `{ 0, 1, 1, 2, 2, 2, 0, ... }` as our initialization value.
You will want to keep track of the indices you initialize your manager, as you will be referring back to this ordering when creating sequences with particular queues.
.. code-block:: cpp
:linenos:
// In this case we select device 0, and for queues, one queue from familyIndex 0
// and one queue from familyIndex 2
uint32_t deviceIndex(0);
std::vector<uint32_t> familyIndices = {0, 2};
// We create a manager with device index, and queues by queue family index
kp::Manager mgr(deviceIndex, familyIndices);
We are now able to create sequences with a particular queue.
By default the Kompute Manager is created with device 0, and with a single queue of the first compatible familyIndex. Similarly, by default sequences are created with the first available queue.
In this case we are able to specify which queue we want to use. Below we initialize "queueOne" named sequence with the graphics family queue, and "queueTwo" with the compute family queue.
It's worth mentioning you can have multiple sequences referencing the same queue.
.. code-block:: cpp
:linenos:
// We need to create explicit sequences with their respective queues
// The second parameter is the index in the familyIndex array which is relative
// to the vector we created the manager with.
mgr.sequence("queueOne", 0);
mgr.sequence("queueTwo", 1);
We create the tensors without modifications.
.. code-block:: cpp
:linenos:
// Creates tensor an initializes GPU memory (below we show more granularity)
auto tensorA = std::make_shared<kp::Tensor>(kp::Tensor(std::vector<float>(10, 0.0)));
auto tensorB = std::make_shared<kp::Tensor>(kp::Tensor(std::vector<float>(10, 0.0)));
Similar to the asyncrhonous usecase above, we can still run synchronous commands without modifications.
.. code-block:: cpp
:linenos:
// We run the first step synchronously on the default sequence
mgr.rebuild({ tensorA, tensorB });
// Define your shader as a string (using string literals for simplicity)
// (You can also pass the raw compiled bytes, or even path to file)
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer b { float pb[]; };
shared uint sharedTotal[1];
void main() {
uint index = gl_GlobalInvocationID.x;
sharedTotal[0] = 0;
// Iterating to simulate longer process
for (int i = 0; i < 100000000; i++)
{
atomicAdd(sharedTotal[0], 1);
}
pb[index] = sharedTotal[0];
}
)");
Now we can actually trigger the parallel processing, running two OpAlgoBase Operations - each in a different sequence / queue.
.. code-block:: cpp
:linenos:
std::vector<uint32_t> spirv = kp::Shader::compile_source(shader);
// Run the first parallel operation in the `queueOne` sequence
mgr.evalOpAsync<kp::OpAlgoBase<>>(
{ tensorA },
"queueOne",
spirv);
// Run the second parallel operation in the `queueTwo` sequence
mgr.evalOpAsync<kp::OpAlgoBase<>>(
{ tensorB },
"queueTwo",
spirv);
Similar to the asynchronous example above, we are able to do other work whilst the tasks are executing.
We are able to wait for the tasks to complete by triggering the `evalOpAwait` on the respective sequence.
.. code-block:: cpp
:linenos:
// Here we can do other work
// We can now wait for the two parallel tasks to finish
mgr.evalOpAwait("queueOne")
mgr.evalOpAwait("queueTwo")
// Sync the GPU memory back to the local tensor
mgr.evalOp<kp::OpTensorSyncLocal>({ tensorA, tensorB });
// Prints the output: A: 100000000 B: 100000000
std::cout << fmt::format("A: {}, B: {}",
tensorA.data()[0], tensorB.data()[0]) << std::endl;
We have added a set of examples for asynchronous and parallel processing examples in the `Advanced Examples documentation page <advanced-examples.rst>`_

View file

@ -33,6 +33,10 @@ This by default configures without any of the extra build tasks (such as buildin
- Disables the install step in the cmake file (useful for android build)
* - -DKOMPUTE_OPT_ANDROID_BUILD=1
- Enables android build which includes and excludes relevant libraries
* - -DKOMPUTE_OPT_DEPENDENCIES_SHARED_LIBS=1
- Ensures dependencies are referenced as shared libraries for kompute install
* - -DKOMPUTE_OPT_BUILD_AS_SHARED_LIB=1
- Whether to build Kompute as shared lib instead of static
Compile Flags

View file

@ -81,6 +81,7 @@ Performing Release
In order to perform the release the following steps need to be carried out:
* Build changelog
* Create branch called `v<VERSION>-release`
* Generate latest changelog `make build_changelog`
* Update latest tag in new CHANGELOG.md to be the vesion to release
* Python Release
@ -98,7 +99,3 @@ In order to perform the release the following steps need to be carried out:
* Ensure all tests pass in GPU and CPU: `python -m pytest`
```
```

View file

@ -39,74 +39,19 @@ Below you
Simple Operation Extending OpAlgoBase
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Below we show a very simple example that enables you to create an operation with a pre-specified shader. In this case it is the multiplication shader.
You can find an example in the `Advanced Examples documentation section <advanced-examples.rst>`_ that shows how to create your own custom function.
.. code-block:: cpp
:linenos:
class OpMyCustom : public OpAlgoBase
{
public:
OpMyCustom(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors)
: OpAlgoBase(physicalDevice, device, commandBuffer, tensors, "")
{
// Perform your custom steps such as reading from a shader file
this->mShaderFilePath = "shaders/glsl/opmult.comp";
}
}
You can also see an implementation in the codebase through the `OpMult` class:
int main() {
kp::Manager mgr; // Automatically selects Device 0
// Create 3 tensors of default type float
auto tensorLhs = std::make_shared<kp::Tensor>(kp::Tensor({ 0., 1., 2. }));
auto tensorRhs = std::make_shared<kp::Tensor>(kp::Tensor({ 2., 4., 6. }));
auto tensorOut = std::make_shared<kp::Tensor>(kp::Tensor({ 0., 0., 0. }));
// Create tensors data explicitly in GPU with an operation
mgr.evalOpDefault<kp::OpTensorCreate>({ tensorLhs, tensorRhs, tensorOut });
// Run Kompute operation on the parameters provided with dispatch layout
mgr.evalOpDefault<kp::OpMyCustom>(
{ tensorLhs, tensorRhs, tensorOut });
// Prints the output which is { 0, 4, 12 }
std::cout << fmt::format("Output: {}", tensorOutput.data()) << std::endl;
}
More Complex Operation Extending OpAlgoBase
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Below we show a more complex operation that performs the following:
* Expects three tensors for an operation, two inputs and one output
* Expects the tensors to be initialised
* Checks that the tensors are of the same size
* Expects output tensor to be of type TensorTypes::eDevice (and creates staging tensor)
* Has functionality to read shader from file or directly from spirv bytes
* Records relevant bufferMemoryBarriers
* Records dispatch command
* Records copy command from device tensor to staging output tensor
* In postEval it maps data from staging tensor to output tensor's data
For starters, the header file contains the functions that will be overriden:
.. literalinclude:: ../../src/include/kompute/operations/OpAlgoLhsRhsOut.hpp
.. literalinclude:: ../../src/include/kompute/operations/OpMult.hpp
:language: cpp
Then the implementation outlines all the implementations that perform the actions above:
~~~~~~~~~~~~~~~~~~~
.. literalinclude:: ../../src/OpAlgoLhsRhsOut.cpp
.. literalinclude:: ../../src/OpMult.cpp
:language: cpp

View file

@ -4,18 +4,22 @@ Memory Management Principles
The principle in Vulkan Kompute on memory management is summarised as follows:
* Explicit is better than implicit for specifying memory management
* Interfaces for memory management are constant until freed
* Memory management responsibilities are acyclic from static object references
* Memory management by Kompute is optional and only in place if resource is created by Kompute
* Memory management ownership architecture are acyclic and with a single top manager
* Operations do not manage any GPU memory or resources
* Top level manager is main owner of GPU resources and removes all resources when destroyed
* Manager holds weak pointers to ensure that if object created outside is destroyed it's released
* Once a resource is destroyed it cannot be recreated
* Resources can only be rebuilt if they haven't been destroyed
Vulkan Kompute is responsible for managing both the CPU and GPU memory allocations and resources, and is important that they are able to explicitly define when these objects are released or destroyed. Similarly, it's important that the memory resources created by the application are released safely.
Vulkan Kompute is responsible for managing both the CPU and GPU memory allocations and resources that it creates, and is important that they are able to explicitly define when these objects are released or destroyed. Similarly, it's important that the memory resources created by the application are released safely.
Vulkan Kompute is built with the BYOV principle in mind (Bring your own Vulkan). This means that even though the top level resources are managing the memory to its owned resources, they themselves may not have full ownership of the GPU / Vulkan components themselves.
Vulkan Kompute is built with the BYOV principle in mind (Bring your own Vulkan). This means that even though the top level resources are managing the memory to its owned resources, they themselves may not have full ownership of the GPU / Vulkan components - this is in the case that you may want to use Kompute with an existing Vulkan enabled application, and may want to initialise Kompute components with existing Vulkan resources.
The memory ownership is hierarchically outlined in the component architecture - in this diagram, the arrows provide an intuition on the memory management ownership relationships (in this case you can ignore the arrow from the Algorithm, as this is the only one that as of today doesn't manage the memory of the Tensors).
The memory ownership is hierarchically outlined in the component architecture - in this diagram, the arrows provide an intuition on the memory management ownership relationships. It's worth mentioning that the memory relationship may be different to the way components interact with each other - for this, you can see the high level component overview. More specifically:
* The purple arrows denote GPU memory management
.. image:: ../images/kompute-architecture.jpg
.. image:: ../images/kompute-vulkan-architecture.jpg
:width: 100%
Optional Memory Management

View file

@ -14,17 +14,19 @@ Then you can interact with it from your interpreter. Below is the same sample as
.. code-block:: python
:linenos:
from kp import Manager, Tensor
from kp import Manager, Tensor, OpTensorSyncDevice, OpTensorSyncLocal, OpAlgoDispatch
from pyshader import python2shader, ivec3, f32, Array
mgr = Manager()
# Can be initialized with List[] or np.Array
tensor_in_a = Tensor([2, 2, 2])
tensor_in_b = Tensor([1, 2, 3])
tensor_out = Tensor([0, 0, 0])
tensor_in_a = mgr.tensor([2, 2, 2])
tensor_in_b = mgr.tensor([1, 2, 3])
tensor_out = mgr.tensor([0, 0, 0])
mgr.eval_tensor_create_def([tensor_in_a, tensor_in_b, tensor_out])
sq = mgr.sequence()
sq.eval(OpTensorSyncLocal([tensor_in_a, tensor_in_b, tensor_out]))
# Define the function via PyShader or directly as glsl string or spirv bytes
@python2shader
@ -35,15 +37,13 @@ Then you can interact with it from your interpreter. Below is the same sample as
i = index.x
data3[i] = data1[i] * data2[i]
algo = mgr.algorithm([tensor_in_a, tensor_in_b, tensor_out], compute_shader_multiply.to_spirv())
# Run shader operation synchronously
mgr.eval_algo_data_def(
[tensor_in_a, tensor_in_b, tensor_out], compute_shader_multiply.to_spirv())
sq.eval(OpAlgoDispatch(algo))
sq.eval(OpAlgoSyncLocal([tensor_out]))
mgr.eval_await_def()
mgr.eval_tensor_sync_local_def([tensor_out])
assert tensor_out.data() == [2.0, 4.0, 6.0]
assert tensor_out.data().tolist() == [2.0, 4.0, 6.0]
Python Example (Extended)
@ -55,6 +55,7 @@ Similarly you can find the same extended example as above:
:linenos:
from kp import Manager, Tensor
import kp
from pyshader import python2shader, ivec3, f32, Array
mgr = Manager(0, [2])
@ -77,20 +78,19 @@ Similarly you can find the same extended example as above:
i = index.x
data3[i] = data1[i] * data2[i]
# Run shader operation asynchronously and then await
mgr.eval_async_algo_data_def(
[tensor_in_a, tensor_in_b, tensor_out], compute_shader_multiply.to_spirv())
mgr.eval_await_def()
algo = mgr.algorithm([tensor_in_a, tensor_in_b, tensor_out], compute_shader_multiply.to_spirv())
seq.begin()
seq.record_tensor_sync_local([tensor_in_a])
seq.record_tensor_sync_local([tensor_in_b])
seq.record_tensor_sync_local([tensor_out])
seq.end()
# Run shader operation asynchronously and then await
mgr.eval_async(kp.OpAlgoDispatch(algo)))
mgr.eval_await()
seq.record(kp.OpTensorSyncLocal([tensor_in_a]))
seq.record(kp.OpTensorSyncLocal([tensor_in_b]))
seq.record(kp.OpTensorSyncLocal([tensor_out]))
seq.eval()
assert tensor_out.data() == [2.0, 4.0, 6.0]
assert tensor_out.data().tolist() == [2.0, 4.0, 6.0]
Kompute Operation Capabilities
^^^^^
@ -101,33 +101,29 @@ Handling multiple capabilites of processing can be done by compute shaders being
:linenos:
from kp import Manager
import kp
# We'll assume we have the shader data available
from my_spv_shader_data import mult_shader, sum_shader
mgr = Manager()
t1 = mgr.build_tensor([2,2,2])
t2 = mgr.build_tensor([1,2,3])
t3 = mgr.build_tensor([1,2,3])
t1 = mgr.tensor([2,2,2])
t2 = mgr.tensor([1,2,3])
t3 = mgr.tensor([1,2,3])
mgr.sequence().eval(kp.OpTensorSyncLocal([t1, t3]))
# Create multiple separate sequences
sq_mult = mgr.create_sequence("SQ_MULT")
sq_sum = mgr.create_sequence("SQ_SUM")
sq_sync = mgr.create_sequence("SQ_SYNC")
sq_mult = mgr.sequence()
sq_sum = mgr.sequence()
sq_sync = mgr.sequence()
# Initialize sq_mult
sq_mult.begin()
sq_mult.record_algo_data([t1, t2, t3], add_shader)
sq_mult.end()
sq_mult.record(kp.OpAlgoDispatch(mgr.algorithm([t1, t2, t3], add_shader))
sq_sum.begin()
sq_sum.record_algo_data([t3, t2, t1], sum_shader)
sq_sum.end()
sq_sum.record(kp.OpAlgoDispatch(mgr.algorithm([t3, t2, t1], sum_shader))
sq_sync.begin()
sq_sync.record_tensor_sync_local([t1, t3])
sq_sync.end()
sq_sync.record(kp.OpTensorSyncLocal([t1, t3]))
# Run multiple iterations
for i in range(10):
@ -147,6 +143,7 @@ Similar to the logistic regression implementation in the C++ examples section, b
:linenos:
from kp import Manager, Tensor
import kp
from pyshader import python2shader, ivec3, f32, Array
@python2shader
@ -189,38 +186,37 @@ Similar to the logistic regression implementation in the C++ examples section, b
l_out[i] = loss
mgr = Manager()
# First we create input and ouput tensors for shader
tensor_x_i = Tensor([0.0, 1.0, 1.0, 1.0, 1.0])
tensor_x_j = Tensor([0.0, 0.0, 0.0, 1.0, 1.0])
tensor_x_i = mgr.tensor([0.0, 1.0, 1.0, 1.0, 1.0])
tensor_x_j = mgr.tensor([0.0, 0.0, 0.0, 1.0, 1.0])
tensor_y = Tensor([0.0, 0.0, 0.0, 1.0, 1.0])
tensor_y = mgr.tensor([0.0, 0.0, 0.0, 1.0, 1.0])
tensor_w_in = Tensor([0.001, 0.001])
tensor_w_out_i = Tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_w_out_j = Tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_w_in = mgr.tensor([0.001, 0.001])
tensor_w_out_i = mgr.tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_w_out_j = mgr.tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_b_in = Tensor([0.0])
tensor_b_out = Tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_b_in = mgr.tensor([0.0])
tensor_b_out = mgr.tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_l_out = Tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_l_out = mgr.tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_m = Tensor([ 5.0 ])
tensor_m = mgr.tensor([ 5.0 ])
# We store them in an array for easier interaction
params = [tensor_x_i, tensor_x_j, tensor_y, tensor_w_in, tensor_w_out_i,
tensor_w_out_j, tensor_b_in, tensor_b_out, tensor_l_out, tensor_m]
mgr = Manager()
mgr.eval_tensor_create_def(params)
sq.sequence().eval(kp.OpTensorSyncDevice(params))
# Record commands for efficient evaluation
sq = mgr.create_sequence()
sq.begin()
sq.record_tensor_sync_device([tensor_w_in, tensor_b_in])
sq.record_algo_data(params, compute_shader.to_spirv())
sq.record_tensor_sync_local([tensor_w_out_i, tensor_w_out_j, tensor_b_out, tensor_l_out])
sq.end()
sq = mgr.sequence()
sq.record(kp.OpTensorSyncDevice([tensor_w_in, tensor_b_in]))
sq.record(kp.OpAlgoDispatch(mgr.algorithm(params, compute_shader.to_spirv())))
sq.record(kp.OpTensorSyncLocal([tensor_w_out_i, tensor_w_out_j, tensor_b_out, tensor_l_out]))
ITERATIONS = 100
learning_rate = 0.1

View file

@ -64,15 +64,15 @@ The :class:`kp::OpBase` provides a top level class for an operation in Kompute,
.. doxygenclass:: kp::OpBase
:members:
OpAlgoBase
OpAlgoDispatch
-------
The vk::OpAlgoBase extends the vk::OpBase class, and provides the base for shader-based operations. Besides of consisting of one or more vk::Tensor as per the vk::OpBase, it also contains a unique vk::Algorithm.
The `vk::OpAlgoDispatch` extends the `vk::OpBase` class, and provides the base for shader-based operations. Besides of consisting of one or more `vk::Tensor` as per the `vk::OpBase`, it also contains a unique `vk::Algorithm`.
.. image:: ../images/kompute-vulkan-architecture-opmult.jpg
:width: 100%
.. doxygenclass:: kp::OpAlgoBase
.. doxygenclass:: kp::OpAlgoDispatch
:members:
OpMult
@ -111,6 +111,13 @@ The :class:`kp::OpTensorSyncDevice` is a tensor only operation that maps the dat
.. doxygenclass:: kp::OpTensorSyncDevice
:members:
OpMemoryBarrier
-------
The :class:`kp::OpMemoryBarrier` is a tensor only operation which adds memory barriers to the tensors provided with the access and stage masks provided.
.. doxygenclass:: kp::OpTensorSyncDevice
:members:
Shader
--------

View file

@ -20,61 +20,62 @@ void KomputeModelML::train(std::vector<float> yData, std::vector<float> xIData,
uint32_t ITERATIONS = 100;
float learningRate = 0.1;
std::shared_ptr<kp::Tensor> xI{ new kp::Tensor(xIData) };
std::shared_ptr<kp::Tensor> xJ{ new kp::Tensor(xJData) };
std::shared_ptr<kp::Tensor> y{ new kp::Tensor(yData) };
std::shared_ptr<kp::Tensor> wIn{ new kp::Tensor({ 0.001, 0.001 }) };
std::shared_ptr<kp::Tensor> wOutI{ new kp::Tensor(zerosData) };
std::shared_ptr<kp::Tensor> wOutJ{ new kp::Tensor(zerosData) };
std::shared_ptr<kp::Tensor> bIn{ new kp::Tensor({ 0 }) };
std::shared_ptr<kp::Tensor> bOut{ new kp::Tensor(zerosData) };
std::shared_ptr<kp::Tensor> lOut{ new kp::Tensor(zerosData) };
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
{
kp::Manager mgr;
{
mgr.rebuild(params);
std::shared_ptr<kp::Tensor> xI = mgr.tensor(xIData);
std::shared_ptr<kp::Tensor> xJ = mgr.tensor(xJData);
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
std::shared_ptr<kp::Tensor> y = mgr.tensor(yData);
// Record op algo base
sq->begin();
std::shared_ptr<kp::Tensor> wIn = mgr.tensor({ 0.001, 0.001 });
std::shared_ptr<kp::Tensor> wOutI = mgr.tensor(zerosData);
std::shared_ptr<kp::Tensor> wOutJ = mgr.tensor(zerosData);
sq->record<kp::OpTensorSyncDevice>({ wIn, bIn });
std::shared_ptr<kp::Tensor> bIn = mgr.tensor({ 0 });
std::shared_ptr<kp::Tensor> bOut = mgr.tensor(zerosData);
// Newer versions of Android are able to use shaderc to read raw string
sq->record<kp::OpAlgoBase>(
params, kp::Shader::compile_source(LR_SHADER));
std::shared_ptr<kp::Tensor> lOut = mgr.tensor(zerosData);
sq->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
sq->end();
std::vector<uint32_t> spirv(
(uint32_t*)kp::shader_data::shaders_glsl_logisticregression_comp_spv,
(uint32_t*)(kp::shader_data::shaders_glsl_logisticregression_comp_spv
+ kp::shader_data::shaders_glsl_logisticregression_comp_spv_len));
// Iterate across all expected iterations
for (size_t i = 0; i < ITERATIONS; i++) {
std::shared_ptr<kp::Algorithm> algo =
mgr.algorithm(params, spirv, kp::Workgroup({ 5 }), kp::Constants({ 5.0 }));
sq->eval();
mgr.sequence()->eval<kp::OpTensorSyncDevice>(params);
for (size_t j = 0; j < bOut->size(); j++) {
wIn->data()[0] -= learningRate * wOutI->data()[j];
wIn->data()[1] -= learningRate * wOutJ->data()[j];
bIn->data()[0] -= learningRate * bOut->data()[j];
}
std::shared_ptr<kp::Sequence> sq = mgr.sequence()
->record<kp::OpTensorSyncDevice>({ wIn, bIn })
->record<kp::OpAlgoDispatch>(algo)
->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
// Iterate across all expected iterations
for (size_t i = 0; i < ITERATIONS; i++) {
sq->eval();
for (size_t j = 0; j < bOut->size(); j++) {
wIn->data()[0] -= learningRate * wOutI->data()[j];
wIn->data()[1] -= learningRate * wOutJ->data()[j];
bIn->data()[0] -= learningRate * bOut->data()[j];
}
}
}
this->mWeights = kp::Tensor(wIn->data());
this->mBias = kp::Tensor(bIn->data());
KP_LOG_INFO("RESULT: <<<<<<<<<<<<<<<<<<<");
KP_LOG_INFO("{}", wIn->data()[0]);
KP_LOG_INFO("{}", wIn->data()[1]);
KP_LOG_INFO("{}", bIn->data()[0]);
this->mWeights = wIn;
this->mBias = bIn;
}
}
std::vector<float> KomputeModelML::predict(std::vector<float> xI, std::vector<float> xJ) {
@ -88,9 +89,9 @@ std::vector<float> KomputeModelML::predict(std::vector<float> xI, std::vector<fl
for (size_t i = 0; i < xI.size(); i++) {
float xIVal = xI[i];
float xJVal = xJ[i];
float result = (xIVal * this->mWeights.data()[0]
+ xJVal * this->mWeights.data()[1]
+ this->mBias.data()[0]);
float result = (xIVal * this->mWeights->data()[0]
+ xJVal * this->mWeights->data()[1]
+ this->mBias->data()[0]);
// Instead of using sigmoid we'll just return full numbers
float var = result > 0 ? 1 : 0;
@ -103,13 +104,13 @@ std::vector<float> KomputeModelML::predict(std::vector<float> xI, std::vector<fl
std::vector<float> KomputeModelML::get_params() {
std::vector<float> retVector;
if(this->mWeights.size() + this->mBias.size() == 0) {
if(this->mWeights->size() + this->mBias->size() == 0) {
return retVector;
}
retVector.push_back(this->mWeights.data()[0]);
retVector.push_back(this->mWeights.data()[1]);
retVector.push_back(this->mBias.data()[0]);
retVector.push_back(this->mWeights->data()[0]);
retVector.push_back(this->mWeights->data()[1]);
retVector.push_back(this->mBias->data()[0]);
retVector.push_back(99.0);
return retVector;

View file

@ -4,6 +4,7 @@
#include <vector>
#include <string>
#include <memory>
#include "kompute/Kompute.hpp"
@ -20,8 +21,8 @@ public:
std::vector<float> get_params();
private:
kp::Tensor mWeights;
kp::Tensor mBias;
std::shared_ptr<kp::Tensor> mWeights;
std::shared_ptr<kp::Tensor> mBias;
};

View file

@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.17.0)
cmake_minimum_required(VERSION 3.4.1)
project(kompute_array_mult VERSION 0.1.0)
set(CMAKE_CXX_STANDARD 14)
@ -23,10 +23,6 @@ endif()
find_package(Vulkan REQUIRED)
if(KOMPUTE_OPT_ENABLE_SPDLOG)
find_package(spdlog REQUIRED)
endif()
add_executable(kompute_array_mult
src/Main.cpp)

View file

@ -15,8 +15,11 @@ This project has the option to either import the Kompute dependency relative to
To build you just need to run the cmake command in this folder as follows:
```
cmake \
-Bbuild
cmake -Bbuild/ \
-DCMAKE_BUILD_TYPE=Debug \
-DKOMPUTE_OPT_INSTALL=0 \
-DKOMPUTE_OPT_REPO_SUBMODULE_BUILD=1 \
-DKOMPUTE_OPT_ENABLE_SPDLOG=1
```
You can pass the following optional parameters based on your desired configuration:

View file

@ -37,15 +37,19 @@ int main()
}
)");
mgr.evalOpDefault<kp::OpAlgoBase>(
{ tensorInA, tensorInB, tensorOut },
kp::Shader::compile_source(shader));
std::vector<std::shared_ptr<kp::Tensor>> params = { tensorInA, tensorInB, tensorOut };
mgr.evalOpDefault<kp::OpTensorSyncLocal>({tensorOut});
std::shared_ptr<kp::Algorithm> algo = mgr.algorithm(params, kp::Shader::compileSource(shader));
mgr.sequence()
->record<kp::OpTensorSyncDevice>(params)
->record<kp::OpAlgoDispatch>(algo)
->record<kp::OpTensorSyncLocal>(params)
->eval();
// prints "Output { 0 4 12 }"
std::cout<< "Output: { ";
for (const float& elem : tensorOut->data()) {
for (const float& elem : tensorOut->vector()) {
std::cout << elem << " ";
}
std::cout << "}" << std::endl;

View file

@ -31,7 +31,7 @@ void KomputeSummatorNode::_init() {
std::cout << "CALLING INIT" << std::endl;
this->mPrimaryTensor = this->mManager.tensor({ 0.0 });
this->mSecondaryTensor = this->mManager.tensor({ 0.0 });
this->mSequence = this->mManager.sequence("AdditionSeq");
this->mSequence = this->mManager.sequence();
// We now record the steps in the sequence
if (std::shared_ptr<kp::Sequence> sq = this->mSequence)
@ -51,7 +51,11 @@ void KomputeSummatorNode::_init() {
}
)");
sq->begin();
std::shared_ptr<kp::Algorithm> algo =
mgr.algorithm(
{ this->mPrimaryTensor, this->mSecondaryTensor },
kp::Shader::compileSource(shader));
// First we ensure secondary tensor loads to GPU
// No need to sync the primary tensor as it should not be changed
@ -59,15 +63,12 @@ void KomputeSummatorNode::_init() {
{ this->mSecondaryTensor });
// Then we run the operation with both tensors
sq->record<kp::OpAlgoBase>(
{ this->mPrimaryTensor, this->mSecondaryTensor },
kp::Shader::compile_source(shader));
sq->record<kp::OpAlgoDispatch>(algo)
// We map the result back to local
sq->record<kp::OpTensorSyncLocal>(
{ this->mPrimaryTensor });
sq->end();
}
else {
throw std::runtime_error("Sequence pointer no longer available");

View file

@ -56,9 +56,9 @@ void KomputeSummator::_init() {
{ this->mSecondaryTensor });
// Then we run the operation with both tensors
this->mSequence->record<kp::OpAlgoBase>(
this->mSequence->record<kp::OpAlgoCreate>(
{ this->mPrimaryTensor, this->mSecondaryTensor },
kp::Shader::compile_source(shader));
kp::Shader::compileSource(shader));
// We map the result back to local
this->mSequence->record<kp::OpTensorSyncLocal>(

View file

@ -29,54 +29,41 @@ void KomputeModelMLNode::train(Array yArr, Array xIArr, Array xJArr) {
uint32_t ITERATIONS = 100;
float learningRate = 0.1;
std::shared_ptr<kp::Tensor> xI{ new kp::Tensor(xIData) };
std::shared_ptr<kp::Tensor> xJ{ new kp::Tensor(xJData) };
std::shared_ptr<kp::Tensor> y{ new kp::Tensor(yData) };
std::shared_ptr<kp::Tensor> wIn{ new kp::Tensor({ 0.001, 0.001 }) };
std::shared_ptr<kp::Tensor> wOutI{ new kp::Tensor(zerosData) };
std::shared_ptr<kp::Tensor> wOutJ{ new kp::Tensor(zerosData) };
std::shared_ptr<kp::Tensor> bIn{ new kp::Tensor({ 0 }) };
std::shared_ptr<kp::Tensor> bOut{ new kp::Tensor(zerosData) };
std::shared_ptr<kp::Tensor> lOut{ new kp::Tensor(zerosData) };
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
{
kp::Manager mgr;
mgr.rebuild(params);
std::shared_ptr<kp::Tensor> xI = mgr.tensor(xIData);
std::shared_ptr<kp::Tensor> xJ = mgr.tensor(xJData);
std::shared_ptr<kp::Tensor> y = mgr.tensor(yData);
std::shared_ptr<kp::Tensor> wIn = mgr.tensor({ 0.001, 0.001 });
std::shared_ptr<kp::Tensor> wOutI = mgr.tensor(zerosData);
std::shared_ptr<kp::Tensor> wOutJ = mgr.tensor(zerosData);
std::shared_ptr<kp::Tensor> bIn = mgr.tensor({ 0 });
std::shared_ptr<kp::Tensor> bOut = mgr.tensor(zerosData);
std::shared_ptr<kp::Tensor> lOut = mgr.tensor(zerosData);
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
{
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
std::vector<uint32_t> spirv(
(uint32_t*)kp::shader_data::shaders_glsl_logisticregression_comp_spv,
(uint32_t*)(kp::shader_data::shaders_glsl_logisticregression_comp_spv
+ kp::shader_data::shaders_glsl_logisticregression_comp_spv_len));
// Record op algo base
sq->begin();
std::shared_ptr<kp::Algorithm> algo = mgr.algorithm(params, spirv);
sq->record<kp::OpTensorSyncDevice>({ wIn, bIn });
mgr.sequence()->eval<kp::OpTensorSyncDevice>(params);
#ifdef KOMPUTE_ANDROID_SHADER_FROM_STRING
// Newer versions of Android are able to use shaderc to read raw string
sq->record<kp::OpAlgoBase>(
params, std::vector<char>(LR_SHADER.begin(), LR_SHADER.end()));
#else
// Older versions of Android require the SPIRV binary directly
sq->record<kp::OpAlgoBase>(
params, std::vector<char>(
kp::shader_data::shaders_glsl_logisticregression_comp_spv,
kp::shader_data::shaders_glsl_logisticregression_comp_spv
+ kp::shader_data::shaders_glsl_logisticregression_comp_spv_len
));
#endif
sq->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
sq->end();
std::shared_ptr<kp::Sequence> sq = mgr.sequence()
->record<kp::OpTensorSyncDevice>({ wIn, bIn })
->record<kp::OpAlgoDispatch>(algo)
->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
// Iterate across all expected iterations
for (size_t i = 0; i < ITERATIONS; i++) {
@ -90,15 +77,15 @@ void KomputeModelMLNode::train(Array yArr, Array xIArr, Array xJArr) {
}
}
}
KP_LOG_INFO("RESULT: <<<<<<<<<<<<<<<<<<<");
KP_LOG_INFO(wIn->data()[0]);
KP_LOG_INFO(wIn->data()[1]);
KP_LOG_INFO(bIn->data()[0]);
this->mWeights = kp::Tensor(wIn->data());
this->mBias = kp::Tensor(bIn->data());
}
KP_LOG_INFO("RESULT: <<<<<<<<<<<<<<<<<<<");
KP_LOG_INFO(wIn->data()[0]);
KP_LOG_INFO(wIn->data()[1]);
KP_LOG_INFO(bIn->data()[0]);
this->mWeights = kp::Tensor(wIn->data());
this->mBias = kp::Tensor(bIn->data());
}
Array KomputeModelMLNode::predict(Array xI, Array xJ) {

View file

@ -33,54 +33,41 @@ void KomputeModelML::train(Array yArr, Array xIArr, Array xJArr) {
uint32_t ITERATIONS = 100;
float learningRate = 0.1;
std::shared_ptr<kp::Tensor> xI{ new kp::Tensor(xIData) };
std::shared_ptr<kp::Tensor> xJ{ new kp::Tensor(xJData) };
std::shared_ptr<kp::Tensor> y{ new kp::Tensor(yData) };
std::shared_ptr<kp::Tensor> wIn{ new kp::Tensor({ 0.001, 0.001 }) };
std::shared_ptr<kp::Tensor> wOutI{ new kp::Tensor(zerosData) };
std::shared_ptr<kp::Tensor> wOutJ{ new kp::Tensor(zerosData) };
std::shared_ptr<kp::Tensor> bIn{ new kp::Tensor({ 0 }) };
std::shared_ptr<kp::Tensor> bOut{ new kp::Tensor(zerosData) };
std::shared_ptr<kp::Tensor> lOut{ new kp::Tensor(zerosData) };
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> xI = mgr.tensor(xIData);
std::shared_ptr<kp::Tensor> xJ = mgr.tensor(xJData);
std::shared_ptr<kp::Tensor> y = mgr.tensor(yData);
std::shared_ptr<kp::Tensor> wIn = mgr.tensor({ 0.001, 0.001 });
std::shared_ptr<kp::Tensor> wOutI = mgr.tensor(zerosData);
std::shared_ptr<kp::Tensor> wOutJ = mgr.tensor(zerosData);
std::shared_ptr<kp::Tensor> bIn = mgr.tensor({ 0 });
std::shared_ptr<kp::Tensor> bOut = mgr.tensor(zerosData);
std::shared_ptr<kp::Tensor> lOut = mgr.tensor(zerosData);
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
{
mgr.rebuild(params);
std::vector<uint32_t> spirv(
(uint32_t*)kp::shader_data::shaders_glsl_logisticregression_comp_spv,
(uint32_t*)(kp::shader_data::shaders_glsl_logisticregression_comp_spv
+ kp::shader_data::shaders_glsl_logisticregression_comp_spv_len));
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
std::shared_ptr<kp::Algorithm> algo = mgr.algorithm(params, spirv);
// Record op algo base
sq->begin();
mgr.sequence()->eval<kp::OpTensorSyncDevice>(params);
sq->record<kp::OpTensorSyncDevice>({ wIn, bIn });
#ifdef KOMPUTE_ANDROID_SHADER_FROM_STRING
// Newer versions of Android are able to use shaderc to read raw string
sq->record<kp::OpAlgoBase>(
params, std::vector<char>(LR_SHADER.begin(), LR_SHADER.end()));
#else
// Older versions of Android require the SPIRV binary directly
sq->record<kp::OpAlgoBase>(
params, std::vector<char>(
kp::shader_data::shaders_glsl_logisticregression_comp_spv,
kp::shader_data::shaders_glsl_logisticregression_comp_spv
+ kp::shader_data::shaders_glsl_logisticregression_comp_spv_len
));
#endif
sq->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
sq->end();
std::shared_ptr<kp::Sequence> sq = mgr.sequence()
->record<kp::OpTensorSyncDevice>({ wIn, bIn })
->record<kp::OpAlgoDispatch>(algo)
->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
// Iterate across all expected iterations
for (size_t i = 0; i < ITERATIONS; i++) {
@ -94,15 +81,15 @@ void KomputeModelML::train(Array yArr, Array xIArr, Array xJArr) {
}
}
}
KP_LOG_INFO("RESULT: <<<<<<<<<<<<<<<<<<<");
KP_LOG_INFO(wIn->data()[0]);
KP_LOG_INFO(wIn->data()[1]);
KP_LOG_INFO(bIn->data()[0]);
this->mWeights = wIn;
this->mBias = bIn;
}
KP_LOG_INFO("RESULT: <<<<<<<<<<<<<<<<<<<");
KP_LOG_INFO(wIn->data()[0]);
KP_LOG_INFO(wIn->data()[1]);
KP_LOG_INFO(bIn->data()[0]);
this->mWeights = kp::Tensor(wIn->data());
this->mBias = kp::Tensor(bIn->data());
}
Array KomputeModelML::predict(Array xI, Array xJ) {
@ -116,9 +103,9 @@ Array KomputeModelML::predict(Array xI, Array xJ) {
for (size_t i = 0; i < xI.size(); i++) {
float xIVal = xI[i];
float xJVal = xJ[i];
float result = (xIVal * this->mWeights.data()[0]
+ xJVal * this->mWeights.data()[1]
+ this->mBias.data()[0]);
float result = (xIVal * this->mWeights->data()[0]
+ xJVal * this->mWeights->data()[1]
+ this->mBias->data()[0]);
// Instead of using sigmoid we'll just return full numbers
Variant var = result > 0 ? 1 : 0;
@ -131,15 +118,15 @@ Array KomputeModelML::predict(Array xI, Array xJ) {
Array KomputeModelML::get_params() {
Array retArray;
KP_LOG_INFO(this->mWeights.size() + this->mBias.size());
KP_LOG_INFO(this->mWeights->size() + this->mBias->size());
if(this->mWeights.size() + this->mBias.size() == 0) {
if(this->mWeights->size() + this->mBias->size() == 0) {
return retArray;
}
retArray.push_back(this->mWeights.data()[0]);
retArray.push_back(this->mWeights.data()[1]);
retArray.push_back(this->mBias.data()[0]);
retArray.push_back(this->mWeights->data()[0]);
retArray.push_back(this->mWeights->data()[1]);
retArray.push_back(this->mBias->data()[0]);
retArray.push_back(99.0);
return retArray;

View file

@ -28,8 +28,8 @@ public:
static void _register_methods();
private:
kp::Tensor mWeights;
kp::Tensor mBias;
std::shared_ptr<kp::Tensor> mWeights;
std::shared_ptr<kp::Tensor> mBias;
};
static std::string LR_SHADER = R"(

View file

@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.17.0)
cmake_minimum_required(VERSION 3.4.1)
project(kompute_linear_reg VERSION 0.1.0)
set(CMAKE_CXX_STANDARD 14)
@ -23,10 +23,6 @@ endif()
find_package(Vulkan REQUIRED)
if(KOMPUTE_OPT_ENABLE_SPDLOG)
find_package(spdlog REQUIRED)
endif()
add_executable(kompute_linear_reg
src/Main.cpp)
@ -39,7 +35,7 @@ include_directories(
../../single_include/)
if(KOMPUTE_OPT_ENABLE_SPDLOG)
target_link_libraries(kompute_array_mult
target_link_libraries(kompute_linear_reg
spdlog::spdlog)
endif()

View file

@ -15,8 +15,11 @@ This project has the option to either import the Kompute dependency relative to
To build you just need to run the cmake command in this folder as follows:
```
cmake \
-Bbuild
cmake -Bbuild/ \
-DCMAKE_BUILD_TYPE=Debug \
-DKOMPUTE_OPT_INSTALL=0 \
-DKOMPUTE_OPT_REPO_SUBMODULE_BUILD=1 \
-DKOMPUTE_OPT_ENABLE_SPDLOG=1
```
You can pass the following optional parameters based on your desired configuration:

View file

@ -15,44 +15,40 @@ int main()
uint32_t ITERATIONS = 100;
float learningRate = 0.1;
std::shared_ptr<kp::Tensor> xI{ new kp::Tensor({ 0, 1, 1, 1, 1 }) };
std::shared_ptr<kp::Tensor> xJ{ new kp::Tensor({ 0, 0, 0, 1, 1 }) };
kp::Manager mgr;
std::shared_ptr<kp::Tensor> y{ new kp::Tensor({ 0, 0, 0, 1, 1 }) };
auto xI = mgr.tensor({ 0, 1, 1, 1, 1 });
auto xJ = mgr.tensor({ 0, 0, 0, 1, 1 });
std::shared_ptr<kp::Tensor> wIn{ new kp::Tensor({ 0.001, 0.001 }) };
std::shared_ptr<kp::Tensor> wOutI{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> wOutJ{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
auto y = mgr.tensor({ 0, 0, 0, 1, 1 });
std::shared_ptr<kp::Tensor> bIn{ new kp::Tensor({ 0 }) };
std::shared_ptr<kp::Tensor> bOut{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
auto wIn = mgr.tensor({ 0.001, 0.001 });
auto wOutI = mgr.tensor({ 0, 0, 0, 0, 0 });
auto wOutJ = mgr.tensor({ 0, 0, 0, 0, 0 });
std::shared_ptr<kp::Tensor> lOut{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
auto bIn = mgr.tensor({ 0 });
auto bOut = mgr.tensor({ 0, 0, 0, 0, 0 });
auto lOut = mgr.tensor({ 0, 0, 0, 0, 0 });
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
kp::Manager mgr;
mgr.rebuild(params);
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
// Record op algo base
sq->begin();
sq->record<kp::OpTensorSyncDevice>({ wIn, bIn });
sq->record<kp::OpAlgoBase>(
params, std::vector<uint32_t>(
std::vector<uint32_t> spirv(
(uint32_t*)kp::shader_data::shaders_glsl_logisticregression_comp_spv,
(uint32_t*)(kp::shader_data::shaders_glsl_logisticregression_comp_spv
+ kp::shader_data::shaders_glsl_logisticregression_comp_spv_len)));
+ kp::shader_data::shaders_glsl_logisticregression_comp_spv_len));
sq->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
std::shared_ptr<kp::Algorithm> algo = mgr.algorithm(
params, spirv, kp::Workgroup({ 5 }), kp::Constants({ 5.0 }));
sq->end();
mgr.sequence()->eval<kp::OpTensorSyncDevice>(params);
std::shared_ptr<kp::Sequence> sq = mgr.sequence()
->record<kp::OpTensorSyncDevice>({ wIn, bIn })
->record<kp::OpAlgoDispatch>(algo)
->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
// Iterate across all expected iterations
for (size_t i = 0; i < ITERATIONS; i++) {

File diff suppressed because it is too large Load diff

View file

@ -4,9 +4,13 @@
#include <kompute/Kompute.hpp>
#include "fmt/ranges.h"
#include "docstrings.hpp"
namespace py = pybind11;
using namespace pybind11::literals; // for the `_a` literal
//used in Core.hpp
py::object kp_debug, kp_info, kp_warning, kp_error;
@ -23,11 +27,10 @@ PYBIND11_MODULE(kp, m) {
py::module_ np = py::module_::import("numpy");
py::enum_<kp::Tensor::TensorTypes>(m, "TensorTypes", DOC(kp, Tensor, TensorTypes))
.value("device", kp::Tensor::TensorTypes::eDevice, "Tensor holding data in GPU memory.")
.value("host", kp::Tensor::TensorTypes::eHost, "Tensor used for CPU visible GPU data.")
.value("storage", kp::Tensor::TensorTypes::eStorage, "Tensor with host visible gpu memory.")
py::enum_<kp::Tensor::TensorTypes>(m, "TensorTypes")
.value("device", kp::Tensor::TensorTypes::eDevice, DOC(kp, Tensor, TensorTypes, eDevice))
.value("host", kp::Tensor::TensorTypes::eHost, DOC(kp, Tensor, TensorTypes, eHost))
.value("storage", kp::Tensor::TensorTypes::eStorage, DOC(kp, Tensor, TensorTypes, eStorage))
.export_values();
#if !defined(KOMPUTE_DISABLE_SHADER_UTILS) || !KOMPUTE_DISABLE_SHADER_UTILS
@ -36,290 +39,204 @@ PYBIND11_MODULE(kp, m) {
const std::string& source,
const std::string& entryPoint,
const std::vector<std::pair<std::string,std::string>>& definitions) {
std::vector<uint32_t> spirv = kp::Shader::compile_source(source, entryPoint, definitions);
std::vector<uint32_t> spirv = kp::Shader::compileSource(source, entryPoint, definitions);
return py::bytes((const char*)spirv.data(), spirv.size() * sizeof(uint32_t));
},
"Compiles string source provided and returns the value in bytes",
py::arg("source"), py::arg("entryPoint") = "main", py::arg("definitions") = std::vector<std::pair<std::string,std::string>>() )
DOC(kp, Shader, compileSource),
py::arg("source"),
py::arg("entryPoint") = "main",
py::arg("definitions") = std::vector<std::pair<std::string,std::string>>() )
.def_static("compile_sources", [](
const std::vector<std::string>& source,
const std::vector<std::string>& files,
const std::string& entryPoint,
const std::vector<std::pair<std::string,std::string>>& definitions) {
std::vector<uint32_t> spirv = kp::Shader::compile_sources(source, files, entryPoint, definitions);
std::vector<uint32_t> spirv = kp::Shader::compileSources(source, files, entryPoint, definitions);
return py::bytes((const char*)spirv.data(), spirv.size() * sizeof(uint32_t));
},
"Compiles sources provided with file names and returns the value in bytes",
py::arg("sources"), py::arg("files") = std::vector<std::string>(), py::arg("entryPoint") = "main", py::arg("definitions") = std::vector<std::pair<std::string,std::string>>() );
DOC(kp, Shader, compileSources),
py::arg("sources"),
py::arg("files") = std::vector<std::string>(),
py::arg("entryPoint") = "main",
py::arg("definitions") = std::vector<std::pair<std::string,std::string>>() );
#endif // KOMPUTE_DISABLE_SHADER_UTILS
py::class_<kp::Tensor, std::shared_ptr<kp::Tensor>>(m, "Tensor", DOC(kp, Tensor))
.def(py::init(
[np](const py::array_t<float> data, kp::Tensor::TensorTypes tensor_type) {
const py::array_t<float> flatdata = np.attr("ravel")(data);
const py::buffer_info info = flatdata.request();
const float* ptr = (float*) info.ptr;
return std::unique_ptr<kp::Tensor>(
new kp::Tensor(std::vector<float>(ptr, ptr+flatdata.size()), tensor_type)
);
}),
"Construct Tensor with an array as initial data and an optional kp.TensorType (default:device).",
py::arg("data"),
py::arg("tensor_type") = kp::Tensor::TensorTypes::eDevice
)
.def("data", &kp::Tensor::data, DOC(kp, Tensor, data))
.def("numpy", [](kp::Tensor& self) {
return py::array(self.data().size(), self.data().data());
}, "Returns stored data as a new numpy array.")
.def("__getitem__", [](kp::Tensor &self, size_t index) -> float { return self.data()[index]; },
"When only an index is necessary")
.def("__setitem__", [](kp::Tensor &self, size_t index, float value) {
self.data()[index] = value; })
.def("set_data", [np](kp::Tensor &self, const py::array_t<float> data){
const py::array_t<float> flatdata = np.attr("ravel")(data);
const py::buffer_info info = flatdata.request();
const float* ptr = (float*) info.ptr;
self.setData(std::vector<float>(ptr, ptr+flatdata.size()));
}, "Overrides the data in the local Tensor memory.")
.def("__iter__", [](kp::Tensor &self) {
return py::make_iterator(self.data().begin(), self.data().end());
}, py::keep_alive<0, 1>(), // Required to keep alive iterator while exists
"Iterator to enable looping within data structure as required.")
.def("__contains__", [](kp::Tensor &self, float v) {
for (size_t i = 0; i < self.data().size(); ++i) {
if (v == self.data()[i]) {
return true;
}
}
return false;
})
.def("__reversed__", [](kp::Tensor &self) {
size_t size = self.data().size();
std::vector<float> reversed(size);
for (size_t i = 0; i < size; i++) {
reversed[size - i - 1] = self.data()[i];
}
return reversed;
})
.def("size", &kp::Tensor::size, "Retrieves the size of the Tensor data as per the local Tensor memory.")
.def("__len__", &kp::Tensor::size, "Retrieves the size of the Tensor data as per the local Tensor memory.")
.def("tensor_type", &kp::Tensor::tensorType, "Retreves the memory type of the tensor.")
.def("is_init", &kp::Tensor::isInit, "Checks whether the tensor GPU memory has been initialised.")
.def("map_data_from_host", &kp::Tensor::mapDataFromHostMemory, "Maps data into GPU memory from tensor local data.")
.def("map_data_into_host", &kp::Tensor::mapDataIntoHostMemory, "Maps data from GPU memory into tensor local data.");
py::class_<kp::OpBase, std::shared_ptr<kp::OpBase>>(m, "OpBase", DOC(kp, OpBase));
py::class_<kp::OpTensorSyncDevice, std::shared_ptr<kp::OpTensorSyncDevice>>(
m, "OpTensorSyncDevice", py::base<kp::OpBase>(), DOC(kp, OpTensorSyncDevice))
.def(py::init<const std::vector<std::shared_ptr<kp::Tensor>>&>(), DOC(kp, OpTensorSyncDevice, OpTensorSyncDevice));
py::class_<kp::OpTensorSyncLocal, std::shared_ptr<kp::OpTensorSyncLocal>>(
m, "OpTensorSyncLocal", py::base<kp::OpBase>(), DOC(kp, OpTensorSyncLocal))
.def(py::init<const std::vector<std::shared_ptr<kp::Tensor>>&>(), DOC(kp, OpTensorSyncLocal, OpTensorSyncLocal));
py::class_<kp::OpTensorCopy, std::shared_ptr<kp::OpTensorCopy>>(
m, "OpTensorCopy", py::base<kp::OpBase>(), DOC(kp, OpTensorCopy))
.def(py::init<const std::vector<std::shared_ptr<kp::Tensor>>&>(), DOC(kp, OpTensorCopy, OpTensorCopy));
py::class_<kp::OpAlgoDispatch, std::shared_ptr<kp::OpAlgoDispatch>>(
m, "OpAlgoDispatch", py::base<kp::OpBase>(), DOC(kp, OpAlgoDispatch))
.def(py::init<const std::shared_ptr<kp::Algorithm>&,const kp::Constants&>(),
DOC(kp, OpAlgoDispatch, OpAlgoDispatch),
py::arg("algorithm"), py::arg("push_consts") = kp::Constants());
py::class_<kp::OpMult, std::shared_ptr<kp::OpMult>>(
m, "OpMult", py::base<kp::OpBase>(), DOC(kp, OpMult))
.def(py::init<const std::vector<std::shared_ptr<kp::Tensor>>&,const std::shared_ptr<kp::Algorithm>&>(),
DOC(kp, OpMult, OpMult));
py::class_<kp::Algorithm, std::shared_ptr<kp::Algorithm>>(m, "Algorithm", DOC(kp, Algorithm, Algorithm))
.def("get_tensors", &kp::Algorithm::getTensors, DOC(kp, Algorithm, getTensors))
.def("destroy", &kp::Algorithm::destroy, DOC(kp, Algorithm, destroy))
.def("get_spec_consts", &kp::Algorithm::getSpecializationConstants, DOC(kp, Algorithm, getSpecializationConstants))
.def("is_init", &kp::Algorithm::isInit, DOC(kp, Algorithm, isInit));
py::class_<kp::Tensor, std::shared_ptr<kp::Tensor>>(m, "Tensor", DOC(kp, Tensor))
.def("data", [](kp::Tensor& self) {
// Non-owning container exposing the underlying pointer
py::str dummyDataOwner; // Explicitly request data to not be owned by np
switch (self.dataType()) {
case kp::Tensor::TensorDataTypes::eFloat:
return py::array(self.size(), self.data<float>(), dummyDataOwner);
case kp::Tensor::TensorDataTypes::eUnsignedInt:
return py::array(self.size(), self.data<uint32_t>(), dummyDataOwner);
case kp::Tensor::TensorDataTypes::eInt:
return py::array(self.size(), self.data<int32_t>(), dummyDataOwner);
case kp::Tensor::TensorDataTypes::eDouble:
return py::array(self.size(), self.data<double>(), dummyDataOwner);
case kp::Tensor::TensorDataTypes::eBool:
return py::array(self.size(), self.data<bool>(), dummyDataOwner);
default:
throw std::runtime_error("Kompute Python data type not supported");
}
}, DOC(kp, Tensor, data))
.def("size", &kp::Tensor::size, DOC(kp, Tensor, size))
.def("__len__", &kp::Tensor::size, DOC(kp, Tensor, size))
.def("tensor_type", &kp::Tensor::tensorType, DOC(kp, Tensor, tensorType))
.def("data_type", &kp::Tensor::dataType, DOC(kp, Tensor, dataType))
.def("is_init", &kp::Tensor::isInit, DOC(kp, Tensor, isInit))
.def("destroy", &kp::Tensor::destroy, DOC(kp, Tensor, destroy));
py::class_<kp::Sequence, std::shared_ptr<kp::Sequence>>(m, "Sequence")
.def("init", &kp::Sequence::init, DOC(kp, Sequence, init))
// record
.def("begin", &kp::Sequence::begin, DOC(kp, Sequence, begin))
.def("end", &kp::Sequence::end, DOC(kp, Sequence, end))
// eval
.def("eval", &kp::Sequence::eval, DOC(kp, Sequence, eval))
.def("eval_async", &kp::Sequence::evalAsync, DOC(kp, Sequence, evalAsync))
.def("eval_await", &kp::Sequence::evalAwait, DOC(kp, Sequence, evalAwait))
// status
.def("is_running", &kp::Sequence::isRunning, DOC(kp, Sequence, isRunning))
.def("is_rec", &kp::Sequence::isRecording, DOC(kp, Sequence, isRecording))
.def("is_init", &kp::Sequence::isInit, DOC(kp, Sequence, isInit))
// record
.def("record_tensor_copy", &kp::Sequence::record<kp::OpTensorCopy>, DOC(kp, Sequence, record))
.def("record_tensor_sync_device", &kp::Sequence::record<kp::OpTensorSyncDevice>,
"Records operation to sync tensor from local memory to GPU memory")
.def("record_tensor_sync_local", &kp::Sequence::record<kp::OpTensorSyncLocal>,
"Records operation to sync tensor(s) from GPU memory to local memory")
.def("record_algo_file", &kp::Sequence::record<
kp::OpAlgoBase,
const std::string&,
kp::Workgroup,
kp::Constants>,
"Records an operation using a custom shader provided from a shader path",
py::arg("tensors"), py::arg("data"), py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() )
.def("record_algo_data", [](kp::Sequence &self,
std::vector<std::shared_ptr<kp::Tensor>> tensors,
py::bytes &bytes,
kp::Workgroup workgroup,
kp::Constants constants) -> bool {
// Bytes have to be converted into std::vector
py::buffer_info info(py::buffer(bytes).request());
const char *data = reinterpret_cast<const char *>(info.ptr);
size_t length = static_cast<size_t>(info.size);
return self.record<kp::OpAlgoBase>(
tensors, std::vector<uint32_t>((uint32_t*)data, (uint32_t*)(data + length)), workgroup, constants);
.def("record", [](kp::Sequence& self, std::shared_ptr<kp::OpBase> op) { return self.record(op); },
DOC(kp, Sequence, record))
.def("eval", [](kp::Sequence& self) { return self.eval(); },
DOC(kp, Sequence, eval))
.def("eval", [](kp::Sequence& self, std::shared_ptr<kp::OpBase> op) { return self.eval(op); },
DOC(kp, Sequence, eval_2))
.def("eval_async", [](kp::Sequence& self) { return self.eval(); },
DOC(kp, Sequence, evalAwait))
.def("eval_async", [](kp::Sequence& self, std::shared_ptr<kp::OpBase> op) { return self.evalAsync(op); },
DOC(kp, Sequence, evalAsync))
.def("eval_await", [](kp::Sequence& self) { return self.evalAwait(); },
DOC(kp, Sequence, evalAwait))
.def("eval_await", [](kp::Sequence& self, uint32_t wait) { return self.evalAwait(wait); },
DOC(kp, Sequence, evalAwait))
.def("is_recording", &kp::Sequence::isRecording,
DOC(kp, Sequence, isRecording))
.def("is_running", &kp::Sequence::isRunning,
DOC(kp, Sequence, isRunning))
.def("is_init", &kp::Sequence::isInit,
DOC(kp, Sequence, isInit))
.def("clear", &kp::Sequence::clear,
DOC(kp, Sequence, clear))
.def("rerecord", &kp::Sequence::rerecord,
DOC(kp, Sequence, rerecord))
.def("get_timestamps", &kp::Sequence::getTimestamps,
DOC(kp, Sequence, getTimestamps))
.def("destroy", &kp::Sequence::destroy,
DOC(kp, Sequence, destroy));
py::class_<kp::Manager, std::shared_ptr<kp::Manager>>(m, "Manager", DOC(kp, Manager))
.def(py::init(), DOC(kp, Manager, Manager))
.def(py::init<uint32_t>(), DOC(kp, Manager, Manager_2))
.def(py::init<uint32_t,const std::vector<uint32_t>&,const std::vector<std::string>&>(),
DOC(kp, Manager, Manager_2),
py::arg("device") = 0,
py::arg("family_queue_indices") = std::vector<uint32_t>(),
py::arg("desired_extensions") = std::vector<std::string>())
.def("sequence", &kp::Manager::sequence, DOC(kp, Manager, sequence),
py::arg("queue_index") = 0, py::arg("total_timestamps") = 0)
.def("tensor", [np](kp::Manager& self,
const py::array_t<float>& data,
kp::Tensor::TensorTypes tensor_type) {
const py::array_t<float>& flatdata = np.attr("ravel")(data);
const py::buffer_info info = flatdata.request();
KP_LOG_DEBUG("Kompute Python Manager tensor() creating tensor float with data size {}", flatdata.size());
return self.tensor(
info.ptr,
flatdata.size(),
sizeof(float),
kp::Tensor::TensorDataTypes::eFloat,
tensor_type);
},
"Records an operation using a custom shader provided as spirv bytes",
py::arg("tensors"), py::arg("bytes"), py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() );
py::class_<kp::Manager>(m, "Manager")
.def(py::init(), "Default initializer uses device 0 and first compute compatible GPU queueFamily")
.def(py::init(
[](uint32_t physicalDeviceIndex) {
return std::unique_ptr<kp::Manager>(new kp::Manager(physicalDeviceIndex));
}), "Manager initialiser can provide specified device index but will use first compute compatible GPU queueFamily")
.def(py::init(
[](uint32_t physicalDeviceIndex, const std::vector<uint32_t>& familyQueueIndices) {
return std::unique_ptr<kp::Manager>(new kp::Manager(physicalDeviceIndex, familyQueueIndices));
}), "Manager initialiser can provide specified device and array of GPU queueFamilies to load.")
.def("sequence", &kp::Manager::sequence,
py::arg("name") = "", py::arg("queueIndex") = 0, "Get or create a sequence with specific name and specified index of available queues")
.def("tensor", &kp::Manager::tensor,
py::arg("data"), py::arg("tensorType") = kp::Tensor::TensorTypes::eDevice, py::arg("syncDataToGPU") = true,
"Build and initialise tensor")
.def("rebuild", py::overload_cast<std::vector<std::shared_ptr<kp::Tensor>>, bool>(&kp::Manager::rebuild),
py::arg("tensors"), py::arg("syncDataToGPU") = true,
"Build and initialise list of tensors")
.def("rebuild", py::overload_cast<std::shared_ptr<kp::Tensor>, bool>(&kp::Manager::rebuild),
py::arg("tensor"), py::arg("syncDataToGPU") = true,
"Build and initialise tensor")
.def("destroy", py::overload_cast<std::shared_ptr<kp::Tensor>>(&kp::Manager::destroy),
py::arg("tensor"), DOC(kp, Manager, destroy))
.def("destroy", py::overload_cast<std::vector<std::shared_ptr<kp::Tensor>>>(&kp::Manager::destroy),
py::arg("tensors"), DOC(kp, Manager, destroy, 2))
.def("destroy", py::overload_cast<std::vector<std::shared_ptr<kp::Sequence>>>(&kp::Manager::destroy),
py::arg("sequences"), DOC(kp, Manager, destroy, 3))
.def("destroy", py::overload_cast<std::shared_ptr<kp::Sequence>>(&kp::Manager::destroy),
py::arg("sequence"), DOC(kp, Manager, destroy, 4))
.def("destroy", py::overload_cast<const std::string &>(&kp::Manager::destroy),
py::arg("sequenceName"), DOC(kp, Manager, destroy, 5))
.def("destroy", py::overload_cast<const std::vector<std::string>&>(&kp::Manager::destroy),
py::arg("sequenceNames"), DOC(kp, Manager, destroy, 6))
// temporary backwards compatibility
.def("eval_tensor_create_def",[](kp::Manager& self, std::vector<std::shared_ptr<kp::Tensor>> tensors, bool syncDataToGPU) -> void {
kp_error("IMPORTANT: eval_tensor_create_def is depricated! Please use Manager.rebuild instead as function will be removed soon.");
self.rebuild(tensors, syncDataToGPU);
DOC(kp, Manager, tensor),
py::arg("data"), py::arg("tensor_type") = kp::Tensor::TensorTypes::eDevice)
.def("tensor_t", [np](kp::Manager& self,
const py::array& data,
kp::Tensor::TensorTypes tensor_type) {
// TODO: Suppport strides in numpy format
const py::array& flatdata = np.attr("ravel")(data);
const py::buffer_info info = flatdata.request();
KP_LOG_DEBUG("Kompute Python Manager creating tensor_T with data size {} dtype {}",
flatdata.size(), std::string(py::str(flatdata.dtype())));
if (flatdata.dtype() == py::dtype::of<std::float_t>()) {
return self.tensor(
info.ptr, flatdata.size(), sizeof(float), kp::Tensor::TensorDataTypes::eFloat, tensor_type);
} else if (flatdata.dtype() == py::dtype::of<std::uint32_t>()) {
return self.tensor(
info.ptr, flatdata.size(), sizeof(uint32_t), kp::Tensor::TensorDataTypes::eUnsignedInt, tensor_type);
} else if (flatdata.dtype() == py::dtype::of<std::int32_t>()) {
return self.tensor(
info.ptr, flatdata.size(), sizeof(int32_t), kp::Tensor::TensorDataTypes::eInt, tensor_type);
} else if (flatdata.dtype() == py::dtype::of<std::double_t>()) {
return self.tensor(
info.ptr, flatdata.size(), sizeof(double), kp::Tensor::TensorDataTypes::eDouble, tensor_type);
} else if (flatdata.dtype() == py::dtype::of<bool>()) {
return self.tensor(
info.ptr, flatdata.size(), sizeof(bool), kp::Tensor::TensorDataTypes::eBool, tensor_type);
} else {
throw std::runtime_error("Kompute Python no valid dtype supported");
}
},
DOC(kp, Manager, tensorT),
py::arg("data"), py::arg("tensor_type") = kp::Tensor::TensorTypes::eDevice)
.def("algorithm", [](kp::Manager& self,
const std::vector<std::shared_ptr<kp::Tensor>>& tensors,
const py::bytes& spirv,
const kp::Workgroup& workgroup,
const kp::Constants& spec_consts,
const kp::Constants& push_consts) {
py::buffer_info info(py::buffer(spirv).request());
const char *data = reinterpret_cast<const char *>(info.ptr);
size_t length = static_cast<size_t>(info.size);
std::vector<uint32_t> spirvVec((uint32_t*)data, (uint32_t*)(data + length));
return self.algorithm(tensors, spirvVec, workgroup, spec_consts, push_consts);
},
py::arg("tensors"), py::arg("syncDataToGPU") = true,
"Temporary backwards compatibility for tensor creation function which will be removed in the next version.")
DOC(kp, Manager, algorithm),
py::arg("tensors"),
py::arg("spirv"),
py::arg("workgroup") = kp::Workgroup(),
py::arg("spec_consts") = kp::Constants(),
py::arg("push_consts") = kp::Constants())
.def("get_device_properties", [](kp::Manager& self){
const auto properties = self.getDeviceProperties();
py::dict py_props(
"device_name"_a = std::string(properties.deviceName.data()),
"max_work_group_count"_a = py::make_tuple(properties.limits.maxComputeWorkGroupCount[0],
properties.limits.maxComputeWorkGroupCount[1],
properties.limits.maxComputeWorkGroupCount[2]),
"max_work_group_invocations"_a = properties.limits.maxComputeWorkGroupInvocations,
"max_work_group_size"_a = py::make_tuple(properties.limits.maxComputeWorkGroupSize[0],
properties.limits.maxComputeWorkGroupSize[1],
properties.limits.maxComputeWorkGroupSize[2]),
"timestamps_supported"_a = (bool)properties.limits.timestampComputeAndGraphics
);
return py_props;
}, "Return a dict containing information about the device");
// Await functions
.def("eval_await", &kp::Manager::evalOpAwait,
py::arg("sequenceName"), py::arg("waitFor") = UINT64_MAX,
"Awaits for asynchronous operation on a named Sequence")
.def("eval_await_def", &kp::Manager::evalOpAwaitDefault,
py::arg("waitFor") = UINT64_MAX, "Awaits for asynchronous operation on the last anonymous Sequence created")
// eval default
.def("eval_tensor_copy_def", &kp::Manager::evalOpDefault<kp::OpTensorCopy>,
"Evaluates operation to copy one tensor to one or many tensors with new anonymous Sequence")
.def("eval_tensor_sync_device_def", &kp::Manager::evalOpDefault<kp::OpTensorSyncDevice>,
"Evaluates operation to sync tensor from local memory to GPU memory with new anonymous Sequence")
.def("eval_tensor_sync_local_def", &kp::Manager::evalOpDefault<kp::OpTensorSyncLocal>,
"Evaluates operation to sync tensor(s) from GPU memory to local memory with new anonymous Sequence")
.def("eval_algo_file_def", &kp::Manager::evalOpDefault<
kp::OpAlgoBase,
const std::string&,
kp::Workgroup,
kp::Constants>,
"Evaluates an operation using a custom shader provided from a shader path with new anonymous Sequence",
py::arg("tensors"), py::arg("data"), py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() )
.def("eval_algo_data_def", [](kp::Manager &self,
std::vector<std::shared_ptr<kp::Tensor>> tensors,
py::bytes &bytes,
kp::Workgroup workgroup,
kp::Constants constants) {
// Bytes have to be converted into std::vector
py::buffer_info info(py::buffer(bytes).request());
const char *data = reinterpret_cast<const char *>(info.ptr);
size_t length = static_cast<size_t>(info.size);
self.evalOpDefault<kp::OpAlgoBase>(
tensors, std::vector<uint32_t>((uint32_t*)data, (uint32_t*)(data + length)), workgroup, constants);
},
"Evaluates an operation using a custom shader provided as spirv bytes with new anonymous Sequence",
py::arg("tensors"), py::arg("bytes"), py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() )
// eval
.def("eval_tensor_copy", &kp::Manager::evalOp<kp::OpTensorCopy>,
"Evaluates operation to copy one tensor to one or many tensors with explicitly named Sequence")
.def("eval_tensor_sync_device", &kp::Manager::evalOp<kp::OpTensorSyncDevice>,
"Evaluates operation to sync tensor from local memory to GPU memory with explicitly named Sequence")
.def("eval_tensor_sync_local", &kp::Manager::evalOp<kp::OpTensorSyncLocal>,
"Evaluates operation to sync tensor(s) from GPU memory to local memory with explicitly named Sequence")
.def("eval_algo_file", &kp::Manager::evalOp<
kp::OpAlgoBase,
const std::string&,
kp::Workgroup,
kp::Constants>,
"Evaluates an operation using a custom shader provided from a shader path with explicitly named Sequence",
py::arg("tensors"), py::arg("sequence_name"), py::arg("data"),py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() )
.def("eval_algo_data", [](kp::Manager &self,
std::vector<std::shared_ptr<kp::Tensor>> tensors,
std::string sequenceName,
py::bytes &bytes,
kp::Workgroup workgroup,
kp::Constants constants) {
// Bytes have to be converted into std::vector
py::buffer_info info(py::buffer(bytes).request());
const char *data = reinterpret_cast<const char *>(info.ptr);
size_t length = static_cast<size_t>(info.size);
self.evalOp<kp::OpAlgoBase>(
tensors, sequenceName, std::vector<uint32_t>((uint32_t*)data, (uint32_t*)(data + length)), workgroup, constants);
},
"Evaluates an operation using a custom shader provided as spirv bytes with explicitly named Sequence",
py::arg("tensors"), py::arg("sequence_name"), py::arg("bytes"), py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() )
// eval async default
.def("eval_async_tensor_copy_def", &kp::Manager::evalOpAsyncDefault<kp::OpTensorCopy>,
"Evaluates asynchronously operation to copy one tensor to one or many tensors with anonymous Sequence")
.def("eval_async_tensor_sync_device_def", &kp::Manager::evalOpAsyncDefault<kp::OpTensorSyncDevice>,
"Evaluates asynchronously operation to sync tensor from local memory to GPU memory with anonymous Sequence")
.def("eval_async_tensor_sync_local_def", &kp::Manager::evalOpAsyncDefault<kp::OpTensorSyncLocal>,
"Evaluates asynchronously operation to sync tensor(s) from GPU memory to local memory with anonymous Sequence")
.def("eval_async_algo_file_def", &kp::Manager::evalOpAsyncDefault<
kp::OpAlgoBase,
const std::string&,
kp::Workgroup,
kp::Constants>,
"Evaluates asynchronously an operation using a custom shader provided from a shader path with anonymous Sequence",
py::arg("tensors"), py::arg("data"), py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() )
.def("eval_async_algo_data_def", [](kp::Manager &self,
std::vector<std::shared_ptr<kp::Tensor>> tensors,
py::bytes &bytes,
kp::Workgroup workgroup,
kp::Constants constants) {
// Bytes have to be converted into std::vector
py::buffer_info info(py::buffer(bytes).request());
const char *data = reinterpret_cast<const char *>(info.ptr);
size_t length = static_cast<size_t>(info.size);
self.evalOpAsyncDefault<kp::OpAlgoBase>(
tensors, std::vector<uint32_t>((uint32_t*)data, (uint32_t*)(data + length)), workgroup, constants);
},
"Evaluates asynchronously an operation using a custom shader provided as raw string or spirv bytes with anonymous Sequence",
py::arg("tensors"), py::arg("bytes"), py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() )
// eval async
.def("eval_async_tensor_copy", &kp::Manager::evalOpAsync<kp::OpTensorCopy>,
"Evaluates asynchronously operation to copy one tensor to one or many tensors with explicitly named Sequence")
.def("eval_async_tensor_sync_device", &kp::Manager::evalOpAsync<kp::OpTensorSyncDevice>,
"Evaluates asynchronously operation to sync tensor from local memory to GPU memory with explicitly named Sequence")
.def("eval_async_tensor_sync_local", &kp::Manager::evalOpAsync<kp::OpTensorSyncLocal>,
"Evaluates asynchronously operation to sync tensor(s) from GPU memory to local memory with explicitly named Sequence")
.def("eval_async_algo_file", &kp::Manager::evalOpAsync<
kp::OpAlgoBase,
const std::string&,
kp::Workgroup,
kp::Constants>,
"Evaluates asynchronously an operation using a custom shader provided from a shader path with explicitly named Sequence",
py::arg("tensors"), py::arg("sequence_name"), py::arg("data"), py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() )
.def("eval_async_algo_data", [](kp::Manager &self,
std::vector<std::shared_ptr<kp::Tensor>> tensors,
std::string sequenceName,
py::bytes &bytes,
kp::Workgroup workgroup,
kp::Constants constants) {
// Bytes have to be converted into std::vector
py::buffer_info info(py::buffer(bytes).request());
const char *data = reinterpret_cast<const char *>(info.ptr);
size_t length = static_cast<size_t>(info.size);
self.evalOpAsync<kp::OpAlgoBase>(
tensors, sequenceName, std::vector<uint32_t>((uint32_t*)data, (uint32_t*)(data + length)), workgroup, constants);
},
"Evaluates asynchronously an operation using a custom shader provided as raw string or spirv bytes with explicitly named Sequence",
py::arg("tensors"), py::arg("sequence_name"), py::arg("bytes"), py::arg("workgroup") = kp::Workgroup(), py::arg("constants") = kp::Constants() );
#ifdef VERSION_INFO
m.attr("__version__") = VERSION_INFO;

View file

@ -9,29 +9,26 @@ def test_array_multiplication():
mgr = kp.Manager()
# 2. Create Kompute Tensors to hold data
tensor_in_a = kp.Tensor([2, 2, 2])
tensor_in_b = kp.Tensor([1, 2, 3])
tensor_out = kp.Tensor([0, 0, 0])
tensor_in_a = mgr.tensor(np.array([2, 2, 2]))
tensor_in_b = mgr.tensor(np.array([1, 2, 3]))
tensor_out = mgr.tensor(np.array([0, 0, 0]))
# 3. Initialise the Kompute Tensors in the GPU
mgr.rebuild([tensor_in_a, tensor_in_b, tensor_out])
params = [tensor_in_a, tensor_in_b, tensor_out]
# 4. Define the multiplication shader code to run on the GPU
@ps.python2shader
def compute_shader_multiply(index=("input", "GlobalInvocationId", ps.ivec3),
def compute_mult(index=("input", "GlobalInvocationId", ps.ivec3),
data1=("buffer", 0, ps.Array(ps.f32)),
data2=("buffer", 1, ps.Array(ps.f32)),
data3=("buffer", 2, ps.Array(ps.f32))):
i = index.x
data3[i] = data1[i] * data2[i]
# 5. Run shader code against our previously defined tensors
mgr.eval_algo_data_def(
[tensor_in_a, tensor_in_b, tensor_out],
compute_shader_multiply.to_spirv())
(mgr.sequence()
.record(kp.OpTensorSyncDevice(params))
.record(kp.OpAlgoDispatch(mgr.algorithm(params, compute_mult.to_spirv())))
.record(kp.OpTensorSyncLocal([tensor_out]))
.eval())
# 6. Sync tensor data from GPU back to local
mgr.eval_tensor_sync_local_def([tensor_out])
assert tensor_out.data() == [2.0, 4.0, 6.0]
assert np.all(tensor_out.numpy() == [2.0, 4.0, 6.0])
assert tensor_out.data().tolist() == [2.0, 4.0, 6.0]
assert np.all(tensor_out.data() == [2.0, 4.0, 6.0])

View file

@ -7,25 +7,66 @@ import pyshader as ps
DIRNAME = os.path.dirname(os.path.abspath(__file__))
def test_opalgobase_file():
"""
Test basic OpMult operation
"""
kp_log = logging.getLogger("kp")
tensor_in_a = kp.Tensor([2, 2, 2])
tensor_in_b = kp.Tensor([1, 2, 3])
tensor_out = kp.Tensor([0, 0, 0])
def test_end_to_end():
mgr = kp.Manager()
mgr.rebuild([tensor_in_a, tensor_in_b, tensor_out])
shader_path = os.path.join(DIRNAME, "../../shaders/glsl/opmult.comp.spv")
tensor_in_a = mgr.tensor([2, 2, 2])
tensor_in_b = mgr.tensor([1, 2, 3])
# Explicit type constructor supports int, in32, double, float and int
tensor_out_a = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))
tensor_out_b = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))
mgr.eval_algo_file_def([tensor_in_a, tensor_in_b, tensor_out], shader_path)
params = [tensor_in_a, tensor_in_b, tensor_out_a, tensor_out_b]
mgr.eval_tensor_sync_local_def([tensor_out])
shader = """
#version 450
assert tensor_out.data() == [2.0, 4.0, 6.0]
layout (local_size_x = 1) in;
// The input tensors bind index is relative to index in parameter passed
layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };
layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };
layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };
layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };
// Kompute supports push constants updated on dispatch
layout(push_constant) uniform PushConstants {
float val;
} push_const;
// Kompute also supports spec constants on initalization
layout(constant_id = 0) const float const_one = 0;
void main() {
uint index = gl_GlobalInvocationID.x;
out_a[index] += uint( in_a[index] * in_b[index] );
out_b[index] += uint( const_one * push_const.val );
}
"""
workgroup = (3, 1, 1)
spec_consts = [2]
push_consts_a = [2]
push_consts_b = [3]
algo = mgr.algorithm(params, kp.Shader.compile_source(shader), workgroup, spec_consts, push_consts_a)
(mgr.sequence()
.record(kp.OpTensorSyncDevice(params))
.record(kp.OpAlgoDispatch(algo))
.record(kp.OpAlgoDispatch(algo, push_consts_b))
.eval())
sq = mgr.sequence()
sq.eval_async(kp.OpTensorSyncLocal(params))
sq.eval_await()
assert tensor_out_a.data().tolist() == [4, 8, 12]
assert tensor_out_b.data().tolist() == [10, 10, 10]
def test_shader_str():
@ -47,67 +88,120 @@ void main()
}
"""
tensor_in_a = kp.Tensor([2, 2, 2])
tensor_in_b = kp.Tensor([1, 2, 3])
tensor_out = kp.Tensor([0, 0, 0])
mgr = kp.Manager()
mgr.rebuild([tensor_in_a, tensor_in_b, tensor_out])
spirv = kp.Shader.compile_source(shader)
mgr.eval_algo_data_def([tensor_in_a, tensor_in_b, tensor_out], spirv)
mgr = kp.Manager()
mgr.eval_tensor_sync_local_def([tensor_out])
tensor_in_a = mgr.tensor([2, 2, 2])
tensor_in_b = mgr.tensor([1, 2, 3])
tensor_out = mgr.tensor([0, 0, 0])
assert tensor_out.data() == [2.0, 4.0, 6.0]
params = [tensor_in_a, tensor_in_b, tensor_out]
algo = mgr.algorithm(params, spirv)
(mgr.sequence()
.record(kp.OpTensorSyncDevice(params))
.record(kp.OpAlgoDispatch(algo))
.record(kp.OpTensorSyncLocal(params))
.eval())
assert tensor_out.data().tolist() == [2.0, 4.0, 6.0]
def test_sequence():
"""
Test basic OpAlgoBase operation
"""
mgr = kp.Manager(0, [2])
tensor_in_a = kp.Tensor([2, 2, 2])
tensor_in_b = kp.Tensor([1, 2, 3])
tensor_out = kp.Tensor([0, 0, 0])
shader = """
#version 450
layout(set = 0, binding = 0) buffer tensorLhs {float valuesLhs[];};
layout(set = 0, binding = 1) buffer tensorRhs {float valuesRhs[];};
layout(set = 0, binding = 2) buffer tensorOutput { float valuesOutput[];};
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
mgr.rebuild([tensor_in_a, tensor_in_b, tensor_out])
void main()
{
uint index = gl_GlobalInvocationID.x;
valuesOutput[index] = valuesLhs[index] * valuesRhs[index];
}
"""
shader_path = os.path.abspath(os.path.join(DIRNAME, "../../shaders/glsl/opmult.comp.spv"))
mgr.eval_async_algo_file_def([tensor_in_a, tensor_in_b, tensor_out], shader_path)
spirv = kp.Shader.compile_source(shader)
mgr.eval_await_def()
mgr = kp.Manager(0)
seq = mgr.sequence("op")
seq.begin()
seq.record_tensor_sync_local([tensor_in_a])
seq.record_tensor_sync_local([tensor_in_b])
seq.record_tensor_sync_local([tensor_out])
seq.end()
seq.eval()
tensor_in_a = mgr.tensor([2, 2, 2])
tensor_in_b = mgr.tensor([1, 2, 3])
tensor_out = mgr.tensor([0, 0, 0])
mgr.destroy("op")
params = [tensor_in_a, tensor_in_b, tensor_out]
assert seq.is_init() == False
algo = mgr.algorithm(params, spirv)
assert tensor_out.data() == [2.0, 4.0, 6.0]
assert np.all(tensor_out.numpy() == [2.0, 4.0, 6.0])
sq = mgr.sequence()
mgr.destroy(tensor_in_a)
mgr.destroy([tensor_in_b, tensor_out])
sq.record(kp.OpTensorSyncDevice(params))
sq.record(kp.OpAlgoDispatch(algo))
sq.record(kp.OpTensorSyncLocal(params))
sq.eval()
assert sq.is_init() == True
sq.destroy()
assert sq.is_init() == False
assert tensor_out.data().tolist() == [2.0, 4.0, 6.0]
assert np.all(tensor_out.data() == [2.0, 4.0, 6.0])
tensor_in_a.destroy()
tensor_in_b.destroy()
tensor_out.destroy()
assert tensor_in_a.is_init() == False
assert tensor_in_b.is_init() == False
assert tensor_out.is_init() == False
def test_pushconsts():
spirv = kp.Shader.compile_source("""
#version 450
layout(push_constant) uniform PushConstants {
float x;
float y;
float z;
} pcs;
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
pa[0] += pcs.x;
pa[1] += pcs.y;
pa[2] += pcs.z;
}
""")
mgr = kp.Manager()
tensor = mgr.tensor([0, 0, 0])
algo = mgr.algorithm([tensor], spirv, (1, 1, 1), [], [0.1, 0.2, 0.3])
(mgr.sequence()
.record(kp.OpTensorSyncDevice([tensor]))
.record(kp.OpAlgoDispatch(algo))
.record(kp.OpAlgoDispatch(algo, [0.3, 0.2, 0.1]))
.record(kp.OpTensorSyncLocal([tensor]))
.eval())
assert np.all(tensor.data() == np.array([0.4, 0.4, 0.4], dtype=np.float32))
def test_workgroup():
mgr = kp.Manager(0)
tensor_a = kp.Tensor(np.zeros([16,8]))
tensor_b = kp.Tensor(np.zeros([16,8]))
mgr.rebuild([tensor_a, tensor_b])
tensor_a = mgr.tensor(np.zeros([16,8]))
tensor_b = mgr.tensor(np.zeros([16,8]))
@ps.python2shader
def compute_shader_wg(gl_idx=("input", "GlobalInvocationId", ps.ivec3),
@ -119,50 +213,17 @@ def test_workgroup():
data1[i] = f32(gl_idx.x)
data2[i] = f32(gl_idx.y)
seq = mgr.sequence("new")
seq.begin()
seq.record_algo_data([tensor_a, tensor_b], compute_shader_wg.to_spirv(), workgroup=(16,8,1))
seq.end()
seq.eval()
algo = mgr.algorithm([tensor_a, tensor_b], compute_shader_wg.to_spirv(), (16,8,1))
mgr.destroy(seq)
(mgr.sequence()
.record(kp.OpTensorSyncDevice([tensor_a, tensor_b]))
.record(kp.OpAlgoDispatch(algo))
.record(kp.OpTensorSyncLocal([tensor_a, tensor_b]))
.eval())
assert seq.is_init() == False
mgr.eval_tensor_sync_local_def([tensor_a, tensor_b])
print(tensor_a.numpy())
print(tensor_b.numpy())
assert np.all(tensor_a.numpy() == np.stack([np.arange(16)]*8, axis=1).ravel())
assert np.all(tensor_b.numpy() == np.stack([np.arange(8)]*16, axis=0).ravel())
mgr.destroy([tensor_a, tensor_b])
assert tensor_a.is_init() == False
assert tensor_b.is_init() == False
def test_tensor_rebuild_backwards_compat():
"""
Test basic OpMult operation
"""
tensor_in_a = kp.Tensor([2, 2, 2])
tensor_in_b = kp.Tensor([1, 2, 3])
tensor_out = kp.Tensor([0, 0, 0])
mgr = kp.Manager()
mgr.eval_tensor_create_def([tensor_in_a, tensor_in_b, tensor_out])
shader_path = os.path.abspath(os.path.join(DIRNAME, "../../shaders/glsl/opmult.comp.spv"))
mgr.eval_async_algo_file_def([tensor_in_a, tensor_in_b, tensor_out], shader_path)
mgr.eval_await_def()
mgr.eval_tensor_sync_local_def([tensor_out])
assert tensor_out.data() == [2.0, 4.0, 6.0]
assert np.all(tensor_out.numpy() == [2.0, 4.0, 6.0])
print(tensor_a.data())
print(tensor_b.data())
assert np.all(tensor_a.data() == np.stack([np.arange(16)]*8, axis=1).ravel())
assert np.all(tensor_b.data() == np.stack([np.arange(8)]*16, axis=0).ravel())

View file

@ -1,4 +1,5 @@
import pyshader as ps
import numpy as np
import kp
def test_logistic_regression():
@ -46,45 +47,39 @@ def test_logistic_regression():
mgr = kp.Manager(0)
# First we create input and ouput tensors for shader
tensor_x_i = kp.Tensor([0.0, 1.0, 1.0, 1.0, 1.0])
tensor_x_j = kp.Tensor([0.0, 0.0, 0.0, 1.0, 1.0])
tensor_x_i = mgr.tensor(np.array([0.0, 1.0, 1.0, 1.0, 1.0]))
tensor_x_j = mgr.tensor(np.array([0.0, 0.0, 0.0, 1.0, 1.0]))
tensor_y = kp.Tensor([0.0, 0.0, 0.0, 1.0, 1.0])
tensor_y = mgr.tensor(np.array([0.0, 0.0, 0.0, 1.0, 1.0]))
tensor_w_in = kp.Tensor([0.001, 0.001])
tensor_w_out_i = kp.Tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_w_out_j = kp.Tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_w_in = mgr.tensor(np.array([0.001, 0.001]))
tensor_w_out_i = mgr.tensor(np.array([0.0, 0.0, 0.0, 0.0, 0.0]))
tensor_w_out_j = mgr.tensor(np.array([0.0, 0.0, 0.0, 0.0, 0.0]))
tensor_b_in = kp.Tensor([0.0])
tensor_b_out = kp.Tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_b_in = mgr.tensor(np.array([0.0]))
tensor_b_out = mgr.tensor(np.array([0.0, 0.0, 0.0, 0.0, 0.0]))
tensor_l_out = kp.Tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_l_out = mgr.tensor(np.array([0.0, 0.0, 0.0, 0.0, 0.0]))
tensor_m = kp.Tensor([ tensor_y.size() ])
tensor_m = mgr.tensor(np.array([ tensor_y.size() ]))
# We store them in an array for easier interaction
params = [tensor_x_i, tensor_x_j, tensor_y, tensor_w_in, tensor_w_out_i,
tensor_w_out_j, tensor_b_in, tensor_b_out, tensor_l_out, tensor_m]
mgr.rebuild(params)
mgr.sequence().eval(kp.OpTensorSyncDevice(params))
# Create a managed sequence
sq = mgr.sequence()
# Clear previous operations and begin recording for new operations
sq.begin()
# Record operation to sync memory from local to GPU memory
sq.record_tensor_sync_device([tensor_w_in, tensor_b_in])
sq.record(kp.OpTensorSyncDevice([tensor_w_in, tensor_b_in]))
# Record operation to execute GPU shader against all our parameters
sq.record_algo_data(params, compute_shader.to_spirv())
sq.record(kp.OpAlgoDispatch(mgr.algorithm(params, compute_shader.to_spirv())))
# Record operation to sync memory from GPU to local memory
sq.record_tensor_sync_local([tensor_w_out_i, tensor_w_out_j, tensor_b_out, tensor_l_out])
# Stop recording operations
sq.end()
sq.record(kp.OpTensorSyncLocal([tensor_w_out_i, tensor_w_out_j, tensor_b_out, tensor_l_out]))
ITERATIONS = 100
learning_rate = 0.1
@ -97,9 +92,9 @@ def test_logistic_regression():
# Calculate the parameters based on the respective derivatives calculated
for j_iter in range(tensor_b_out.size()):
tensor_w_in[0] -= learning_rate * tensor_w_out_i.data()[j_iter]
tensor_w_in[1] -= learning_rate * tensor_w_out_j.data()[j_iter]
tensor_b_in[0] -= learning_rate * tensor_b_out.data()[j_iter]
tensor_w_in.data()[0] -= learning_rate * tensor_w_out_i.data()[j_iter]
tensor_w_in.data()[1] -= learning_rate * tensor_w_out_j.data()[j_iter]
tensor_b_in.data()[0] -= learning_rate * tensor_b_out.data()[j_iter]
assert tensor_w_in.data()[0] < 0.01
assert tensor_w_in.data()[0] > 0.0

View file

@ -0,0 +1,206 @@
import pyshader as ps
import os
import pytest
import kp
import numpy as np
def test_type_float():
shader = """
#version 450
layout(set = 0, binding = 0) buffer tensorLhs {float valuesLhs[];};
layout(set = 0, binding = 1) buffer tensorRhs {float valuesRhs[];};
layout(set = 0, binding = 2) buffer tensorOutput { float valuesOutput[];};
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
void main()
{
uint index = gl_GlobalInvocationID.x;
valuesOutput[index] = valuesLhs[index] * valuesRhs[index];
}
"""
spirv = kp.Shader.compile_source(shader)
arr_in_a = np.array([123., 153., 231.], dtype=np.float32)
arr_in_b = np.array([9482, 1208, 1238], dtype=np.float32)
arr_out = np.array([0, 0, 0], dtype=np.float32)
mgr = kp.Manager()
tensor_in_a = mgr.tensor(arr_in_a)
tensor_in_b = mgr.tensor(arr_in_b)
tensor_out = mgr.tensor(arr_out)
params = [tensor_in_a, tensor_in_b, tensor_out]
(mgr.sequence()
.record(kp.OpTensorSyncDevice(params))
.record(kp.OpAlgoDispatch(mgr.algorithm(params, spirv)))
.record(kp.OpTensorSyncLocal([tensor_out]))
.eval())
assert np.all(tensor_out.data() == arr_in_a * arr_in_b)
def test_type_float_double_incorrect():
shader = """
#version 450
layout(set = 0, binding = 0) buffer tensorLhs {float valuesLhs[];};
layout(set = 0, binding = 1) buffer tensorRhs {float valuesRhs[];};
layout(set = 0, binding = 2) buffer tensorOutput { float valuesOutput[];};
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
void main()
{
uint index = gl_GlobalInvocationID.x;
valuesOutput[index] = valuesLhs[index] * valuesRhs[index];
}
"""
spirv = kp.Shader.compile_source(shader)
arr_in_a = np.array([123., 153., 231.], dtype=np.float32)
arr_in_b = np.array([9482, 1208, 1238], dtype=np.uint32)
arr_out = np.array([0, 0, 0], dtype=np.float32)
mgr = kp.Manager()
tensor_in_a = mgr.tensor_t(arr_in_a)
tensor_in_b = mgr.tensor_t(arr_in_b)
tensor_out = mgr.tensor_t(arr_out)
params = [tensor_in_a, tensor_in_b, tensor_out]
(mgr.sequence()
.record(kp.OpTensorSyncDevice(params))
.record(kp.OpAlgoDispatch(mgr.algorithm(params, spirv)))
.record(kp.OpTensorSyncLocal([tensor_out]))
.eval())
assert np.all(tensor_out.data() != arr_in_a * arr_in_b)
@pytest.mark.skipif("swiftshader" in os.environ.get("VK_ICD_FILENAMES", ""),
reason="Swiftshader doesn't support double")
def test_type_double():
shader = """
#version 450
layout(set = 0, binding = 0) buffer tensorLhs { double valuesLhs[]; };
layout(set = 0, binding = 1) buffer tensorRhs { double valuesRhs[]; };
layout(set = 0, binding = 2) buffer tensorOutput { double valuesOutput[]; };
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
void main()
{
uint index = gl_GlobalInvocationID.x;
valuesOutput[index] = valuesLhs[index] * valuesRhs[index];
}
"""
spirv = kp.Shader.compile_source(shader)
arr_in_a = np.array([123., 153., 231.], dtype=np.float64)
arr_in_b = np.array([9482, 1208, 1238], dtype=np.float64)
arr_out = np.array([0, 0, 0], dtype=np.float64)
mgr = kp.Manager()
tensor_in_a = mgr.tensor_t(arr_in_a)
tensor_in_b = mgr.tensor_t(arr_in_b)
tensor_out = mgr.tensor_t(arr_out)
params = [tensor_in_a, tensor_in_b, tensor_out]
(mgr.sequence()
.record(kp.OpTensorSyncDevice(params))
.record(kp.OpAlgoDispatch(mgr.algorithm(params, spirv)))
.record(kp.OpTensorSyncLocal([tensor_out]))
.eval())
print(f"Dtype value {tensor_out.data().dtype}")
assert np.all(tensor_out.data() == arr_in_a * arr_in_b)
def test_type_int():
shader = """
#version 450
layout(set = 0, binding = 0) buffer tensorLhs { int valuesLhs[]; };
layout(set = 0, binding = 1) buffer tensorRhs { int valuesRhs[]; };
layout(set = 0, binding = 2) buffer tensorOutput { int valuesOutput[]; };
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
void main()
{
uint index = gl_GlobalInvocationID.x;
valuesOutput[index] = valuesLhs[index] * valuesRhs[index];
}
"""
spirv = kp.Shader.compile_source(shader)
arr_in_a = np.array([123, 153, 231], dtype=np.int32)
arr_in_b = np.array([9482, 1208, 1238], dtype=np.int32)
arr_out = np.array([0, 0, 0], dtype=np.int32)
mgr = kp.Manager()
tensor_in_a = mgr.tensor_t(arr_in_a)
tensor_in_b = mgr.tensor_t(arr_in_b)
tensor_out = mgr.tensor_t(arr_out)
params = [tensor_in_a, tensor_in_b, tensor_out]
(mgr.sequence()
.record(kp.OpTensorSyncDevice(params))
.record(kp.OpAlgoDispatch(mgr.algorithm(params, spirv)))
.record(kp.OpTensorSyncLocal([tensor_out]))
.eval())
print(f"Dtype value {tensor_out.data().dtype}")
assert np.all(tensor_out.data() == arr_in_a * arr_in_b)
def test_type_unsigned_int():
shader = """
#version 450
layout(set = 0, binding = 0) buffer tensorLhs { uint valuesLhs[]; };
layout(set = 0, binding = 1) buffer tensorRhs { uint valuesRhs[]; };
layout(set = 0, binding = 2) buffer tensorOutput { uint valuesOutput[]; };
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
void main()
{
uint index = gl_GlobalInvocationID.x;
valuesOutput[index] = valuesLhs[index] * valuesRhs[index];
}
"""
spirv = kp.Shader.compile_source(shader)
arr_in_a = np.array([123, 153, 231], dtype=np.uint32)
arr_in_b = np.array([9482, 1208, 1238], dtype=np.uint32)
arr_out = np.array([0, 0, 0], dtype=np.uint32)
mgr = kp.Manager()
tensor_in_a = mgr.tensor_t(arr_in_a)
tensor_in_b = mgr.tensor_t(arr_in_b)
tensor_out = mgr.tensor_t(arr_out)
params = [tensor_in_a, tensor_in_b, tensor_out]
(mgr.sequence()
.record(kp.OpTensorSyncDevice(params))
.record(kp.OpAlgoDispatch(mgr.algorithm(params, spirv)))
.record(kp.OpTensorSyncLocal([tensor_out]))
.eval())
print(f"Dtype value {tensor_out.data().dtype}")
assert np.all(tensor_out.data() == arr_in_a * arr_in_b)

View file

@ -57,7 +57,7 @@ class CMakeBuild(build_ext):
else:
cmake_args += ['-DKOMPUTE_EXTRA_CXX_FLAGS="-fPIC"']
cmake_args += ['-DCMAKE_BUILD_TYPE=' + cfg]
build_args += ['--', '-j2']
build_args += ['--', '-j']
env = os.environ.copy()
env['CXXFLAGS'] = '{} -DVERSION_INFO=\\"{}\\"'.format(env.get('CXXFLAGS', ''),
@ -70,7 +70,7 @@ class CMakeBuild(build_ext):
setup(
name='kp',
version='0.6.0',
version='0.7.0',
author='Alejandro Saucedo',
description='Vulkan Kompute: Blazing fast, mobile-enabled, asynchronous, and optimized for advanced GPU processing usecases.',
long_description=long_description,

View file

@ -1,16 +1,16 @@
#pragma once
#include "kompute/Core.hpp"
#include "kompute/Shader.hpp"
#include "kompute/shaders/shaderopmult.hpp"
#include "kompute/shaders/shaderlogisticregression.hpp"
#include "kompute/Manager.hpp"
#include "kompute/Sequence.hpp"
#include "kompute/Core.hpp"
#include "kompute/Shader.hpp"
#include "kompute/Tensor.hpp"
#include "kompute/Algorithm.hpp"
#include "kompute/operations/OpBase.hpp"
#include "kompute/operations/OpAlgoBase.hpp"
#include "kompute/operations/OpAlgoLhsRhsOut.hpp"
#include "kompute/operations/OpMult.hpp"
#include "kompute/operations/OpMemoryBarrier.hpp"
#include "kompute/operations/OpTensorCopy.hpp"
#include "kompute/operations/OpTensorSyncDevice.hpp"
#include "kompute/operations/OpTensorSyncLocal.hpp"
#include "kompute/Algorithm.hpp"
#include "kompute/Tensor.hpp"
#include "kompute/operations/OpAlgoDispatch.hpp"
#include "kompute/operations/OpMult.hpp"
#include "kompute/Sequence.hpp"
#include "kompute/Manager.hpp"

File diff suppressed because it is too large Load diff

View file

@ -4,138 +4,178 @@
namespace kp {
Algorithm::Algorithm()
{
KP_LOG_DEBUG("Kompute Algorithm base constructor");
}
Algorithm::Algorithm(std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
const Constants& specializationConstants)
const std::vector<std::shared_ptr<Tensor>>& tensors,
const std::vector<uint32_t>& spirv,
const Workgroup& workgroup,
const Constants& specializationConstants,
const Constants& pushConstants)
{
KP_LOG_DEBUG("Kompute Algorithm Constructor with device");
this->mDevice = device;
this->mCommandBuffer = commandBuffer;
this->mSpecializationConstants = specializationConstants;
if (tensors.size() && spirv.size()) {
KP_LOG_INFO("Kompute Algorithm initialising with tensor size: {} and "
"spirv size: {}",
tensors.size(),
spirv.size());
this->rebuild(
tensors, spirv, workgroup, specializationConstants, pushConstants);
} else {
KP_LOG_INFO("Kompute Algorithm constructor with empty tensors and or "
"spirv so not rebuilding vulkan components");
}
}
Algorithm::~Algorithm()
{
KP_LOG_DEBUG("Kompute Algorithm Destructor started");
this->destroy();
}
void
Algorithm::rebuild(const std::vector<std::shared_ptr<Tensor>>& tensors,
const std::vector<uint32_t>& spirv,
const Workgroup& workgroup,
const Constants& specializationConstants,
const Constants& pushConstants)
{
KP_LOG_DEBUG("Kompute Algorithm rebuild started");
this->mTensors = tensors;
this->mSpirv = spirv;
this->mSpecializationConstants = specializationConstants;
this->mPushConstants = pushConstants;
this->setWorkgroup(workgroup,
this->mTensors.size() ? this->mTensors[0]->size() : 1);
// Descriptor pool is created first so if available then destroy all before
// rebuild
if (this->isInit()) {
this->destroy();
}
this->createParameters();
this->createShaderModule();
this->createPipeline();
}
bool
Algorithm::isInit()
{
return this->mPipeline && this->mPipelineCache && this->mPipelineLayout &&
this->mDescriptorPool && this->mDescriptorSet &&
this->mDescriptorSetLayout && this->mShaderModule;
}
void
Algorithm::destroy()
{
if (!this->mDevice) {
KP_LOG_ERROR(
"Kompute Algorithm destructor reached with null Device pointer");
KP_LOG_WARN("Kompute Algorithm destroy function reached with null "
"Device pointer");
return;
}
if (this->mFreePipeline) {
if (this->mFreePipeline && this->mPipeline) {
KP_LOG_DEBUG("Kompute Algorithm Destroying pipeline");
if (!this->mPipeline) {
KP_LOG_ERROR("Kompute Algorithm Error requested to destroy "
"pipeline but it is null");
KP_LOG_WARN("Kompute Algorithm Error requested to destroy "
"pipeline but it is null");
}
this->mDevice->destroy(
*this->mPipeline,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mPipeline = nullptr;
}
if (this->mFreePipelineCache) {
if (this->mFreePipelineCache && this->mPipelineCache) {
KP_LOG_DEBUG("Kompute Algorithm Destroying pipeline cache");
if (!this->mPipelineCache) {
KP_LOG_ERROR("Kompute Algorithm Error requested to destroy "
"pipeline cache but it is null");
KP_LOG_WARN("Kompute Algorithm Error requested to destroy "
"pipeline cache but it is null");
}
this->mDevice->destroy(
*this->mPipelineCache,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mPipelineCache = nullptr;
}
if (this->mFreePipelineLayout) {
if (this->mFreePipelineLayout && this->mPipelineLayout) {
KP_LOG_DEBUG("Kompute Algorithm Destroying pipeline layout");
if (!this->mPipelineLayout) {
KP_LOG_ERROR("Kompute Algorithm Error requested to destroy "
"pipeline layout but it is null");
KP_LOG_WARN("Kompute Algorithm Error requested to destroy "
"pipeline layout but it is null");
}
this->mDevice->destroy(
*this->mPipelineLayout,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mPipelineLayout = nullptr;
}
if (this->mFreeShaderModule) {
if (this->mFreeShaderModule && this->mShaderModule) {
KP_LOG_DEBUG("Kompute Algorithm Destroying shader module");
if (!this->mShaderModule) {
KP_LOG_ERROR("Kompute Algorithm Error requested to destroy shader "
"module but it is null");
KP_LOG_WARN("Kompute Algorithm Error requested to destroy shader "
"module but it is null");
}
this->mDevice->destroy(
*this->mShaderModule,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mShaderModule = nullptr;
}
if (this->mFreeDescriptorSet) {
KP_LOG_DEBUG("Kompute Algorithm Freeing Descriptor Set");
if (!this->mDescriptorSet) {
KP_LOG_ERROR(
"Kompute Algorithm Error requested to free descriptor set");
}
this->mDevice->freeDescriptorSets(
*this->mDescriptorPool, 1, this->mDescriptorSet.get());
}
// We don't call freeDescriptorSet as the descriptor pool is not created
// with VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT more at
// (https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#VUID-vkFreeDescriptorSets-descriptorPool-00312))
// if (this->mFreeDescriptorSet && this->mDescriptorSet) {
// KP_LOG_DEBUG("Kompute Algorithm Freeing Descriptor Set");
// if (!this->mDescriptorSet) {
// KP_LOG_WARN(
// "Kompute Algorithm Error requested to free descriptor set");
// }
// this->mDevice->freeDescriptorSets(
// *this->mDescriptorPool, 1, this->mDescriptorSet.get());
// this->mDescriptorSet = nullptr;
//}
if (this->mFreeDescriptorSetLayout) {
if (this->mFreeDescriptorSetLayout && this->mDescriptorSetLayout) {
KP_LOG_DEBUG("Kompute Algorithm Destroying Descriptor Set Layout");
if (!this->mDescriptorSetLayout) {
KP_LOG_ERROR("Kompute Algorithm Error requested to destroy "
"descriptor set layout but it is null");
KP_LOG_WARN("Kompute Algorithm Error requested to destroy "
"descriptor set layout but it is null");
}
this->mDevice->destroy(
*this->mDescriptorSetLayout,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mDescriptorSetLayout = nullptr;
}
if (this->mFreeDescriptorPool) {
if (this->mFreeDescriptorPool && this->mDescriptorPool) {
KP_LOG_DEBUG("Kompute Algorithm Destroying Descriptor Pool");
if (!this->mDescriptorPool) {
KP_LOG_ERROR("Kompute Algorithm Error requested to destroy "
"descriptor pool but it is null");
KP_LOG_WARN("Kompute Algorithm Error requested to destroy "
"descriptor pool but it is null");
}
this->mDevice->destroy(
*this->mDescriptorPool,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mDescriptorPool = nullptr;
}
}
void
Algorithm::init(const std::vector<uint32_t>& shaderFileData,
std::vector<std::shared_ptr<Tensor>> tensorParams)
{
KP_LOG_DEBUG("Kompute Algorithm init started");
this->createParameters(tensorParams);
this->createShaderModule(shaderFileData);
for (std::shared_ptr<Tensor> tensor : tensorParams) {
this->mSpecializationConstants.push_back(tensor->size());
}
this->createPipeline();
}
void
Algorithm::createDescriptorPool()
{}
void
Algorithm::createParameters(std::vector<std::shared_ptr<Tensor>>& tensorParams)
Algorithm::createParameters()
{
KP_LOG_DEBUG("Kompute Algorithm createParameters started");
std::vector<vk::DescriptorPoolSize> descriptorPoolSizes = {
vk::DescriptorPoolSize(
vk::DescriptorType::eStorageBuffer,
static_cast<uint32_t>(tensorParams.size()) // Descriptor count
static_cast<uint32_t>(this->mTensors.size()) // Descriptor count
)
};
@ -152,7 +192,7 @@ Algorithm::createParameters(std::vector<std::shared_ptr<Tensor>>& tensorParams)
this->mFreeDescriptorPool = true;
std::vector<vk::DescriptorSetLayoutBinding> descriptorSetBindings;
for (size_t i = 0; i < tensorParams.size(); i++) {
for (size_t i = 0; i < this->mTensors.size(); i++) {
descriptorSetBindings.push_back(
vk::DescriptorSetLayoutBinding(i, // Binding index
vk::DescriptorType::eStorageBuffer,
@ -184,11 +224,11 @@ Algorithm::createParameters(std::vector<std::shared_ptr<Tensor>>& tensorParams)
this->mFreeDescriptorSet = true;
KP_LOG_DEBUG("Kompute Algorithm updating descriptor sets");
for (size_t i = 0; i < tensorParams.size(); i++) {
for (size_t i = 0; i < this->mTensors.size(); i++) {
std::vector<vk::WriteDescriptorSet> computeWriteDescriptorSets;
vk::DescriptorBufferInfo descriptorBufferInfo =
tensorParams[i]->constructDescriptorBufferInfo();
this->mTensors[i]->constructDescriptorBufferInfo();
computeWriteDescriptorSets.push_back(
vk::WriteDescriptorSet(*this->mDescriptorSet,
@ -207,17 +247,17 @@ Algorithm::createParameters(std::vector<std::shared_ptr<Tensor>>& tensorParams)
}
void
Algorithm::createShaderModule(const std::vector<uint32_t>& shaderFileData)
Algorithm::createShaderModule()
{
KP_LOG_DEBUG("Kompute Algorithm createShaderModule started");
vk::ShaderModuleCreateInfo shaderModuleInfo(
vk::ShaderModuleCreateFlags(),
sizeof(uint32_t) * shaderFileData.size(),
shaderFileData.data());
vk::ShaderModuleCreateInfo shaderModuleInfo(vk::ShaderModuleCreateFlags(),
sizeof(uint32_t) *
this->mSpirv.size(),
this->mSpirv.data());
KP_LOG_DEBUG("Kompute Algorithm Creating shader module. ShaderFileSize: {}",
shaderFileData.size());
this->mSpirv.size());
this->mFreeShaderModule = true;
this->mShaderModule = std::make_shared<vk::ShaderModule>();
this->mDevice->createShaderModule(
@ -237,6 +277,16 @@ Algorithm::createPipeline()
1, // Set layout count
this->mDescriptorSetLayout.get());
vk::PushConstantRange pushConstantRange;
if (this->mPushConstants.size()) {
pushConstantRange.setStageFlags(vk::ShaderStageFlagBits::eCompute);
pushConstantRange.setOffset(0);
pushConstantRange.setSize(sizeof(float) * this->mPushConstants.size());
pipelineLayoutInfo.setPushConstantRangeCount(1);
pipelineLayoutInfo.setPPushConstantRanges(&pushConstantRange);
}
this->mPipelineLayout = std::make_shared<vk::PipelineLayout>();
this->mDevice->createPipelineLayout(
&pipelineLayoutInfo, nullptr, this->mPipelineLayout.get());
@ -246,14 +296,14 @@ Algorithm::createPipeline()
for (uint32_t i = 0; i < this->mSpecializationConstants.size(); i++) {
vk::SpecializationMapEntry specializationEntry(
static_cast<uint32_t>(i),
static_cast<uint32_t>(sizeof(float) * i),
sizeof(float));
static_cast<uint32_t>(i),
static_cast<uint32_t>(sizeof(float) * i),
sizeof(float));
specializationEntries.push_back(specializationEntry);
}
// This passes ownership of the memory so we remove ownership from
// This passes ownership of the memory so we remove ownership from
// specialization container by using "transferDataOwnership"
vk::SpecializationInfo specializationInfo(
static_cast<uint32_t>(specializationEntries.size()),
@ -289,32 +339,129 @@ Algorithm::createPipeline()
throw std::runtime_error("Failed to create pipeline result: " +
vk::to_string(pipelineResult.result));
}
vk::Pipeline& pipeline = pipelineResult.value;
this->mPipeline = std::make_shared<vk::Pipeline>(pipeline);
this->mFreePipeline = true;
#else
vk::Pipeline pipelineResult =
vk::Pipeline pipeline =
this->mDevice->createComputePipeline(*this->mPipelineCache, pipelineInfo);
this->mPipeline = std::make_shared<vk::Pipeline>(pipeline);
this->mFreePipeline = true;
#endif
this->mFreePipeline = true;
this->mPipeline = std::make_shared<vk::Pipeline>(pipelineResult);
// TODO: Update to consistent
// this->mPipeline = std::make_shared<vk::Pipeline>();
// this->mDevice->createComputePipelines(
// *this->mPipelineCache, 1, &pipelineInfo, nullptr,
// this->mPipeline.get());
KP_LOG_DEBUG("Kompute Algorithm Create Pipeline Success");
}
void
Algorithm::recordDispatch(uint32_t x, uint32_t y, uint32_t z)
Algorithm::recordBindCore(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute Algorithm calling record dispatch");
KP_LOG_DEBUG("Kompute Algorithm binding pipeline");
this->mCommandBuffer->bindPipeline(vk::PipelineBindPoint::eCompute,
*this->mPipeline);
commandBuffer.bindPipeline(vk::PipelineBindPoint::eCompute,
*this->mPipeline);
this->mCommandBuffer->bindDescriptorSets(vk::PipelineBindPoint::eCompute,
*this->mPipelineLayout,
0, // First set
*this->mDescriptorSet,
nullptr // Dispatcher
KP_LOG_DEBUG("Kompute Algorithm binding descriptor sets");
commandBuffer.bindDescriptorSets(vk::PipelineBindPoint::eCompute,
*this->mPipelineLayout,
0, // First set
*this->mDescriptorSet,
nullptr // Dispatcher
);
}
this->mCommandBuffer->dispatch(x, y, z);
void
Algorithm::recordBindPush(const vk::CommandBuffer& commandBuffer)
{
if (this->mPushConstants.size()) {
KP_LOG_DEBUG("Kompute Algorithm binding push constants size: {}",
this->mPushConstants.size());
commandBuffer.pushConstants(*this->mPipelineLayout,
vk::ShaderStageFlagBits::eCompute,
0,
this->mPushConstants.size() * sizeof(float),
this->mPushConstants.data());
}
}
void
Algorithm::recordDispatch(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute Algorithm recording dispatch");
commandBuffer.dispatch(
this->mWorkgroup[0], this->mWorkgroup[1], this->mWorkgroup[2]);
}
void
Algorithm::setWorkgroup(const Workgroup& workgroup, uint32_t minSize)
{
KP_LOG_INFO("Kompute OpAlgoCreate setting dispatch size");
// The dispatch size is set up based on either explicitly provided template
// parameters or by default it would take the shape and size of the tensors
if (workgroup[0] > 0) {
// If at least the x value is provided we use mainly the parameters
// provided
this->mWorkgroup = { workgroup[0],
workgroup[1] > 0 ? workgroup[1] : 1,
workgroup[2] > 0 ? workgroup[2] : 1 };
} else {
this->mWorkgroup = { minSize, 1, 1 };
}
KP_LOG_INFO("Kompute OpAlgoCreate set dispatch size X: {}, Y: {}, Z: {}",
this->mWorkgroup[0],
this->mWorkgroup[1],
this->mWorkgroup[2]);
}
void
Algorithm::setPush(const Constants& pushConstants)
{
if (pushConstants.size() != this->mPushConstants.size()) {
throw std::runtime_error(
fmt::format("Kompute Algorithm push "
"constant provided is size {} but expected size {}",
pushConstants.size(),
this->mPushConstants.size()));
}
this->mPushConstants = pushConstants;
}
const Workgroup&
Algorithm::getWorkgroup()
{
return this->mWorkgroup;
}
const Constants&
Algorithm::getSpecializationConstants()
{
return this->mSpecializationConstants;
}
const Constants&
Algorithm::getPush()
{
return this->mPushConstants;
}
const std::vector<std::shared_ptr<Tensor>>&
Algorithm::getTensors()
{
return this->mTensors;
}
}

View file

@ -39,19 +39,25 @@ if(KOMPUTE_OPT_ANDOID_BUILD)
${PROJECT_SOURCE_DIR}/vk_ndk_wrapper_include/kompute_vk_ndk_wrapper.cpp)
endif()
add_library(
kompute STATIC
${kompute_CPP})
if(NOT KOMPUTE_OPT_BUILD_AS_SHARED_LIB)
add_library(
kompute STATIC
${kompute_CPP})
else()
add_library(
kompute SHARED
${kompute_CPP})
endif()
target_include_directories(
kompute PUBLIC
$<INSTALL_INTERFACE:include>
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
${CMAKE_CURRENT_SOURCE_DIR}/include
${PROJECT_SOURCE_DIR}/single_include
)
if(NOT KOMPUTE_OPT_ANDOID_BUILD)
target_link_libraries(
kompute
kompute
Vulkan::Vulkan
)
else()
@ -151,8 +157,7 @@ if(NOT KOMPUTE_OPT_DISABLE_SHADER_UTILS)
# HLSL
# glslang includes OGLCompiler, OSDependent, MachineIndependent
glslang
SPIRV
glslang-default-resource-limits)
SPIRV)
else()
find_package(glslang CONFIG REQUIRED)
@ -164,9 +169,8 @@ if(NOT KOMPUTE_OPT_DISABLE_SHADER_UTILS)
# Not including hlsl support
# glslang::HLSL
# Adding explicit dependencies to match above
glslang
SPIRV
glslang-default-resource-limits)
glslang::glslang
glslang::SPIRV)
endif()
endif()

View file

@ -1,9 +1,13 @@
#include <iterator>
#include <set>
#include <sstream>
#include <string>
#include "kompute/Manager.hpp"
#include "fmt/ranges.h"
namespace kp {
#if DEBUG
@ -29,28 +33,38 @@ Manager::Manager()
{}
Manager::Manager(uint32_t physicalDeviceIndex,
const std::vector<uint32_t>& familyQueueIndices)
const std::vector<uint32_t>& familyQueueIndices,
const std::vector<std::string>& desiredExtensions)
{
this->mPhysicalDeviceIndex = physicalDeviceIndex;
this->mManageResources = true;
this->createInstance();
this->createDevice(familyQueueIndices);
this->createDevice(
familyQueueIndices, physicalDeviceIndex, desiredExtensions);
}
Manager::Manager(std::shared_ptr<vk::Instance> instance,
std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
uint32_t physicalDeviceIndex)
std::shared_ptr<vk::Device> device)
{
this->mManageResources = false;
this->mInstance = instance;
this->mPhysicalDevice = physicalDevice;
this->mDevice = device;
this->mPhysicalDeviceIndex = physicalDeviceIndex;
}
Manager::~Manager()
{
KP_LOG_DEBUG("Kompute Manager Destructor started");
this->destroy();
}
void
Manager::destroy()
{
KP_LOG_DEBUG("Kompute Manager destroy() started");
if (this->mDevice == nullptr) {
KP_LOG_ERROR(
@ -58,24 +72,34 @@ Manager::~Manager()
return;
}
if (this->mManagedSequences.size()) {
if (this->mManageResources && this->mManagedSequences.size()) {
KP_LOG_DEBUG("Kompute Manager explicitly running destructor for "
"managed sequences");
for (const std::pair<std::string, std::shared_ptr<Sequence>>& sqPair :
this->mManagedSequences) {
sqPair.second->freeMemoryDestroyGPUResources();
for (const std::weak_ptr<Sequence>& weakSq : this->mManagedSequences) {
if (std::shared_ptr<Sequence> sq = weakSq.lock()) {
sq->destroy();
}
}
this->mManagedSequences.clear();
}
if (this->mManagedTensors.size()) {
KP_LOG_DEBUG("Kompute Manager explicitly freeing tensors");
for (const std::shared_ptr<Tensor>& tensor : this->mManagedTensors) {
if (!tensor->isInit()) {
KP_LOG_ERROR("Kompute Manager attempted to free managed tensor "
"but not tensor is not initialised");
if (this->mManageResources && this->mManagedAlgorithms.size()) {
KP_LOG_DEBUG("Kompute Manager explicitly freeing algorithms");
for (const std::weak_ptr<Algorithm>& weakAlgorithm :
this->mManagedAlgorithms) {
if (std::shared_ptr<Algorithm> algorithm = weakAlgorithm.lock()) {
algorithm->destroy();
}
}
this->mManagedAlgorithms.clear();
}
if (this->mManageResources && this->mManagedTensors.size()) {
KP_LOG_DEBUG("Kompute Manager explicitly freeing tensors");
for (const std::weak_ptr<Tensor>& weakTensor : this->mManagedTensors) {
if (std::shared_ptr<Tensor> tensor = weakTensor.lock()) {
tensor->destroy();
}
tensor->freeMemoryDestroyGPUResources();
}
this->mManagedTensors.clear();
}
@ -84,6 +108,7 @@ Manager::~Manager()
KP_LOG_INFO("Destroying device");
this->mDevice->destroy(
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mDevice = nullptr;
KP_LOG_DEBUG("Kompute Manager Destroyed Device");
}
@ -106,39 +131,11 @@ Manager::~Manager()
if (this->mFreeInstance) {
this->mInstance->destroy(
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mInstance = nullptr;
KP_LOG_DEBUG("Kompute Manager Destroyed Instance");
}
}
std::shared_ptr<Sequence>
Manager::sequence(std::string sequenceName, uint32_t queueIndex)
{
KP_LOG_DEBUG("Kompute Manager sequence() with sequenceName: {} "
"and queueIndex: {}",
sequenceName,
queueIndex);
std::shared_ptr<Sequence> sq = nullptr;
std::unordered_map<std::string, std::shared_ptr<Sequence>>::iterator found =
this->mManagedSequences.find(sequenceName);
if (found == this->mManagedSequences.end()) {
std::shared_ptr<Sequence> sq =
std::make_shared<Sequence>(this->mPhysicalDevice,
this->mDevice,
this->mComputeQueues[queueIndex],
this->mComputeQueueFamilyIndices[queueIndex]);
sq->init();
this->mManagedSequences.insert({ sequenceName, sq });
return sq;
} else {
return found->second;
}
}
void
Manager::createInstance()
{
@ -155,7 +152,10 @@ Manager::createInstance()
applicationInfo.applicationVersion = KOMPUTE_VK_API_VERSION;
std::vector<const char*> applicationExtensions;
#if DEBUG
applicationExtensions.push_back(VK_EXT_DEBUG_REPORT_EXTENSION_NAME);
#endif
vk::InstanceCreateInfo computeInstanceCreateInfo;
computeInstanceCreateInfo.pApplicationInfo = &applicationInfo;
@ -172,8 +172,24 @@ Manager::createInstance()
// We'll identify the layers that are supported
std::vector<const char*> validLayerNames;
std::vector<const char*> desiredLayerNames = {
"VK_LAYER_LUNARG_assistant_layer", "VK_LAYER_LUNARG_standard_validation"
"VK_LAYER_LUNARG_assistant_layer",
"VK_LAYER_LUNARG_standard_validation",
"VK_LAYER_KHRONOS_validation",
};
std::vector<std::string> envLayerNames;
const char* envLayerNamesVal = std::getenv("KOMPUTE_ENV_DEBUG_LAYERS");
KP_LOG_DEBUG("Kompute Manager adding environment layers: {}",
envLayerNamesVal);
if (envLayerNamesVal != NULL && *envLayerNamesVal != '\0') {
std::istringstream iss(envLayerNamesVal);
std::istream_iterator<std::string> beg(iss), end;
envLayerNames = std::vector<std::string>(beg, end);
for (const std::string& layerName : envLayerNames) {
desiredLayerNames.push_back(layerName.c_str());
}
KP_LOG_DEBUG("Desired layers: {}", desiredLayerNames);
}
// Identify the valid layer names based on the desiredLayerNames
{
std::set<std::string> uniqueLayerNames;
@ -183,6 +199,7 @@ Manager::createInstance()
std::string layerName(layerProperties.layerName.data());
uniqueLayerNames.insert(layerName);
}
KP_LOG_DEBUG("Available layers: {}", uniqueLayerNames);
for (const char* desiredLayerName : desiredLayerNames) {
if (uniqueLayerNames.count(desiredLayerName) != 0) {
validLayerNames.push_back(desiredLayerName);
@ -191,9 +208,15 @@ Manager::createInstance()
}
if (validLayerNames.size() > 0) {
KP_LOG_DEBUG(
"Kompute Manager Initializing instance with valid layers: {}",
validLayerNames);
computeInstanceCreateInfo.enabledLayerCount =
(uint32_t)validLayerNames.size();
computeInstanceCreateInfo.ppEnabledLayerNames = validLayerNames.data();
} else {
KP_LOG_WARN("Kompute Manager no valid layer names found from desired "
"layer names");
}
#endif
#endif
@ -225,7 +248,32 @@ Manager::createInstance()
}
void
Manager::createDevice(const std::vector<uint32_t>& familyQueueIndices)
Manager::clear()
{
if (this->mManageResources) {
this->mManagedTensors.erase(
std::remove_if(begin(this->mManagedTensors),
end(this->mManagedTensors),
[](std::weak_ptr<Tensor> t) { return t.expired(); }),
end(this->mManagedTensors));
this->mManagedAlgorithms.erase(
std::remove_if(
begin(this->mManagedAlgorithms),
end(this->mManagedAlgorithms),
[](std::weak_ptr<Algorithm> t) { return t.expired(); }),
end(this->mManagedAlgorithms));
this->mManagedSequences.erase(
std::remove_if(begin(this->mManagedSequences),
end(this->mManagedSequences),
[](std::weak_ptr<Sequence> t) { return t.expired(); }),
end(this->mManagedSequences));
}
}
void
Manager::createDevice(const std::vector<uint32_t>& familyQueueIndices,
uint32_t physicalDeviceIndex,
const std::vector<std::string>& desiredExtensions)
{
KP_LOG_DEBUG("Kompute Manager creating Device");
@ -233,7 +281,7 @@ Manager::createDevice(const std::vector<uint32_t>& familyQueueIndices)
if (this->mInstance == nullptr) {
throw std::runtime_error("Kompute Manager instance is null");
}
if (this->mPhysicalDeviceIndex < 0) {
if (physicalDeviceIndex < 0) {
throw std::runtime_error(
"Kompute Manager physical device index not provided");
}
@ -243,8 +291,7 @@ Manager::createDevice(const std::vector<uint32_t>& familyQueueIndices)
std::vector<vk::PhysicalDevice> physicalDevices =
this->mInstance->enumeratePhysicalDevices();
vk::PhysicalDevice physicalDevice =
physicalDevices[this->mPhysicalDeviceIndex];
vk::PhysicalDevice physicalDevice = physicalDevices[physicalDeviceIndex];
this->mPhysicalDevice =
std::make_shared<vk::PhysicalDevice>(physicalDevice);
@ -253,8 +300,8 @@ Manager::createDevice(const std::vector<uint32_t>& familyQueueIndices)
physicalDevice.getProperties();
KP_LOG_INFO("Using physical device index {} found {}",
this->mPhysicalDeviceIndex,
physicalDeviceProperties.deviceName);
physicalDeviceIndex,
physicalDeviceProperties.deviceName.data());
if (!familyQueueIndices.size()) {
// Find compute queue
@ -304,9 +351,37 @@ Manager::createDevice(const std::vector<uint32_t>& familyQueueIndices)
deviceQueueCreateInfos.push_back(deviceQueueCreateInfo);
}
KP_LOG_DEBUG("Kompute Manager desired extension layers {}",
desiredExtensions);
std::vector<vk::ExtensionProperties> deviceExtensions =
this->mPhysicalDevice->enumerateDeviceExtensionProperties();
std::set<std::string> uniqueExtensionNames;
for (const vk::ExtensionProperties& ext : deviceExtensions) {
std::string extName(ext.extensionName.data());
uniqueExtensionNames.insert(extName);
}
KP_LOG_DEBUG("Kompute Manager available extensions {}",
uniqueExtensionNames);
std::vector<const char*> validExtensions;
for (std::string ext : desiredExtensions) {
if (uniqueExtensionNames.count(ext) != 0) {
validExtensions.push_back(ext.c_str());
}
}
if (desiredExtensions.size() != validExtensions.size()) {
KP_LOG_ERROR("Kompute Manager not all extensions were added: {}",
validExtensions);
}
vk::DeviceCreateInfo deviceCreateInfo(vk::DeviceCreateFlags(),
deviceQueueCreateInfos.size(),
deviceQueueCreateInfos.data());
deviceQueueCreateInfos.data(),
{},
{},
validExtensions.size(),
validExtensions.data());
this->mDevice = std::make_shared<vk::Device>();
physicalDevice.createDevice(
@ -328,151 +403,54 @@ Manager::createDevice(const std::vector<uint32_t>& familyQueueIndices)
KP_LOG_DEBUG("Kompute Manager compute queue obtained");
}
std::shared_ptr<Tensor>
Manager::tensor(
const std::vector<float>& data,
Tensor::TensorTypes tensorType,
bool syncDataToGPU)
std::shared_ptr<Algorithm>
Manager::algorithm(const std::vector<std::shared_ptr<Tensor>>& tensors,
const std::vector<uint32_t>& spirv,
const Workgroup& workgroup,
const Constants& specializationConstants,
const Constants& pushConstants)
{
KP_LOG_DEBUG("Kompute Manager tensor triggered");
KP_LOG_DEBUG("Kompute Manager creating new tensor shared ptr");
std::shared_ptr<Tensor> tensor =
std::make_shared<Tensor>(kp::Tensor(data, tensorType));
KP_LOG_DEBUG("Kompute Manager algorithm creation triggered");
tensor->init(this->mPhysicalDevice, this->mDevice);
std::shared_ptr<Algorithm> algorithm{ new kp::Algorithm(
this->mDevice,
tensors,
spirv,
workgroup,
specializationConstants,
pushConstants) };
if (syncDataToGPU) {
this->evalOpDefault<OpTensorSyncDevice>({ tensor });
if (this->mManageResources) {
this->mManagedAlgorithms.push_back(algorithm);
}
this->mManagedTensors.insert(tensor);
return tensor;
return algorithm;
}
void
Manager::rebuild(std::vector<std::shared_ptr<kp::Tensor>> tensors,
bool syncDataToGPU)
std::shared_ptr<Sequence>
Manager::sequence(uint32_t queueIndex, uint32_t totalTimestamps)
{
KP_LOG_DEBUG("Kompute Manager rebuild triggered");
for (std::shared_ptr<Tensor> tensor : tensors) {
KP_LOG_DEBUG("Kompute Manager sequence() with queueIndex: {}", queueIndex);
// False syncData to run all tensors at once instead one by one
this->rebuild(tensor, false);
std::shared_ptr<Sequence> sq{ new kp::Sequence(
this->mPhysicalDevice,
this->mDevice,
this->mComputeQueues[queueIndex],
this->mComputeQueueFamilyIndices[queueIndex],
totalTimestamps) };
if (this->mManageResources) {
this->mManagedSequences.push_back(sq);
}
if (syncDataToGPU) {
this->evalOpDefault<OpTensorSyncDevice>(tensors);
}
return sq;
}
void
Manager::rebuild(std::shared_ptr<kp::Tensor> tensor,
bool syncDataToGPU)
vk::PhysicalDeviceProperties
Manager::getDeviceProperties() const
{
KP_LOG_DEBUG("Kompute Manager rebuild Tensor triggered");
if (tensor->isInit()) {
tensor->freeMemoryDestroyGPUResources();
}
tensor->init(this->mPhysicalDevice, this->mDevice);
std::set<std::shared_ptr<Tensor>>::iterator it =
this->mManagedTensors.find(tensor);
if (it == this->mManagedTensors.end()) {
this->mManagedTensors.insert(tensor);
}
if (syncDataToGPU) {
this->evalOpDefault<OpTensorSyncDevice>({ tensor });
}
return this->mPhysicalDevice->getProperties();
}
void
Manager::destroy(std::shared_ptr<kp::Tensor> tensor)
{
KP_LOG_DEBUG("Kompute Manager rebuild Tensor triggered");
if (tensor->isInit()) {
tensor->freeMemoryDestroyGPUResources();
}
// TODO: Confirm not limiting destroying tensors owned by this manager allowed
std::set<std::shared_ptr<Tensor>>::iterator it =
this->mManagedTensors.find(tensor);
if (it != this->mManagedTensors.end()) {
this->mManagedTensors.erase(tensor);
}
}
void
Manager::destroy(std::vector<std::shared_ptr<kp::Tensor>> tensors)
{
KP_LOG_DEBUG("Kompute Manager rebuild Tensor triggered");
for (std::shared_ptr<Tensor> tensor : tensors) {
this->destroy(tensor);
}
}
void
Manager::destroy(std::vector<std::shared_ptr<kp::Sequence>> sequences)
{
KP_LOG_DEBUG("Kompute Manager rebuild Sequence triggered");
for (std::shared_ptr<kp::Sequence> sequence : sequences) {
this->destroy(sequence);
}
}
void
Manager::destroy(std::shared_ptr<kp::Sequence> sequence)
{
KP_LOG_DEBUG("Kompute Manager rebuild Sequence triggered");
// Inefficient but required to delete by value
// Depending on the amount of named sequences created may be worth creating
// a set to ensure efficient delete.
for (std::unordered_map<std::string, std::shared_ptr<Sequence>>::iterator it = this->mManagedSequences.begin(); it != this->mManagedSequences.end(); it++) {
if (it->second == sequence) {
this->mManagedSequences.erase(it);
break;
}
}
if (sequence->isInit()) {
sequence->freeMemoryDestroyGPUResources();
}
}
void
Manager::destroy(const std::string& sequenceName)
{
KP_LOG_DEBUG("Kompute Manager rebuild Sequence triggered");
std::unordered_map<std::string, std::shared_ptr<Sequence>>::iterator
found = this->mManagedSequences.find(sequenceName);
if (found != this->mManagedSequences.end()) {
// We don't call destroy(sequence) as erasing sequence by name more efficient
if (found->second->isInit()) {
found->second->freeMemoryDestroyGPUResources();
}
this->mManagedSequences.erase(sequenceName);
}
}
void
Manager::destroy(const std::vector<std::string>& sequenceNames)
{
KP_LOG_DEBUG("Kompute Manager rebuild Sequence triggered");
for (const std::string& sequenceName : sequenceNames) {
this->destroy(sequenceName);
}
}
}

View file

@ -1,176 +0,0 @@
#pragma once
#include "kompute/operations/OpAlgoBase.hpp"
namespace kp {
OpAlgoBase::OpAlgoBase()
{
KP_LOG_DEBUG("Kompute OpAlgoBase constructor base");
}
OpAlgoBase::OpAlgoBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>>& tensors,
const Workgroup& komputeWorkgroup,
const Constants& specializationConstants)
: OpBase(physicalDevice, device, commandBuffer, tensors)
{
KP_LOG_DEBUG("Kompute OpAlgoBase constructor with params numTensors: {}",
tensors.size());
// The dispatch size is set up based on either explicitly provided template
// parameters or by default it would take the shape and size of the tensors
if (komputeWorkgroup[0] > 0) {
// If at least the x value is provided we use mainly the parameters
// provided
this->mKomputeWorkgroup = {
komputeWorkgroup[0],
komputeWorkgroup[1] > 0 ? komputeWorkgroup[1] : 1,
komputeWorkgroup[2] > 0 ? komputeWorkgroup[2] : 1
};
} else {
this->mKomputeWorkgroup = { tensors[0]->size(), 1, 1 };
}
KP_LOG_INFO("Kompute OpAlgoBase dispatch size X: {}, Y: {}, Z: {}",
this->mKomputeWorkgroup[0],
this->mKomputeWorkgroup[1],
this->mKomputeWorkgroup[2]);
this->mAlgorithm = std::make_shared<Algorithm>(device, commandBuffer, specializationConstants);
}
OpAlgoBase::OpAlgoBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>>& tensors,
std::string shaderFilePath,
const Workgroup& komputeWorkgroup,
const Constants& specializationConstants)
: OpAlgoBase(physicalDevice, device, commandBuffer, tensors, komputeWorkgroup, specializationConstants)
{
KP_LOG_DEBUG(
"Kompute OpAlgoBase shaderFilePath constructo with shaderfile path: {}",
shaderFilePath);
this->mShaderFilePath = shaderFilePath;
}
OpAlgoBase::OpAlgoBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>>& tensors,
const std::vector<uint32_t>& shaderDataRaw,
const Workgroup& komputeWorkgroup,
const Constants& specializationConstants)
: OpAlgoBase(physicalDevice, device, commandBuffer, tensors, komputeWorkgroup, specializationConstants)
{
KP_LOG_DEBUG("Kompute OpAlgoBase shaderFilePath constructo with shader raw "
"data length: {}",
shaderDataRaw.size());
this->mShaderDataRaw = shaderDataRaw;
}
OpAlgoBase::~OpAlgoBase()
{
KP_LOG_DEBUG("Kompute OpAlgoBase destructor started");
}
void
OpAlgoBase::init()
{
KP_LOG_DEBUG("Kompute OpAlgoBase init called");
if (this->mTensors.size() < 1) {
throw std::runtime_error(
"Kompute OpAlgoBase called with less than 1 tensor");
}
for (std::shared_ptr<Tensor> tensor : this->mTensors) {
if (!tensor->isInit()) {
throw std::runtime_error(
"Kompute OpAlgoBase validation failed; all tensor parameters "
"must be initialised.");
}
}
KP_LOG_DEBUG("Kompute OpAlgoBase fetching spirv data");
std::vector<uint32_t> shaderFileData = this->fetchSpirvBinaryData();
KP_LOG_DEBUG("Kompute OpAlgoBase Initialising algorithm component");
this->mAlgorithm->init(shaderFileData, this->mTensors);
}
void
OpAlgoBase::record()
{
KP_LOG_DEBUG("Kompute OpAlgoBase record called");
// Barrier to ensure the data is finished writing to buffer memory
for (std::shared_ptr<Tensor> tensor : this->mTensors) {
tensor->recordBufferMemoryBarrier(
this->mCommandBuffer,
vk::AccessFlagBits::eHostWrite,
vk::AccessFlagBits::eShaderRead,
vk::PipelineStageFlagBits::eHost,
vk::PipelineStageFlagBits::eComputeShader);
}
this->mAlgorithm->recordDispatch(this->mKomputeWorkgroup[0],
this->mKomputeWorkgroup[1],
this->mKomputeWorkgroup[2]);
}
void
OpAlgoBase::preEval()
{
KP_LOG_DEBUG("Kompute OpAlgoBase preEval called");
}
void
OpAlgoBase::postEval()
{
KP_LOG_DEBUG("Kompute OpAlgoBase postSubmit called");
}
std::vector<uint32_t>
OpAlgoBase::fetchSpirvBinaryData()
{
KP_LOG_DEBUG("Kompute OpAlgoBase Running fetchSpirvBinaryData");
if (this->mShaderFilePath.size()) {
KP_LOG_DEBUG("Kompute OpAlgoBase Reading data from file path");
std::ifstream fileStream(this->mShaderFilePath,
std::ios::binary | std::ios::in |
std::ios::ate);
if (!fileStream.good()) {
throw std::runtime_error("Error reading file: " +
this->mShaderFilePath);
}
size_t shaderFileSize = fileStream.tellg();
fileStream.seekg(0, std::ios::beg);
char* shaderDataRaw = new char[shaderFileSize];
fileStream.read(shaderDataRaw, shaderFileSize);
fileStream.close();
KP_LOG_WARN("Kompute OpAlgoBase fetched {} bytes", shaderFileSize);
return std::vector<uint32_t>((uint32_t*)shaderDataRaw, (uint32_t*)(shaderDataRaw + shaderFileSize));
} else if (this->mShaderDataRaw.size()) {
KP_LOG_DEBUG("Kompute OpAlgoBase Reading data from data provided");
return this->mShaderDataRaw;
} else {
throw std::runtime_error(
"Kompute OpAlgoBase Error reached fetchSpirvBinaryData but neither "
"filepath nor data provided");
}
}
}

58
src/OpAlgoDispatch.cpp Normal file
View file

@ -0,0 +1,58 @@
#pragma once
#include "kompute/operations/OpAlgoDispatch.hpp"
namespace kp {
OpAlgoDispatch::OpAlgoDispatch(const std::shared_ptr<kp::Algorithm>& algorithm,
const kp::Constants& pushConstants)
{
KP_LOG_DEBUG("Kompute OpAlgoDispatch constructor");
this->mAlgorithm = algorithm;
this->mPushConstants = pushConstants;
}
OpAlgoDispatch::~OpAlgoDispatch()
{
KP_LOG_DEBUG("Kompute OpAlgoDispatch destructor started");
}
void
OpAlgoDispatch::record(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpAlgoDispatch record called");
// Barrier to ensure the data is finished writing to buffer memory
for (const std::shared_ptr<Tensor>& tensor :
this->mAlgorithm->getTensors()) {
tensor->recordPrimaryBufferMemoryBarrier(
commandBuffer,
vk::AccessFlagBits::eTransferWrite,
vk::AccessFlagBits::eShaderRead,
vk::PipelineStageFlagBits::eTransfer,
vk::PipelineStageFlagBits::eComputeShader);
}
if (this->mPushConstants.size()) {
this->mAlgorithm->setPush(this->mPushConstants);
}
this->mAlgorithm->recordBindCore(commandBuffer);
this->mAlgorithm->recordBindPush(commandBuffer);
this->mAlgorithm->recordDispatch(commandBuffer);
}
void
OpAlgoDispatch::preEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpAlgoDispatch preEval called");
}
void
OpAlgoDispatch::postEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpAlgoDispatch postSubmit called");
}
}

View file

@ -1,122 +0,0 @@
#pragma once
#include "kompute/operations/OpAlgoLhsRhsOut.hpp"
namespace kp {
OpAlgoLhsRhsOut::OpAlgoLhsRhsOut()
{
KP_LOG_DEBUG("Kompute OpAlgoLhsRhsOut constructor base");
}
OpAlgoLhsRhsOut::OpAlgoLhsRhsOut(
std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors,
const Workgroup& komputeWorkgroup)
// The inheritance is initialised with the copyOutputData to false given that
// this depencendant class handles the transfer of data via staging buffers in
// a granular way.
: OpAlgoBase(physicalDevice, device, commandBuffer, tensors, komputeWorkgroup)
{
KP_LOG_DEBUG("Kompute OpAlgoLhsRhsOut constructor with params");
}
OpAlgoLhsRhsOut::~OpAlgoLhsRhsOut()
{
KP_LOG_DEBUG("Kompute OpAlgoLhsRhsOut destructor started");
}
void
OpAlgoLhsRhsOut::init()
{
KP_LOG_DEBUG("Kompute OpAlgoLhsRhsOut init called");
if (this->mTensors.size() < 3) {
throw std::runtime_error(
"Kompute OpAlgoLhsRhsOut called with less than 1 tensor");
} else if (this->mTensors.size() > 3) {
KP_LOG_WARN(
"Kompute OpAlgoLhsRhsOut called with more than 3 this->mTensors");
}
this->mTensorLHS = this->mTensors[0];
this->mTensorRHS = this->mTensors[1];
this->mTensorOutput = this->mTensors[2];
if (!(this->mTensorLHS->isInit() && this->mTensorRHS->isInit() &&
this->mTensorOutput->isInit())) {
throw std::runtime_error(
"Kompute OpAlgoLhsRhsOut all tensor parameters must be initialised. "
"LHS: " +
std::to_string(this->mTensorLHS->isInit()) +
" RHS: " + std::to_string(this->mTensorRHS->isInit()) +
" Output: " + std::to_string(this->mTensorOutput->isInit()));
}
if (!(this->mTensorLHS->size() == this->mTensorRHS->size() &&
this->mTensorRHS->size() == this->mTensorOutput->size())) {
throw std::runtime_error(
"Kompute OpAlgoLhsRhsOut all tensor parameters must be the same size "
"LHS: " +
std::to_string(this->mTensorLHS->size()) +
" RHS: " + std::to_string(this->mTensorRHS->size()) +
" Output: " + std::to_string(this->mTensorOutput->size()));
}
KP_LOG_DEBUG("Kompute OpAlgoLhsRhsOut fetching spirv data");
std::vector<uint32_t> shaderFileData = this->fetchSpirvBinaryData();
KP_LOG_DEBUG("Kompute OpAlgoLhsRhsOut Initialising algorithm component");
this->mAlgorithm->init(shaderFileData, this->mTensors);
}
void
OpAlgoLhsRhsOut::record()
{
KP_LOG_DEBUG("Kompute OpAlgoLhsRhsOut record called");
// Barrier to ensure the data is finished writing to buffer memory
this->mTensorLHS->recordBufferMemoryBarrier(
this->mCommandBuffer,
vk::AccessFlagBits::eHostWrite,
vk::AccessFlagBits::eShaderRead,
vk::PipelineStageFlagBits::eHost,
vk::PipelineStageFlagBits::eComputeShader);
this->mTensorRHS->recordBufferMemoryBarrier(
this->mCommandBuffer,
vk::AccessFlagBits::eHostWrite,
vk::AccessFlagBits::eShaderRead,
vk::PipelineStageFlagBits::eHost,
vk::PipelineStageFlagBits::eComputeShader);
this->mAlgorithm->recordDispatch(this->mKomputeWorkgroup[0],
this->mKomputeWorkgroup[1],
this->mKomputeWorkgroup[2]);
// Barrier to ensure the shader code is executed before buffer read
this->mTensorOutput->recordBufferMemoryBarrier(
this->mCommandBuffer,
vk::AccessFlagBits::eShaderWrite,
vk::AccessFlagBits::eTransferRead,
vk::PipelineStageFlagBits::eComputeShader,
vk::PipelineStageFlagBits::eTransfer);
if (this->mTensorOutput->tensorType() == Tensor::TensorTypes::eDevice) {
this->mTensorOutput->recordCopyFromDeviceToStaging(this->mCommandBuffer,
true);
}
}
void
OpAlgoLhsRhsOut::postEval()
{
KP_LOG_DEBUG("Kompute OpAlgoLhsRhsOut postSubmit called");
this->mTensorOutput->mapDataFromHostMemory();
}
}

66
src/OpMemoryBarrier.cpp Normal file
View file

@ -0,0 +1,66 @@
#pragma once
#include "kompute/operations/OpMemoryBarrier.hpp"
namespace kp {
OpMemoryBarrier::OpMemoryBarrier(
const std::vector<std::shared_ptr<Tensor>>& tensors,
const vk::AccessFlagBits& srcAccessMask,
const vk::AccessFlagBits& dstAccessMask,
const vk::PipelineStageFlagBits& srcStageMask,
const vk::PipelineStageFlagBits& dstStageMask,
bool barrierOnPrimary)
: mTensors(tensors)
, mSrcAccessMask(srcAccessMask)
, mDstAccessMask(dstAccessMask)
, mSrcStageMask(srcStageMask)
, mDstStageMask(dstStageMask)
, mBarrierOnPrimary(barrierOnPrimary)
{
KP_LOG_DEBUG("Kompute OpMemoryBarrier constructor");
}
OpMemoryBarrier::~OpMemoryBarrier()
{
KP_LOG_DEBUG("Kompute OpMemoryBarrier destructor started");
}
void
OpMemoryBarrier::record(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpMemoryBarrier record called");
// Barrier to ensure the data is finished writing to buffer memory
if (this->mBarrierOnPrimary) {
for (const std::shared_ptr<Tensor>& tensor : this->mTensors) {
tensor->recordPrimaryBufferMemoryBarrier(commandBuffer,
this->mSrcAccessMask,
this->mDstAccessMask,
this->mSrcStageMask,
this->mDstStageMask);
}
} else {
for (const std::shared_ptr<Tensor>& tensor : this->mTensors) {
tensor->recordStagingBufferMemoryBarrier(commandBuffer,
this->mSrcAccessMask,
this->mDstAccessMask,
this->mSrcStageMask,
this->mDstStageMask);
}
}
}
void
OpMemoryBarrier::preEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpMemoryBarrier preEval called");
}
void
OpMemoryBarrier::postEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpMemoryBarrier postSubmit called");
}
}

View file

@ -3,18 +3,33 @@
namespace kp {
OpTensorCopy::OpTensorCopy()
{
KP_LOG_DEBUG("Kompute OpTensorCopy constructor base");
}
OpTensorCopy::OpTensorCopy(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors)
: OpBase(physicalDevice, device, commandBuffer, tensors)
OpTensorCopy::OpTensorCopy(const std::vector<std::shared_ptr<Tensor>>& tensors)
{
KP_LOG_DEBUG("Kompute OpTensorCopy constructor with params");
this->mTensors = tensors;
if (this->mTensors.size() < 2) {
throw std::runtime_error(
"Kompute OpTensorCopy called with less than 2 tensor");
}
kp::Tensor::TensorDataTypes dataType = this->mTensors[0]->dataType();
uint32_t size = this->mTensors[0]->size();
for (const std::shared_ptr<Tensor>& tensor : tensors) {
if (tensor->dataType() != dataType) {
throw std::runtime_error(fmt::format(
"Attempting to copy tensors of different types from {} to {}",
dataType,
tensor->dataType()));
}
if (tensor->size() != size) {
throw std::runtime_error(fmt::format(
"Attempting to copy tensors of different sizes from {} to {}",
size,
tensor->size()));
}
}
}
OpTensorCopy::~OpTensorCopy()
@ -23,54 +38,32 @@ OpTensorCopy::~OpTensorCopy()
}
void
OpTensorCopy::init()
{
KP_LOG_DEBUG("Kompute OpTensorCopy init called");
if (this->mTensors.size() < 2) {
throw std::runtime_error(
"Kompute OpTensorCopy called with less than 2 tensor");
}
for (std::shared_ptr<Tensor> tensor : this->mTensors) {
if (!tensor->isInit()) {
throw std::runtime_error(
"Kompute OpTensorCopy tensor parameter has not been initialized");
}
if (tensor->tensorType() == Tensor::TensorTypes::eStorage) {
throw std::runtime_error("Kompute OpTensorCopy tensor parameter is "
"of TensorTypes::eStorage and hence "
"cannot be used to receive or pass data.");
}
}
}
void
OpTensorCopy::record()
OpTensorCopy::record(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpTensorCopy record called");
// We iterate from the second tensor onwards and record a copy to all
for (size_t i = 1; i < this->mTensors.size(); i++) {
this->mTensors[i]->recordCopyFrom(
this->mCommandBuffer, this->mTensors[0], false);
this->mTensors[i]->recordCopyFrom(commandBuffer, this->mTensors[0]);
}
}
void
OpTensorCopy::preEval()
OpTensorCopy::preEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpTensorCopy preEval called");
}
void
OpTensorCopy::postEval()
OpTensorCopy::postEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpTensorCopy postEval called");
void* data = this->mTensors[0]->rawData();
// Copy the data from the first tensor into all the tensors
for (size_t i = 1; i < this->mTensors.size(); i++) {
this->mTensors[i]->setData(this->mTensors[0]->data());
this->mTensors[i]->setRawData(data);
}
}

View file

@ -1,82 +1,48 @@
#include "kompute/Tensor.hpp"
#include "kompute/operations/OpTensorSyncDevice.hpp"
namespace kp {
OpTensorSyncDevice::OpTensorSyncDevice()
{
KP_LOG_DEBUG("Kompute OpTensorSyncDevice constructor base");
}
OpTensorSyncDevice::OpTensorSyncDevice(
std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors)
: OpBase(physicalDevice, device, commandBuffer, tensors)
const std::vector<std::shared_ptr<Tensor>>& tensors)
{
KP_LOG_DEBUG("Kompute OpTensorSyncDevice constructor with params");
if (tensors.size() < 1) {
throw std::runtime_error(
"Kompute OpTensorSyncDevice called with less than 1 tensor");
}
this->mTensors = tensors;
}
OpTensorSyncDevice::~OpTensorSyncDevice()
{
KP_LOG_DEBUG("Kompute OpTensorSyncDevice destructor started");
this->mTensors.clear();
}
void
OpTensorSyncDevice::init()
{
KP_LOG_DEBUG("Kompute OpTensorSyncDevice init called");
if (this->mTensors.size() < 1) {
throw std::runtime_error(
"Kompute OpTensorSyncDevice called with less than 1 tensor");
}
for (std::shared_ptr<Tensor> tensor : this->mTensors) {
if (!tensor->isInit()) {
throw std::runtime_error("Kompute OpTensorSyncDevice: Tensor param "
"has not been initialized");
}
if (tensor->tensorType() == Tensor::TensorTypes::eStorage) {
KP_LOG_WARN(
"Kompute OpTensorSyncLocal tensor parameter is of type "
"TensorTypes::eStorage and hence cannot be used to receive or "
"pass data.");
}
}
}
void
OpTensorSyncDevice::record()
OpTensorSyncDevice::record(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpTensorSyncDevice record called");
for (size_t i = 0; i < this->mTensors.size(); i++) {
if (this->mTensors[i]->tensorType() == Tensor::TensorTypes::eDevice) {
this->mTensors[i]->recordCopyFromStagingToDevice(
this->mCommandBuffer, false);
this->mTensors[i]->recordCopyFromStagingToDevice(commandBuffer);
}
}
}
void
OpTensorSyncDevice::preEval()
OpTensorSyncDevice::preEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpTensorSyncDevice preEval called");
// Performing sync of data as eval can be called multiple times with same op
for (size_t i = 0; i < this->mTensors.size(); i++) {
if (this->mTensors[i]->tensorType() != Tensor::TensorTypes::eStorage) {
this->mTensors[i]->mapDataIntoHostMemory();
}
}
}
void
OpTensorSyncDevice::postEval()
OpTensorSyncDevice::postEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpTensorSyncDevice postEval called");
}

View file

@ -5,19 +5,17 @@
namespace kp {
OpTensorSyncLocal::OpTensorSyncLocal()
{
KP_LOG_DEBUG("Kompute OpTensorSyncLocal constructor base");
}
OpTensorSyncLocal::OpTensorSyncLocal(
std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors)
: OpBase(physicalDevice, device, commandBuffer, tensors)
const std::vector<std::shared_ptr<Tensor>>& tensors)
{
KP_LOG_DEBUG("Kompute OpTensorSyncLocal constructor with params");
if (tensors.size() < 1) {
throw std::runtime_error(
"Kompute OpTensorSyncLocal called with less than 1 tensor");
}
this->mTensors = tensors;
}
OpTensorSyncLocal::~OpTensorSyncLocal()
@ -26,59 +24,44 @@ OpTensorSyncLocal::~OpTensorSyncLocal()
}
void
OpTensorSyncLocal::init()
{
KP_LOG_DEBUG("Kompute OpTensorSyncLocal init called");
if (this->mTensors.size() < 1) {
throw std::runtime_error(
"Kompute OpTensorSyncLocal called with less than 1 tensor");
}
for (std::shared_ptr<Tensor> tensor : this->mTensors) {
if (!tensor->isInit()) {
throw std::runtime_error(
"Kompute OpTensorSyncLocal: Tensor has not been initialized");
}
if (tensor->tensorType() == Tensor::TensorTypes::eStorage) {
KP_LOG_WARN(
"Kompute OpTensorSyncLocal tensor parameter is of type "
"TensorTypes::eStorage and hence cannot be used to receive or "
"pass data.");
}
}
}
void
OpTensorSyncLocal::record()
OpTensorSyncLocal::record(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpTensorSyncLocal record called");
for (size_t i = 0; i < this->mTensors.size(); i++) {
if (this->mTensors[i]->tensorType() == Tensor::TensorTypes::eDevice) {
this->mTensors[i]->recordCopyFromDeviceToStaging(
this->mCommandBuffer, true);
this->mTensors[i]->recordPrimaryBufferMemoryBarrier(
commandBuffer,
vk::AccessFlagBits::eShaderWrite,
vk::AccessFlagBits::eTransferRead,
vk::PipelineStageFlagBits::eComputeShader,
vk::PipelineStageFlagBits::eTransfer);
this->mTensors[i]->recordCopyFromDeviceToStaging(commandBuffer);
this->mTensors[i]->recordPrimaryBufferMemoryBarrier(
commandBuffer,
vk::AccessFlagBits::eTransferWrite,
vk::AccessFlagBits::eHostRead,
vk::PipelineStageFlagBits::eTransfer,
vk::PipelineStageFlagBits::eHost);
}
}
}
void
OpTensorSyncLocal::preEval()
OpTensorSyncLocal::preEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpTensorSyncLocal preEval called");
}
void
OpTensorSyncLocal::postEval()
OpTensorSyncLocal::postEval(const vk::CommandBuffer& commandBuffer)
{
KP_LOG_DEBUG("Kompute OpTensorSyncLocal postEval called");
KP_LOG_DEBUG("Kompute OpTensorSyncLocal mapping data into tensor local");
for (size_t i = 0; i < this->mTensors.size(); i++) {
if (this->mTensors[i]->tensorType() != Tensor::TensorTypes::eStorage) {
this->mTensors[i]->mapDataFromHostMemory();
}
}
}
}

View file

@ -3,16 +3,11 @@
namespace kp {
Sequence::Sequence()
{
KP_LOG_DEBUG("Kompute Sequence base constructor");
this->mIsInit = false;
}
Sequence::Sequence(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::Queue> computeQueue,
uint32_t queueIndex)
uint32_t queueIndex,
uint32_t totalTimestamps)
{
KP_LOG_DEBUG("Kompute Sequence Constructor with existing device & queue");
@ -20,126 +15,111 @@ Sequence::Sequence(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
this->mDevice = device;
this->mComputeQueue = computeQueue;
this->mQueueIndex = queueIndex;
this->mIsInit = false;
this->createCommandPool();
this->createCommandBuffer();
if (totalTimestamps > 0)
this->createTimestampQueryPool(totalTimestamps +
1); //+1 for the first one
}
Sequence::~Sequence()
{
KP_LOG_DEBUG("Kompute Sequence Destructor started");
if (!this->mIsInit) {
KP_LOG_INFO("Kompute Sequence destructor called but sequence is not "
"initialized so no need to removing GPU resources.");
return;
} else {
this->freeMemoryDestroyGPUResources();
if (this->mDevice) {
this->destroy();
}
}
void
Sequence::init()
{
this->createCommandPool();
this->createCommandBuffer();
this->mIsInit = true;
}
bool
Sequence::begin()
{
KP_LOG_DEBUG("Kompute sequence called BEGIN");
if (this->isRecording()) {
KP_LOG_WARN("Kompute Sequence begin called when already recording");
return false;
KP_LOG_DEBUG("Kompute Sequence begin called when already recording");
return;
}
if (this->isRunning()) {
KP_LOG_WARN(
throw std::runtime_error(
"Kompute Sequence begin called when sequence still running");
return false;
}
if (!this->mCommandPool) {
throw std::runtime_error("Kompute Sequence command pool is null");
}
KP_LOG_INFO("Kompute Sequence command now started recording");
this->mCommandBuffer->begin(vk::CommandBufferBeginInfo());
this->mRecording = true;
if (this->mOperations.size()) {
KP_LOG_INFO("Kompute Sequence clearing previous operations");
this->mOperations.clear();
}
if (!this->mRecording) {
KP_LOG_INFO("Kompute Sequence command recording BEGIN");
this->mCommandBuffer->begin(vk::CommandBufferBeginInfo());
this->mRecording = true;
} else {
KP_LOG_WARN("Kompute Sequence attempted to start command recording "
"but recording already started");
}
return true;
// latch the first timestamp before any commands are submitted
if (this->timestampQueryPool)
this->mCommandBuffer->writeTimestamp(
vk::PipelineStageFlagBits::eAllCommands,
*this->timestampQueryPool,
0);
}
bool
void
Sequence::end()
{
KP_LOG_DEBUG("Kompute Sequence calling END");
if (this->isRunning()) {
throw std::runtime_error(
"Kompute Sequence begin called when sequence still running");
}
if (!this->isRecording()) {
KP_LOG_WARN("Kompute Sequence end called when not recording");
return false;
}
if (!this->mCommandPool) {
throw std::runtime_error("Kompute Sequence command pool is null");
}
if (this->mRecording) {
return;
} else {
KP_LOG_INFO("Kompute Sequence command recording END");
this->mCommandBuffer->end();
this->mRecording = false;
} else {
KP_LOG_WARN("Kompute Sequence attempted to end command recording but "
"recording not started");
}
return true;
}
bool
void
Sequence::clear()
{
KP_LOG_DEBUG("Kompute Sequence calling clear");
if (this->isRecording()) {
this->end();
}
}
std::shared_ptr<Sequence>
Sequence::eval()
{
KP_LOG_DEBUG("Kompute sequence EVAL BEGIN");
bool evalResult = this->evalAsync();
if (!evalResult) {
KP_LOG_DEBUG("Kompute sequence EVAL FAILURE");
return false;
}
evalResult = this->evalAwait();
KP_LOG_DEBUG("Kompute sequence EVAL SUCCESS");
return evalResult;
return this->evalAsync()->evalAwait();
}
bool
std::shared_ptr<Sequence>
Sequence::eval(std::shared_ptr<OpBase> op)
{
this->clear();
return this->record(op)->eval();
}
std::shared_ptr<Sequence>
Sequence::evalAsync()
{
if (this->isRecording()) {
KP_LOG_WARN("Kompute Sequence evalAsync called when still recording");
return false;
this->end();
}
if (this->mIsRunning) {
KP_LOG_WARN("Kompute Sequence evalAsync called when an eval async was "
"called without successful wait");
return false;
throw std::runtime_error(
"Kompute Sequence evalAsync called when an eval async was "
"called without successful wait");
}
this->mIsRunning = true;
for (size_t i = 0; i < this->mOperations.size(); i++) {
this->mOperations[i]->preEval();
this->mOperations[i]->preEval(*this->mCommandBuffer);
}
vk::SubmitInfo submitInfo(
@ -152,15 +132,24 @@ Sequence::evalAsync()
this->mComputeQueue->submit(1, &submitInfo, this->mFence);
return true;
return shared_from_this();
}
bool
std::shared_ptr<Sequence>
Sequence::evalAsync(std::shared_ptr<OpBase> op)
{
this->clear();
this->record(op);
this->evalAsync();
return shared_from_this();
}
std::shared_ptr<Sequence>
Sequence::evalAwait(uint64_t waitFor)
{
if (!this->mIsRunning) {
KP_LOG_WARN("Kompute Sequence evalAwait called without existing eval");
return false;
return shared_from_this();
}
vk::Result result =
@ -171,15 +160,16 @@ Sequence::evalAwait(uint64_t waitFor)
this->mIsRunning = false;
if (result == vk::Result::eTimeout) {
KP_LOG_WARN("Kompute Sequence evalAwait timed out");
return false;
KP_LOG_WARN("Kompute Sequence evalAwait reached timeout of {}",
waitFor);
return shared_from_this();
}
for (size_t i = 0; i < this->mOperations.size(); i++) {
this->mOperations[i]->postEval();
this->mOperations[i]->postEval(*this->mCommandBuffer);
}
return true;
return shared_from_this();
}
bool
@ -197,54 +187,62 @@ Sequence::isRecording()
bool
Sequence::isInit()
{
return this->mIsInit;
return this->mDevice && this->mCommandPool && this->mCommandBuffer &&
this->mComputeQueue;
}
void
Sequence::freeMemoryDestroyGPUResources()
Sequence::rerecord()
{
KP_LOG_DEBUG("Kompute Sequence freeMemoryDestroyGPUResources called");
if (!this->mIsInit) {
KP_LOG_ERROR("Kompute Sequence freeMemoryDestroyGPUResources called "
"but Sequence is not initialized so there's no relevant "
"GPU resources.");
return;
this->end();
std::vector<std::shared_ptr<OpBase>> ops = this->mOperations;
this->mOperations.clear();
for (const std::shared_ptr<kp::OpBase>& op : ops) {
this->record(op);
}
}
void
Sequence::destroy()
{
KP_LOG_DEBUG("Kompute Sequence destroy called");
if (!this->mDevice) {
KP_LOG_ERROR("Kompute Sequence freeMemoryDestroyGPUResources called "
"with null Device pointer");
this->mIsInit = false;
KP_LOG_WARN("Kompute Sequence destroy called "
"with null Device pointer");
return;
}
if (this->mFreeCommandBuffer) {
KP_LOG_INFO("Freeing CommandBuffer");
if (!this->mCommandBuffer) {
KP_LOG_ERROR(
"Kompute Sequence freeMemoryDestroyGPUResources called with null "
"CommandPool pointer");
this->mIsInit = false;
KP_LOG_WARN("Kompute Sequence destroy called with null "
"CommandPool pointer");
return;
}
this->mDevice->freeCommandBuffers(
*this->mCommandPool, 1, this->mCommandBuffer.get());
this->mCommandBuffer = nullptr;
this->mFreeCommandBuffer = false;
KP_LOG_DEBUG("Kompute Sequence Freed CommandBuffer");
}
if (this->mFreeCommandPool) {
KP_LOG_INFO("Destroying CommandPool");
if (this->mCommandPool == nullptr) {
KP_LOG_ERROR(
"Kompute Sequence freeMemoryDestroyGPUResources called with null "
"CommandPool pointer");
this->mIsInit = false;
KP_LOG_WARN("Kompute Sequence destroy called with null "
"CommandPool pointer");
return;
}
this->mDevice->destroy(
*this->mCommandPool,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mCommandPool = nullptr;
this->mFreeCommandPool = false;
KP_LOG_DEBUG("Kompute Sequence Destroyed CommandPool");
}
@ -253,7 +251,48 @@ Sequence::freeMemoryDestroyGPUResources()
this->mOperations.clear();
}
this->mIsInit = false;
if (this->timestampQueryPool) {
KP_LOG_INFO("Destroying QueryPool");
this->mDevice->destroy(
*this->timestampQueryPool,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->timestampQueryPool = nullptr;
KP_LOG_DEBUG("Kompute Sequence Destroyed QueryPool");
}
if (this->mDevice) {
this->mDevice = nullptr;
}
if (this->mPhysicalDevice) {
this->mPhysicalDevice = nullptr;
}
if (this->mComputeQueue) {
this->mComputeQueue = nullptr;
}
}
std::shared_ptr<Sequence>
Sequence::record(std::shared_ptr<OpBase> op)
{
KP_LOG_DEBUG("Kompute Sequence record function started");
this->begin();
KP_LOG_DEBUG(
"Kompute Sequence running record on OpBase derived class instance");
op->record(*this->mCommandBuffer);
this->mOperations.push_back(op);
if (this->timestampQueryPool)
this->mCommandBuffer->writeTimestamp(
vk::PipelineStageFlagBits::eAllCommands,
*this->timestampQueryPool,
this->mOperations.size());
return shared_from_this();
}
void
@ -300,4 +339,52 @@ Sequence::createCommandBuffer()
KP_LOG_DEBUG("Kompute Sequence Command Buffer Created");
}
void
Sequence::createTimestampQueryPool(uint32_t totalTimestamps)
{
KP_LOG_DEBUG("Kompute Sequence creating query pool");
if (!this->isInit()) {
throw std::runtime_error(
"createTimestampQueryPool() called on uninitialized Sequence");
}
if (!this->mPhysicalDevice) {
throw std::runtime_error("Kompute Sequence physical device is null");
}
vk::PhysicalDeviceProperties physicalDeviceProperties =
this->mPhysicalDevice->getProperties();
if (physicalDeviceProperties.limits.timestampComputeAndGraphics) {
vk::QueryPoolCreateInfo queryPoolInfo;
queryPoolInfo.setQueryCount(totalTimestamps);
queryPoolInfo.setQueryType(vk::QueryType::eTimestamp);
this->timestampQueryPool = std::make_shared<vk::QueryPool>(
this->mDevice->createQueryPool(queryPoolInfo));
KP_LOG_DEBUG("Query pool for timestamps created");
} else {
throw std::runtime_error("Device does not support timestamps");
}
}
std::vector<std::uint64_t>
Sequence::getTimestamps()
{
if (!this->timestampQueryPool)
throw std::runtime_error("Timestamp latching not enabled");
const auto n = this->mOperations.size() + 1;
std::vector<std::uint64_t> timestamps(n, 0);
this->mDevice->getQueryPoolResults(
*this->timestampQueryPool,
0,
n,
timestamps.size() * sizeof(std::uint64_t),
timestamps.data(),
sizeof(uint64_t),
vk::QueryResultFlagBits::e64 | vk::QueryResultFlagBits::eWait);
return timestamps;
}
}

View file

@ -5,11 +5,13 @@
namespace kp {
std::vector<uint32_t>
Shader::compile_sources(const std::vector<std::string>& sources,
const std::vector<std::string>& files,
const std::string& entryPoint,
std::vector<std::pair<std::string,std::string>> definitions,
const TBuiltInResource& resources) {
Shader::compileSources(
const std::vector<std::string>& sources,
const std::vector<std::string>& files,
const std::string& entryPoint,
std::vector<std::pair<std::string, std::string>> definitions,
const TBuiltInResource& resources)
{
// Initialize glslang library.
glslang::InitializeProcess();
@ -18,27 +20,32 @@ Shader::compile_sources(const std::vector<std::string>& sources,
const EShLanguage language = EShLangCompute;
glslang::TShader shader(language);
std::vector<const char*> filesCStr(files.size()), sourcesCStr(sources.size());
for (size_t i = 0; i < sources.size(); i++) sourcesCStr[i] = sources[i].c_str();
std::vector<const char*> filesCStr(files.size()),
sourcesCStr(sources.size());
for (size_t i = 0; i < sources.size(); i++)
sourcesCStr[i] = sources[i].c_str();
if (files.size() > 1) {
assert(files.size() == sources.size());
for (size_t i = 0; i < files.size(); i++) filesCStr[i] = files[i].c_str();
shader.setStringsWithLengthsAndNames(sourcesCStr.data(), nullptr, filesCStr.data(), filesCStr.size());
}
else {
filesCStr = {""};
shader.setStringsWithLengthsAndNames(sourcesCStr.data(), nullptr, filesCStr.data(), sourcesCStr.size());
for (size_t i = 0; i < files.size(); i++)
filesCStr[i] = files[i].c_str();
shader.setStringsWithLengthsAndNames(
sourcesCStr.data(), nullptr, filesCStr.data(), filesCStr.size());
} else {
filesCStr = { "" };
shader.setStringsWithLengthsAndNames(
sourcesCStr.data(), nullptr, filesCStr.data(), sourcesCStr.size());
}
shader.setEntryPoint(entryPoint.c_str());
shader.setSourceEntryPoint(entryPoint.c_str());
std::string info_log = "";
const EShMessages messages = static_cast<EShMessages>(EShMsgDefault | EShMsgVulkanRules | EShMsgSpvRules);
if (!shader.parse(&resources, 100, false, messages))
{
info_log = std::string(shader.getInfoLog()) + "\n" + std::string(shader.getInfoDebugLog());
const EShMessages messages = static_cast<EShMessages>(
EShMsgDefault | EShMsgVulkanRules | EShMsgSpvRules);
if (!shader.parse(&resources, 100, false, messages)) {
info_log = std::string(shader.getInfoLog()) + "\n" +
std::string(shader.getInfoDebugLog());
KP_LOG_ERROR("Kompute Shader Error: {}", info_log);
throw std::runtime_error(info_log);
}
@ -47,24 +54,23 @@ Shader::compile_sources(const std::vector<std::string>& sources,
glslang::TProgram program;
program.addShader(&shader);
// Link program.
if (!program.link(messages))
{
info_log = std::string(program.getInfoLog()) + "\n" + std::string(program.getInfoDebugLog());
if (!program.link(messages)) {
info_log = std::string(program.getInfoLog()) + "\n" +
std::string(program.getInfoDebugLog());
KP_LOG_ERROR("Kompute Shader Error: {}", info_log);
throw std::runtime_error(info_log);
}
// Save any info log that was generated.
if (shader.getInfoLog())
{
info_log += std::string(shader.getInfoLog()) + "\n" + std::string(shader.getInfoDebugLog()) + "\n";
if (shader.getInfoLog()) {
info_log += std::string(shader.getInfoLog()) + "\n" +
std::string(shader.getInfoDebugLog()) + "\n";
KP_LOG_INFO("Kompute Shader Information: {}", info_log);
}
glslang::TIntermediate *intermediate = program.getIntermediate(language);
glslang::TIntermediate* intermediate = program.getIntermediate(language);
// Translate to SPIRV.
if (!intermediate)
{
if (!intermediate) {
info_log += "Failed to get shared intermediate code.\n";
KP_LOG_ERROR("Kompute Shader Error: {}", info_log);
throw std::runtime_error(info_log);
@ -74,8 +80,7 @@ Shader::compile_sources(const std::vector<std::string>& sources,
std::vector<std::uint32_t> spirv;
glslang::GlslangToSpv(*intermediate, spirv, &logger);
if (shader.getInfoLog())
{
if (shader.getInfoLog()) {
info_log += logger.getAllMessages() + "\n";
KP_LOG_DEBUG("Kompute Shader all result messages: {}", info_log);
}
@ -87,12 +92,127 @@ Shader::compile_sources(const std::vector<std::string>& sources,
}
std::vector<uint32_t>
Shader::compile_source(const std::string& source,
const std::string& entryPoint,
std::vector<std::pair<std::string,std::string>> definitions,
const TBuiltInResource& resource) {
return compile_sources({source}, std::vector<std::string>({}), entryPoint, definitions, resource);
Shader::compileSource(
const std::string& source,
const std::string& entryPoint,
std::vector<std::pair<std::string, std::string>> definitions,
const TBuiltInResource& resource)
{
return compileSources({ source },
std::vector<std::string>({}),
entryPoint,
definitions,
resource);
}
const TBuiltInResource Shader::defaultResource = {
/* .MaxLights = */ 0,
/* .MaxClipPlanes = */ 0,
/* .MaxTextureUnits = */ 0,
/* .MaxTextureCoords = */ 0,
/* .MaxVertexAttribs = */ 64,
/* .MaxVertexUniformComponents = */ 4096,
/* .MaxVaryingFloats = */ 64,
/* .MaxVertexTextureImageUnits = */ 0,
/* .MaxCombinedTextureImageUnits = */ 0,
/* .MaxTextureImageUnits = */ 0,
/* .MaxFragmentUniformComponents = */ 0,
/* .MaxDrawBuffers = */ 0,
/* .MaxVertexUniformVectors = */ 128,
/* .MaxVaryingVectors = */ 8,
/* .MaxFragmentUniformVectors = */ 0,
/* .MaxVertexOutputVectors = */ 16,
/* .MaxFragmentInputVectors = */ 0,
/* .MinProgramTexelOffset = */ -8,
/* .MaxProgramTexelOffset = */ 7,
/* .MaxClipDistances = */ 8,
/* .MaxComputeWorkGroupCountX = */ 65535,
/* .MaxComputeWorkGroupCountY = */ 65535,
/* .MaxComputeWorkGroupCountZ = */ 65535,
/* .MaxComputeWorkGroupSizeX = */ 1024,
/* .MaxComputeWorkGroupSizeY = */ 1024,
/* .MaxComputeWorkGroupSizeZ = */ 64,
/* .MaxComputeUniformComponents = */ 1024,
/* .MaxComputeTextureImageUnits = */ 16,
/* .MaxComputeImageUniforms = */ 8,
/* .MaxComputeAtomicCounters = */ 8,
/* .MaxComputeAtomicCounterBuffers = */ 1,
/* .MaxVaryingComponents = */ 60,
/* .MaxVertexOutputComponents = */ 64,
/* .MaxGeometryInputComponents = */ 64,
/* .MaxGeometryOutputComponents = */ 128,
/* .MaxFragmentInputComponents = */ 0,
/* .MaxImageUnits = */ 0,
/* .MaxCombinedImageUnitsAndFragmentOutputs = */ 0,
/* .MaxCombinedShaderOutputResources = */ 8,
/* .MaxImageSamples = */ 0,
/* .MaxVertexImageUniforms = */ 0,
/* .MaxTessControlImageUniforms = */ 0,
/* .MaxTessEvaluationImageUniforms = */ 0,
/* .MaxGeometryImageUniforms = */ 0,
/* .MaxFragmentImageUniforms = */ 0,
/* .MaxCombinedImageUniforms = */ 0,
/* .MaxGeometryTextureImageUnits = */ 0,
/* .MaxGeometryOutputVertices = */ 256,
/* .MaxGeometryTotalOutputComponents = */ 1024,
/* .MaxGeometryUniformComponents = */ 1024,
/* .MaxGeometryVaryingComponents = */ 64,
/* .MaxTessControlInputComponents = */ 128,
/* .MaxTessControlOutputComponents = */ 128,
/* .MaxTessControlTextureImageUnits = */ 0,
/* .MaxTessControlUniformComponents = */ 1024,
/* .MaxTessControlTotalOutputComponents = */ 4096,
/* .MaxTessEvaluationInputComponents = */ 128,
/* .MaxTessEvaluationOutputComponents = */ 128,
/* .MaxTessEvaluationTextureImageUnits = */ 16,
/* .MaxTessEvaluationUniformComponents = */ 1024,
/* .MaxTessPatchComponents = */ 120,
/* .MaxPatchVertices = */ 32,
/* .MaxTessGenLevel = */ 64,
/* .MaxViewports = */ 16,
/* .MaxVertexAtomicCounters = */ 0,
/* .MaxTessControlAtomicCounters = */ 0,
/* .MaxTessEvaluationAtomicCounters = */ 0,
/* .MaxGeometryAtomicCounters = */ 0,
/* .MaxFragmentAtomicCounters = */ 0,
/* .MaxCombinedAtomicCounters = */ 8,
/* .MaxAtomicCounterBindings = */ 1,
/* .MaxVertexAtomicCounterBuffers = */ 0,
/* .MaxTessControlAtomicCounterBuffers = */ 0,
/* .MaxTessEvaluationAtomicCounterBuffers = */ 0,
/* .MaxGeometryAtomicCounterBuffers = */ 0,
/* .MaxFragmentAtomicCounterBuffers = */ 0,
/* .MaxCombinedAtomicCounterBuffers = */ 1,
/* .MaxAtomicCounterBufferSize = */ 16384,
/* .MaxTransformFeedbackBuffers = */ 4,
/* .MaxTransformFeedbackInterleavedComponents = */ 64,
/* .MaxCullDistances = */ 8,
/* .MaxCombinedClipAndCullDistances = */ 8,
/* .MaxSamples = */ 4,
/* .maxMeshOutputVerticesNV = */ 256,
/* .maxMeshOutputPrimitivesNV = */ 512,
/* .maxMeshWorkGroupSizeX_NV = */ 32,
/* .maxMeshWorkGroupSizeY_NV = */ 1,
/* .maxMeshWorkGroupSizeZ_NV = */ 1,
/* .maxTaskWorkGroupSizeX_NV = */ 32,
/* .maxTaskWorkGroupSizeY_NV = */ 1,
/* .maxTaskWorkGroupSizeZ_NV = */ 1,
/* .maxMeshViewCountNV = */ 4,
/* .maxDualSourceDrawBuffersEXT = */ 1,
/* .limits = */
{
/* .nonInductiveForLoops = */ 1,
/* .whileLoops = */ 1,
/* .doWhileLoops = */ 1,
/* .generalUniformIndexing = */ 1,
/* .generalAttributeMatrixVectorIndexing = */ 1,
/* .generalVaryingIndexing = */ 1,
/* .generalSamplerIndexing = */ 1,
/* .generalVariableIndexing = */ 1,
/* .generalConstantMatrixVectorIndexing = */ 1,
}
};
}
#endif // DKOMPUTE_DISABLE_SHADER_UTILS

View file

@ -3,23 +3,24 @@
namespace kp {
Tensor::Tensor()
Tensor::Tensor(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
void* data,
uint32_t elementTotalCount,
uint32_t elementMemorySize,
const TensorDataTypes& dataType,
const TensorTypes& tensorType)
{
KP_LOG_DEBUG("Kompute Tensor base constructor");
this->mTensorType = TensorTypes::eDevice;
}
Tensor::Tensor(const std::vector<float>& data, TensorTypes tensorType)
{
#if DEBUG
KP_LOG_DEBUG("Kompute Tensor constructor data length: {}, and type: {}",
data.size(),
elementTotalCount,
tensorType);
#endif
this->mData = data;
this->mShape = { static_cast<uint32_t>(data.size()) };
this->mPhysicalDevice = physicalDevice;
this->mDevice = device;
this->mDataType = dataType;
this->mTensorType = tensorType;
this->rebuild(data, elementTotalCount, elementMemorySize);
}
Tensor::~Tensor()
@ -27,57 +28,33 @@ Tensor::~Tensor()
KP_LOG_DEBUG("Kompute Tensor destructor started. Type: {}",
this->tensorType());
if (this->isInit()) {
this->freeMemoryDestroyGPUResources();
if (this->mDevice) {
this->destroy();
}
KP_LOG_DEBUG("Kompute Tensor destructor success");
}
void
Tensor::init(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device)
Tensor::rebuild(void* data,
uint32_t elementTotalCount,
uint32_t elementMemorySize)
{
KP_LOG_DEBUG("Kompute Tensor running init with Vulkan params and num data "
"elementS: {}",
this->mData.size());
KP_LOG_DEBUG("Kompute Tensor rebuilding with size {}", elementTotalCount);
this->mPhysicalDevice = physicalDevice;
this->mDevice = device;
this->mSize = elementTotalCount;
this->mDataTypeMemorySize = elementMemorySize;
this->mIsInit = true;
if (this->mPrimaryBuffer || this->mPrimaryMemory) {
KP_LOG_DEBUG(
"Kompute Tensor destroying existing resources before rebuild");
this->destroy();
}
this->allocateMemoryCreateGPUResources();
}
this->mapRawData();
std::vector<float>&
Tensor::data()
{
return this->mData;
}
float&
Tensor::operator[](int index)
{
return this->mData[index];
}
uint64_t
Tensor::memorySize()
{
return this->size() * sizeof(float);
}
uint32_t
Tensor::size()
{
return this->mShape[0];
}
std::array<uint32_t, KP_MAX_DIM_SIZE>
Tensor::shape()
{
return this->mShape;
memcpy(this->mRawData, data, this->memorySize());
}
Tensor::TensorTypes
@ -89,140 +66,50 @@ Tensor::tensorType()
bool
Tensor::isInit()
{
return this->mIsInit && this->mPrimaryBuffer && this->mPrimaryMemory;
return this->mDevice && this->mPrimaryBuffer && this->mPrimaryMemory &&
this->mRawData;
}
uint32_t
Tensor::size()
{
return this->mSize;
}
uint32_t
Tensor::dataTypeMemorySize()
{
return this->mDataTypeMemorySize;
}
uint32_t
Tensor::memorySize()
{
return this->mSize * this->mDataTypeMemorySize;
}
kp::Tensor::TensorDataTypes
Tensor::dataType()
{
return this->mDataType;
}
void*
Tensor::rawData()
{
return this->mRawData;
}
void
Tensor::setData(const std::vector<float>& data)
Tensor::setRawData(const void* data)
{
if (data.size() != this->mData.size()) {
throw std::runtime_error(
"Kompute Tensor Cannot set data of different sizes");
}
this->mData = data;
memcpy(this->mRawData, data, this->memorySize());
}
void
Tensor::recordCopyFrom(std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::shared_ptr<Tensor> copyFromTensor,
bool createBarrier)
Tensor::mapRawData()
{
vk::DeviceSize bufferSize(this->memorySize());
vk::BufferCopy copyRegion(0, 0, bufferSize);
KP_LOG_DEBUG("Kompute Tensor recordCopyFrom data size {}.", bufferSize);
this->copyBuffer(commandBuffer,
copyFromTensor->mPrimaryBuffer,
this->mPrimaryBuffer,
bufferSize,
copyRegion,
createBarrier);
}
void
Tensor::recordCopyFromStagingToDevice(
std::shared_ptr<vk::CommandBuffer> commandBuffer,
bool createBarrier)
{
vk::DeviceSize bufferSize(this->memorySize());
vk::BufferCopy copyRegion(0, 0, bufferSize);
KP_LOG_DEBUG("Kompute Tensor copying data size {}.", bufferSize);
this->copyBuffer(commandBuffer,
this->mStagingBuffer,
this->mPrimaryBuffer,
bufferSize,
copyRegion,
createBarrier);
}
void
Tensor::recordCopyFromDeviceToStaging(
std::shared_ptr<vk::CommandBuffer> commandBuffer,
bool createBarrier)
{
vk::DeviceSize bufferSize(this->memorySize());
vk::BufferCopy copyRegion(0, 0, bufferSize);
KP_LOG_DEBUG("Kompute Tensor copying data size {}.", bufferSize);
this->copyBuffer(commandBuffer,
this->mPrimaryBuffer,
this->mStagingBuffer,
bufferSize,
copyRegion,
createBarrier);
}
void
Tensor::copyBuffer(std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::shared_ptr<vk::Buffer> bufferFrom,
std::shared_ptr<vk::Buffer> bufferTo,
vk::DeviceSize bufferSize,
vk::BufferCopy copyRegion,
bool createBarrier)
{
if (!this->mIsInit) {
throw std::runtime_error(
"Kompute Tensor attempted to run copyBuffer without init");
}
commandBuffer->copyBuffer(*bufferFrom, *bufferTo, copyRegion);
if (createBarrier) {
// Buffer to ensure wait until data is copied to staging buffer
this->recordBufferMemoryBarrier(commandBuffer,
vk::AccessFlagBits::eTransferWrite,
vk::AccessFlagBits::eHostRead,
vk::PipelineStageFlagBits::eTransfer,
vk::PipelineStageFlagBits::eHost);
}
}
void
Tensor::recordBufferMemoryBarrier(
std::shared_ptr<vk::CommandBuffer> commandBuffer,
vk::AccessFlagBits srcAccessMask,
vk::AccessFlagBits dstAccessMask,
vk::PipelineStageFlagBits srcStageMask,
vk::PipelineStageFlagBits dstStageMask)
{
KP_LOG_DEBUG("Kompute Tensor recording buffer memory barrier");
vk::DeviceSize bufferSize = this->memorySize();
vk::BufferMemoryBarrier bufferMemoryBarrier;
bufferMemoryBarrier.buffer = *this->mPrimaryBuffer;
bufferMemoryBarrier.size = bufferSize;
bufferMemoryBarrier.srcAccessMask = srcAccessMask;
bufferMemoryBarrier.dstAccessMask = dstAccessMask;
bufferMemoryBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
bufferMemoryBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
commandBuffer->pipelineBarrier(srcStageMask,
dstStageMask,
vk::DependencyFlags(),
nullptr,
bufferMemoryBarrier,
nullptr);
}
vk::DescriptorBufferInfo
Tensor::constructDescriptorBufferInfo()
{
vk::DeviceSize bufferSize = this->memorySize();
return vk::DescriptorBufferInfo(*this->mPrimaryBuffer,
0, // offset
bufferSize);
}
void
Tensor::mapDataFromHostMemory()
{
KP_LOG_DEBUG("Kompute Tensor mapping data from host buffer");
std::shared_ptr<vk::DeviceMemory> hostVisibleMemory = nullptr;
@ -238,19 +125,20 @@ Tensor::mapDataFromHostMemory()
}
vk::DeviceSize bufferSize = this->memorySize();
void* mapped = this->mDevice->mapMemory(
// Given we request coherent host memory we don't need to invalidate /
// flush
this->mRawData = this->mDevice->mapMemory(
*hostVisibleMemory, 0, bufferSize, vk::MemoryMapFlags());
vk::MappedMemoryRange mappedMemoryRange(*hostVisibleMemory, 0, bufferSize);
this->mDevice->invalidateMappedMemoryRanges(mappedMemoryRange);
memcpy(this->mData.data(), mapped, bufferSize);
this->mDevice->unmapMemory(*hostVisibleMemory);
}
void
Tensor::mapDataIntoHostMemory()
Tensor::unmapRawData()
{
KP_LOG_DEBUG("Kompute Tensor local mapping tensor data to host buffer");
KP_LOG_DEBUG("Kompute Tensor mapping data from host buffer");
std::shared_ptr<vk::DeviceMemory> hostVisibleMemory = nullptr;
@ -265,15 +153,142 @@ Tensor::mapDataIntoHostMemory()
}
vk::DeviceSize bufferSize = this->memorySize();
void* mapped = this->mDevice->mapMemory(
*hostVisibleMemory, 0, bufferSize, vk::MemoryMapFlags());
memcpy(mapped, this->mData.data(), bufferSize);
vk::MappedMemoryRange mappedRange(*hostVisibleMemory, 0, bufferSize);
this->mDevice->flushMappedMemoryRanges(1, &mappedRange);
this->mDevice->unmapMemory(*hostVisibleMemory);
}
void
Tensor::recordCopyFrom(const vk::CommandBuffer& commandBuffer,
std::shared_ptr<Tensor> copyFromTensor)
{
vk::DeviceSize bufferSize(this->memorySize());
vk::BufferCopy copyRegion(0, 0, bufferSize);
KP_LOG_DEBUG("Kompute Tensor recordCopyFrom data size {}.", bufferSize);
this->recordCopyBuffer(commandBuffer,
copyFromTensor->mPrimaryBuffer,
this->mPrimaryBuffer,
bufferSize,
copyRegion);
}
void
Tensor::recordCopyFromStagingToDevice(const vk::CommandBuffer& commandBuffer)
{
vk::DeviceSize bufferSize(this->memorySize());
vk::BufferCopy copyRegion(0, 0, bufferSize);
KP_LOG_DEBUG("Kompute Tensor copying data size {}.", bufferSize);
this->recordCopyBuffer(commandBuffer,
this->mStagingBuffer,
this->mPrimaryBuffer,
bufferSize,
copyRegion);
}
void
Tensor::recordCopyFromDeviceToStaging(const vk::CommandBuffer& commandBuffer)
{
vk::DeviceSize bufferSize(this->memorySize());
vk::BufferCopy copyRegion(0, 0, bufferSize);
KP_LOG_DEBUG("Kompute Tensor copying data size {}.", bufferSize);
this->recordCopyBuffer(commandBuffer,
this->mPrimaryBuffer,
this->mStagingBuffer,
bufferSize,
copyRegion);
}
void
Tensor::recordCopyBuffer(const vk::CommandBuffer& commandBuffer,
std::shared_ptr<vk::Buffer> bufferFrom,
std::shared_ptr<vk::Buffer> bufferTo,
vk::DeviceSize bufferSize,
vk::BufferCopy copyRegion)
{
commandBuffer.copyBuffer(*bufferFrom, *bufferTo, copyRegion);
}
void
Tensor::recordPrimaryBufferMemoryBarrier(const vk::CommandBuffer& commandBuffer,
vk::AccessFlagBits srcAccessMask,
vk::AccessFlagBits dstAccessMask,
vk::PipelineStageFlagBits srcStageMask,
vk::PipelineStageFlagBits dstStageMask)
{
KP_LOG_DEBUG("Kompute Tensor recording PRIMARY buffer memory barrier");
this->recordBufferMemoryBarrier(commandBuffer,
*this->mPrimaryBuffer,
srcAccessMask,
dstAccessMask,
srcStageMask,
dstStageMask);
}
void
Tensor::recordStagingBufferMemoryBarrier(const vk::CommandBuffer& commandBuffer,
vk::AccessFlagBits srcAccessMask,
vk::AccessFlagBits dstAccessMask,
vk::PipelineStageFlagBits srcStageMask,
vk::PipelineStageFlagBits dstStageMask)
{
KP_LOG_DEBUG("Kompute Tensor recording PRIMARY buffer memory barrier");
this->recordBufferMemoryBarrier(commandBuffer,
*this->mStagingBuffer,
srcAccessMask,
dstAccessMask,
srcStageMask,
dstStageMask);
}
void
Tensor::recordBufferMemoryBarrier(const vk::CommandBuffer& commandBuffer,
const vk::Buffer& buffer,
vk::AccessFlagBits srcAccessMask,
vk::AccessFlagBits dstAccessMask,
vk::PipelineStageFlagBits srcStageMask,
vk::PipelineStageFlagBits dstStageMask)
{
KP_LOG_DEBUG("Kompute Tensor recording buffer memory barrier");
vk::DeviceSize bufferSize = this->memorySize();
vk::BufferMemoryBarrier bufferMemoryBarrier;
bufferMemoryBarrier.buffer = buffer;
bufferMemoryBarrier.size = bufferSize;
bufferMemoryBarrier.srcAccessMask = srcAccessMask;
bufferMemoryBarrier.dstAccessMask = dstAccessMask;
bufferMemoryBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
bufferMemoryBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
commandBuffer.pipelineBarrier(srcStageMask,
dstStageMask,
vk::DependencyFlags(),
nullptr,
bufferMemoryBarrier,
nullptr);
}
vk::DescriptorBufferInfo
Tensor::constructDescriptorBufferInfo()
{
KP_LOG_DEBUG("Kompute Tensor construct descriptor buffer info size {}",
this->memorySize());
vk::DeviceSize bufferSize = this->memorySize();
return vk::DescriptorBufferInfo(*this->mPrimaryBuffer,
0, // offset
bufferSize);
}
vk::BufferUsageFlags
Tensor::getPrimaryBufferUsageFlags()
{
@ -304,7 +319,8 @@ Tensor::getPrimaryMemoryPropertyFlags()
return vk::MemoryPropertyFlagBits::eDeviceLocal;
break;
case TensorTypes::eHost:
return vk::MemoryPropertyFlagBits::eHostVisible;
return vk::MemoryPropertyFlagBits::eHostVisible |
vk::MemoryPropertyFlagBits::eHostCoherent;
break;
case TensorTypes::eStorage:
return vk::MemoryPropertyFlagBits::eDeviceLocal;
@ -332,7 +348,8 @@ Tensor::getStagingMemoryPropertyFlags()
{
switch (this->mTensorType) {
case TensorTypes::eDevice:
return vk::MemoryPropertyFlagBits::eHostVisible;
return vk::MemoryPropertyFlagBits::eHostVisible |
vk::MemoryPropertyFlagBits::eHostCoherent;
break;
default:
throw std::runtime_error("Kompute Tensor invalid tensor type");
@ -344,11 +361,6 @@ Tensor::allocateMemoryCreateGPUResources()
{
KP_LOG_DEBUG("Kompute Tensor creating buffer");
if (!this->mIsInit) {
throw std::runtime_error(
"Kompute Tensor attempted to run createBuffer without init");
}
if (!this->mPhysicalDevice) {
throw std::runtime_error("Kompute Tensor phyisical device is null");
}
@ -455,71 +467,121 @@ Tensor::allocateBindMemory(std::shared_ptr<vk::Buffer> buffer,
}
void
Tensor::freeMemoryDestroyGPUResources()
Tensor::destroy()
{
KP_LOG_DEBUG("Kompute Tensor started freeMemoryDestroyGPUResources");
KP_LOG_DEBUG("Kompute Tensor started destroy()");
this->mIsInit = false;
// Setting raw data to null regardless whether device is available to
// invalidate Tensor
this->mRawData = nullptr;
this->mSize = 0;
this->mDataTypeMemorySize = 0;
if (!this->mDevice) {
KP_LOG_ERROR(
KP_LOG_WARN(
"Kompute Tensor destructor reached with null Device pointer");
return;
}
// Unmap the current memory data
this->unmapRawData();
if (this->mFreePrimaryBuffer) {
if (!this->mPrimaryBuffer) {
KP_LOG_ERROR("Kompose Tensor expected to destroy primary buffer "
"but got null buffer");
KP_LOG_WARN("Kompose Tensor expected to destroy primary buffer "
"but got null buffer");
} else {
KP_LOG_DEBUG("Kompose Tensor destroying primary buffer");
this->mDevice->destroy(
*this->mPrimaryBuffer,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mPrimaryBuffer = nullptr;
this->mFreePrimaryBuffer = false;
}
}
if (this->mFreeStagingBuffer) {
if (!this->mStagingBuffer) {
KP_LOG_ERROR("Kompose Tensor expected to destroy staging buffer "
"but got null buffer");
KP_LOG_WARN("Kompose Tensor expected to destroy staging buffer "
"but got null buffer");
} else {
KP_LOG_DEBUG("Kompose Tensor destroying staging buffer");
this->mDevice->destroy(
*this->mStagingBuffer,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mStagingBuffer = nullptr;
this->mFreeStagingBuffer = false;
}
}
if (this->mFreePrimaryMemory) {
if (!this->mPrimaryMemory) {
KP_LOG_ERROR("Kompose Tensor expected to free primary memory but "
"got null memory");
KP_LOG_WARN("Kompose Tensor expected to free primary memory but "
"got null memory");
} else {
KP_LOG_DEBUG("Kompose Tensor freeing primary memory");
this->mDevice->freeMemory(
*this->mPrimaryMemory,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mPrimaryMemory = nullptr;
this->mFreePrimaryMemory = false;
}
}
if (this->mFreeStagingMemory) {
if (!this->mStagingMemory) {
KP_LOG_ERROR("Kompose Tensor expected to free staging memory but "
"got null memory");
KP_LOG_WARN("Kompose Tensor expected to free staging memory but "
"got null memory");
} else {
KP_LOG_DEBUG("Kompose Tensor freeing staging memory");
this->mDevice->freeMemory(
*this->mStagingMemory,
(vk::Optional<const vk::AllocationCallbacks>)nullptr);
this->mStagingMemory = nullptr;
this->mFreeStagingMemory = false;
}
}
KP_LOG_DEBUG("Kompute Tensor successful freeMemoryDestroyGPUResources");
if (this->mDevice) {
this->mDevice = nullptr;
}
KP_LOG_DEBUG("Kompute Tensor successful destroy()");
}
template<>
Tensor::TensorDataTypes
TensorT<bool>::dataType()
{
return Tensor::TensorDataTypes::eBool;
}
template<>
Tensor::TensorDataTypes
TensorT<int32_t>::dataType()
{
return Tensor::TensorDataTypes::eInt;
}
template<>
Tensor::TensorDataTypes
TensorT<uint32_t>::dataType()
{
return Tensor::TensorDataTypes::eUnsignedInt;
}
template<>
Tensor::TensorDataTypes
TensorT<float>::dataType()
{
return Tensor::TensorDataTypes::eFloat;
}
template<>
Tensor::TensorDataTypes
TensorT<double>::dataType()
{
return Tensor::TensorDataTypes::eDouble;
}
}

View file

@ -12,35 +12,51 @@ namespace kp {
*/
class Algorithm
{
public:
public:
/**
Base constructor for Algorithm. Should not be used unless explicit
intended.
*/
Algorithm();
/**
* Default constructor for Algorithm
* Main constructor for algorithm with configuration parameters to create
* the underlying resources.
*
* @param device The Vulkan device to use for creating resources
* @param commandBuffer The vulkan command buffer to bind the pipeline and
* shaders
* @param tensors (optional) The tensors to use to create the descriptor
* resources
* @param spirv (optional) The spirv code to use to create the algorithm
* @param workgroup (optional) The kp::Workgroup to use for the dispatch
* which defaults to kp::Workgroup(tensor[0].size(), 1, 1) if not set.
* @param specializationConstants (optional) The kp::Constants to use to
* initialize the specialization constants which cannot be changed once set.
* @param pushConstants (optional) The kp::Constants to use when
* initializing the pipeline, which set the size of the push constants -
* these can be modified but all new values must have the same vector size
* as this initial value.
*/
Algorithm(std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
const Constants& specializationConstants = {});
const std::vector<std::shared_ptr<Tensor>>& tensors = {},
const std::vector<uint32_t>& spirv = {},
const Workgroup& workgroup = {},
const Constants& specializationConstants = {},
const Constants& pushConstants = {});
/**
* Initialiser for the shader data provided to the algorithm as well as
* tensor parameters that will be used in shader.
* Rebuild function to reconstruct algorithm with configuration parameters
* to create the underlying resources.
*
* @param shaderFileData The bytes in spir-v format of the shader
* @tensorParams The Tensors to be used in the Algorithm / shader for
* @specalizationInstalces The specialization parameters to pass to the function
* processing
* @param tensors The tensors to use to create the descriptor resources
* @param spirv The spirv code to use to create the algorithm
* @param workgroup (optional) The kp::Workgroup to use for the dispatch
* which defaults to kp::Workgroup(tensor[0].size(), 1, 1) if not set.
* @param specializationConstants (optional) The kp::Constants to use to
* initialize the specialization constants which cannot be changed once set.
* @param pushConstants (optional) The kp::Constants to use when
* initializing the pipeline, which set the size of the push constants -
* these can be modified but all new values must have the same vector size
* as this initial value.
*/
void init(const std::vector<uint32_t>& shaderFileData,
std::vector<std::shared_ptr<Tensor>> tensorParams);
void rebuild(const std::vector<std::shared_ptr<Tensor>>& tensors,
const std::vector<uint32_t>& spirv,
const Workgroup& workgroup = {},
const Constants& specializationConstants = {},
const Constants& pushConstants = {});
/**
* Destructor for Algorithm which is responsible for freeing and desroying
@ -52,16 +68,88 @@ public:
* Records the dispatch function with the provided template parameters or
* alternatively using the size of the tensor by default.
*
* @param x Layout X dispatch value
* @param y Layout Y dispatch value
* @param z Layout Z dispatch value
* @param commandBuffer Command buffer to record the algorithm resources to
*/
void recordDispatch(uint32_t x = 1, uint32_t y = 1, uint32_t z = 1);
void recordDispatch(const vk::CommandBuffer& commandBuffer);
private:
/**
* Records command that binds the "core" algorithm components which consist
* of binding the pipeline and binding the descriptorsets.
*
* @param commandBuffer Command buffer to record the algorithm resources to
*/
void recordBindCore(const vk::CommandBuffer& commandBuffer);
/**
* Records command that binds the push constants to the command buffer
* provided
* - it is required that the pushConstants provided are of the same size as
* the ones provided during initialization.
*
* @param commandBuffer Command buffer to record the algorithm resources to
*/
void recordBindPush(const vk::CommandBuffer& commandBuffer);
/**
* function that checks all the gpu resource components to verify if these
* have been created and returns true if all are valid.
*
* @returns returns true if the algorithm is currently initialized.
*/
bool isInit();
/**
* Sets the work group to use in the recordDispatch
*
* @param workgroup The kp::Workgroup value to use to update the algorithm.
* It must have a value greater than 1 on the x value (index 1) otherwise it
* will be initialized on the size of the first tensor (ie.
* this->mTensor[0]->size())
*/
void setWorkgroup(const Workgroup& workgroup, uint32_t minSize = 1);
/**
* Sets the push constants to the new value provided to use in the next
* bindPush()
*
* @param The kp::Constant to use to set the push constants to use in the
* next bindPush(...) calls. The constants provided must be of the same size
* as the ones created during initialization.
*/
void setPush(const Constants& pushConstants);
/**
* Gets the current workgroup from the algorithm.
*
* @param The kp::Constant to use to set the push constants to use in the
* next bindPush(...) calls. The constants provided must be of the same size
* as the ones created during initialization.
*/
const Workgroup& getWorkgroup();
/**
* Gets the specialization constants of the current algorithm.
*
* @returns The kp::Constants currently set for specialization constants
*/
const Constants& getSpecializationConstants();
/**
* Gets the specialization constants of the current algorithm.
*
* @returns The kp::Constants currently set for push constants
*/
const Constants& getPush();
/**
* Gets the current tensors that are used in the algorithm.
*
* @returns The list of tensors used in the algorithm.
*/
const std::vector<std::shared_ptr<Tensor>>& getTensors();
void destroy();
private:
// -------------- NEVER OWNED RESOURCES
std::shared_ptr<vk::Device> mDevice;
std::shared_ptr<vk::CommandBuffer> mCommandBuffer;
std::vector<std::shared_ptr<Tensor>> mTensors;
// -------------- OPTIONALLY OWNED RESOURCES
std::shared_ptr<vk::DescriptorSetLayout> mDescriptorSetLayout;
@ -80,15 +168,17 @@ private:
bool mFreePipeline = false;
// -------------- ALWAYS OWNED RESOURCES
std::vector<uint32_t> mSpirv;
Constants mSpecializationConstants;
Constants mPushConstants;
Workgroup mWorkgroup;
// Create util functions
void createShaderModule(const std::vector<uint32_t>& shaderFileData);
void createShaderModule();
void createPipeline();
// Parameters
void createParameters(std::vector<std::shared_ptr<Tensor>>& tensorParams);
void createDescriptorPool();
void createParameters();
};
} // End namespace kp

View file

@ -60,12 +60,19 @@ extern py::object kp_debug, kp_info, kp_warning, kp_error;
#define KP_LOG_DEBUG(...)
#else
#if defined(VK_USE_PLATFORM_ANDROID_KHR)
#define KP_LOG_DEBUG(...) \
((void)__android_log_print(ANDROID_LOG_DEBUG, KOMPUTE_LOG_TAG, fmt::format(__VA_ARGS__)))
#define KP_LOG_DEBUG(...) \
((void)__android_log_write( \
ANDROID_LOG_DEBUG, KOMPUTE_LOG_TAG, fmt::format(__VA_ARGS__).c_str()))
#elif defined(KOMPUTE_BUILD_PYTHON)
#define KP_LOG_DEBUG(...) kp_debug(fmt::format(__VA_ARGS__))
#else
#define KP_LOG_DEBUG(...) fmt::print("[{} {}] [debug] [{}:{}] {}\n", __DATE__, __TIME__, __FILE__, __LINE__, fmt::format(__VA_ARGS__))
#define KP_LOG_DEBUG(...) \
fmt::print("[{} {}] [debug] [{}:{}] {}\n", \
__DATE__, \
__TIME__, \
__FILE__, \
__LINE__, \
fmt::format(__VA_ARGS__))
#endif // VK_USE_PLATFORM_ANDROID_KHR
#endif // SPDLOG_ACTIVE_LEVEL > 1
@ -73,12 +80,19 @@ extern py::object kp_debug, kp_info, kp_warning, kp_error;
#define KP_LOG_INFO(...)
#else
#if defined(VK_USE_PLATFORM_ANDROID_KHR)
#define KP_LOG_INFO(...) \
((void)__android_log_print(ANDROID_LOG_INFO, KOMPUTE_LOG_TAG, fmt::format(__VA_ARGS__)))
#define KP_LOG_INFO(...) \
((void)__android_log_write( \
ANDROID_LOG_INFO, KOMPUTE_LOG_TAG, fmt::format(__VA_ARGS__).c_str()))
#elif defined(KOMPUTE_BUILD_PYTHON)
#define KP_LOG_INFO(...) kp_info(fmt::format(__VA_ARGS__))
#else
#define KP_LOG_INFO(...) fmt::print("[{} {}] [debug] [{}:{}] {}\n", __DATE__, __TIME__, __FILE__, __LINE__, fmt::format(__VA_ARGS__))
#define KP_LOG_INFO(...) \
fmt::print("[{} {}] [debug] [{}:{}] {}\n", \
__DATE__, \
__TIME__, \
__FILE__, \
__LINE__, \
fmt::format(__VA_ARGS__))
#endif // VK_USE_PLATFORM_ANDROID_KHR
#endif // SPDLOG_ACTIVE_LEVEL > 2
@ -86,12 +100,19 @@ extern py::object kp_debug, kp_info, kp_warning, kp_error;
#define KP_LOG_WARN(...)
#else
#if defined(VK_USE_PLATFORM_ANDROID_KHR)
#define KP_LOG_WARN(...) \
((void)__android_log_print(ANDROID_LOG_WARN, KOMPUTE_LOG_TAG, fmt::format(__VA_ARGS__)))
#define KP_LOG_WARN(...) \
((void)__android_log_write( \
ANDROID_LOG_WARN, KOMPUTE_LOG_TAG, fmt::format(__VA_ARGS__).c_str()))
#elif defined(KOMPUTE_BUILD_PYTHON)
#define KP_LOG_WARN(...) kp_warning(fmt::format(__VA_ARGS__))
#else
#define KP_LOG_WARN(...) fmt::print("[{} {}] [debug] [{}:{}] {}\n", __DATE__, __TIME__, __FILE__, __LINE__, fmt::format(__VA_ARGS__))
#define KP_LOG_WARN(...) \
fmt::print("[{} {}] [debug] [{}:{}] {}\n", \
__DATE__, \
__TIME__, \
__FILE__, \
__LINE__, \
fmt::format(__VA_ARGS__))
#endif // VK_USE_PLATFORM_ANDROID_KHR
#endif // SPDLOG_ACTIVE_LEVEL > 3
@ -99,12 +120,19 @@ extern py::object kp_debug, kp_info, kp_warning, kp_error;
#define KP_LOG_ERROR(...)
#else
#if defined(VK_USE_PLATFORM_ANDROID_KHR)
#define KP_LOG_ERROR(...) \
((void)__android_log_print(ANDROID_LOG_ERROR, KOMPUTE_LOG_TAG, fmt::format(__VA_ARGS__)))
#define KP_LOG_ERROR(...) \
((void)__android_log_write( \
ANDROID_LOG_ERROR, KOMPUTE_LOG_TAG, fmt::format(__VA_ARGS__).c_str()))
#elif defined(KOMPUTE_BUILD_PYTHON)
#define KP_LOG_ERROR(...) kp_error(fmt::format(__VA_ARGS__))
#else
#define KP_LOG_ERROR(...) fmt::print("[{} {}] [debug] [{}:{}] {}\n", __DATE__, __TIME__, __FILE__, __LINE__, fmt::format(__VA_ARGS__))
#define KP_LOG_ERROR(...) \
fmt::print("[{} {}] [debug] [{}:{}] {}\n", \
__DATE__, \
__TIME__, \
__FILE__, \
__LINE__, \
fmt::format(__VA_ARGS__))
#endif // VK_USE_PLATFORM_ANDROID_KHR
#endif // SPDLOG_ACTIVE_LEVEL > 4
#endif // KOMPUTE_SPDLOG_ENABLED

View file

@ -7,8 +7,6 @@
#include "kompute/Sequence.hpp"
#include "kompute/operations/OpTensorSyncDevice.hpp"
#define KP_DEFAULT_SESSION "DEFAULT"
namespace kp {
@ -26,16 +24,18 @@ class Manager
Manager();
/**
* Similar to base constructor but allows the user to provide the device
* they would like to create the resources on.
* Similar to base constructor but allows for further configuration to use
* when creating the Vulkan resources.
*
* @param physicalDeviceIndex The index of the physical device to use
* @param familyQueueIndices (Optional) List of queue indices to add for
* explicit allocation
* @param totalQueues The total number of compute queues to create.
* @param desiredExtensions The desired extensions to load from
* physicalDevice
*/
Manager(uint32_t physicalDeviceIndex,
const std::vector<uint32_t>& familyQueueIndices = {});
const std::vector<uint32_t>& familyQueueIndices = {},
const std::vector<std::string>& desiredExtensions = {});
/**
* Manager constructor which allows your own vulkan application to integrate
@ -48,8 +48,7 @@ class Manager
*/
Manager(std::shared_ptr<vk::Instance> instance,
std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
uint32_t physicalDeviceIndex);
std::shared_ptr<vk::Device> device);
/**
* Manager destructor which would ensure all owned resources are destroyed
@ -58,269 +57,124 @@ class Manager
~Manager();
/**
* Get or create a managed Sequence that will be contained by this manager.
* If the named sequence does not currently exist, it would be created and
* initialised.
* Create a managed sequence that will be destroyed by this manager
* if it hasn't been destroyed by its reference count going to zero.
*
* @param sequenceName The name for the named sequence to be retrieved or
* created
* @param queueIndex The queue to use from the available queues
* @return Shared pointer to the manager owned sequence resource
* @param nrOfTimestamps The maximum number of timestamps to allocate.
* If zero (default), disables latching of timestamps.
* @returns Shared pointer with initialised sequence
*/
std::shared_ptr<Sequence> sequence(
std::string sequenceName = KP_DEFAULT_SESSION,
uint32_t queueIndex = 0);
std::shared_ptr<Sequence> sequence(uint32_t queueIndex = 0,
uint32_t totalTimestamps = 0);
/**
* Function that evaluates operation against named sequence.
*
* @param tensors The tensors to be used in the operation recorded
* @param sequenceName The name of the sequence to be retrieved or created
* @param TArgs Template parameters that will be used to initialise
* Operation to allow for extensible configurations on initialisation
*/
template<typename T, typename... TArgs>
void evalOp(std::vector<std::shared_ptr<Tensor>> tensors,
std::string sequenceName,
TArgs&&... params)
{
KP_LOG_DEBUG("Kompute Manager evalOp triggered");
std::shared_ptr<kp::Sequence> sq =
this->sequence(sequenceName);
KP_LOG_DEBUG("Kompute Manager evalOp running sequence BEGIN");
sq->begin();
KP_LOG_DEBUG("Kompute Manager evalOp running sequence RECORD");
sq->record<T>(tensors, std::forward<TArgs>(params)...);
KP_LOG_DEBUG("Kompute Manager evalOp running sequence END");
sq->end();
KP_LOG_DEBUG("Kompute Manager evalOp running sequence EVAL");
sq->eval();
KP_LOG_DEBUG("Kompute Manager evalOp running sequence SUCCESS");
}
/**
* Function that evaluates operation against a newly created sequence.
*
* @param tensors The tensors to be used in the operation recorded
* @param TArgs Template parameters that will be used to initialise
* Operation to allow for extensible configurations on initialisation
*/
template<typename T, typename... TArgs>
void evalOpDefault(std::vector<std::shared_ptr<Tensor>> tensors,
TArgs&&... params)
{
KP_LOG_DEBUG("Kompute Manager evalOp Default triggered");
this->mCurrentSequenceIndex++;
this->evalOp<T>(
tensors, KP_DEFAULT_SESSION, std::forward<TArgs>(params)...);
}
/**
* Function that evaluates operation against named sequence asynchronously.
*
* @param tensors The tensors to be used in the operation recorded
* @param sequenceName The name of the sequence to be retrieved or created
* @param params Template parameters that will be used to initialise
* Operation to allow for extensible configurations on initialisation
*/
template<typename T, typename... TArgs>
void evalOpAsync(std::vector<std::shared_ptr<Tensor>> tensors,
std::string sequenceName,
TArgs&&... params)
{
KP_LOG_DEBUG("Kompute Manager evalOpAsync triggered");
std::shared_ptr<kp::Sequence> sq =
this->sequence(sequenceName);
KP_LOG_DEBUG("Kompute Manager evalOpAsync running sequence BEGIN");
sq->begin();
KP_LOG_DEBUG("Kompute Manager evalOpAsync running sequence RECORD");
sq->record<T>(tensors, std::forward<TArgs>(params)...);
KP_LOG_DEBUG("Kompute Manager evalOpAsync running sequence END");
sq->end();
KP_LOG_DEBUG("Kompute Manager evalOpAsync running sequence EVAL");
sq->evalAsync();
KP_LOG_DEBUG("Kompute Manager evalOpAsync running sequence SUCCESS");
}
/**
* Operation that evaluates operation against default sequence
* asynchronously.
*
* @param tensors The tensors to be used in the operation recorded
* @param params Template parameters that will be used to initialise
* Operation to allow for extensible configurations on initialisation
*/
template<typename T, typename... TArgs>
void evalOpAsyncDefault(std::vector<std::shared_ptr<Tensor>> tensors,
TArgs&&... params)
{
KP_LOG_DEBUG("Kompute Manager evalOpAsyncDefault triggered");
this->mCurrentSequenceIndex++;
this->evalOpAsync<T>(
tensors, KP_DEFAULT_SESSION, std::forward<TArgs>(params)...);
}
/**
* Operation that awaits for named sequence to finish.
*
* @param sequenceName The name of the sequence to wait for termination
* @param waitFor The amount of time to wait before timing out
*/
void evalOpAwait(std::string sequenceName, uint64_t waitFor = UINT64_MAX)
{
KP_LOG_DEBUG("Kompute Manager evalOpAwait triggered with sequence {}",
sequenceName);
std::unordered_map<std::string, std::shared_ptr<Sequence>>::iterator
found = this->mManagedSequences.find(sequenceName);
if (found != this->mManagedSequences.end()) {
if (std::shared_ptr<kp::Sequence> sq = found->second) {
KP_LOG_DEBUG("Kompute Manager evalOpAwait running sequence "
"Sequence EVAL AWAIT");
if (sq->isRunning()) {
sq->evalAwait(waitFor);
}
}
KP_LOG_DEBUG(
"Kompute Manager evalOpAwait running sequence SUCCESS");
} else {
KP_LOG_ERROR("Kompute Manager evalOpAwait Sequence not found");
}
}
/**
* Operation that awaits for default sequence to finish.
*
* @param tensors The tensors to be used in the operation recorded
* @param params Template parameters that will be used to initialise
* Operation to allow for extensible configurations on initialisation
*/
void evalOpAwaitDefault(uint64_t waitFor = UINT64_MAX)
{
KP_LOG_DEBUG("Kompute Manager evalOpAwaitDefault triggered");
this->evalOpAwait(KP_DEFAULT_SESSION, waitFor);
}
/**
* Function that simplifies the common workflow of tensor creation and
* initialization. It will take the constructor parameters for a Tensor
* and will will us it to create a new Tensor and then create it. The
* tensor memory will then be managed and owned by the manager.
* Create a managed tensor that will be destroyed by this manager
* if it hasn't been destroyed by its reference count going to zero.
*
* @param data The data to initialize the tensor with
* @param tensorType The type of tensor to initialize
* @param syncDataToGPU Whether to sync the data to GPU memory
* @returns Initialized Tensor with memory Syncd to GPU device
* @returns Shared pointer with initialised tensor
*/
std::shared_ptr<Tensor> tensor(
template<typename T>
std::shared_ptr<TensorT<T>> tensorT(
const std::vector<T>& data,
Tensor::TensorTypes tensorType = Tensor::TensorTypes::eDevice)
{
KP_LOG_DEBUG("Kompute Manager tensor creation triggered");
std::shared_ptr<TensorT<T>> tensor{ new kp::TensorT<T>(
this->mPhysicalDevice, this->mDevice, data, tensorType) };
if (this->mManageResources) {
this->mManagedTensors.push_back(tensor);
}
return tensor;
}
std::shared_ptr<TensorT<float>> tensor(
const std::vector<float>& data,
Tensor::TensorTypes tensorType = Tensor::TensorTypes::eDevice,
bool syncDataToGPU = true);
Tensor::TensorTypes tensorType = Tensor::TensorTypes::eDevice)
{
return this->tensorT<float>(data, tensorType);
}
std::shared_ptr<Tensor> tensor(
void* data,
uint32_t elementTotalCount,
uint32_t elementMemorySize,
const Tensor::TensorDataTypes& dataType,
Tensor::TensorTypes tensorType = Tensor::TensorTypes::eDevice)
{
std::shared_ptr<Tensor> tensor{ new kp::Tensor(this->mPhysicalDevice,
this->mDevice,
data,
elementTotalCount,
elementMemorySize,
dataType,
tensorType) };
if (this->mManageResources) {
this->mManagedTensors.push_back(tensor);
}
return tensor;
}
/**
* Function that simplifies the common workflow of tensor initialisation. It
* will take the constructor parameters for a Tensor and will will us it to
* create a new Tensor. The tensor memory will then be managed and owned by
* the manager.
* Create a managed algorithm that will be destroyed by this manager
* if it hasn't been destroyed by its reference count going to zero.
*
* @param tensors Array of tensors to rebuild
* @param syncDataToGPU Whether to sync the data to GPU memory
* @param tensors (optional) The tensors to initialise the algorithm with
* @param spirv (optional) The SPIRV bytes for the algorithm to dispatch
* @param workgroup (optional) kp::Workgroup for algorithm to use, and
* defaults to (tensor[0].size(), 1, 1)
* @param specializationConstants (optional) kp::Constant to use for
* specialization constants, and defaults to an empty constant
* @param pushConstants (optional) kp::Constant to use for push constants,
* and defaults to an empty constant
* @returns Shared pointer with initialised algorithm
*/
void rebuild(std::vector<std::shared_ptr<kp::Tensor>> tensors,
bool syncDataToGPU = true);
std::shared_ptr<Algorithm> algorithm(
const std::vector<std::shared_ptr<Tensor>>& tensors = {},
const std::vector<uint32_t>& spirv = {},
const Workgroup& workgroup = {},
const Constants& specializationConstants = {},
const Constants& pushConstants = {});
/**
* Function that simplifies the common workflow of tensor initialisation. It
* will take the constructor parameters for a Tensor and will will us it to
* create a new Tensor. The tensor memory will then be managed and owned by
* the manager.
*
* @param tensors Single tensor to rebuild
* @param syncDataToGPU Whether to sync the data to GPU memory
*/
void rebuild(std::shared_ptr<kp::Tensor> tensor,
bool syncDataToGPU = true);
* Destroy the GPU resources and all managed resources by manager.
**/
void destroy();
/**
* Run a pseudo-garbage collection to release all the managed resources
* that have been already freed due to these reaching to zero ref count.
**/
void clear();
/**
* Destroy owned Vulkan GPU resources and free GPU memory for
* single tensor.
*
* @param tensors Single tensor to rebuild
*/
void destroy(std::shared_ptr<kp::Tensor> tensor);
/**
* Destroy owned Vulkan GPU resources and free GPU memory for
* vector of tensors.
*
* @param tensors Single tensor to rebuild
*/
void destroy(std::vector<std::shared_ptr<kp::Tensor>> tensors);
/**
* Destroy owned Vulkan GPU resources and free GPU memory for
* vector of sequences. Destroying by sequence name is more efficent
* and hence recommended instead of by object.
*
* @param sequences Vector for shared ptrs with sequences to destroy
*/
void destroy(std::vector<std::shared_ptr<kp::Sequence>> sequences);
/**
* Destroy owned Vulkan GPU resources and free GPU memory for
* single sequence. Destroying by sequence name is more efficent
* and hence recommended instead of by object.
*
* @param sequences Single sequence to rebuild
*/
void destroy(std::shared_ptr<kp::Sequence> sequence);
/**
* Destroy owned Vulkan GPU resources and free GPU memory for
* sequence by name.
*
* @param sequenceName Single name of named sequence to destroy
*/
void destroy(const std::string& sequenceName);
/**
* Destroy owned Vulkan GPU resources and free GPU memory for
* sequences using vector of named sequence names.
*
* @param sequenceName Vector of sequence names to destroy
*/
void destroy(const std::vector<std::string>& sequenceNames);
* Return a struct containing information about the device.
**/
vk::PhysicalDeviceProperties getDeviceProperties() const;
private:
// -------------- OPTIONALLY OWNED RESOURCES
std::shared_ptr<vk::Instance> mInstance = nullptr;
bool mFreeInstance = false;
std::shared_ptr<vk::PhysicalDevice> mPhysicalDevice = nullptr;
uint32_t mPhysicalDeviceIndex = -1;
std::shared_ptr<vk::Device> mDevice = nullptr;
bool mFreeDevice = false;
// -------------- ALWAYS OWNED RESOURCES
std::set<std::shared_ptr<Tensor>> mManagedTensors;
std::unordered_map<std::string, std::shared_ptr<Sequence>>
mManagedSequences;
std::vector<std::weak_ptr<Tensor>> mManagedTensors;
std::vector<std::weak_ptr<Sequence>> mManagedSequences;
std::vector<std::weak_ptr<Algorithm>> mManagedAlgorithms;
std::vector<uint32_t> mComputeQueueFamilyIndices;
std::vector<std::shared_ptr<vk::Queue>> mComputeQueues;
uint32_t mCurrentSequenceIndex = -1;
bool mManageResources = false;
#if DEBUG
#ifndef KOMPUTE_DISABLE_VK_DEBUG_LAYERS
@ -331,7 +185,9 @@ class Manager
// Create functions
void createInstance();
void createDevice(const std::vector<uint32_t>& familyQueueIndices = {});
void createDevice(const std::vector<uint32_t>& familyQueueIndices = {},
uint32_t hysicalDeviceIndex = 0,
const std::vector<std::string>& desiredExtensions = {});
};
} // End namespace kp

View file

@ -1,47 +0,0 @@
#pragma once
#include "kompute/Core.hpp"
#include "kompute/Tensor.hpp"
namespace kp {
class Algorithm
{
public:
Algorithm();
Algorithm(std::shared_ptr<vk::Device> device);
void init(std::string shaderFilePath,
std::vector<std::shared_ptr<Tensor>> tensorParams);
~Algorithm();
private:
// -------------- NEVER OWNED RESOURCES
std::shared_ptr<vk::Device> mDevice;
// -------------- OPTIONALLY OWNED RESOURCES
std::shared_ptr<vk::DescriptorSetLayout> mDescriptorSetLayout;
bool mFreeDescriptorSetLayout = false;
std::shared_ptr<vk::DescriptorPool> mDescriptorPool;
bool mFreeDescriptorPool = false;
std::shared_ptr<vk::DescriptorSet> mDescriptorSet;
bool mFreeDescriptorSet = false;
std::shared_ptr<vk::ShaderModule> mShaderModule;
bool mFreeShaderModule = false;
std::shared_ptr<vk::PipelineLayout> mPipelineLayout;
bool mFreePipelineLayout = false;
std::shared_ptr<vk::PipelineCache> mPipelineCache;
bool mFreePipelineCache = false;
std::shared_ptr<vk::Pipeline> mPipeline;
bool mFreePipeline = false;
// Create util functions
void createParameters();
void createShaderModule(std::string shaderFilePath);
void createPipeline();
};
} // End namespace kp

View file

@ -2,6 +2,7 @@
#include "kompute/Core.hpp"
#include "kompute/operations/OpAlgoDispatch.hpp"
#include "kompute/operations/OpBase.hpp"
namespace kp {
@ -9,14 +10,9 @@ namespace kp {
/**
* Container of operations that can be sent to GPU as batch
*/
class Sequence
class Sequence : public std::enable_shared_from_this<Sequence>
{
public:
/**
* Base constructor for Sequence. Should not be used unless explicit
* intended.
*/
Sequence();
/**
* Main constructor for sequence which requires core vulkan components to
* generate all dependent resources.
@ -25,11 +21,13 @@ class Sequence
* @param device Vulkan logical device
* @param computeQueue Vulkan compute queue
* @param queueIndex Vulkan compute queue index in device
* @param totalTimestamps Maximum number of timestamps to allocate
*/
Sequence(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::Queue> computeQueue,
uint32_t queueIndex);
uint32_t queueIndex,
uint32_t totalTimestamps = 0);
/**
* Destructor for sequence which is responsible for cleaning all subsequent
* owned operations.
@ -37,80 +35,16 @@ class Sequence
~Sequence();
/**
* Initialises sequence including the creation of the command pool and the
* command buffer.
*/
void init();
/**
* Begins recording commands for commands to be submitted into the command
* buffer.
* Record function for operation to be added to the GPU queue in batch. This
* template requires classes to be derived from the OpBase class. This
* function also requires the Sequence to be recording, otherwise it will
* not be able to add the operation.
*
* @return Boolean stating whether execution was successful.
* @param op Object derived from kp::BaseOp that will be recoreded by the
* sequence which will be used when the operation is evaluated.
* @return shared_ptr<Sequence> of the Sequence class itself
*/
bool begin();
/**
* Ends the recording and stops recording commands when the record command
* is sent.
*
* @return Boolean stating whether execution was successful.
*/
bool end();
/**
* Eval sends all the recorded and stored operations in the vector of
* operations into the gpu as a submit job with a barrier.
*
* @return Boolean stating whether execution was successful.
*/
bool eval();
/**
* Eval Async sends all the recorded and stored operations in the vector of
* operations into the gpu as a submit job with a barrier. EvalAwait() must
* be called after to ensure the sequence is terminated correctly.
*
* @return Boolean stating whether execution was successful.
*/
bool evalAsync();
/**
* Eval Await waits for the fence to finish processing and then once it
* finishes, it runs the postEval of all operations.
*
* @param waitFor Number of milliseconds to wait before timing out.
* @return Boolean stating whether execution was successful.
*/
bool evalAwait(uint64_t waitFor = UINT64_MAX);
/**
* Returns true if the sequence is currently in recording activated.
*
* @return Boolean stating if recording ongoing.
*/
bool isRecording();
/**
* Returns true if the sequence is currently running - mostly used for async
* workloads.
*
* @return Boolean stating if currently running.
*/
bool isRunning();
/**
* Returns true if the sequence has been successfully initialised.
*
* @return Boolean stating if sequence has been initialised.
*/
bool isInit();
/**
* Destroys and frees the GPU resources which include the buffer and memory
* and sets the sequence as init=False.
*/
void freeMemoryDestroyGPUResources();
std::shared_ptr<Sequence> record(std::shared_ptr<OpBase> op);
/**
* Record function for operation to be added to the GPU queue in batch. This
@ -121,45 +55,215 @@ class Sequence
* @param tensors Vector of tensors to use for the operation
* @param TArgs Template parameters that are used to initialise operation
* which allows for extensible configurations on initialisation.
* @return shared_ptr<Sequence> of the Sequence class itself
*/
template<typename T, typename... TArgs>
bool record(std::vector<std::shared_ptr<Tensor>> tensors, TArgs&&... params)
std::shared_ptr<Sequence> record(
std::vector<std::shared_ptr<Tensor>> tensors,
TArgs&&... params)
{
static_assert(std::is_base_of<OpBase, T>::value,
"Kompute Sequence record(...) template only valid with "
"OpBase derived classes");
KP_LOG_DEBUG("Kompute Sequence record function started");
if (!this->isRecording()) {
KP_LOG_ERROR(
"Kompute sequence record attempted when not record BEGIN");
return false;
}
KP_LOG_DEBUG("Kompute Sequence creating OpBase derived class instance");
T* op = new T(this->mPhysicalDevice,
this->mDevice,
this->mCommandBuffer,
tensors,
std::forward<TArgs>(params)...);
OpBase* baseOp = dynamic_cast<OpBase*>(op);
std::unique_ptr<OpBase> baseOpPtr{ baseOp };
KP_LOG_DEBUG(
"Kompute Sequence running init on OpBase derived class instance");
baseOpPtr->init();
KP_LOG_DEBUG(
"Kompute Sequence running record on OpBase derived class instance");
baseOpPtr->record();
mOperations.push_back(std::move(baseOpPtr));
return true;
std::shared_ptr<T> op{ new T(tensors, std::forward<TArgs>(params)...) };
return this->record(op);
}
/**
* Record function for operation to be added to the GPU queue in batch. This
* template requires classes to be derived from the OpBase class. This
* function also requires the Sequence to be recording, otherwise it will
* not be able to add the operation.
*
* @param algorithm Algorithm to use for the record often used for OpAlgo
* operations
* @param TArgs Template parameters that are used to initialise operation
* which allows for extensible configurations on initialisation.
* @return shared_ptr<Sequence> of the Sequence class itself
*/
template<typename T, typename... TArgs>
std::shared_ptr<Sequence> record(std::shared_ptr<Algorithm> algorithm,
TArgs&&... params)
{
std::shared_ptr<T> op{ new T(algorithm,
std::forward<TArgs>(params)...) };
return this->record(op);
}
/**
* Eval sends all the recorded and stored operations in the vector of
* operations into the gpu as a submit job synchronously (with a barrier).
*
* @return shared_ptr<Sequence> of the Sequence class itself
*/
std::shared_ptr<Sequence> eval();
/**
* Resets all the recorded and stored operations, records the operation
* provided and submits into the gpu as a submit job synchronously (with a
* barrier).
*
* @return shared_ptr<Sequence> of the Sequence class itself
*/
std::shared_ptr<Sequence> eval(std::shared_ptr<OpBase> op);
/**
* Eval sends all the recorded and stored operations in the vector of
* operations into the gpu as a submit job with a barrier.
*
* @param tensors Vector of tensors to use for the operation
* @param TArgs Template parameters that are used to initialise operation
* which allows for extensible configurations on initialisation.
* @return shared_ptr<Sequence> of the Sequence class itself
*/
template<typename T, typename... TArgs>
std::shared_ptr<Sequence> eval(std::vector<std::shared_ptr<Tensor>> tensors,
TArgs&&... params)
{
std::shared_ptr<T> op{ new T(tensors, std::forward<TArgs>(params)...) };
return this->eval(op);
}
/**
* Eval sends all the recorded and stored operations in the vector of
* operations into the gpu as a submit job with a barrier.
*
* @param algorithm Algorithm to use for the record often used for OpAlgo
* operations
* @param TArgs Template parameters that are used to initialise operation
* which allows for extensible configurations on initialisation.
* @return shared_ptr<Sequence> of the Sequence class itself
*/
template<typename T, typename... TArgs>
std::shared_ptr<Sequence> eval(std::shared_ptr<Algorithm> algorithm,
TArgs&&... params)
{
std::shared_ptr<T> op{ new T(algorithm,
std::forward<TArgs>(params)...) };
return this->eval(op);
}
/**
* Eval Async sends all the recorded and stored operations in the vector of
* operations into the gpu as a submit job without a barrier. EvalAwait()
* must ALWAYS be called after to ensure the sequence is terminated
* correctly.
*
* @return Boolean stating whether execution was successful.
*/
std::shared_ptr<Sequence> evalAsync();
/**
* Clears currnet operations to record provided one in the vector of
* operations into the gpu as a submit job without a barrier. EvalAwait()
* must ALWAYS be called after to ensure the sequence is terminated
* correctly.
*
* @return Boolean stating whether execution was successful.
*/
std::shared_ptr<Sequence> evalAsync(std::shared_ptr<OpBase> op);
/**
* Eval sends all the recorded and stored operations in the vector of
* operations into the gpu as a submit job with a barrier.
*
* @param tensors Vector of tensors to use for the operation
* @param TArgs Template parameters that are used to initialise operation
* which allows for extensible configurations on initialisation.
* @return shared_ptr<Sequence> of the Sequence class itself
*/
template<typename T, typename... TArgs>
std::shared_ptr<Sequence> evalAsync(
std::vector<std::shared_ptr<Tensor>> tensors,
TArgs&&... params)
{
std::shared_ptr<T> op{ new T(tensors, std::forward<TArgs>(params)...) };
return this->evalAsync(op);
}
/**
* Eval sends all the recorded and stored operations in the vector of
* operations into the gpu as a submit job with a barrier.
*
* @param algorithm Algorithm to use for the record often used for OpAlgo
* operations
* @param TArgs Template parameters that are used to initialise operation
* which allows for extensible configurations on initialisation.
* @return shared_ptr<Sequence> of the Sequence class itself
*/
template<typename T, typename... TArgs>
std::shared_ptr<Sequence> evalAsync(std::shared_ptr<Algorithm> algorithm,
TArgs&&... params)
{
std::shared_ptr<T> op{ new T(algorithm,
std::forward<TArgs>(params)...) };
return this->evalAsync(op);
}
/**
* Eval Await waits for the fence to finish processing and then once it
* finishes, it runs the postEval of all operations.
*
* @param waitFor Number of milliseconds to wait before timing out.
* @return shared_ptr<Sequence> of the Sequence class itself
*/
std::shared_ptr<Sequence> evalAwait(uint64_t waitFor = UINT64_MAX);
/**
* Clear function clears all operations currently recorded and starts
* recording again.
*/
void clear();
/**
* Return the timestamps that were latched at the beginning and
* after each operation during the last eval() call.
*/
std::vector<std::uint64_t> getTimestamps();
/**
* Begins recording commands for commands to be submitted into the command
* buffer.
*
* @return Boolean stating whether execution was successful.
*/
void begin();
/**
* Ends the recording and stops recording commands when the record command
* is sent.
*
* @return Boolean stating whether execution was successful.
*/
void end();
/**
* Returns true if the sequence is currently in recording activated.
*
* @return Boolean stating if recording ongoing.
*/
bool isRecording();
/**
* Returns true if the sequence has been initialised, and it's based on the
* GPU resources being refrenced.
*
* @return Boolean stating if is initialized
*/
bool isInit();
/**
* Clears command buffer and triggers re-record of all the current
* operations saved, which is useful if the underlying kp::Tensors or
* kp::Algorithms are modified and need to be re-recorded.
*/
void rerecord();
/**
* Returns true if the sequence is currently running - mostly used for async
* workloads.
*
* @return Boolean stating if currently running.
*/
bool isRunning();
/**
* Destroys and frees the GPU resources which include the buffer and memory
* and sets the sequence as init=False.
*/
void destroy();
private:
// -------------- NEVER OWNED RESOURCES
@ -176,16 +280,17 @@ class Sequence
// -------------- ALWAYS OWNED RESOURCES
vk::Fence mFence;
std::vector<std::unique_ptr<OpBase>> mOperations;
std::vector<std::shared_ptr<OpBase>> mOperations;
std::shared_ptr<vk::QueryPool> timestampQueryPool = nullptr;
// State
bool mIsInit = false;
bool mRecording = false;
bool mIsRunning = false;
// Create functions
void createCommandPool();
void createCommandBuffer();
void createTimestampQueryPool(uint32_t totalTimestamps);
};
} // End namespace kp

View file

@ -4,173 +4,67 @@
#include <iostream>
#include <vector>
#include <SPIRV/GlslangToSpv.h>
#include <glslang/Include/ResourceLimits.h>
#include <glslang/Public/ShaderLang.h>
#include <SPIRV/GlslangToSpv.h>
#include "kompute/Core.hpp"
namespace kp {
// The default resource limit for the GLSL compiler, can be overwritten
// Has been adobted by:
// https://github.com/KhronosGroup/glslang/blob/master/StandAlone/ResourceLimits.cpp
const TBuiltInResource defaultResource = {
/* .MaxLights = */ 0,
/* .MaxClipPlanes = */ 0,
/* .MaxTextureUnits = */ 0,
/* .MaxTextureCoords = */ 0,
/* .MaxVertexAttribs = */ 64,
/* .MaxVertexUniformComponents = */ 4096,
/* .MaxVaryingFloats = */ 64,
/* .MaxVertexTextureImageUnits = */ 0,
/* .MaxCombinedTextureImageUnits = */ 0,
/* .MaxTextureImageUnits = */ 0,
/* .MaxFragmentUniformComponents = */ 0,
/* .MaxDrawBuffers = */ 0,
/* .MaxVertexUniformVectors = */ 128,
/* .MaxVaryingVectors = */ 8,
/* .MaxFragmentUniformVectors = */ 0,
/* .MaxVertexOutputVectors = */ 16,
/* .MaxFragmentInputVectors = */ 0,
/* .MinProgramTexelOffset = */ -8,
/* .MaxProgramTexelOffset = */ 7,
/* .MaxClipDistances = */ 8,
/* .MaxComputeWorkGroupCountX = */ 65535,
/* .MaxComputeWorkGroupCountY = */ 65535,
/* .MaxComputeWorkGroupCountZ = */ 65535,
/* .MaxComputeWorkGroupSizeX = */ 1024,
/* .MaxComputeWorkGroupSizeY = */ 1024,
/* .MaxComputeWorkGroupSizeZ = */ 64,
/* .MaxComputeUniformComponents = */ 1024,
/* .MaxComputeTextureImageUnits = */ 16,
/* .MaxComputeImageUniforms = */ 8,
/* .MaxComputeAtomicCounters = */ 8,
/* .MaxComputeAtomicCounterBuffers = */ 1,
/* .MaxVaryingComponents = */ 60,
/* .MaxVertexOutputComponents = */ 64,
/* .MaxGeometryInputComponents = */ 64,
/* .MaxGeometryOutputComponents = */ 128,
/* .MaxFragmentInputComponents = */ 0,
/* .MaxImageUnits = */ 0,
/* .MaxCombinedImageUnitsAndFragmentOutputs = */ 0,
/* .MaxCombinedShaderOutputResources = */ 8,
/* .MaxImageSamples = */ 0,
/* .MaxVertexImageUniforms = */ 0,
/* .MaxTessControlImageUniforms = */ 0,
/* .MaxTessEvaluationImageUniforms = */ 0,
/* .MaxGeometryImageUniforms = */ 0,
/* .MaxFragmentImageUniforms = */ 0,
/* .MaxCombinedImageUniforms = */ 0,
/* .MaxGeometryTextureImageUnits = */ 0,
/* .MaxGeometryOutputVertices = */ 256,
/* .MaxGeometryTotalOutputComponents = */ 1024,
/* .MaxGeometryUniformComponents = */ 1024,
/* .MaxGeometryVaryingComponents = */ 64,
/* .MaxTessControlInputComponents = */ 128,
/* .MaxTessControlOutputComponents = */ 128,
/* .MaxTessControlTextureImageUnits = */ 0,
/* .MaxTessControlUniformComponents = */ 1024,
/* .MaxTessControlTotalOutputComponents = */ 4096,
/* .MaxTessEvaluationInputComponents = */ 128,
/* .MaxTessEvaluationOutputComponents = */ 128,
/* .MaxTessEvaluationTextureImageUnits = */ 16,
/* .MaxTessEvaluationUniformComponents = */ 1024,
/* .MaxTessPatchComponents = */ 120,
/* .MaxPatchVertices = */ 32,
/* .MaxTessGenLevel = */ 64,
/* .MaxViewports = */ 16,
/* .MaxVertexAtomicCounters = */ 0,
/* .MaxTessControlAtomicCounters = */ 0,
/* .MaxTessEvaluationAtomicCounters = */ 0,
/* .MaxGeometryAtomicCounters = */ 0,
/* .MaxFragmentAtomicCounters = */ 0,
/* .MaxCombinedAtomicCounters = */ 8,
/* .MaxAtomicCounterBindings = */ 1,
/* .MaxVertexAtomicCounterBuffers = */ 0,
/* .MaxTessControlAtomicCounterBuffers = */ 0,
/* .MaxTessEvaluationAtomicCounterBuffers = */ 0,
/* .MaxGeometryAtomicCounterBuffers = */ 0,
/* .MaxFragmentAtomicCounterBuffers = */ 0,
/* .MaxCombinedAtomicCounterBuffers = */ 1,
/* .MaxAtomicCounterBufferSize = */ 16384,
/* .MaxTransformFeedbackBuffers = */ 4,
/* .MaxTransformFeedbackInterleavedComponents = */ 64,
/* .MaxCullDistances = */ 8,
/* .MaxCombinedClipAndCullDistances = */ 8,
/* .MaxSamples = */ 4,
/* .maxMeshOutputVerticesNV = */ 256,
/* .maxMeshOutputPrimitivesNV = */ 512,
/* .maxMeshWorkGroupSizeX_NV = */ 32,
/* .maxMeshWorkGroupSizeY_NV = */ 1,
/* .maxMeshWorkGroupSizeZ_NV = */ 1,
/* .maxTaskWorkGroupSizeX_NV = */ 32,
/* .maxTaskWorkGroupSizeY_NV = */ 1,
/* .maxTaskWorkGroupSizeZ_NV = */ 1,
/* .maxMeshViewCountNV = */ 4,
/* .maxDualSourceDrawBuffersEXT = */ 1,
/* .limits = */ {
/* .nonInductiveForLoops = */ 1,
/* .whileLoops = */ 1,
/* .doWhileLoops = */ 1,
/* .generalUniformIndexing = */ 1,
/* .generalAttributeMatrixVectorIndexing = */ 1,
/* .generalVaryingIndexing = */ 1,
/* .generalSamplerIndexing = */ 1,
/* .generalVariableIndexing = */ 1,
/* .generalConstantMatrixVectorIndexing = */ 1,
}};
/**
Shader utily class with functions to compile and process glsl files.
*/
class Shader {
public:
class Shader
{
public:
// The default resource limit for the GLSL compiler, can be overwritten
// Has been adopted by:
// https://github.com/KhronosGroup/glslang/blob/master/StandAlone/ResourceLimits.cpp
const static TBuiltInResource defaultResource;
/**
* Compile multiple sources with optional filenames. Currently this function
* uses the glslang C++ interface which is not thread safe so this funciton
* should not be called from multiple threads concurrently. If you have a
* online shader processing multithreading use-case that can't use offline
* online shader processing multithreading use-case that can't use offline
* compilation please open an issue.
*
* @param sources A list of raw glsl shaders in string format
* @param files A list of file names respective to each of the sources
* @param entryPoint The function name to use as entry point
* @param definitions List of pairs containing key value definitions
* @param resourcesLimit A list that contains the resource limits for the GLSL compiler
* @param resourcesLimit A list that contains the resource limits for the
* GLSL compiler
* @return The compiled SPIR-V binary in unsigned int32 format
*/
static std::vector<uint32_t> compile_sources(
const std::vector<std::string>& sources,
const std::vector<std::string>& files = {},
const std::string& entryPoint = "main",
std::vector<std::pair<std::string,std::string>> definitions = {},
const TBuiltInResource& resources = defaultResource);
static std::vector<uint32_t> compileSources(
const std::vector<std::string>& sources,
const std::vector<std::string>& files = {},
const std::string& entryPoint = "main",
std::vector<std::pair<std::string, std::string>> definitions = {},
const TBuiltInResource& resources = Shader::defaultResource);
/**
* Compile a single glslang source from string value. Currently this function
* uses the glslang C++ interface which is not thread safe so this funciton
* should not be called from multiple threads concurrently. If you have a
* online shader processing multithreading use-case that can't use offline
* compilation please open an issue.
* Compile a single glslang source from string value. Currently this
* function uses the glslang C++ interface which is not thread safe so this
* funciton should not be called from multiple threads concurrently. If you
* have a online shader processing multithreading use-case that can't use
* offline compilation please open an issue.
*
* @param source An individual raw glsl shader in string format
* @param entryPoint The function name to use as entry point
* @param definitions List of pairs containing key value definitions
* @param resourcesLimit A list that contains the resource limits for the GLSL compiler
* @param resourcesLimit A list that contains the resource limits for the
* GLSL compiler
* @return The compiled SPIR-V binary in unsigned int32 format
*/
static std::vector<uint32_t> compile_source(
const std::string& source,
const std::string& entryPoint = "main",
std::vector<std::pair<std::string,std::string>> definitions = {},
const TBuiltInResource& resources = defaultResource);
static std::vector<uint32_t> compileSource(
const std::string& source,
const std::string& entryPoint = "main",
std::vector<std::pair<std::string, std::string>> definitions = {},
const TBuiltInResource& resources = Shader::defaultResource);
};
}
#endif // DKOMPUTE_DISABLE_SHADER_UTILS

View file

@ -2,8 +2,6 @@
#include "kompute/Core.hpp"
#define KP_MAX_DIM_SIZE 1
namespace kp {
/**
@ -29,94 +27,68 @@ class Tensor
eHost = 1, ///< Type is host memory, source and destination
eStorage = 2, ///< Type is Device memory (only)
};
enum class TensorDataTypes
{
eBool = 0,
eInt = 1,
eUnsignedInt = 2,
eFloat = 3,
eDouble = 4,
};
/**
* Base constructor, should not be used unless explicitly intended.
*/
Tensor();
/**
* Default constructor with data provided which would be used to create the
* Constructor with data provided which would be used to create the
* respective vulkan buffer and memory.
*
* @param physicalDevice The physical device to use to fetch properties
* @param device The device to use to create the buffer and memory from
* @param data Non-zero-sized vector of data that will be used by the
* tensor
* @param tensorType Type for the tensor which is of type TensorTypes
* @param tensorTypes Type for the tensor which is of type TensorTypes
*/
Tensor(const std::vector<float>& data,
TensorTypes tensorType = TensorTypes::eDevice);
Tensor(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
void* data,
uint32_t elementTotalCount,
uint32_t elementMemorySize,
const TensorDataTypes& dataType,
const TensorTypes& tensorType = TensorTypes::eDevice);
/**
* Destructor which is in charge of freeing vulkan resources unless they
* have been provided externally.
*/
~Tensor();
virtual ~Tensor();
/**
* Initialiser which calls the initialisation for all the respective tensors
* as well as creates the respective staging tensors. The staging tensors
* would only be created for the tensors of type TensorType::eDevice as
* otherwise there is no need to copy from host memory.
* Function to trigger reinitialisation of the tensor buffer and memory with
* new data as well as new potential device type.
*
* @param data Vector of data to use to initialise vector from
* @param tensorType The type to use for the tensor
*/
void init(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device);
void rebuild(void* data,
uint32_t elementTotalCount,
uint32_t elementMemorySize);
/**
* Destroys and frees the GPU resources which include the buffer and memory.
*/
void freeMemoryDestroyGPUResources();
void destroy();
/**
* Returns the vector of data currently contained by the Tensor. It is
* important to ensure that there is no out-of-sync data with the GPU
* memory.
* Check whether tensor is initialized based on the created gpu resources.
*
* @return Reference to vector of elements representing the data in the
* tensor.
* @returns Boolean stating whether tensor is initialized
*/
std::vector<float>& data();
/**
* Overrides the subscript operator to expose the underlying data's
* subscript operator which in this case would be its underlying
* vector's.
*
* @param i The index where the element will be returned from.
* @return Returns the element in the position requested.
*/
float& operator[](int index);
/**
* Returns the size/magnitude of the Tensor, which will be the total number
* of elements across all dimensions
*
* @return Unsigned integer representing the total number of elements
*/
uint32_t size();
/**
* Returns the shape of the tensor, which includes the number of dimensions
* and the size per dimension.
*
* @return Array containing the sizes for each dimension. Zero means
* respective dimension is not active.
*/
std::array<uint32_t, KP_MAX_DIM_SIZE> shape();
bool isInit();
/**
* Retrieve the tensor type of the Tensor
*
* @return Tensor type of tensor
*/
TensorTypes tensorType();
/**
* Returns true if the tensor initialisation function has been carried out
* successful, which would mean that the buffer and memory will have been
* provisioned.
*/
bool isInit();
/**
* Sets / resets the vector data of the tensor. This function does not
* perform any copies into GPU memory and is only performed on the host.
*/
void setData(const std::vector<float>& data);
/**
* Records a copy from the memory of the tensor provided to the current
@ -125,12 +97,9 @@ class Tensor
*
* @param commandBuffer Vulkan Command Buffer to record the commands into
* @param copyFromTensor Tensor to copy the data from
* @param createBarrier Whether to create a barrier that ensures the data is
* copied before further operations. Default is true.
*/
void recordCopyFrom(std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::shared_ptr<Tensor> copyFromTensor,
bool createBarrier);
void recordCopyFrom(const vk::CommandBuffer& commandBuffer,
std::shared_ptr<Tensor> copyFromTensor);
/**
* Records a copy from the internal staging memory to the device memory
@ -138,12 +107,8 @@ class Tensor
* only be relevant for kp::Tensors of type eDevice.
*
* @param commandBuffer Vulkan Command Buffer to record the commands into
* @param createBarrier Whether to create a barrier that ensures the data is
* copied before further operations. Default is true.
*/
void recordCopyFromStagingToDevice(
std::shared_ptr<vk::CommandBuffer> commandBuffer,
bool createBarrier);
void recordCopyFromStagingToDevice(const vk::CommandBuffer& commandBuffer);
/**
* Records a copy from the internal device memory to the staging memory
@ -151,16 +116,13 @@ class Tensor
* only be relevant for kp::Tensors of type eDevice.
*
* @param commandBuffer Vulkan Command Buffer to record the commands into
* @param createBarrier Whether to create a barrier that ensures the data is
* copied before further operations. Default is true.
*/
void recordCopyFromDeviceToStaging(
std::shared_ptr<vk::CommandBuffer> commandBuffer,
bool createBarrier);
void recordCopyFromDeviceToStaging(const vk::CommandBuffer& commandBuffer);
/**
* Records the buffer memory barrier into the command buffer which
* ensures that relevant data transfers are carried out correctly.
* Records the buffer memory barrier into the primary buffer and command
* buffer which ensures that relevant data transfers are carried out
* correctly.
*
* @param commandBuffer Vulkan Command Buffer to record the commands into
* @param srcAccessMask Access flags for source access mask
@ -168,8 +130,25 @@ class Tensor
* @param scrStageMask Pipeline stage flags for source stage mask
* @param dstStageMask Pipeline stage flags for destination stage mask
*/
void recordBufferMemoryBarrier(
std::shared_ptr<vk::CommandBuffer> commandBuffer,
void recordPrimaryBufferMemoryBarrier(
const vk::CommandBuffer& commandBuffer,
vk::AccessFlagBits srcAccessMask,
vk::AccessFlagBits dstAccessMask,
vk::PipelineStageFlagBits srcStageMask,
vk::PipelineStageFlagBits dstStageMask);
/**
* Records the buffer memory barrier into the staging buffer and command
* buffer which ensures that relevant data transfers are carried out
* correctly.
*
* @param commandBuffer Vulkan Command Buffer to record the commands into
* @param srcAccessMask Access flags for source access mask
* @param dstAccessMask Access flags for destination access mask
* @param scrStageMask Pipeline stage flags for source stage mask
* @param dstStageMask Pipeline stage flags for destination stage mask
*/
void recordStagingBufferMemoryBarrier(
const vk::CommandBuffer& commandBuffer,
vk::AccessFlagBits srcAccessMask,
vk::AccessFlagBits dstAccessMask,
vk::PipelineStageFlagBits srcStageMask,
@ -183,16 +162,88 @@ class Tensor
* @return Descriptor buffer info with own buffer
*/
vk::DescriptorBufferInfo constructDescriptorBufferInfo();
/**
* Maps data from the Host Visible GPU memory into the data vector. It
* requires the Tensor to be of staging type for it to work.
* Returns the size/magnitude of the Tensor, which will be the total number
* of elements across all dimensions
*
* @return Unsigned integer representing the total number of elements
*/
void mapDataFromHostMemory();
uint32_t size();
/**
* Maps data from the data vector into the Host Visible GPU memory. It
* requires the tensor to be of staging type for it to work.
* Returns the total size of a single element of the respective data type
* that this tensor holds.
*
* @return Unsigned integer representing the memory of a single element of
* the respective data type.
*/
void mapDataIntoHostMemory();
uint32_t dataTypeMemorySize();
/**
* Returns the total memory size of the data contained by the Tensor object
* which would equate to (this->size() * this->dataTypeMemorySize())
*
* @return Unsigned integer representing the memory of a single element of
* the respective data type.
*/
uint32_t memorySize();
/**
* Retrieve the data type of the tensor (host, device, storage)
*
* @return Data type of tensor of type kp::Tensor::TensorDataTypes
*/
TensorDataTypes dataType();
/**
* Retrieve the raw data via the pointer to the memory that contains the raw
* memory of this current tensor. This tensor gets changed to a nullptr when
* the Tensor is removed.
*
* @return Pointer to raw memory containing raw bytes data of Tensor.
*/
void* rawData();
/**
* Sets / resets the data of the tensor which is directly done on the GPU
* host visible memory available by the tensor.
*/
void setRawData(const void* data);
/**
* Template to return the pointer data converted by specific type, which
* would be any of the supported types including float, double, int32,
* uint32 and bool.
*
* @return Pointer to raw memory containing raw bytes data of Tensor.
*/
template<typename T>
T* data()
{
return (T*)this->mRawData;
}
/**
* Template to get the data of the current tensor as a vector of specific
* type, which would be any of the supported types including float, double,
* int32, uint32 and bool.
*
* @return Vector of type provided by template.
*/
template<typename T>
std::vector<T> vector()
{
return { (T*)this->mRawData, ((T*)this->mRawData) + this->size() };
}
protected:
// -------------- ALWAYS OWNED RESOURCES
TensorTypes mTensorType;
TensorDataTypes mDataType;
uint32_t mSize;
uint32_t mDataTypeMemorySize;
void* mRawData;
private:
// -------------- NEVER OWNED RESOURCES
@ -209,33 +260,81 @@ class Tensor
std::shared_ptr<vk::DeviceMemory> mStagingMemory;
bool mFreeStagingMemory = false;
// -------------- ALWAYS OWNED RESOURCES
std::vector<float> mData;
TensorTypes mTensorType = TensorTypes::eDevice;
std::array<uint32_t, KP_MAX_DIM_SIZE> mShape;
bool mIsInit = false;
void allocateMemoryCreateGPUResources(); // Creates the vulkan buffer
void createBuffer(std::shared_ptr<vk::Buffer> buffer,
vk::BufferUsageFlags bufferUsageFlags);
void allocateBindMemory(std::shared_ptr<vk::Buffer> buffer,
std::shared_ptr<vk::DeviceMemory> memory,
vk::MemoryPropertyFlags memoryPropertyFlags);
void copyBuffer(std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::shared_ptr<vk::Buffer> bufferFrom,
std::shared_ptr<vk::Buffer> bufferTo,
vk::DeviceSize bufferSize,
vk::BufferCopy copyRegion,
bool createBarrier);
void recordCopyBuffer(const vk::CommandBuffer& commandBuffer,
std::shared_ptr<vk::Buffer> bufferFrom,
std::shared_ptr<vk::Buffer> bufferTo,
vk::DeviceSize bufferSize,
vk::BufferCopy copyRegion);
void recordBufferMemoryBarrier(const vk::CommandBuffer& commandBuffer,
const vk::Buffer& buffer,
vk::AccessFlagBits srcAccessMask,
vk::AccessFlagBits dstAccessMask,
vk::PipelineStageFlagBits srcStageMask,
vk::PipelineStageFlagBits dstStageMask);
// Private util functions
vk::BufferUsageFlags getPrimaryBufferUsageFlags();
vk::MemoryPropertyFlags getPrimaryMemoryPropertyFlags();
vk::BufferUsageFlags getStagingBufferUsageFlags();
vk::MemoryPropertyFlags getStagingMemoryPropertyFlags();
uint64_t memorySize();
void mapRawData();
void unmapRawData();
};
template<typename T>
class TensorT : public Tensor
{
public:
TensorT(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
const std::vector<T>& data,
const TensorTypes& tensorType = TensorTypes::eDevice)
: Tensor(physicalDevice,
device,
(void*)data.data(),
data.size(),
sizeof(T),
this->dataType(),
tensorType)
{
KP_LOG_DEBUG("Kompute TensorT constructor with data size {}",
data.size());
}
~TensorT() { KP_LOG_DEBUG("Kompute TensorT destructor"); }
T* data() { return (T*)this->mRawData; }
std::vector<T> vector()
{
return { (T*)this->mRawData, ((T*)this->mRawData) + this->size() };
}
T& operator[](int index) { return *(((T*)this->mRawData) + index); }
void setData(const std::vector<T>& data)
{
KP_LOG_DEBUG("Kompute TensorT setting data with data size {}",
data.size());
if (data.size() != this->mSize) {
throw std::runtime_error(
"Kompute TensorT Cannot set data of different sizes");
}
Tensor::setRawData(data.data());
}
TensorDataTypes dataType();
};
} // End namespace kp

View file

@ -1,144 +0,0 @@
#pragma once
#include <fstream>
#include "kompute/Core.hpp"
#include "kompute/shaders/shaderopmult.hpp"
#include "kompute/Algorithm.hpp"
#include "kompute/Tensor.hpp"
#include "kompute/operations/OpBase.hpp"
namespace kp {
/**
* Operation that provides a general abstraction that simplifies the use of
* algorithm and parameter components which can be used with shaders.
* By default it enables the user to provide a dynamic number of tensors
* which are then passed as inputs.
*/
class OpAlgoBase : public OpBase
{
public:
/**
* Base constructor, should not be used unless explicitly intended.
*/
OpAlgoBase();
/**
* Default constructor with parameters that provides the bare minimum
* requirements for the operations to be able to create and manage their
* sub-components.
*
* @param physicalDevice Vulkan physical device used to find device queues
* @param device Vulkan logical device for passing to Algorithm
* @param commandBuffer Vulkan Command Buffer to record commands into
* @param tensors Tensors that are to be used in this operation
* @param shaderFilePath Optional parameter to specify the shader to load (either in spirv or raw format)
* @param komputeWorkgroup Optional parameter to specify the layout for processing
*/
OpAlgoBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>>& tensors,
const Workgroup& komputeWorkgroup = {},
const Constants& specializationConstants = {});
/**
* Constructor that enables a file to be passed to the operation with
* the contents of the shader. This can be either in raw format or in
* compiled SPIR-V binary format.
*
* @param physicalDevice Vulkan physical device used to find device queues
* @param device Vulkan logical device for passing to Algorithm
* @param commandBuffer Vulkan Command Buffer to record commands into
* @param tensors Tensors that are to be used in this operation
* @param shaderFilePath Parameter to specify the shader to load (either in spirv or raw format)
* @param komputeWorkgroup Optional parameter to specify the layout for processing
*/
OpAlgoBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>>& tensors,
std::string shaderFilePath,
const Workgroup& komputeWorkgroup = {},
const Constants& specializationConstants = {});
/**
* Constructor that enables raw shader data to be passed to the main operation
* which can be either in raw shader glsl code or in compiled SPIR-V binary.
*
* @param physicalDevice Vulkan physical device used to find device queues
* @param device Vulkan logical device for passing to Algorithm
* @param commandBuffer Vulkan Command Buffer to record commands into
* @param tensors Tensors that are to be used in this operation
* @param shaderDataRaw Optional parameter to specify the shader data either in binary or raw form
* @param komputeWorkgroup Optional parameter to specify the layout for processing
*/
OpAlgoBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>>& tensors,
const std::vector<uint32_t>& shaderDataRaw,
const Workgroup& komputeWorkgroup = {},
const Constants& specializationConstants = {});
/**
* Default destructor, which is in charge of destroying the algorithm
* components but does not destroy the underlying tensors
*/
virtual ~OpAlgoBase() override;
/**
* The init function is responsible for the initialisation of the algorithm
* component based on the parameters specified, and allows for extensibility
* on the options provided. Further dependent classes can perform more
* specific checks such as ensuring tensors provided are initialised, etc.
*/
virtual void init() override;
/**
* This records the commands that are to be sent to the GPU. This includes
* the barriers that ensure the memory has been copied before going in and
* out of the shader, as well as the dispatch operation that sends the
* shader processing to the gpu. This function also records the GPU memory
* copy of the output data for the staging buffer so it can be read by the
* host.
*/
virtual void record() override;
/**
* Does not perform any preEval commands.
*/
virtual void preEval() override;
/**
* Executes after the recorded commands are submitted, and performs a copy
* of the GPU Device memory into the staging buffer so the output data can
* be retrieved.
*/
virtual void postEval() override;
protected:
// -------------- NEVER OWNED RESOURCES
// -------------- OPTIONALLY OWNED RESOURCES
std::shared_ptr<Algorithm> mAlgorithm;
bool mFreeAlgorithm = false;
// -------------- ALWAYS OWNED RESOURCES
Workgroup mKomputeWorkgroup;
std::string mShaderFilePath; ///< Optional member variable which can be provided for the OpAlgoBase to find the data automatically and load for processing
std::vector<uint32_t> mShaderDataRaw; ///< Optional member variable which can be provided to contain either the raw shader content or the spirv binary content
virtual std::vector<uint32_t> fetchSpirvBinaryData();
};
} // End namespace kp

View file

@ -0,0 +1,69 @@
#pragma once
#include "kompute/Core.hpp"
#include "kompute/Algorithm.hpp"
#include "kompute/Tensor.hpp"
#include "kompute/operations/OpBase.hpp"
namespace kp {
/**
* Operation that provides a general abstraction that simplifies the use of
* algorithm and parameter components which can be used with shaders.
* By default it enables the user to provide a dynamic number of tensors
* which are then passed as inputs.
*/
class OpAlgoDispatch : public OpBase
{
public:
/**
* Constructor that stores the algorithm to use as well as the relevant
* push constants to override when recording.
*
* @param algorithm The algorithm object to use for dispatch
* @param pushConstants The push constants to use for override
*/
OpAlgoDispatch(const std::shared_ptr<kp::Algorithm>& algorithm,
const kp::Constants& pushConstants = {});
/**
* Default destructor, which is in charge of destroying the algorithm
* components but does not destroy the underlying tensors
*/
virtual ~OpAlgoDispatch() override;
/**
* This records the commands that are to be sent to the GPU. This includes
* the barriers that ensure the memory has been copied before going in and
* out of the shader, as well as the dispatch operation that sends the
* shader processing to the gpu. This function also records the GPU memory
* copy of the output data for the staging buffer so it can be read by the
* host.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void record(const vk::CommandBuffer& commandBuffer) override;
/**
* Does not perform any preEval commands.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void preEval(const vk::CommandBuffer& commandBuffer) override;
/**
* Does not perform any postEval commands.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void postEval(const vk::CommandBuffer& commandBuffer) override;
private:
// -------------- ALWAYS OWNED RESOURCES
std::shared_ptr<Algorithm> mAlgorithm;
Constants mPushConstants;
};
} // End namespace kp

View file

@ -1,84 +0,0 @@
#pragma once
#include <fstream>
#include "kompute/Core.hpp"
#include "kompute/Algorithm.hpp"
#include "kompute/Tensor.hpp"
#include "kompute/operations/OpAlgoBase.hpp"
namespace kp {
/**
* Operation base class to simplify the creation of operations that require
* right hand and left hand side datapoints together with a single output.
* The expected data passed is two input tensors and one output tensor.
*/
class OpAlgoLhsRhsOut : public OpAlgoBase
{
public:
/**
* Base constructor, should not be used unless explicitly intended.
*/
OpAlgoLhsRhsOut();
/**
* Default constructor with parameters that provides the bare minimum
* requirements for the operations to be able to create and manage their
* sub-components.
*
* @param physicalDevice Vulkan physical device used to find device queues
* @param device Vulkan logical device for passing to Algorithm
* @param commandBuffer Vulkan Command Buffer to record commands into
* @param tensors Tensors that are to be used in this operation
* @param freeTensors Whether operation manages the memory of the Tensors
* @param komputeWorkgroup Optional parameter to specify the layout for processing
*/
OpAlgoLhsRhsOut(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors,
const Workgroup& komputeWorkgroup = {});
/**
* Default destructor, which is in charge of destroying the algorithm
* components but does not destroy the underlying tensors
*/
virtual ~OpAlgoLhsRhsOut() override;
/**
* The init function is responsible for ensuring that all of the tensors
* provided are aligned with requirements such as LHS, RHS and Output
* tensors, and creates the algorithm component which processes the
* computation.
*/
virtual void init() override;
/**
* This records the commands that are to be sent to the GPU. This includes
* the barriers that ensure the memory has been copied before going in and
* out of the shader, as well as the dispatch operation that sends the
* shader processing to the gpu. This function also records the GPU memory
* copy of the output data for the staging buffer so it can be read by the
* host.
*/
virtual void record() override;
/**
* Executes after the recorded commands are submitted, and performs a copy
* of the GPU Device memory into the staging buffer so the output data can
* be retrieved.
*/
virtual void postEval() override;
protected:
// -------------- NEVER OWNED RESOURCES
std::shared_ptr<Tensor> mTensorLHS; ///< Reference to the parameter used in the left hand side equation of the shader
std::shared_ptr<Tensor> mTensorRHS; ///< Reference to the parameter used in the right hand side equation of the shader
std::shared_ptr<Tensor> mTensorOutput; ///< Reference to the parameter used in the output of the shader and will be copied with a staging vector
};
} // End namespace kp

View file

@ -1,8 +1,8 @@
#pragma once
#include "kompute/Core.hpp"
#include "kompute/Tensor.hpp"
#include "kompute/Algorithm.hpp"
namespace kp {
@ -17,33 +17,6 @@ namespace kp {
class OpBase
{
public:
/**
* Base constructor, should not be used unless explicitly intended.
*/
OpBase() { KP_LOG_DEBUG("Compute OpBase base constructor"); }
/**
* Default constructor with parameters that provides the bare minimum
* requirements for the operations to be able to create and manage their
* sub-components.
*
* @param physicalDevice Vulkan physical device used to find device queues
* @param device Vulkan logical device for passing to Algorithm
* @param commandBuffer Vulkan Command Buffer to record commands into
* @param tensors Tensors that are to be used in this operation
*/
OpBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>>& tensors)
{
KP_LOG_DEBUG("Compute OpBase constructor with params");
this->mPhysicalDevice = physicalDevice;
this->mDevice = device;
this->mCommandBuffer = commandBuffer;
this->mTensors = tensors;
}
/**
* Default destructor for OpBase class. This OpBase destructor class should
@ -53,37 +26,16 @@ class OpBase
virtual ~OpBase()
{
KP_LOG_DEBUG("Kompute OpBase destructor started");
if (!this->mDevice) {
KP_LOG_WARN("Kompute OpBase destructor called with empty device");
return;
}
if (this->mFreeTensors) {
KP_LOG_DEBUG("Kompute OpBase freeing tensors");
for (std::shared_ptr<Tensor> tensor : this->mTensors) {
if (tensor && tensor->isInit()) {
tensor->freeMemoryDestroyGPUResources();
} else {
KP_LOG_WARN("Kompute OpBase expected to free "
"tensor but has already been freed.");
}
}
}
}
/**
* The init function is responsible for setting up all the resources and
* should be called after the Operation has been created.
*/
virtual void init() = 0;
/**
* The record function is intended to only send a record command or run
* commands that are expected to record operations that are to be submitted
* as a batch into the GPU.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void record() = 0;
virtual void record(const vk::CommandBuffer& commandBuffer) = 0;
/**
* Pre eval is called before the Sequence has called eval and submitted the commands to
@ -92,8 +44,10 @@ class OpBase
* there are situations where eval can be called multiple times, so the
* resources that are created should be idempotent in case it's called multiple
* times in a row.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void preEval() = 0;
virtual void preEval(const vk::CommandBuffer& commandBuffer) = 0;
/**
* Post eval is called after the Sequence has called eval and submitted the commands to
@ -102,23 +56,10 @@ class OpBase
* there are situations where eval can be called multiple times, so the
* resources that are destroyed should not require a re-init unless explicitly
* provided by the user.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void postEval() = 0;
protected:
// -------------- NEVER OWNED RESOURCES
std::shared_ptr<vk::PhysicalDevice>
mPhysicalDevice; ///< Vulkan Physical Device
std::shared_ptr<vk::Device> mDevice; ///< Vulkan Logical Device
std::shared_ptr<vk::CommandBuffer>
mCommandBuffer; ///< Vulkan Command Buffer
// -------------- OPTIONALLY OWNED RESOURCES
std::vector<std::shared_ptr<Tensor>>
mTensors; ///< Tensors referenced by operation that can be managed
///< optionally by operation
bool mFreeTensors = false; ///< Explicit boolean that specifies whether the
///< tensors are freed (if they are managed)
virtual void postEval(const vk::CommandBuffer& commandBuffer) = 0;
};
} // End namespace kp

View file

@ -0,0 +1,78 @@
#pragma once
#include "kompute/Core.hpp"
#include "kompute/Algorithm.hpp"
#include "kompute/Tensor.hpp"
#include "kompute/operations/OpBase.hpp"
namespace kp {
/**
* Operation that provides a general abstraction that simplifies the use of
* algorithm and parameter components which can be used with shaders.
* It exposes the pipeline barrier functionality specifically for memory
* barriers that can be configured through the respective source and destination
* masks
*/
class OpMemoryBarrier : public OpBase
{
public:
/**
* Constructor that stores tensors as well as memory barrier parameters to be
* used to create a pipeline barrier on the respective primary or staging tensor.
*
* @param tensors The tensors to apply the memory barriers on
* @param srcAccessMask The kp::AccessFlagBits for the source access mask
* @param dstAccessMask The kp::AccessFlagBits for the destination access mask
* @param srcStageMask The kp::PipelineStageFlagBits for the source stage mask
* @param dstStageMask The kp::PipelineStageFlagBits for the destination stage mask
* @param barrierOnPrimary Boolean to select primary or secondary buffers on tensors
*/
OpMemoryBarrier(
const std::vector<std::shared_ptr<Tensor>>& tensors,
const vk::AccessFlagBits& srcAccessMask,
const vk::AccessFlagBits& dstAccessMask,
const vk::PipelineStageFlagBits& srcStageMask,
const vk::PipelineStageFlagBits& dstStageMask,
bool barrierOnPrimary = true);
/**
* Default destructor, which is in charge of destroying the reference to the tensors
* and all the relevant access / stage masks created
*/
virtual ~OpMemoryBarrier() override;
/**
* This records the memory barrier with the access and stage masks provided
* across all relevant tensors.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void record(const vk::CommandBuffer& commandBuffer) override;
/**
* Does not perform any preEval commands.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void preEval(const vk::CommandBuffer& commandBuffer) override;
/**
* Does not perform any postEval commands.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void postEval(const vk::CommandBuffer& commandBuffer) override;
private:
const vk::AccessFlagBits mSrcAccessMask;
const vk::AccessFlagBits mDstAccessMask;
const vk::PipelineStageFlagBits mSrcStageMask;
const vk::PipelineStageFlagBits mDstStageMask;
const bool mBarrierOnPrimary;
const std::vector<std::shared_ptr<Tensor>> mTensors;
};
} // End namespace kp

View file

@ -4,14 +4,12 @@
#include "kompute/Core.hpp"
#if RELEASE
#include "kompute/shaders/shaderopmult.hpp"
#endif
#include "kompute/Algorithm.hpp"
#include "kompute/Tensor.hpp"
#include "kompute/operations/OpAlgoBase.hpp"
#include "kompute/operations/OpAlgoDispatch.hpp"
namespace kp {
@ -19,67 +17,43 @@ namespace kp {
* Operation that performs multiplication on two tensors and outpus on third
* tensor.
*/
class OpMult : public OpAlgoBase
class OpMult : public OpAlgoDispatch
{
public:
/**
* Base constructor, should not be used unless explicitly intended.
*/
OpMult() {
}
/**
* Default constructor with parameters that provides the bare minimum
* requirements for the operations to be able to create and manage their
* sub-components.
*
* @param physicalDevice Vulkan physical device used to find device queues
* @param device Vulkan logical device for passing to Algorithm
* @param commandBuffer Vulkan Command Buffer to record commands into
* @param tensors Tensors that are to be used in this operation
* @param komputeWorkgroup Optional parameter to specify the layout for processing
* @param algorithm An algorithm that will be overridden with the OpMult
* shader data and the tensors provided which are expected to be 3
*/
OpMult(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors,
const Workgroup& komputeWorkgroup = {})
: OpAlgoBase(physicalDevice, device, commandBuffer, tensors, "", komputeWorkgroup)
OpMult(std::vector<std::shared_ptr<Tensor>> tensors, std::shared_ptr<Algorithm> algorithm)
: OpAlgoDispatch(algorithm)
{
KP_LOG_DEBUG("Kompute OpMult constructor with params");
#ifndef RELEASE
this->mShaderFilePath = "shaders/glsl/opmult.comp.spv";
#endif
}
if (tensors.size() != 3) {
throw std::runtime_error("Kompute OpMult expected 3 tensors but got " + tensors.size());
}
#if RELEASE
/**
* If RELEASE=1 it will be using the static version of the shader which is
* loaded using this file directly. Otherwise it should not override the function.
*/
std::vector<uint32_t> fetchSpirvBinaryData() override
{
KP_LOG_WARN(
"Kompute OpMult Running shaders directly from header");
return std::vector<uint32_t>(
std::vector<uint32_t> spirv(
(uint32_t*)shader_data::shaders_glsl_opmult_comp_spv,
(uint32_t*)(shader_data::shaders_glsl_opmult_comp_spv +
kp::shader_data::shaders_glsl_opmult_comp_spv_len));
algorithm->rebuild(tensors, spirv);
}
#endif
/**
* Default destructor, which is in charge of destroying the algorithm
* components but does not destroy the underlying tensors
*/
~OpMult() override {
virtual ~OpMult() override {
KP_LOG_DEBUG("Kompute OpMult destructor started");
}
};
} // End namespace kp

View file

@ -9,52 +9,53 @@
namespace kp {
/**
Operation that copies the data from the first tensor to the rest of the tensors provided, using a record command for all the vectors. This operation does not own/manage the memory of the tensors passed to it. The operation must only receive tensors of type
* Operation that copies the data from the first tensor to the rest of the tensors
* provided, using a record command for all the vectors. This operation does not
* own/manage the memory of the tensors passed to it. The operation must only
* receive tensors of type
*/
class OpTensorCopy : public OpBase
{
public:
OpTensorCopy();
/**
* Default constructor with parameters that provides the core vulkan resources and the tensors that will be used in the operation.
* Default constructor with parameters that provides the core vulkan resources
* and the tensors that will be used in the operation.
*
* @param physicalDevice Vulkan physical device used to find device queues
* @param device Vulkan logical device for passing to Algorithm
* @param commandBuffer Vulkan Command Buffer to record commands into
* @param tensors Tensors that will be used to create in operation.
*/
OpTensorCopy(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors);
OpTensorCopy(const std::vector<std::shared_ptr<Tensor>>& tensors);
/**
* Default destructor. This class does not manage memory so it won't be expecting the parent to perform a release.
* Default destructor. This class does not manage memory so it won't be
* expecting the parent to perform a release.
*/
~OpTensorCopy() override;
/**
* Performs basic checks such as ensuring there are at least two tensors provided, that they are initialised and that they are not of type TensorTypes::eStorage.
* Records the copy commands from the first tensor into all the other
* tensors provided. Also optionally records a barrier.
*
* @param commandBuffer The command buffer to record the command into.
*/
void init() override;
/**
* Records the copy commands from the first tensor into all the other tensors provided. Also optionally records a barrier.
*/
void record() override;
void record(const vk::CommandBuffer& commandBuffer) override;
/**
* Does not perform any preEval commands.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void preEval() override;
virtual void preEval(const vk::CommandBuffer& commandBuffer) override;
/**
* Copies the local vectors for all the tensors to sync the data with the gpu.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void postEval() override;
virtual void postEval(const vk::CommandBuffer& commandBuffer) override;
private:
// -------------- ALWAYS OWNED RESOURCES
std::vector<std::shared_ptr<Tensor>> mTensors;
};
} // End namespace kp

View file

@ -1,33 +1,30 @@
#pragma once
#include "kompute/Core.hpp"
#include "kompute/operations/OpBase.hpp"
#include "kompute/Tensor.hpp"
#include "kompute/operations/OpBase.hpp"
namespace kp {
/**
Operation that syncs tensor's device by mapping local data into the device memory. For TensorTypes::eDevice it will use a record operation for the memory to be syncd into GPU memory which means that the operation will be done in sync with GPU commands. For TensorTypes::eStaging it will only map the data into host memory which will happen during preEval before the recorded commands are dispatched. This operation won't have any effect on TensorTypes::eStaging.
* Operation that syncs tensor's device by mapping local data into the device memory.
* For TensorTypes::eDevice it will use a record operation for the memory to be syncd
* into GPU memory which means that the operation will be done in sync with GPU commands.
* For TensorTypes::eHost it will only map the data into host memory which will
* happen during preEval before the recorded commands are dispatched.
*/
class OpTensorSyncDevice : public OpBase
{
public:
OpTensorSyncDevice();
/**
* Default constructor with parameters that provides the core vulkan resources and the tensors that will be used in the operation. The tensos provided cannot be of type TensorTypes::eStorage.
* Default constructor with parameters that provides the core vulkan resources
* and the tensors that will be used in the operation. The tensos provided cannot
* be of type TensorTypes::eStorage.
*
* @param physicalDevice Vulkan physical device used to find device queues
* @param device Vulkan logical device for passing to Algorithm
* @param commandBuffer Vulkan Command Buffer to record commands into
* @param tensors Tensors that will be used to create in operation.
*/
OpTensorSyncDevice(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors);
OpTensorSyncDevice(const std::vector<std::shared_ptr<Tensor>>& tensors);
/**
* Default destructor. This class does not manage memory so it won't be expecting the parent to perform a release.
@ -35,26 +32,30 @@ class OpTensorSyncDevice : public OpBase
~OpTensorSyncDevice() override;
/**
* Performs basic checks such as ensuring that there is at least one tensor provided with min memory of 1 element.
* For device tensors, it records the copy command for the tensor to copy the
* data from its staging to device memory.
*
* @param commandBuffer The command buffer to record the command into.
*/
void init() override;
/**
* For device tensors, it records the copy command for the tensor to copy the data from its staging to device memory.
*/
void record() override;
void record(const vk::CommandBuffer& commandBuffer) override;
/**
* Does not perform any preEval commands.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void preEval() override;
virtual void preEval(const vk::CommandBuffer& commandBuffer) override;
/**
* Does not perform any postEval commands.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void postEval() override;
virtual void postEval(const vk::CommandBuffer& commandBuffer) override;
private:
// -------------- ALWAYS OWNED RESOURCES
std::vector<std::shared_ptr<Tensor>> mTensors;
};
} // End namespace kp

View file

@ -9,53 +9,57 @@
namespace kp {
/**
Operation that syncs tensor's local memory by mapping device data into the local CPU memory. For TensorTypes::eDevice it will use a record operation for the memory to be syncd into GPU memory which means that the operation will be done in sync with GPU commands. For TensorTypes::eStaging it will only map the data into host memory which will happen during preEval before the recorded commands are dispatched. This operation won't have any effect on TensorTypes::eStaging.
* Operation that syncs tensor's local memory by mapping device data into the
* local CPU memory. For TensorTypes::eDevice it will use a record operation
* for the memory to be syncd into GPU memory which means that the operation
* will be done in sync with GPU commands. For TensorTypes::eHost it will
* only map the data into host memory which will happen during preEval before
* the recorded commands are dispatched.
*/
class OpTensorSyncLocal : public OpBase
{
public:
OpTensorSyncLocal();
/**
* Default constructor with parameters that provides the core vulkan resources and the tensors that will be used in the operation. The tensors provided cannot be of type TensorTypes::eStorage.
* Default constructor with parameters that provides the core vulkan resources
* and the tensors that will be used in the operation. The tensors provided
* cannot be of type TensorTypes::eStorage.
*
* @param physicalDevice Vulkan physical device used to find device queues
* @param device Vulkan logical device for passing to Algorithm
* @param commandBuffer Vulkan Command Buffer to record commands into
* @param tensors Tensors that will be used to create in operation.
*/
OpTensorSyncLocal(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
std::shared_ptr<vk::Device> device,
std::shared_ptr<vk::CommandBuffer> commandBuffer,
std::vector<std::shared_ptr<Tensor>> tensors);
OpTensorSyncLocal(const std::vector<std::shared_ptr<Tensor>>& tensors);
/**
* Default destructor. This class does not manage memory so it won't be expecting the parent to perform a release.
* Default destructor. This class does not manage memory so it won't be expecting
* the parent to perform a release.
*/
~OpTensorSyncLocal() override;
/**
* Performs basic checks such as ensuring that there is at least one tensor provided with min memory of 1 element.
* For device tensors, it records the copy command for the tensor to copy the
* data from its device to staging memory.
*
* @param commandBuffer The command buffer to record the command into.
*/
void init() override;
/**
* For device tensors, it records the copy command for the tensor to copy the data from its device to staging memory.
*/
void record() override;
void record(const vk::CommandBuffer& commandBuffer) override;
/**
* Does not perform any preEval commands.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void preEval() override;
virtual void preEval(const vk::CommandBuffer& commandBuffer) override;
/**
* For host tensors it performs the map command from the host memory into local memory.
*
* @param commandBuffer The command buffer to record the command into.
*/
virtual void postEval() override;
virtual void postEval(const vk::CommandBuffer& commandBuffer) override;
private:
// -------------- ALWAYS OWNED RESOURCES
std::vector<std::shared_ptr<Tensor>> mTensors;
};
} // End namespace kp

View file

@ -37,25 +37,32 @@ TEST(TestAsyncOperations, TestManagerParallelExecution)
}
)");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
std::vector<float> data(size, 0.0);
std::vector<float> resultSync(size, 100000000);
std::vector<float> resultAsync(size, 100000000);
kp::Manager mgr;
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
std::vector<std::shared_ptr<kp::Tensor>> inputsSyncB;
std::vector<std::shared_ptr<kp::Algorithm>> algorithms;
for (uint32_t i = 0; i < numParallel; i++) {
inputsSyncB.push_back(std::make_shared<kp::Tensor>(kp::Tensor(data)));
inputsSyncB.push_back(mgr.tensor(data));
algorithms.push_back(mgr.algorithm({ inputsSyncB[i] }, spirv));
}
mgr.rebuild(inputsSyncB);
sq->eval<kp::OpTensorSyncDevice>(inputsSyncB);
mgr.sequence()->eval<kp::OpTensorSyncDevice>(inputsSyncB);
auto startSync = std::chrono::high_resolution_clock::now();
for (uint32_t i = 0; i < numParallel; i++) {
mgr.evalOpDefault<kp::OpAlgoBase>(
{ inputsSyncB[i] }, kp::Shader::compile_source(shader));
sq->eval<kp::OpAlgoDispatch>(algorithms[i]);
}
auto endSync = std::chrono::high_resolution_clock::now();
@ -63,37 +70,37 @@ TEST(TestAsyncOperations, TestManagerParallelExecution)
std::chrono::duration_cast<std::chrono::microseconds>(endSync - startSync)
.count();
mgr.evalOpDefault<kp::OpTensorSyncLocal>(inputsSyncB);
sq->eval<kp::OpTensorSyncLocal>(inputsSyncB);
for (uint32_t i = 0; i < numParallel; i++) {
EXPECT_EQ(inputsSyncB[i]->data(), resultSync);
EXPECT_EQ(inputsSyncB[i]->vector<float>(), resultSync);
}
kp::Manager mgrAsync(0, { 0, 2 });
std::vector<std::shared_ptr<kp::Tensor>> inputsAsyncB;
std::vector<std::shared_ptr<kp::Algorithm>> algosAsync;
for (uint32_t i = 0; i < numParallel; i++) {
inputsAsyncB.push_back(std::make_shared<kp::Tensor>(kp::Tensor(data)));
inputsAsyncB.push_back(mgr.tensor(data));
algosAsync.push_back(mgr.algorithm({ inputsAsyncB[i] }, spirv));
}
mgrAsync.rebuild(inputsAsyncB);
std::vector<std::shared_ptr<kp::Sequence>> sqs;
for (uint32_t i = 0; i < numParallel; i++) {
mgrAsync.sequence("async" + std::to_string(i), i);
sqs.push_back(mgrAsync.sequence(i));
}
auto startAsync = std::chrono::high_resolution_clock::now();
for (uint32_t i = 0; i < numParallel; i++) {
mgrAsync.evalOpAsync<kp::OpAlgoBase>(
{ inputsAsyncB[i] },
"async" + std::to_string(i),
kp::Shader::compile_source(shader));
sqs[i]->evalAsync<kp::OpAlgoDispatch>(algosAsync[i]);
}
for (uint32_t i = 0; i < numParallel; i++) {
mgrAsync.evalOpAwait("async" + std::to_string(i));
sqs[i]->evalAwait();
}
auto endAsync = std::chrono::high_resolution_clock::now();
@ -101,10 +108,10 @@ TEST(TestAsyncOperations, TestManagerParallelExecution)
endAsync - startAsync)
.count();
mgrAsync.evalOpDefault<kp::OpTensorSyncLocal>({ inputsAsyncB });
sq->eval<kp::OpTensorSyncLocal>({ inputsAsyncB });
for (uint32_t i = 0; i < numParallel; i++) {
EXPECT_EQ(inputsAsyncB[i]->data(), resultAsync);
EXPECT_EQ((inputsAsyncB[i]->vector<float>()), resultAsync);
}
// The speedup should be at least 40%
@ -138,33 +145,33 @@ TEST(TestAsyncOperations, TestManagerAsyncExecution)
}
)");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
std::vector<float> data(size, 0.0);
std::vector<float> resultAsync(size, 100000000);
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(data) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(data) };
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor(data);
std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor(data);
mgr.sequence("asyncOne");
mgr.sequence("asyncTwo");
std::shared_ptr<kp::Sequence> sq1 = mgr.sequence();
std::shared_ptr<kp::Sequence> sq2 = mgr.sequence();
mgr.rebuild({ tensorA, tensorB });
sq1->eval<kp::OpTensorSyncLocal>({ tensorA, tensorB });
std::vector<uint32_t> result = kp::Shader::compile_source(shader);
std::shared_ptr<kp::Algorithm> algo1 = mgr.algorithm({ tensorA }, spirv);
std::shared_ptr<kp::Algorithm> algo2 = mgr.algorithm({ tensorB }, spirv);
mgr.evalOpAsync<kp::OpAlgoBase>(
{ tensorA }, "asyncOne", kp::Shader::compile_source(shader));
sq1->evalAsync<kp::OpAlgoDispatch>(algo1);
sq2->evalAsync<kp::OpAlgoDispatch>(algo2);
mgr.evalOpAsync<kp::OpAlgoBase>(
{ tensorB }, "asyncTwo", kp::Shader::compile_source(shader));
sq1->evalAwait();
sq2->evalAwait();
mgr.evalOpAwait("asyncOne");
mgr.evalOpAwait("asyncTwo");
sq1->evalAsync<kp::OpTensorSyncLocal>({ tensorA, tensorB });
sq1->evalAwait();
mgr.evalOpAsyncDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB });
mgr.evalOpAwaitDefault();
EXPECT_EQ(tensorA->data(), resultAsync);
EXPECT_EQ(tensorB->data(), resultAsync);
EXPECT_EQ(tensorA->vector(), resultAsync);
EXPECT_EQ(tensorB->vector(), resultAsync);
}

View file

@ -5,7 +5,7 @@
TEST(TestDestroy, TestDestroyTensorSingle)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::TensorT<float>> tensorA = nullptr;
std::string shader(R"(
#version 450
@ -16,37 +16,37 @@ TEST(TestDestroy, TestDestroyTensorSingle)
pa[index] = pa[index] + 1;
})");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
{
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
mgr.rebuild({ tensorA });
tensorA = mgr.tensor({ 0, 0, 0 });
sq = mgr.sequence();
std::shared_ptr<kp::Algorithm> algo =
mgr.algorithm({ tensorA }, spirv);
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->end();
mgr.sequence()
->record<kp::OpAlgoDispatch>(algo)
->eval()
->eval<kp::OpTensorSyncLocal>(algo->getTensors());
sq->eval();
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
mgr.destroy(tensorA);
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 1, 1, 1 }));
tensorA->destroy();
EXPECT_FALSE(tensorA->isInit());
}
EXPECT_FALSE(tensorA->isInit());
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 1, 1, 1 }));
}
TEST(TestDestroy, TestDestroyTensorVector)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 1, 1, 1 }) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor({ 1, 1, 1 }) };
std::shared_ptr<kp::TensorT<float>> tensorA = nullptr;
std::shared_ptr<kp::TensorT<float>> tensorB = nullptr;
std::string shader(R"(
#version 450
@ -58,6 +58,7 @@ TEST(TestDestroy, TestDestroyTensorVector)
pa[index] = pa[index] + 1;
pb[index] = pb[index] + 2;
})");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
{
std::shared_ptr<kp::Sequence> sq = nullptr;
@ -65,55 +66,33 @@ TEST(TestDestroy, TestDestroyTensorVector)
{
kp::Manager mgr;
mgr.rebuild({ tensorA, tensorB });
tensorA = mgr.tensor({ 1, 1, 1 });
tensorB = mgr.tensor({ 1, 1, 1 });
sq = mgr.sequence();
std::shared_ptr<kp::Algorithm> algo =
mgr.algorithm({ tensorA, tensorB }, spirv);
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA, tensorB }, kp::Shader::compile_source(shader));
sq->end();
mgr.sequence()
->record<kp::OpTensorSyncDevice>(algo->getTensors())
->record<kp::OpAlgoDispatch>(algo)
->record<kp::OpTensorSyncLocal>(algo->getTensors())
->eval();
sq->eval();
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 2, 2, 2 }));
EXPECT_EQ(tensorB->vector(), std::vector<float>({ 3, 3, 3 }));
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB });
mgr.destroy({ tensorA, tensorB });
tensorA->destroy();
tensorB->destroy();
EXPECT_FALSE(tensorA->isInit());
EXPECT_FALSE(tensorB->isInit());
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 2, 2, 2 }));
EXPECT_EQ(tensorB->data(), std::vector<float>({ 3, 3, 3 }));
}
TEST(TestDestroy, TestDestroyTensorVectorUninitialised)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 1, 1, 1 }) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor({ 1, 1, 1 }) };
{
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
mgr.rebuild({ tensorA, tensorB });
mgr.destroy({ tensorA, tensorB });
EXPECT_FALSE(tensorA->isInit());
EXPECT_FALSE(tensorB->isInit());
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 1, 1, 1 }));
EXPECT_EQ(tensorA->data(), std::vector<float>({ 1, 1, 1 }));
}
TEST(TestDestroy, TestDestroySequenceSingle)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::TensorT<float>> tensorA = nullptr;
std::string shader(R"(
#version 450
@ -124,247 +103,28 @@ TEST(TestDestroy, TestDestroySequenceSingle)
pa[index] = pa[index] + 1;
})");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
{
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
mgr.rebuild({ tensorA });
tensorA = mgr.tensor({ 0, 0, 0 });
sq = mgr.sequence();
sq =
mgr.sequence()
->record<kp::OpTensorSyncDevice>({ tensorA })
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record<kp::OpTensorSyncLocal>({ tensorA })
->eval();
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->end();
sq->eval();
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
mgr.destroy(sq);
sq->destroy();
EXPECT_FALSE(sq->isInit());
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 1, 1, 1 }));
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 1, 1, 1 }));
}
TEST(TestDestroy, TestDestroySequenceVector)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = pa[index] + 1;
})");
{
std::shared_ptr<kp::Sequence> sq1 = nullptr;
std::shared_ptr<kp::Sequence> sq2 = nullptr;
{
kp::Manager mgr;
mgr.rebuild({ tensorA });
sq1 = mgr.sequence("One");
sq1->begin();
sq1->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq1->end();
sq1->eval();
sq2 = mgr.sequence("Two");
sq2->begin();
sq2->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq2->end();
sq2->eval();
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
mgr.destroy({ sq1, sq2 });
EXPECT_FALSE(sq1->isInit());
EXPECT_FALSE(sq2->isInit());
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 2, 2, 2 }));
}
TEST(TestDestroy, TestDestroySequenceNameSingleInsideManager)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = pa[index] + 1;
})");
{
kp::Manager mgr;
{
mgr.rebuild({ tensorA });
mgr.evalOp<kp::OpAlgoBase>(
{ tensorA }, "one",
kp::Shader::compile_source(shader));
mgr.evalOp<kp::OpAlgoBase>(
{ tensorA }, "two",
kp::Shader::compile_source(shader));
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
mgr.destroy("one");
mgr.destroy("two");
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 2, 2, 2 }));
}
TEST(TestDestroy, TestDestroySequenceNameSingleOutsideManager)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = pa[index] + 1;
})");
{
std::shared_ptr<kp::Sequence> sq1 = nullptr;
{
kp::Manager mgr;
mgr.rebuild({ tensorA });
sq1 = mgr.sequence("One");
sq1->begin();
sq1->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq1->end();
sq1->eval();
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
mgr.destroy("One");
EXPECT_FALSE(sq1->isInit());
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 1, 1, 1 }));
}
TEST(TestDestroy, TestDestroySequenceNameVectorInsideManager)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = pa[index] + 1;
})");
{
kp::Manager mgr;
{
mgr.rebuild({ tensorA });
mgr.evalOp<kp::OpAlgoBase>(
{ tensorA }, "one",
kp::Shader::compile_source(shader));
mgr.evalOp<kp::OpAlgoBase>(
{ tensorA }, "two",
kp::Shader::compile_source(shader));
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
mgr.destroy(std::vector<std::string>({"one", "two"}));
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 2, 2, 2 }));
}
TEST(TestDestroy, TestDestroySequenceNameVectorOutsideManager)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = pa[index] + 1;
})");
{
kp::Manager mgr;
{
mgr.rebuild({ tensorA });
mgr.evalOp<kp::OpAlgoBase>(
{ tensorA }, "one",
kp::Shader::compile_source(shader));
mgr.evalOp<kp::OpAlgoBase>(
{ tensorA }, "two",
kp::Shader::compile_source(shader));
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
mgr.destroy(std::vector<std::string>({"one", "two"}));
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 2, 2, 2 }));
}
TEST(TestDestroy, TestDestroySequenceNameDefaultOutsideManager)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = pa[index] + 1;
})");
{
kp::Manager mgr;
{
mgr.rebuild({ tensorA });
mgr.evalOpDefault<kp::OpAlgoBase>(
{ tensorA },
kp::Shader::compile_source(shader));
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
mgr.destroy(KP_DEFAULT_SESSION);
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 1, 1, 1 }));
}

View file

@ -11,47 +11,49 @@ TEST(TestLogisticRegression, TestMainLogisticRegression)
uint32_t ITERATIONS = 100;
float learningRate = 0.1;
std::shared_ptr<kp::Tensor> xI{ new kp::Tensor({ 0, 1, 1, 1, 1 }) };
std::shared_ptr<kp::Tensor> xJ{ new kp::Tensor({ 0, 0, 0, 1, 1 }) };
std::shared_ptr<kp::Tensor> y{ new kp::Tensor({ 0, 0, 0, 1, 1 }) };
std::shared_ptr<kp::Tensor> wIn{ new kp::Tensor({ 0.001, 0.001 }) };
std::shared_ptr<kp::Tensor> wOutI{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> wOutJ{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> bIn{ new kp::Tensor({ 0 }) };
std::shared_ptr<kp::Tensor> bOut{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> lOut{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
{
kp::Manager mgr;
mgr.rebuild(params);
std::shared_ptr<kp::TensorT<float>> xI = mgr.tensor({ 0, 1, 1, 1, 1 });
std::shared_ptr<kp::TensorT<float>> xJ = mgr.tensor({ 0, 0, 0, 1, 1 });
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
std::shared_ptr<kp::TensorT<float>> y = mgr.tensor({ 0, 0, 0, 1, 1 });
// Record op algo base
sq->begin();
std::shared_ptr<kp::TensorT<float>> wIn = mgr.tensor({ 0.001, 0.001 });
std::shared_ptr<kp::TensorT<float>> wOutI =
mgr.tensor({ 0, 0, 0, 0, 0 });
std::shared_ptr<kp::TensorT<float>> wOutJ =
mgr.tensor({ 0, 0, 0, 0, 0 });
sq->record<kp::OpTensorSyncDevice>({ wIn, bIn });
std::shared_ptr<kp::TensorT<float>> bIn = mgr.tensor({ 0 });
std::shared_ptr<kp::TensorT<float>> bOut =
mgr.tensor({ 0, 0, 0, 0, 0 });
sq->record<kp::OpAlgoBase>(
params,
std::vector<uint32_t>(
(uint32_t*)kp::shader_data::shaders_glsl_logisticregression_comp_spv,
(uint32_t*)(kp::shader_data::shaders_glsl_logisticregression_comp_spv +
kp::shader_data::shaders_glsl_logisticregression_comp_spv_len)),
kp::Workgroup(), kp::Constants({5.0}));
std::shared_ptr<kp::TensorT<float>> lOut =
mgr.tensor({ 0, 0, 0, 0, 0 });
sq->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
sq->end();
mgr.sequence()->eval<kp::OpTensorSyncDevice>(params);
std::vector<uint32_t> spirv = std::vector<uint32_t>(
(uint32_t*)kp::shader_data::
test_shaders_glsl_test_logistic_regression_comp_spv,
(uint32_t*)(kp::shader_data::
test_shaders_glsl_test_logistic_regression_comp_spv +
kp::shader_data::
test_shaders_glsl_test_logistic_regression_comp_spv_len));
std::shared_ptr<kp::Algorithm> algorithm = mgr.algorithm(
params, spirv, kp::Workgroup({ 5 }), kp::Constants({ 5.0 }));
std::shared_ptr<kp::Sequence> sq =
mgr.sequence()
->record<kp::OpTensorSyncDevice>({ wIn, bIn })
->record<kp::OpAlgoDispatch>(algorithm)
->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
// Iterate across all expected iterations
for (size_t i = 0; i < ITERATIONS; i++) {
@ -64,21 +66,21 @@ TEST(TestLogisticRegression, TestMainLogisticRegression)
bIn->data()[0] -= learningRate * bOut->data()[j];
}
}
// Based on the inputs the outputs should be at least:
// * wi < 0.01
// * wj > 1.0
// * b < 0
// TODO: Add EXPECT_DOUBLE_EQ instead
EXPECT_LT(wIn->data()[0], 0.01);
EXPECT_GT(wIn->data()[1], 1.0);
EXPECT_LT(bIn->data()[0], 0.0);
KP_LOG_WARN("Result wIn i: {}, wIn j: {}, bIn: {}",
wIn->data()[0],
wIn->data()[1],
bIn->data()[0]);
}
// Based on the inputs the outputs should be at least:
// * wi < 0.01
// * wj > 1.0
// * b < 0
// TODO: Add EXPECT_DOUBLE_EQ instead
EXPECT_LT(wIn->data()[0], 0.01);
EXPECT_GT(wIn->data()[1], 1.0);
EXPECT_LT(bIn->data()[0], 0.0);
KP_LOG_WARN("Result wIn i: {}, wIn j: {}, bIn: {}",
wIn->data()[0],
wIn->data()[1],
bIn->data()[0]);
}
TEST(TestLogisticRegression, TestMainLogisticRegressionManualCopy)
@ -87,50 +89,50 @@ TEST(TestLogisticRegression, TestMainLogisticRegressionManualCopy)
uint32_t ITERATIONS = 100;
float learningRate = 0.1;
kp::Constants wInVec = { 0.001, 0.001 };
std::vector<float> bInVec = { 0 };
std::shared_ptr<kp::Tensor> xI{ new kp::Tensor({ 0, 1, 1, 1, 1 }) };
std::shared_ptr<kp::Tensor> xJ{ new kp::Tensor({ 0, 0, 0, 1, 1 }) };
std::shared_ptr<kp::Tensor> y{ new kp::Tensor({ 0, 0, 0, 1, 1 }) };
std::shared_ptr<kp::Tensor> wIn{ new kp::Tensor(
wInVec, kp::Tensor::TensorTypes::eHost) };
std::shared_ptr<kp::Tensor> wOutI{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> wOutJ{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> bIn{ new kp::Tensor(
bInVec, kp::Tensor::TensorTypes::eHost) };
std::shared_ptr<kp::Tensor> bOut{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> lOut{ new kp::Tensor({ 0, 0, 0, 0, 0 }) };
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
{
kp::Manager mgr;
mgr.rebuild(params);
std::shared_ptr<kp::TensorT<float>> xI = mgr.tensor({ 0, 1, 1, 1, 1 });
std::shared_ptr<kp::TensorT<float>> xJ = mgr.tensor({ 0, 0, 0, 1, 1 });
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
std::shared_ptr<kp::TensorT<float>> y = mgr.tensor({ 0, 0, 0, 1, 1 });
// Record op algo base
sq->begin();
std::shared_ptr<kp::TensorT<float>> wIn =
mgr.tensor({ 0.001, 0.001 }, kp::Tensor::TensorTypes::eHost);
std::shared_ptr<kp::TensorT<float>> wOutI =
mgr.tensor({ 0, 0, 0, 0, 0 });
std::shared_ptr<kp::TensorT<float>> wOutJ =
mgr.tensor({ 0, 0, 0, 0, 0 });
sq->record<kp::OpAlgoBase>(
params,
std::vector<uint32_t>(
(uint32_t*)kp::shader_data::shaders_glsl_logisticregression_comp_spv,
(uint32_t*)(kp::shader_data::shaders_glsl_logisticregression_comp_spv +
kp::shader_data::shaders_glsl_logisticregression_comp_spv_len)),
kp::Workgroup(), kp::Constants({5.0}));
std::shared_ptr<kp::TensorT<float>> bIn =
mgr.tensor({ 0 }, kp::Tensor::TensorTypes::eHost);
std::shared_ptr<kp::TensorT<float>> bOut =
mgr.tensor({ 0, 0, 0, 0, 0 });
sq->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
std::shared_ptr<kp::TensorT<float>> lOut =
mgr.tensor({ 0, 0, 0, 0, 0 });
sq->end();
std::vector<std::shared_ptr<kp::Tensor>> params = { xI, xJ, y,
wIn, wOutI, wOutJ,
bIn, bOut, lOut };
mgr.sequence()->record<kp::OpTensorSyncDevice>(params)->eval();
std::vector<uint32_t> spirv = std::vector<uint32_t>(
(uint32_t*)kp::shader_data::shaders_glsl_logisticregression_comp_spv,
(uint32_t*)(kp::shader_data::
shaders_glsl_logisticregression_comp_spv +
kp::shader_data::
shaders_glsl_logisticregression_comp_spv_len));
std::shared_ptr<kp::Algorithm> algorithm =
mgr.algorithm(params, spirv, kp::Workgroup(), kp::Constants({ 5.0 }));
std::shared_ptr<kp::Sequence> sq =
mgr.sequence()
->record<kp::OpTensorSyncDevice>({ wIn, bIn })
->record<kp::OpAlgoDispatch>(algorithm)
->record<kp::OpTensorSyncLocal>({ wOutI, wOutJ, bOut, lOut });
// Iterate across all expected iterations
for (size_t i = 0; i < ITERATIONS; i++) {
@ -142,22 +144,20 @@ TEST(TestLogisticRegression, TestMainLogisticRegressionManualCopy)
wIn->data()[1] -= learningRate * wOutJ->data()[j];
bIn->data()[0] -= learningRate * bOut->data()[j];
}
wIn->mapDataIntoHostMemory();
bIn->mapDataIntoHostMemory();
}
// Based on the inputs the outputs should be at least:
// * wi < 0.01
// * wj > 1.0
// * b < 0
// TODO: Add EXPECT_DOUBLE_EQ instead
EXPECT_LT(wIn->data()[0], 0.01);
EXPECT_GT(wIn->data()[1], 1.0);
EXPECT_LT(bIn->data()[0], 0.0);
KP_LOG_WARN("Result wIn i: {}, wIn j: {}, bIn: {}",
wIn->data()[0],
wIn->data()[1],
bIn->data()[0]);
}
// Based on the inputs the outputs should be at least:
// * wi < 0.01
// * wj > 1.0
// * b < 0
// TODO: Add EXPECT_DOUBLE_EQ instead
EXPECT_LT(wIn->data()[0], 0.01);
EXPECT_GT(wIn->data()[1], 1.0);
EXPECT_LT(bIn->data()[0], 0.0);
KP_LOG_WARN("Result wIn i: {}, wIn j: {}, bIn: {}",
wIn->data()[0],
wIn->data()[1],
bIn->data()[0]);
}

View file

@ -3,130 +3,69 @@
#include "kompute/Kompute.hpp"
TEST(TestManager, EndToEndOpMultFlow)
TEST(TestManager, EndToEndOpMultEvalFlow)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorLHS{ new kp::Tensor({ 0, 1, 2 }) };
mgr.rebuild({ tensorLHS });
std::shared_ptr<kp::TensorT<float>> tensorLHS = mgr.tensor({ 0, 1, 2 });
std::shared_ptr<kp::TensorT<float>> tensorRHS = mgr.tensor({ 2, 4, 6 });
std::shared_ptr<kp::TensorT<float>> tensorOutput = mgr.tensor({ 0, 0, 0 });
std::shared_ptr<kp::Tensor> tensorRHS{ new kp::Tensor({ 2, 4, 6 }) };
mgr.rebuild({ tensorRHS });
std::vector<std::shared_ptr<kp::Tensor>> params = { tensorLHS,
tensorRHS,
tensorOutput };
std::shared_ptr<kp::Tensor> tensorOutput{ new kp::Tensor({ 0, 0, 0 }) };
mgr.sequence()
->eval<kp::OpTensorSyncDevice>(params)
->eval<kp::OpMult>(params, mgr.algorithm())
->eval<kp::OpTensorSyncLocal>(params);
mgr.rebuild({ tensorOutput });
mgr.evalOpDefault<kp::OpMult>({ tensorLHS, tensorRHS, tensorOutput });
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorOutput });
EXPECT_EQ(tensorOutput->data(), std::vector<float>({ 0, 4, 12 }));
EXPECT_EQ(tensorOutput->vector(), std::vector<float>({ 0, 4, 12 }));
}
TEST(TestManager, OpMultSequenceFlow)
TEST(TestManager, EndToEndOpMultSeqFlow)
{
std::shared_ptr<kp::Tensor> tensorLHS{ new kp::Tensor({ 0, 1, 2 }) };
std::shared_ptr<kp::Tensor> tensorRHS{ new kp::Tensor({ 2, 4, 6 }) };
std::shared_ptr<kp::Tensor> tensorOutput{ new kp::Tensor({ 0, 0, 0 }) };
kp::Manager mgr;
{
mgr.rebuild({ tensorLHS, tensorRHS, tensorOutput });
std::shared_ptr<kp::TensorT<float>> tensorLHS = mgr.tensor({ 0, 1, 2 });
std::shared_ptr<kp::TensorT<float>> tensorRHS = mgr.tensor({ 2, 4, 6 });
std::shared_ptr<kp::TensorT<float>> tensorOutput = mgr.tensor({ 0, 0, 0 });
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence");
std::vector<std::shared_ptr<kp::Tensor>> params = { tensorLHS,
tensorRHS,
tensorOutput };
sq->begin();
mgr.sequence()
->record<kp::OpTensorSyncDevice>(params)
->record<kp::OpMult>(params, mgr.algorithm())
->record<kp::OpTensorSyncLocal>(params)
->eval();
sq->record<kp::OpMult>({ tensorLHS, tensorRHS, tensorOutput });
sq->record<kp::OpTensorSyncLocal>({ tensorOutput });
sq->end();
sq->eval();
}
EXPECT_EQ(tensorOutput->data(), std::vector<float>({ 0, 4, 12 }));
EXPECT_EQ(tensorOutput->vector(), std::vector<float>({ 0, 4, 12 }));
}
TEST(TestManager, TestMultipleSequences)
{
kp::Manager mgr;
std::shared_ptr<kp::Sequence> sqOne =
mgr.sequence("sqOne");
std::shared_ptr<kp::TensorT<float>> tensorLHS = mgr.tensor({ 0, 1, 2 });
std::shared_ptr<kp::TensorT<float>> tensorRHS = mgr.tensor({ 2, 4, 6 });
std::shared_ptr<kp::TensorT<float>> tensorOutput = mgr.tensor({ 0, 0, 0 });
std::shared_ptr<kp::Sequence> sqTwo =
mgr.sequence("sqTwo");
std::vector<std::shared_ptr<kp::Tensor>> params = { tensorLHS,
tensorRHS,
tensorOutput };
std::shared_ptr<kp::Sequence> sqOneRef =
mgr.sequence("sqOne");
mgr.sequence()->eval<kp::OpTensorSyncDevice>(params);
mgr.sequence()->eval<kp::OpMult>(params, mgr.algorithm());
mgr.sequence()->eval<kp::OpTensorSyncLocal>(params);
std::shared_ptr<kp::Sequence> sqTwoRef =
mgr.sequence("sqTwo");
EXPECT_EQ(sqOne, sqOneRef);
EXPECT_NE(sqTwo, sqOneRef);
EXPECT_EQ(sqTwo, sqTwoRef);
EXPECT_NE(sqOneRef, sqTwoRef);
EXPECT_EQ(tensorOutput->vector(), std::vector<float>({ 0, 4, 12 }));
}
TEST(TestManager, TestMultipleTensorsAtOnce)
{
std::shared_ptr<kp::Tensor> tensorLHS{ new kp::Tensor({ 0, 1, 2 }) };
std::shared_ptr<kp::Tensor> tensorRHS{ new kp::Tensor({ 2, 4, 6 }) };
std::shared_ptr<kp::Tensor> tensorOutput{ new kp::Tensor({ 0, 0, 0 }) };
kp::Manager mgr;
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence");
{
mgr.rebuild({ tensorLHS, tensorRHS, tensorOutput });
EXPECT_TRUE(tensorLHS->isInit());
EXPECT_TRUE(tensorRHS->isInit());
EXPECT_TRUE(tensorOutput->isInit());
sq->begin();
sq->record<kp::OpMult>({ tensorLHS, tensorRHS, tensorOutput });
sq->record<kp::OpTensorSyncLocal>({ tensorOutput });
sq->end();
sq->eval();
}
EXPECT_EQ(tensorOutput->data(), std::vector<float>({ 0, 4, 12 }));
}
TEST(TestManager, TestCreateInitTensor)
TEST(TestManager, TestDeviceProperties)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA = mgr.tensor({ 0, 1, 2 });
std::shared_ptr<kp::Tensor> tensorB = mgr.tensor({ 0, 0, 0 });
mgr.evalOpDefault<kp::OpTensorCopy>({ tensorA, tensorB });
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorB });
EXPECT_EQ(tensorB->data(), std::vector<float>({ 0, 1, 2 }));
std::shared_ptr<kp::Tensor> tensorC =
mgr.tensor({ 0, 0, 0 }, kp::Tensor::TensorTypes::eHost);
mgr.evalOpDefault<kp::OpTensorCopy>({ tensorA, tensorC });
EXPECT_EQ(tensorC->data(), std::vector<float>({ 0, 1, 2 }));
const auto properties = mgr.getDeviceProperties();
EXPECT_GT(properties.deviceName.size(), 0);
}

View file

@ -3,12 +3,82 @@
#include "kompute/Kompute.hpp"
TEST(TestMultipleAlgoExecutions, TestEndToEndFunctionality)
{
kp::Manager mgr;
// Default tensor constructor simplifies creation of float values
auto tensorInA = mgr.tensor({ 2., 2., 2. });
auto tensorInB = mgr.tensor({ 1., 2., 3. });
// Explicit type constructor supports int, in32, double, float and int
auto tensorOutA = mgr.tensorT<uint32_t>({ 0, 0, 0 });
auto tensorOutB = mgr.tensorT<uint32_t>({ 0, 0, 0 });
std::string shader = (R"(
#version 450
layout (local_size_x = 1) in;
// The input tensors bind index is relative to index in parameter passed
layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };
layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };
layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };
layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };
// Kompute supports push constants updated on dispatch
layout(push_constant) uniform PushConstants {
float val;
} push_const;
// Kompute also supports spec constants on initalization
layout(constant_id = 0) const float const_one = 0;
void main() {
uint index = gl_GlobalInvocationID.x;
out_a[index] += uint( in_a[index] * in_b[index] );
out_b[index] += uint( const_one * push_const.val );
}
)");
std::vector<std::shared_ptr<kp::Tensor>> params = {
tensorInA, tensorInB, tensorOutA, tensorOutB
};
kp::Workgroup workgroup({ 3, 1, 1 });
kp::Constants specConsts({ 2 });
kp::Constants pushConstsA({ 2.0 });
kp::Constants pushConstsB({ 3.0 });
auto algorithm = mgr.algorithm(params,
kp::Shader::compileSource(shader),
workgroup,
specConsts,
pushConstsA);
// 3. Run operation with string shader synchronously
mgr.sequence()
->record<kp::OpTensorSyncDevice>(params)
->record<kp::OpAlgoDispatch>(algorithm)
->eval()
->record<kp::OpAlgoDispatch>(algorithm, pushConstsB)
->eval();
auto sq = mgr.sequence();
sq->evalAsync<kp::OpTensorSyncLocal>(params);
sq->evalAwait();
EXPECT_EQ(tensorOutA->vector(), std::vector<uint32_t>({ 4, 8, 12 }));
EXPECT_EQ(tensorOutB->vector(), std::vector<uint32_t>({ 10, 10, 10 }));
}
TEST(TestMultipleAlgoExecutions, SingleSequenceRecord)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor({ 0, 0, 0 });
std::string shader(R"(
#version 450
@ -19,35 +89,26 @@ TEST(TestMultipleAlgoExecutions, SingleSequenceRecord)
pa[index] = pa[index] + 1;
})");
mgr.rebuild({ tensorA });
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
{
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->record<kp::OpTensorSyncLocal>({ tensorA });
sq->end();
sq->eval();
mgr.sequence()
->record<kp::OpTensorSyncDevice>({ tensorA })
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record<kp::OpTensorSyncLocal>({ tensorA })
->eval();
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 3, 3, 3 }));
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 3, 3, 3 }));
}
TEST(TestMultipleAlgoExecutions, MultipleCmdBufRecords)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor({ 0, 0, 0 });
std::string shader(R"(
#version 450
@ -58,43 +119,24 @@ TEST(TestMultipleAlgoExecutions, MultipleCmdBufRecords)
pa[index] = pa[index] + 1;
})");
mgr.rebuild({ tensorA }, false);
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
std::shared_ptr<kp::Sequence> sqTensor = mgr.sequence();
std::shared_ptr<kp::Algorithm> algorithm =
mgr.algorithm({ tensorA }, spirv);
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
// First create the tensor in a separate sequence
sqTensor->begin();
sqTensor->record<kp::OpTensorSyncDevice>({ tensorA });
sqTensor->end();
sqTensor->eval();
mgr.sequence()->record<kp::OpTensorSyncDevice>({ tensorA })->eval();
// Then perform the computations
sq->begin();
sq->record<kp::OpAlgoBase>({ tensorA },
kp::Shader::compile_source(shader));
sq->end();
sq->eval();
mgr.sequence()->record<kp::OpAlgoDispatch>(algorithm)->eval();
sq->begin();
sq->record<kp::OpAlgoBase>({ tensorA },
kp::Shader::compile_source(shader));
sq->end();
sq->eval();
mgr.sequence()->record<kp::OpAlgoDispatch>(algorithm)->eval();
sq->begin();
sq->record<kp::OpAlgoBase>({ tensorA },
kp::Shader::compile_source(shader));
sq->end();
sq->eval();
mgr.sequence()->record<kp::OpAlgoDispatch>(algorithm)->eval();
sq->begin();
sq->record<kp::OpTensorSyncLocal>({ tensorA });
sq->end();
sq->eval();
mgr.sequence()->record<kp::OpTensorSyncLocal>({ tensorA })->eval();
EXPECT_EQ(tensorA->data(), std::vector<float>({ 3, 3, 3 }));
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 3, 3, 3 }));
}
TEST(TestMultipleAlgoExecutions, MultipleSequences)
@ -102,7 +144,7 @@ TEST(TestMultipleAlgoExecutions, MultipleSequences)
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor({ 0, 0, 0 });
std::string shader(R"(
#version 450
@ -113,68 +155,31 @@ TEST(TestMultipleAlgoExecutions, MultipleSequences)
pa[index] = pa[index] + 1;
})");
mgr.rebuild({ tensorA });
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence");
std::shared_ptr<kp::Algorithm> algorithm =
mgr.algorithm({ tensorA }, spirv);
sq->begin();
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->record<kp::OpTensorSyncDevice>({ tensorA })->eval();
sq->end();
sq->eval();
}
sq->record<kp::OpAlgoDispatch>(algorithm)->eval();
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence2");
sq->record<kp::OpAlgoDispatch>(algorithm)->eval();
sq->begin();
sq->record<kp::OpAlgoDispatch>(algorithm)->eval();
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->record<kp::OpTensorSyncLocal>({ tensorA })->eval();
sq->end();
sq->eval();
}
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence3");
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->end();
sq->eval();
}
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence5");
sq->begin();
sq->record<kp::OpTensorSyncLocal>({ tensorA });
sq->end();
sq->eval();
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 3, 3, 3 }));
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 3, 3, 3 }));
}
TEST(TestMultipleAlgoExecutions, SingleRecordMultipleEval)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor({ 0, 0, 0 });
std::string shader(R"(
#version 450
@ -185,169 +190,18 @@ TEST(TestMultipleAlgoExecutions, SingleRecordMultipleEval)
pa[index] = pa[index] + 1;
})");
mgr.rebuild({ tensorA }, false);
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence");
std::shared_ptr<kp::Algorithm> algorithm =
mgr.algorithm({ tensorA }, spirv);
sq->begin();
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
sq->record<kp::OpTensorSyncDevice>({ tensorA });
sq->record<kp::OpTensorSyncDevice>({ tensorA })->eval();
sq->end();
sq->eval();
}
sq->record<kp::OpAlgoDispatch>(algorithm)->eval()->eval()->eval();
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence2");
sq->record<kp::OpTensorSyncLocal>({ tensorA })->eval();
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->end();
sq->eval();
sq->eval();
sq->eval();
}
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence3");
sq->begin();
sq->record<kp::OpTensorSyncLocal>({ tensorA });
sq->end();
sq->eval();
sq->eval();
sq->eval();
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 3, 3, 3 }));
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 3, 3, 3 }));
}
TEST(TestMultipleAlgoExecutions, ManagerEvalMultSourceStrOpCreate)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorInA{ new kp::Tensor({ 2.0, 4.0, 6.0 }) };
std::shared_ptr<kp::Tensor> tensorInB{ new kp::Tensor({ 0.0, 1.0, 2.0 }) };
std::shared_ptr<kp::Tensor> tensorOut{ new kp::Tensor({ 0.0, 0.0, 0.0 }) };
mgr.rebuild({ tensorInA, tensorInB, tensorOut });
std::string shader(R"(
// The version to use
#version 450
// The execution structure
layout (local_size_x = 1) in;
// The buffers are provided via the tensors
layout(binding = 0) buffer bufA { float a[]; };
layout(binding = 1) buffer bufB { float b[]; };
layout(binding = 2) buffer bufOut { float o[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
o[index] = a[index] * b[index];
}
)");
mgr.evalOpDefault<kp::OpAlgoBase>(
{ tensorInA, tensorInB, tensorOut },
kp::Shader::compile_source(shader));
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorOut });
EXPECT_EQ(tensorOut->data(), std::vector<float>({ 0.0, 4.0, 12.0 }));
}
TEST(TestMultipleAlgoExecutions, ManagerEvalMultSourceStrMgrCreate)
{
kp::Manager mgr;
auto tensorInA = mgr.tensor(
{ 2.0, 4.0, 6.0 }, kp::Tensor::TensorTypes::eDevice, false);
auto tensorInB = mgr.tensor(
{ 0.0, 1.0, 2.0 }, kp::Tensor::TensorTypes::eDevice, false);
auto tensorOut = mgr.tensor(
{ 0.0, 0.0, 0.0 }, kp::Tensor::TensorTypes::eDevice, false);
std::string shader(R"(
// The version to use
#version 450
// The execution structure
layout (local_size_x = 1) in;
// The buffers are provided via the tensors
layout(binding = 0) buffer bufA { float a[]; };
layout(binding = 1) buffer bufB { float b[]; };
layout(binding = 2) buffer bufOut { float o[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
o[index] = a[index] * b[index];
}
)");
mgr.evalOpDefault<kp::OpTensorSyncDevice>(
{ tensorInA, tensorInB, tensorOut });
mgr.evalOpDefault<kp::OpAlgoBase>(
{ tensorInA, tensorInB, tensorOut },
kp::Shader::compile_source(shader));
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorOut });
EXPECT_EQ(tensorOut->data(), std::vector<float>({ 0.0, 4.0, 12.0 }));
}
TEST(TestMultipleAlgoExecutions, SequenceAlgoDestroyOutsideManagerScope)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = pa[index] + 1;
})");
{
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
mgr.rebuild({ tensorA });
sq = mgr.sequence();
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA }, kp::Shader::compile_source(shader));
sq->end();
sq->eval();
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 1, 1, 1 }));
}

View file

@ -1,80 +0,0 @@
#include "gtest/gtest.h"
#include "kompute/Kompute.hpp"
TEST(TestProcessingIterations, IterateThroughMultipleSumAndCopies)
{
kp::Manager mgr;
float TOTAL_ITER = 10;
std::vector<float> testExpectedOutVec = { TOTAL_ITER,
TOTAL_ITER,
TOTAL_ITER };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor({ 0, 0, 0 }) };
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
layout(set = 0, binding = 1) buffer b { float pb[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pb[index] = pa[index] + 1;
}
)");
mgr.rebuild({ tensorA, tensorB }, false);
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("default");
sq->begin();
sq->record<kp::OpTensorSyncDevice>({ tensorA, tensorB });
sq->end();
sq->eval();
}
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("run");
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA, tensorB },
kp::Shader::compile_source(shader));
sq->record<kp::OpTensorCopy>({ tensorB, tensorA });
sq->end();
for (size_t i = 0; i < TOTAL_ITER; i++) {
sq->eval();
}
}
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("export");
sq->begin();
sq->record<kp::OpTensorSyncLocal>({ tensorA, tensorB });
sq->end();
sq->eval();
}
EXPECT_EQ(tensorA->data(), testExpectedOutVec);
}

View file

@ -5,13 +5,12 @@
#include "kompute_test/shaders/shadertest_op_custom_shader.hpp"
TEST(TestOpAlgoBase, ShaderRawDataFromConstructor)
TEST(TestOpAlgoCreate, ShaderRawDataFromConstructor)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 3, 4, 5 }) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor({ 0, 0, 0 }) };
mgr.rebuild({ tensorA, tensorB });
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor({ 3, 4, 5 });
std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor({ 0, 0, 0 });
std::string shader(R"(
#version 450
@ -28,50 +27,60 @@ TEST(TestOpAlgoBase, ShaderRawDataFromConstructor)
}
)");
mgr.evalOpDefault<kp::OpAlgoBase>(
{ tensorA, tensorB }, kp::Shader::compile_source(shader));
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB });
std::vector<std::shared_ptr<kp::Tensor>> params = { tensorA, tensorB };
EXPECT_EQ(tensorA->data(), std::vector<float>({ 0, 1, 2 }));
EXPECT_EQ(tensorB->data(), std::vector<float>({ 3, 4, 5 }));
mgr.sequence()
->eval<kp::OpTensorSyncDevice>(params)
->eval<kp::OpAlgoDispatch>(mgr.algorithm(params, spirv))
->eval<kp::OpTensorSyncLocal>(params);
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 0, 1, 2 }));
EXPECT_EQ(tensorB->vector(), std::vector<float>({ 3, 4, 5 }));
}
TEST(TestOpAlgoBase, ShaderCompiledDataFromConstructor)
TEST(TestOpAlgoCreate, ShaderCompiledDataFromConstructor)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 3, 4, 5 }) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor({ 0, 0, 0 }) };
mgr.rebuild({ tensorA, tensorB });
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor({ 3, 4, 5 });
std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor({ 0, 0, 0 });
mgr.evalOpDefault<kp::OpAlgoBase>(
{ tensorA, tensorB },
std::vector<uint32_t>(
(uint32_t*)kp::shader_data::test_shaders_glsl_test_op_custom_shader_comp_spv,
(uint32_t*)(kp::shader_data::test_shaders_glsl_test_op_custom_shader_comp_spv +
kp::shader_data::
test_shaders_glsl_test_op_custom_shader_comp_spv_len)));
std::vector<uint32_t> spirv = std::vector<uint32_t>(
(uint32_t*)
kp::shader_data::test_shaders_glsl_test_op_custom_shader_comp_spv,
(uint32_t*)(kp::shader_data::
test_shaders_glsl_test_op_custom_shader_comp_spv +
kp::shader_data::
test_shaders_glsl_test_op_custom_shader_comp_spv_len));
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB });
std::vector<std::shared_ptr<kp::Tensor>> params = { tensorA, tensorB };
EXPECT_EQ(tensorA->data(), std::vector<float>({ 0, 1, 2 }));
EXPECT_EQ(tensorB->data(), std::vector<float>({ 3, 4, 5 }));
mgr.sequence()
->eval<kp::OpTensorSyncDevice>(params)
->eval<kp::OpAlgoDispatch>(mgr.algorithm(params, spirv))
->eval<kp::OpTensorSyncLocal>(params);
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 0, 1, 2 }));
EXPECT_EQ(tensorB->vector(), std::vector<float>({ 3, 4, 5 }));
}
TEST(TestOpAlgoBase, ShaderCompiledDataFromFile)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 3, 4, 5 }) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor({ 0, 0, 0 }) };
mgr.rebuild({ tensorA, tensorB });
mgr.evalOpDefault<kp::OpAlgoBase>(
{ tensorA, tensorB }, "test/shaders/glsl/test_op_custom_shader.comp.spv");
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB });
EXPECT_EQ(tensorA->data(), std::vector<float>({ 0, 1, 2 }));
EXPECT_EQ(tensorB->data(), std::vector<float>({ 3, 4, 5 }));
}
// TODO: Add support to read from file for shader
// TEST(TestOpAlgoCreate, ShaderCompiledDataFromFile)
//{
// kp::Manager mgr;
//
// std::shared_ptr<kp::TensorT<float>> tensorA{ new kp::Tensor({ 3, 4, 5 })
// }; std::shared_ptr<kp::TensorT<float>> tensorB{ new kp::Tensor({ 0, 0, 0
// }) }; mgr.rebuild({ tensorA, tensorB });
//
// mgr.evalOpDefault<kp::OpAlgoCreate>(
// { tensorA, tensorB },
// "test/shaders/glsl/test_op_custom_shader.comp.spv");
//
// mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB });
//
// EXPECT_EQ(tensorA->vector(), std::vector<float>({ 0, 1, 2 }));
// EXPECT_EQ(tensorB->vector(), std::vector<float>({ 3, 4, 5 }));
//}

View file

@ -11,21 +11,19 @@ TEST(TestOpTensorCopy, CopyDeviceToDeviceTensor)
std::vector<float> testVecA{ 1, 2, 3 };
std::vector<float> testVecB{ 0, 0, 0 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(testVecB) };
mgr.rebuild({ tensorA, tensorB });
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor(testVecA);
std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor(testVecB);
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
mgr.evalOpDefault<kp::OpTensorCopy>({ tensorA, tensorB });
mgr.sequence()
->eval<kp::OpTensorSyncDevice>({ tensorA, tensorB })
->eval<kp::OpTensorCopy>({ tensorA, tensorB })
->eval<kp::OpTensorSyncLocal>({ tensorA, tensorB });
EXPECT_EQ(tensorA->data(), tensorB->data());
// Making sure the GPU holds the same data
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorB });
EXPECT_EQ(tensorA->data(), tensorB->data());
// Making sure the GPU holds the same vector
EXPECT_EQ(tensorA->vector(), tensorB->vector());
}
TEST(TestOpTensorCopy, CopyDeviceToDeviceTensorMulti)
@ -37,25 +35,26 @@ TEST(TestOpTensorCopy, CopyDeviceToDeviceTensorMulti)
std::vector<float> testVecB{ 0, 0, 0 };
std::vector<float> testVecC{ 0, 0, 0 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(testVecB) };
std::shared_ptr<kp::Tensor> tensorC{ new kp::Tensor(testVecC) };
mgr.rebuild({ tensorA, tensorB, tensorC });
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor(testVecA);
std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor(testVecB);
std::shared_ptr<kp::TensorT<float>> tensorC = mgr.tensor(testVecC);
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
EXPECT_TRUE(tensorC->isInit());
mgr.evalOpDefault<kp::OpTensorCopy>({ tensorA, tensorB, tensorC });
mgr.sequence()
->eval<kp::OpTensorSyncLocal>({ tensorA, tensorB, tensorC })
->eval<kp::OpTensorCopy>({ tensorA, tensorB, tensorC });
EXPECT_EQ(tensorA->data(), tensorB->data());
EXPECT_EQ(tensorA->data(), tensorC->data());
EXPECT_EQ(tensorA->vector(), tensorB->vector());
EXPECT_EQ(tensorA->vector(), tensorC->vector());
// Making sure the GPU holds the same data
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorB, tensorC });
EXPECT_EQ(tensorA->data(), tensorB->data());
EXPECT_EQ(tensorA->data(), tensorC->data());
// Making sure the GPU holds the same vector
mgr.sequence()->eval<kp::OpTensorSyncLocal>({ tensorB, tensorC });
EXPECT_EQ(tensorA->vector(), tensorB->vector());
EXPECT_EQ(tensorA->vector(), tensorC->vector());
}
TEST(TestOpTensorCopy, CopyDeviceToHostTensor)
@ -66,25 +65,23 @@ TEST(TestOpTensorCopy, CopyDeviceToHostTensor)
std::vector<float> testVecA{ 3, 4, 5 };
std::vector<float> testVecB{ 0, 0, 0 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(
testVecB, kp::Tensor::TensorTypes::eHost) };
mgr.rebuild({ tensorA, tensorB }, false);
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor(testVecA);
std::shared_ptr<kp::TensorT<float>> tensorB =
mgr.tensor(testVecB, kp::Tensor::TensorTypes::eHost);
// Only calling sync on device type tensor
mgr.evalOpDefault<kp::OpTensorSyncDevice>({ tensorA });
mgr.sequence()->eval<kp::OpTensorSyncDevice>({ tensorA });
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
mgr.evalOpDefault<kp::OpTensorCopy>({ tensorA, tensorB });
mgr.sequence()->eval<kp::OpTensorCopy>({ tensorA, tensorB });
EXPECT_EQ(tensorA->data(), tensorB->data());
EXPECT_EQ(tensorA->vector(), tensorB->vector());
// Making sure the GPU holds the same data
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorB });
EXPECT_EQ(tensorA->data(), tensorB->data());
// Making sure the GPU holds the same vector
mgr.sequence()->eval<kp::OpTensorSyncLocal>({ tensorB });
EXPECT_EQ(tensorA->vector(), tensorB->vector());
}
TEST(TestOpTensorCopy, CopyHostToDeviceTensor)
@ -95,28 +92,23 @@ TEST(TestOpTensorCopy, CopyHostToDeviceTensor)
std::vector<float> testVecA{ 4, 5, 6 };
std::vector<float> testVecB{ 0, 0, 0 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(
testVecA, kp::Tensor::TensorTypes::eHost) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(testVecB) };
mgr.rebuild({ tensorA, tensorB }, false);
// Manually copy data into host memory of Tensor
tensorA->mapDataIntoHostMemory();
std::shared_ptr<kp::TensorT<float>> tensorA =
mgr.tensor(testVecA, kp::Tensor::TensorTypes::eHost);
std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor(testVecB);
// Only calling sync on device type tensor
mgr.evalOpDefault<kp::OpTensorSyncDevice>({ tensorB });
mgr.sequence()->eval<kp::OpTensorSyncDevice>({ tensorA, tensorB });
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
mgr.evalOpDefault<kp::OpTensorCopy>({ tensorA, tensorB });
mgr.sequence()->eval<kp::OpTensorCopy>({ tensorA, tensorB });
EXPECT_EQ(tensorA->data(), tensorB->data());
EXPECT_EQ(tensorA->vector(), tensorB->vector());
// Making sure the GPU holds the same data
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorB });
EXPECT_EQ(tensorA->data(), tensorB->data());
// Making sure the GPU holds the same vector
mgr.sequence()->eval<kp::OpTensorSyncLocal>({ tensorB });
EXPECT_EQ(tensorA->vector(), tensorB->vector());
}
TEST(TestOpTensorCopy, CopyHostToHostTensor)
@ -127,23 +119,23 @@ TEST(TestOpTensorCopy, CopyHostToHostTensor)
std::vector<float> testVecA{ 5, 6, 7 };
std::vector<float> testVecB{ 0, 0, 0 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(
testVecA, kp::Tensor::TensorTypes::eHost) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(
testVecB, kp::Tensor::TensorTypes::eHost) };
mgr.rebuild({ tensorA, tensorB });
std::shared_ptr<kp::TensorT<float>> tensorA =
mgr.tensor(testVecA, kp::Tensor::TensorTypes::eHost);
std::shared_ptr<kp::TensorT<float>> tensorB =
mgr.tensor(testVecB, kp::Tensor::TensorTypes::eHost);
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
mgr.evalOpDefault<kp::OpTensorCopy>({ tensorA, tensorB });
mgr.sequence()
->eval<kp::OpTensorSyncDevice>({ tensorA })
->eval<kp::OpTensorCopy>({ tensorA, tensorB });
EXPECT_EQ(tensorA->data(), tensorB->data());
EXPECT_EQ(tensorA->vector(), tensorB->vector());
// Making sure the GPU holds the same data
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorB });
EXPECT_EQ(tensorA->data(), tensorB->data());
// Making sure the GPU holds the same vector
mgr.sequence()->eval<kp::OpTensorSyncLocal>({ tensorB });
EXPECT_EQ(tensorA->vector(), tensorB->vector());
}
TEST(TestOpTensorCopy, SingleTensorShouldFail)
@ -153,13 +145,11 @@ TEST(TestOpTensorCopy, SingleTensorShouldFail)
std::vector<float> testVecA{ 6, 7, 8 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(
testVecA, kp::Tensor::TensorTypes::eHost) };
mgr.rebuild({ tensorA }, false);
std::shared_ptr<kp::TensorT<float>> tensorA =
mgr.tensor(testVecA, kp::Tensor::TensorTypes::eHost);
EXPECT_TRUE(tensorA->isInit());
EXPECT_THROW(mgr.evalOpDefault<kp::OpTensorCopy>({ tensorA }),
EXPECT_THROW(mgr.sequence()->eval<kp::OpTensorCopy>({ tensorA }),
std::runtime_error);
}

View file

@ -6,135 +6,38 @@
TEST(TestOpTensorCreate, CreateSingleTensorSingleOp)
{
std::vector<float> testVecA{ 9, 8, 7 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
std::shared_ptr<kp::TensorT<float>> tensorA = nullptr;
{
kp::Manager mgr;
mgr.rebuild({ tensorA });
tensorA = mgr.tensor(testVecA);
EXPECT_TRUE(tensorA->isInit());
EXPECT_EQ(tensorA->data(), testVecA);
EXPECT_EQ(tensorA->vector(), testVecA);
}
EXPECT_FALSE(tensorA->isInit());
}
TEST(TestOpTensorCreate, CreateMultipleTensorSingleOp)
{
kp::Manager mgr;
std::vector<float> testVecA{ 9, 8, 7 };
std::vector<float> testVecB{ 6, 5, 4 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(testVecB) };
mgr.rebuild({ tensorA, tensorB });
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
EXPECT_EQ(tensorA->data(), testVecA);
EXPECT_EQ(tensorB->data(), testVecB);
}
TEST(TestOpTensorCreate, CreateMultipleTensorMultipleOp)
{
kp::Manager mgr;
std::vector<float> testVecA{ 9, 8, 7 };
std::vector<float> testVecB{ 6, 5, 4 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(testVecB) };
mgr.rebuild({ tensorA });
mgr.rebuild({ tensorB });
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
EXPECT_EQ(tensorA->data(), testVecA);
EXPECT_EQ(tensorB->data(), testVecB);
}
TEST(TestOpTensorCreate, TestTensorMemoryManagedByManagerDestroyed)
{
std::vector<float> testVecA{ 9, 8, 7 };
std::vector<float> testVecB{ 6, 5, 4 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(testVecB) };
{
kp::Manager mgr;
mgr.rebuild({ tensorA });
mgr.rebuild({ tensorB });
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
EXPECT_EQ(tensorA->data(), testVecA);
EXPECT_EQ(tensorB->data(), testVecB);
}
EXPECT_FALSE(tensorA->isInit());
EXPECT_FALSE(tensorB->isInit());
}
TEST(TestOpTensorCreate, TestTensorMemoryManagedByManagerNOTDestroyed)
{
std::vector<float> testVecA{ 9, 8, 7 };
std::vector<float> testVecB{ 6, 5, 4 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(testVecB) };
kp::Manager mgr;
{
mgr.rebuild({ tensorA });
mgr.rebuild({ tensorB });
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
EXPECT_EQ(tensorA->data(), testVecA);
EXPECT_EQ(tensorB->data(), testVecB);
}
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
}
TEST(TestOpTensorCreate, NoErrorIfTensorFreedBefore)
{
std::vector<float> testVecA{ 9, 8, 7 };
std::vector<float> testVecB{ 6, 5, 4 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(testVecB) };
kp::Manager mgr;
mgr.rebuild({ tensorA });
mgr.rebuild({ tensorB });
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor(testVecA);
std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor(testVecB);
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
EXPECT_EQ(tensorA->vector(), testVecA);
EXPECT_EQ(tensorB->vector(), testVecB);
EXPECT_EQ(tensorA->data(), testVecA);
EXPECT_EQ(tensorB->data(), testVecB);
tensorA->destroy();
tensorB->destroy();
tensorA->freeMemoryDestroyGPUResources();
tensorB->freeMemoryDestroyGPUResources();
EXPECT_FALSE(tensorA->isInit());
EXPECT_FALSE(tensorB->isInit());
}
@ -143,12 +46,10 @@ TEST(TestOpTensorCreate, ExceptionOnZeroSizeTensor)
{
std::vector<float> testVecA;
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecA) };
kp::Manager mgr;
try {
mgr.rebuild({ tensorA });
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor(testVecA);
} catch (const std::runtime_error& err) {
// check exception
ASSERT_TRUE(std::string(err.what()).find("zero-sized") !=

View file

@ -11,19 +11,17 @@ TEST(TestOpTensorSync, SyncToDeviceMemorySingleTensor)
std::vector<float> testVecPreA{ 0, 0, 0 };
std::vector<float> testVecPostA{ 9, 8, 7 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(testVecPreA) };
mgr.rebuild({ tensorA }, false);
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor(testVecPreA);
EXPECT_TRUE(tensorA->isInit());
tensorA->setData(testVecPostA);
mgr.evalOpDefault<kp::OpTensorSyncDevice>({ tensorA });
mgr.sequence()->eval<kp::OpTensorSyncDevice>({ tensorA });
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA });
mgr.sequence()->eval<kp::OpTensorSyncLocal>({ tensorA });
EXPECT_EQ(tensorA->data(), testVecPostA);
EXPECT_EQ(tensorA->vector(), testVecPostA);
}
TEST(TestOpTensorSync, SyncToDeviceMemoryMultiTensor)
@ -33,11 +31,9 @@ TEST(TestOpTensorSync, SyncToDeviceMemoryMultiTensor)
std::vector<float> testVec{ 9, 8, 7 };
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> tensorC{ new kp::Tensor({ 0, 0, 0 }) };
mgr.rebuild({ tensorA, tensorB, tensorC }, false);
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor({ 0, 0, 0 });
std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor({ 0, 0, 0 });
std::shared_ptr<kp::TensorT<float>> tensorC = mgr.tensor({ 0, 0, 0 });
EXPECT_TRUE(tensorA->isInit());
EXPECT_TRUE(tensorB->isInit());
@ -45,13 +41,13 @@ TEST(TestOpTensorSync, SyncToDeviceMemoryMultiTensor)
tensorA->setData(testVec);
mgr.evalOpDefault<kp::OpTensorSyncDevice>({ tensorA });
mgr.sequence()->eval<kp::OpTensorSyncDevice>({ tensorA });
mgr.evalOpDefault<kp::OpTensorCopy>({ tensorA, tensorB, tensorC });
mgr.sequence()->eval<kp::OpTensorCopy>({ tensorA, tensorB, tensorC });
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB, tensorC });
mgr.sequence()->eval<kp::OpTensorSyncLocal>({ tensorA, tensorB, tensorC });
EXPECT_EQ(tensorA->data(), testVec);
EXPECT_EQ(tensorB->data(), testVec);
EXPECT_EQ(tensorC->data(), testVec);
EXPECT_EQ(tensorA->vector(), testVec);
EXPECT_EQ(tensorB->vector(), testVec);
EXPECT_EQ(tensorC->vector(), testVec);
}

135
test/TestPushConstant.cpp Normal file
View file

@ -0,0 +1,135 @@
#include "gtest/gtest.h"
#include "kompute/Kompute.hpp"
#include "fmt/ranges.h"
TEST(TestPushConstants, TestConstantsAlgoDispatchOverride)
{
{
std::string shader(R"(
#version 450
layout(push_constant) uniform PushConstants {
float x;
float y;
float z;
} pcs;
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
pa[0] += pcs.x;
pa[1] += pcs.y;
pa[2] += pcs.z;
})");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
std::shared_ptr<kp::TensorT<float>> tensor =
mgr.tensor({ 0, 0, 0 });
std::shared_ptr<kp::Algorithm> algo = mgr.algorithm(
{ tensor }, spirv, kp::Workgroup({ 1 }), {}, { 0.0, 0.0, 0.0 });
sq = mgr.sequence()->eval<kp::OpTensorSyncDevice>({ tensor });
// We need to run this in sequence to avoid race condition
// We can't use atomicAdd as swiftshader doesn't support it for
// float
sq->eval<kp::OpAlgoDispatch>(algo, kp::Constants{ 0.1, 0.2, 0.3 });
sq->eval<kp::OpAlgoDispatch>(algo, kp::Constants{ 0.3, 0.2, 0.1 });
sq->eval<kp::OpTensorSyncLocal>({ tensor });
EXPECT_EQ(tensor->vector(), kp::Constants({ 0.4, 0.4, 0.4 }));
}
}
}
TEST(TestPushConstants, TestConstantsAlgoDispatchNoOverride)
{
{
std::string shader(R"(
#version 450
layout(push_constant) uniform PushConstants {
float x;
float y;
float z;
} pcs;
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
pa[0] += pcs.x;
pa[1] += pcs.y;
pa[2] += pcs.z;
})");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
std::shared_ptr<kp::TensorT<float>> tensor =
mgr.tensor({ 0, 0, 0 });
std::shared_ptr<kp::Algorithm> algo = mgr.algorithm(
{ tensor }, spirv, kp::Workgroup({ 1 }), {}, { 0.1, 0.2, 0.3 });
sq = mgr.sequence()->eval<kp::OpTensorSyncDevice>({ tensor });
// We need to run this in sequence to avoid race condition
// We can't use atomicAdd as swiftshader doesn't support it for
// float
sq->eval<kp::OpAlgoDispatch>(algo);
sq->eval<kp::OpAlgoDispatch>(algo, kp::Constants{ 0.3, 0.2, 0.1 });
sq->eval<kp::OpTensorSyncLocal>({ tensor });
EXPECT_EQ(tensor->vector(), kp::Constants({ 0.4, 0.4, 0.4 }));
}
}
}
TEST(TestPushConstants, TestConstantsWrongSize)
{
{
std::string shader(R"(
#version 450
layout(push_constant) uniform PushConstants {
float x;
float y;
float z;
} pcs;
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
pa[0] += pcs.x;
pa[1] += pcs.y;
pa[2] += pcs.z;
})");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
std::shared_ptr<kp::TensorT<float>> tensor =
mgr.tensor({ 0, 0, 0 });
std::shared_ptr<kp::Algorithm> algo = mgr.algorithm(
{ tensor }, spirv, kp::Workgroup({ 1 }), {}, { 0.0 });
sq = mgr.sequence()->record<kp::OpTensorSyncDevice>({ tensor });
EXPECT_THROW(sq->record<kp::OpAlgoDispatch>(
algo, kp::Constants{ 0.1, 0.2, 0.3 }),
std::runtime_error);
}
}
}

View file

@ -3,28 +3,6 @@
#include "kompute/Kompute.hpp"
TEST(TestSequence, CmdBufSequenceBeginEnd)
{
kp::Manager mgr;
{
std::shared_ptr<kp::Sequence> sq =
mgr.sequence("newSequence");
EXPECT_TRUE(sq->eval());
EXPECT_TRUE(!sq->isRecording());
EXPECT_TRUE(sq->begin());
EXPECT_TRUE(sq->isRecording());
EXPECT_TRUE(!sq->begin());
EXPECT_TRUE(sq->isRecording());
EXPECT_TRUE(sq->end());
EXPECT_TRUE(!sq->isRecording());
EXPECT_TRUE(!sq->end());
EXPECT_TRUE(!sq->isRecording());
EXPECT_TRUE(sq->eval());
}
}
TEST(TestSequence, SequenceDestructorViaManager)
{
std::shared_ptr<kp::Sequence> sq = nullptr;
@ -32,7 +10,7 @@ TEST(TestSequence, SequenceDestructorViaManager)
{
kp::Manager mgr;
sq = mgr.sequence("newSequence");
sq = mgr.sequence();
EXPECT_TRUE(sq->isInit());
}
@ -40,3 +18,115 @@ TEST(TestSequence, SequenceDestructorViaManager)
EXPECT_FALSE(sq->isInit());
}
TEST(TestSequence, SequenceDestructorOutsideManagerExplicit)
{
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
sq = mgr.sequence();
EXPECT_TRUE(sq->isInit());
sq->destroy();
EXPECT_FALSE(sq->isInit());
}
EXPECT_FALSE(sq->isInit());
}
TEST(TestSequence, SequenceDestructorOutsideManagerImplicit)
{
kp::Manager mgr;
std::weak_ptr<kp::Sequence> sqWeak;
{
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
sqWeak = sq;
EXPECT_TRUE(sq->isInit());
}
EXPECT_FALSE(sqWeak.lock());
}
TEST(TestSequence, RerecordSequence)
{
kp::Manager mgr;
std::shared_ptr<kp::Sequence> sq = mgr.sequence();
std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor({ 1, 2, 3 });
std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor({ 2, 2, 2 });
std::shared_ptr<kp::TensorT<float>> tensorOut = mgr.tensor({ 0, 0, 0 });
sq->eval<kp::OpTensorSyncDevice>({ tensorA, tensorB, tensorOut });
std::vector<uint32_t> spirv = kp::Shader::compileSource(R"(
#version 450
layout (local_size_x = 1) in;
// The input tensors bind index is relative to index in parameter passed
layout(set = 0, binding = 0) buffer bina { float tina[]; };
layout(set = 0, binding = 1) buffer binb { float tinb[]; };
layout(set = 0, binding = 2) buffer bout { float tout[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
tout[index] = tina[index] * tinb[index];
}
)");
std::shared_ptr<kp::Algorithm> algo =
mgr.algorithm({ tensorA, tensorB, tensorOut }, spirv);
sq->record<kp::OpAlgoDispatch>(algo)->record<kp::OpTensorSyncLocal>(
{ tensorA, tensorB, tensorOut });
sq->eval();
EXPECT_EQ(tensorOut->vector(), std::vector<float>({ 2, 4, 6 }));
algo->rebuild({ tensorOut, tensorA, tensorB }, spirv);
// Refresh and trigger a rerecord
sq->rerecord();
sq->eval();
EXPECT_EQ(tensorB->vector(), std::vector<float>({ 2, 8, 18 }));
}
TEST(TestSequence, SequenceTimestamps)
{
kp::Manager mgr;
std::shared_ptr<kp::Tensor> tensorA = mgr.tensor({ 0, 0, 0 });
std::string shader(R"(
#version 450
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = pa[index] + 1;
})");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
auto seq = mgr.sequence(0, 100); // 100 timestamps
seq->record<kp::OpTensorSyncDevice>({ tensorA })
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record<kp::OpTensorSyncLocal>({ tensorA })
->eval();
const std::vector<uint64_t> timestamps = seq->getTimestamps();
EXPECT_EQ(timestamps.size(),
6); // 1 timestamp at start + 1 after each operation
}

View file

@ -24,34 +24,43 @@ static const std::string shaderString = (R"(
}
)");
void compileShaderWithGivenResources(const std::string shaderString, const TBuiltInResource resources) {
kp::Shader::compile_source(shaderString, std::string("main"), std::vector<std::pair<std::string,std::string>>({}), resources);
void
compileShaderWithGivenResources(const std::string shaderString,
const TBuiltInResource resources)
{
kp::Shader::compileSource(
shaderString,
std::string("main"),
std::vector<std::pair<std::string, std::string>>({}),
resources);
}
TEST(TestShaderResources, TestNoMaxLight)
{
TBuiltInResource noMaxLightResources = kp::defaultResource;
noMaxLightResources.maxLights=0;
EXPECT_NO_THROW(compileShaderWithGivenResources(shaderString, noMaxLightResources));
}
TBuiltInResource noMaxLightResources = kp::Shader::defaultResource;
noMaxLightResources.maxLights = 0;
EXPECT_NO_THROW(
compileShaderWithGivenResources(shaderString, noMaxLightResources));
}
TEST(TestShaderResources, TestSmallComputeWorkGroupSizeX)
{
TBuiltInResource smallComputeWorkGroupSizeXResources = kp::defaultResource;
smallComputeWorkGroupSizeXResources.maxComputeWorkGroupSizeX=0;
ASSERT_THROW(compileShaderWithGivenResources(shaderString, smallComputeWorkGroupSizeXResources), std::runtime_error);
}
TBuiltInResource smallComputeWorkGroupSizeXResources =
kp::Shader::defaultResource;
smallComputeWorkGroupSizeXResources.maxComputeWorkGroupSizeX = 0;
ASSERT_THROW(compileShaderWithGivenResources(
shaderString, smallComputeWorkGroupSizeXResources),
std::runtime_error);
}
TEST(TestShaderResources, TestNoWhileLoopLimit)
{
TBuiltInResource noWhileLoopLimitResources = kp::defaultResource;
noWhileLoopLimitResources.limits.whileLoops=0;
ASSERT_THROW(compileShaderWithGivenResources(shaderString, noWhileLoopLimitResources), std::runtime_error);
}
TBuiltInResource noWhileLoopLimitResources = kp::Shader::defaultResource;
noWhileLoopLimitResources.limits.whileLoops = 0;
ASSERT_THROW(
compileShaderWithGivenResources(shaderString, noWhileLoopLimitResources),
std::runtime_error);
}

View file

@ -4,46 +4,48 @@
TEST(TestSpecializationConstants, TestTwoConstants)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor({ 0, 0, 0 }) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor({ 0, 0, 0 }) };
std::string shader(R"(
#version 450
layout (constant_id = 0) const float cOne = 1;
layout (constant_id = 1) const float cTwo = 1;
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
layout(set = 0, binding = 1) buffer b { float pb[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = cOne;
pb[index] = cTwo;
})");
{
std::string shader(R"(
#version 450
layout (constant_id = 0) const float cOne = 1;
layout (constant_id = 1) const float cTwo = 1;
layout (local_size_x = 1) in;
layout(set = 0, binding = 0) buffer a { float pa[]; };
layout(set = 0, binding = 1) buffer b { float pb[]; };
void main() {
uint index = gl_GlobalInvocationID.x;
pa[index] = cOne;
pb[index] = cTwo;
})");
std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
mgr.rebuild({ tensorA, tensorB });
std::shared_ptr<kp::TensorT<float>> tensorA =
mgr.tensor({ 0, 0, 0 });
std::shared_ptr<kp::TensorT<float>> tensorB =
mgr.tensor({ 0, 0, 0 });
sq = mgr.sequence();
std::vector<std::shared_ptr<kp::Tensor>> params = { tensorA,
tensorB };
auto spec = kp::Constants({5.0, 0.3});
kp::Constants spec = kp::Constants({ 5.0, 0.3 });
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA, tensorB },
kp::Shader::compile_source(shader),
kp::Workgroup(), spec);
sq->end();
std::shared_ptr<kp::Algorithm> algo =
mgr.algorithm(params, spirv, {}, spec);
sq->eval();
sq = mgr.sequence()
->record<kp::OpTensorSyncDevice>(params)
->record<kp::OpAlgoDispatch>(algo)
->record<kp::OpTensorSyncLocal>(params)
->eval();
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB });
EXPECT_EQ(tensorA->vector(), std::vector<float>({ 5, 5, 5 }));
EXPECT_EQ(tensorB->vector(), std::vector<float>({ 0.3, 0.3, 0.3 }));
}
}
EXPECT_EQ(tensorA->data(), std::vector<float>({ 5, 5, 5 }));
EXPECT_EQ(tensorB->data(), std::vector<float>({ 0.3, 0.3, 0.3 }));
}

View file

@ -5,36 +5,9 @@
TEST(TestTensor, ConstructorData)
{
std::vector<float> vec{ 0, 1, 2 };
kp::Tensor tensor(vec);
EXPECT_EQ(tensor.size(), vec.size());
EXPECT_EQ(tensor.data(), vec);
}
TEST(TestTensor, CopyFromHostData)
{
std::vector<float> vecA{ 0, 1, 2 };
std::vector<float> vecB{ 0, 0, 0 };
std::shared_ptr<kp::Tensor> tensorA =
std::make_shared<kp::Tensor>(vecA, kp::Tensor::TensorTypes::eHost);
std::shared_ptr<kp::Tensor> tensorB =
std::make_shared<kp::Tensor>(vecB, kp::Tensor::TensorTypes::eHost);
kp::Manager mgr;
mgr.rebuild({ tensorA, tensorB });
if (std::shared_ptr<kp::Sequence> sq =
mgr.sequence("new")) {
sq->begin();
sq->record<kp::OpTensorCopy>({ tensorA, tensorB });
sq->end();
sq->eval();
}
EXPECT_EQ(tensorA->data(), tensorB->data());
std::vector<float> vec{ 0, 1, 2 };
std::shared_ptr<kp::TensorT<float>> tensor = mgr.tensor(vec);
EXPECT_EQ(tensor->size(), vec.size());
EXPECT_EQ(tensor->vector(), vec);
}

View file

@ -5,44 +5,64 @@
#include "kompute_test/shaders/shadertest_workgroup.hpp"
TEST(TestWorkgroup, TestSimpleWorkgroup)
{
std::shared_ptr<kp::Tensor> tensorA{ new kp::Tensor(std::vector<float>(16 * 8)) };
std::shared_ptr<kp::Tensor> tensorB{ new kp::Tensor(std::vector<float>(16 * 8)) };
std::shared_ptr<kp::TensorT<float>> tensorA = nullptr;
std::shared_ptr<kp::TensorT<float>> tensorB = nullptr;
{
std::shared_ptr<kp::Sequence> sq = nullptr;
{
kp::Manager mgr;
mgr.rebuild({ tensorA, tensorB });
tensorA = mgr.tensor(std::vector<float>(16 * 8));
tensorB = mgr.tensor(std::vector<float>(16 * 8));
kp::Workgroup workgroup = {16, 8, 1};
std::vector<std::shared_ptr<kp::Tensor>> params = { tensorA,
tensorB };
std::vector<uint32_t> spirv(
(uint32_t*)
kp::shader_data::test_shaders_glsl_test_workgroup_comp_spv,
(uint32_t*)(kp::shader_data::
test_shaders_glsl_test_workgroup_comp_spv +
kp::shader_data::
test_shaders_glsl_test_workgroup_comp_spv_len));
kp::Workgroup workgroup = { 16, 8, 1 };
std::shared_ptr<kp::Algorithm> algorithm =
mgr.algorithm(params, spirv, workgroup);
sq = mgr.sequence();
sq->begin();
sq->record<kp::OpAlgoBase>(
{ tensorA, tensorB },
std::vector<uint32_t>(
(uint32_t*)kp::shader_data::test_shaders_glsl_test_workgroup_comp_spv,
(uint32_t*)(kp::shader_data::test_shaders_glsl_test_workgroup_comp_spv +
kp::shader_data::test_shaders_glsl_test_workgroup_comp_spv_len)),
workgroup);
sq->end();
sq->record<kp::OpTensorSyncDevice>(params);
sq->record<kp::OpAlgoDispatch>(algorithm);
sq->record<kp::OpTensorSyncLocal>(params);
sq->eval();
mgr.evalOpDefault<kp::OpTensorSyncLocal>({ tensorA, tensorB });
std::vector<float> expectedA = {
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7,
8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9,
10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11,
12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13,
14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15
};
std::vector<float> expectedB = {
0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2,
3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5,
6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0,
1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3,
4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6,
7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1,
2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7
};
EXPECT_EQ(tensorA->vector(), expectedA);
EXPECT_EQ(tensorB->vector(), expectedB);
}
}
std::vector<float> expectedA = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15};
std::vector<float> expectedB = { 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7 };
EXPECT_EQ(tensorA->data(), expectedA);
EXPECT_EQ(tensorB->data(), expectedB);
}

View file

@ -1,6 +1,6 @@
{
"name": "example",
"version-string": "0.6.0",
"version-string": "0.7.0",
"dependencies": [
"fmt",
"spdlog",