Wrap A CUDA Application In Golang
Wrap CUDA Application In Golang
In today’s standards, applications very rarely stay as standalone, and instead the usual step further is to give the user an access point through the internet, through an API.
Once you got a C/C++ app that uses a kernel with CUDA, how can we make it accessible through an API? We could code a server in C++, but is that the most maintainable option?
Personally, I prefer to use a language that is more suited for internet access endpoints. Due to some other requirements I will not mention, I chose Go. But can we call an executable in Go? The answer is yes. It can be achieved using CGo.
Create A Shared Library
In this text I assume you got a CMake build pipeline to create an executable file for your project. For the wrapping, we need to modify the CMake files in order to create the executable (for running the standalone) and create a shared library as well (for go). In Linux, a shared library has the .so extension, while in Windows it stands as .dll. Most servers are cheaper if you run a Linux machine, so we will go on with the Linux way.
CGo calls C code, but since CUDA is compiled with C++ syntax, we need to basically create a C interface that can call C++ code. CGo calls this interface, and this interface calls C++/CUDA code.
Adjust The Code
The project that we will wrap has the following structure:
cuda/
├─ myfolder1/
| ├─ MyFile1.cu
| ├─ CMakeLists.txt
├─ myfolder2/
| ├─ MyFile2.cu
| ├─ CMakeLists.txt
├─ mylib1/
| ├─ Lib1.h
| ├─ Lib1.cpp
| ├─ CMakeLists.txt
├─ main.cu
├─ CMakeLists.txt
main.cu
// external dependencies
#include <cuda_runtime.h>
#include <ZXing/BarcodeFormat.h>
#include <opencv2/opencv.hpp>
// internal dependencies
#include "myfolder1/MyFile1.cu"
#include "myfolder2/MyFile2.cu"
#include "mylib1/Lib1.h"
using namespace cv;
using namespace ZXing;
__global__ myKernel(){
// defined in MyFile1.cu
MyClass1 obj1();
obj1.someFunc();
// defined in MyFile2.cu
MyClass2 obj2();
obj2.someFunc();
}
void generate() {
// defined in Lib1.h
MyClass::myFunc();
myKernel<<<1, 10>>>();
}
int main() {
generate();
}
It’s important to note that main.cu depends on MyFile1.cu and MyFile2.cu to work, but also on OpenCV and ZXing (C++ libraries). The function we want to be able to call from Go is generate().
- The first thing to do is to create a header file for
main.cu, for examplemain_interface.hpp. It’s important to use the.hppextension so that it’s compiled with the C++ compiler, since it’s a header file for CUDA code, which uses C++ syntax.
cuda/
├─ myfolder1/
| ├─ MyFile1.cu
| ├─ CMakeLists.txt
├─ myfolder2/
| ├─ MyFile2.cu
| ├─ CMakeLists.txt
├─ mylib1/
| ├─ Lib1.h
| ├─ Lib1.cpp
| ├─ CMakeLists.txt
├─ main.cu
├─ main_interface.hpp <-----
├─ CMakeLists.txt
- Include the
extern "C"directive in the function declaration between preprocessor directives to ensure it’s recognized only by C++ compiler. Theexternlinkage especifier is not recognized by C, so that’s why we enclose it with preprocessor directives:
main_interface.hpp
#ifndef MAIN_INTERFACE_H
#define MAIN_INTERFACE_H
#ifdef __cplusplus
extern "C" {
#endif
void generate();
#ifdef __cplusplus
}
#endif
#endif
- Since
main.cunow is the implementation ofmain_interface.hppheader file, we need to include the header inmain.cuand enclosegenerate()withextern "C"linkage specifier. This time, we don’t need preprocessor directives like#ifdefused above because we know thatmain.cuwill be compiled by nvcc which recognizes C++ syntax.
main.cu
// external dependencies
#include <cuda_runtime.h>
#include <ZXing/BarcodeFormat.h>
#include <opencv2/opencv.hpp>
// internal dependencies
#include "myfolder1/MyFile1.cu"
#include "myfolder2/MyFile2.cu"
#include "main_interface.hpp" // <----- add this
using namespace cv;
using namespace ZXing;
__global__ myKernel(){
// defined in MyFile1.cu
MyClass1 obj1();
obj1.someFunc();
// defined in MyFile2.cu
MyClass2 obj2();
obj2.someFunc();
}
extern "C" {
void generate() {
myKernel<<<1, 10>>>();
}
}
int main() {
generate();
}
The extern "C" makes the compiler not mangle its function name, which is often done in C++ compilation, and by doing this we make generate() visible to C code.
- Create a wrapper folder with some C files in the folder where
main.cuis.
mkdir wrapper
touch wrapper/Wrapper.c
touch wrapper/Wrapper.h
touch wrapper/CMakeLists.txt
and thus, the project will look like this:
cuda/
├─ myfolder1/
| ├─ MyFile1.cu
| ├─ CMakeLists.txt
├─ myfolder2/
| ├─ MyFile2.cu
| ├─ CMakeLists.txt
├─ mylib1/
| ├─ Lib1.h
| ├─ Lib1.cpp
| ├─ CMakeLists.txt
├─ wrapper/
| ├─ Wrapper.c
| ├─ Wrapper.h
| ├─ CMakeLists.txt
├─ main.cu
├─ main_interface.hpp
├─ CMakeLists.txt
This folder will contain the C interface that we will include in our Go code.
- Provide the C code necessary for it to be a library that calls our CUDA function
generate().
wrapper/Wrapper.h
#ifndef WRAPPER_H
#define WRAPPER_H
void WrapGenerate();
#endif
wrapper/Wrapper.c
#include "../main_interface.hpp"
#include "Wrapper.h"
void WrapGenerate() {
generate();
}
In this way, WrapGenerate(), which is visible to CGo, calls the function generate(), declared in ../main_interface.hpp but implemented in main.cu.
Adjust CMake
As mentioned before, we need to stop generating an executable file (.o) and generate a shared library (.so) instead. For this, we need to modify our CMake files.
- Add
add_librarywith the key wordSHAREDinstead ofadd_executablein thewrapper/CMakeLists.txt:
wrapper/CMakeLists.txt
find_package(CUDA REQUIRED)
add_library(wrapper SHARED
${CMAKE_SOURCE_DIR}/main_interface.hpp
${CMAKE_SOURCE_DIR}/main.cu
${CMAKE_SOURCE_DIR}/myfolder1/MyFile1.cu
${CMAKE_SOURCE_DIR}/myfolder2/MyFile2.cu
${CMAKE_SOURCE_DIR}/wrapper/Wrapper.c
${CMAKE_SOURCE_DIR}/wrapper/Wrapper.h
)
set_target_properties(wrapper PROPERTIES ENABLE_EXPORTS 1)
target_include_directories(wrapper PUBLIC ${CMAKE_SOURCE_DIR})
target_link_libraries(wrapper PRIVATE ${OpenCV_LIBS} ZXing::ZXing mylib1)
Here, it’s crucial to include in the add_library command all files that main.cu depends on, or else the shared library will not include generate() because its implementation’s dependencies are unknown.
./CMakeLists.txt
cmake_minimum_required(VERSION 3.8 FATAL_ERROR)
project(main LANGUAGES C CXX CUDA)
set(CMAKE_CUDA_ARCHITECTURES 70)
find_package(CUDA REQUIRED)
find_package(ZXing REQUIRED)
find_package(OpenCV REQUIRED)
add_subdirectory(myfolder1)
add_subdirectory(myfolder2)
add_subdirectory(mylib1)
add_subdirectory(wrapper)
include_directories(${OpenCV_INCLUDE_DIRS})
include_directories(${CMAKE_SOURCE_DIR}/wrapper)
# all files that main.cu uses
add_executable(main
main_interface.hpp
main.cu
myfolder1/MyFile1.cu
myfolder2/MyFile2.cu
)
# libraries that main.cu uses
target_link_libraries(main ${OpenCV_LIBS} ZXing::ZXing mylib1)
set_property(TARGET mylib1 PROPERTY POSITION_INDEPENDENT_CODE ON)
SET (CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -arch=sm_70 -std=c++17")
And, just to leave an example of a CMake file to generate mylib1 as a separated internal library (for aws calls, for example):
mylib1/CMakeLists.txt
add_library(mylib1 Lib1.cpp Lib1.h)
# external libs
target_link_libraries(mylib1 PRIVATE ${OpenCV_LIBS})
- Now we can generate the shared libraries. Go to the root of the project and type:
cmake .
make
This will generate one executable and one shared library:
cuda/
├─ myfolder1/
| ├─ MyFile1.cu
| ├─ CMakeLists.txt
├─ myfolder2/
| ├─ MyFile2.cu
| ├─ CMakeLists.txt
├─ mylib1/
| ├─ Lib1.h
| ├─ Lib1.cpp
| ├─ CMakeLists.txt
├─ wrapper/
| ├─ Wrapper.c
| ├─ Wrapper.h
| ├─ libwrapper.so <----- shared lib
| ├─ CMakeLists.txt
├─ main.cu
├─ main_interface.hpp
├─ main <----- exe
├─ CMakeLists.txt
The one we need to use in Go is wrapper/libwrapper.so. CGo can compile C projects, but since ours has CUDA/C++, compiling it with Go would be a pain. Therefore, we compile the shared libs first using CMake, and then we link the necessary .so file to our Go code.
CGo will look for generate() in the symbol table inside our .so library that we tell it to use. In order to make sure that the symbol for generate() is present, we can run the command:
nm -D wrapper/libwrapper.so | grep generate
and the output, if found, must be similar to this:
0000000000017584 T generate
where the T tells us that this symbol is indeed extracted from the text in the .so file. Something else means that the symbol is not found, and therefore CGo won’t find it. Tweak your CMakeLists.txt file until you find the symbol.
Since the symbol is found, we can proceed to our Go code.
Create A Go Project
- Create a folder in the same level as the
cuda/folder:
├─ cuda/
| ├─ myfolder1/
| ├─ myfolder2/
| ├─ mylib1/
| ├─ wrapper/
├─ go/
| ├─ main.go
- Add the necessary Go code so that our
libwrapper.sois linked, not compiled, by CGo.
package main
// #cgo CFLAGS: -I../cuda/wrapper
// #cgo LDFLAGS: -L../cuda/wrapper -lwrapper
// #include "Wrapper.h"
import "C"
func main() {
C.WrapGenerate()
}
The comments here are mandatory, since they are CGo directives. The -I flag determines the include directory where the Wrapper.h can be found. The -L determines the directory where the libwrapper.so can be found. The -lwrapper flag must have the structure -l plus the name of your .so file without the extension and the word lib.
- Compile your go app like this:
LD_LIBRARY_PATH=/home/Documents/my-project/cuda/wrapper go build
- Run the executable like this:
LD_LIBRARY_PATH=/home/Documents/my-project/cuda/wrapper ./wrapper
We specify LD_LIBRARY_PATH variable with the path where the libwrapper.so is located because the shared lib is not installed in any standard install location, such as /usr/local or /usr/lib, and so we need to tell the go-tool where the library is located.
When you run the go executable, you will now see your CUDA app output, if you got any, alongside your go output.
Also, the executable we created too as ./main is there so that you can run the C++ standalone app by itself, maybe useful for quicker debugging.