Vollo Runtime
The Vollo runtime provides a low latency asynchronous inference API for timing critical inference requests on the Vollo accelerator.
A couple of example C programs that use the Vollo runtime API have been included in the
installation in the example/
directory.
In order to use the Vollo runtime you need to have an accelerator set up:
- A programmed FPGA
- A loaded kernel driver and an installed license
- Environment set up with
source setup.sh
Python API
The Vollo SDK includes Python bindings for the Vollo runtime. These can be more convenient than the C API for e.g. testing Vollo against PyTorch models.
The API for the Python bindings can be found here.
A small example of using the Python bindings is provided here.