Vollo Runtime

The Vollo runtime provides a low latency asynchronous inference API for timing critical inference requests on the Vollo accelerator.

A couple of example C programs that use the Vollo runtime API have been included in the installation in the example/ directory.

In order to use the Vollo runtime you need to have an accelerator set up:

Programmed Intel Agilex or AMD V80 accelerator
A loaded kernel driver and an installed license
Environment set up with source setup.sh

Python API

The Vollo SDK includes Python bindings for the Vollo runtime. These can be more convenient than the C API for e.g. testing Vollo against PyTorch models.

The API for the Python bindings can be found here.

A small example of using the Python bindings is provided here.

Vollo SDK User Guide

Vollo Runtime

Python API