Key Features

Vollo accelerates machine learning inference for low latency streaming models typically found in financial trading or fraud detection systems such as:

Market predictions
Risk analysis
Anomaly detection
Portfolio optimisation

Vollo is able to process of range of models, including models which maintain state while streaming such as recurrent models and convolutional models.

Key characteristics of Vollo are:

Low latency inference of machine learning models, typically between 2-10μs.
High accuracy inference through use of Brain Floating Point 16 (bfloat16) numerical format.
High density processing in a 1U server form factor suitable for co-located server deployment.
Compiles a range of PyTorch models for use on the accelerator.

Vollo SDK User Guide

Key Features