Key Features
Vollo accelerates machine learning inference for low latency streaming models typically found in financial trading or fraud detection systems such as:
- Market predictions
- Risk analysis
- Anomaly detection
- Portfolio optimisation
Vollo is able to process of range of models, including models which maintain state while streaming such as convolutional models.
Key characteristics of Vollo are:
- Low latency inference of machine learning models, typically between 5-10μs.
- High accuracy inference through use of Brain Floating Point 16 (bfloat16) numerical format.
- High density processing in a 1U server form factor suitable for co-located server deployment.
- Compiles a range of PyTorch models for use on the accelerator.