Release Notes
23.0.0
This is an alpha release for the newly supported Alveo V80 which is not intended for production use. Performance on the V80 is not yet optimized.
- Add support for Alveo V80
- Add support for vfio-pci; use
load-kernel-driver.sh vfioto load it, required for V80 - Add lock to Vollo RT to prevent concurrent usage of the accelerator
- Add support for broadcasting non-constant tensors except along the data dimension
- Add grouped convolution support to
vollo_torch.nn.PaddedConv1d - Add support for reshape operations
- Improve VM cycle count estimates for Agilex devices
- Changes to the API of
vollo_torch.nn.Scan: thestepfunction now returns an output tensor and a separate state tensor instead of a single tensor; theforwardmethod now takes both aninput_axisand anoutput_axisinstead of a singleaxisargument
22.1.0
- Update compatibility with newer IA420F boards (IA420F-0015)
22.0.1
- Allow weights to be shared in multi-model programs
22.0.0
- Add support for compiling models with multiple input tensors and multiple output tensors
- Improve accuracy of LSTM unit
- Change behavior of VOLLO_FP32_ROUND in Vollo RT so that it's enabled by default; set to 0 to truncate f32 inputs
- Change
vollo-tool reset --pcitovollo-tool reset-pci - Expand supported PyTorch stacking and concatenating operations:
concatenate,stack,vstack,hstack,row_stack,column_stack,dstack - Expand supported PyTorch transposition operations:
permute,swapdims,swapaxes,t,T,mT
21.1.1
- Initial support for
torch.nn.LSTM - Performance improvements in VM simulation, especially for LSTMs
- Improve error messages from
vollo_torch.fx.nnir.to_nnirfor unsupported field accesses (getattr) in PyTorch model - Add
f32_roundargument tovollo_compiler.VM.runmethods to choose whether to round or truncate f32 inputs (previously always rounded) - Fix handling of non-contiguous input arrays/tensors in Vollo RT Python bindings
- Fix bug in
streaming_transformfor tensor sum reductions
21.1.0
- Support tanh
21.0.0
- Runtime/bitstream optimisation for small inputs (using MMIO instead of DMA)
- Scheduling and architecture optimisations
- Add
resetsubcommand tovollo-tool - Support ReLU via
torch.relu
20.0.3
- Separate bitstreams from Vollo SDK
- Add c2b64d hw config to support models up to 8M parameters (bitstream and compiler)
- Improve compiler error messages
- Fix example build
20.0.2
- Fix for incorrect
vollo_rt_accelerator_num_coresintroduced in 20.0.1
20.0.1
vollo_rt_add_vmto test thevollo-rtAPI without an acceleratorvollo_rt_load_program_from_bufferandvollo_compiler.Program.{save,load}_bytes- Add
vollo_torch.nn.RecurrentStateLSTMfor modelling streaming LSTM models across forward passes - Codegen fix for
vollo_torch.nn.Scan - Fix incorrect input validation for
torch.sumlayers - Change vollo-rt example to compile with older C compilers
20.0.0
- Add support for LayerNorm
- Add support for RMSNorm
- Add support for sqrt operations (
torch.sqrtandtorch.rsqrt) - Add support for summing over the data dimension
- Add
cycle_count_per_inferenceandcompute_duration_per_inference_usProgram methods - Add support for a wider range of torch arithmetic operation aliases
19.2.3
- Downgrade glibc dependency to support systems with glibc >=2.17
19.2.2
- Add support for
torch.div,torch.Tensor.div - Fix compiler code generation bug for division
19.2.1
- Add support for scalars on the left of division
- Add support for
Reciprocalnode in ONNX frontend
19.2.0
- Add support for division by non-constant tensors
- Fix slicing in ONNX frontend
19.1.1
- Fix compiler bug in constant folding
19.1.0
- Add support for partial updates of input data on the accelerator
- VM simulates Vollo accelerator bit-accurately:
bf16_precisionargument renamed tobit_accurateand enabled by default vollo-toolincludes license self-service- Performance improvements due to DMA optimization
18.0.2
- Add
optimize_transformsoption to the compiler to improve program schedule in some cases
18.0.1
- Add fallback to Vollo RT and vollo-tool for when AVX is not available
18.0.0
- Vollo RT support for using raw DMA buffers to skip IO copy
- Vollo RT remove redundant/noisy warnings on error: it is the user's responsibility to check returned errors
- Compiler optimization for Where nodes
- Compiler scheduling optimizations
- Vollo IP Core public documentation
0.17.1
- Fix vollo-tool compatibility with older bitstreams
0.17.0
- New DMA engine that reduces IO latencies by ~1.3us
- Initial support for non-streaming LSTM
0.16.0
- Vollo IP Core now available on request
- Add C library for configuring IP Core:
vollo-cfg - Support for slicing/concatenation in the middle of models
- Support for BatchNorm nodes
- Support for Scan/LSTMCell nodes
- Add
--io-onlyoption tovollo-onnx - Add
program-metadatacommand tovollo-tool - Fix compiler bug with transposing streaming dimension
- Fix accelerator bug in initial state of streaming models
0.15.0
- Accelerator bug fix
0.14.0
- Support for filtering dropout layers
- Instruction packing improvements
- LSTM performance improvement
- Improvements to weight sharing
0.13.0
- Support for multi-model programs
- Provide Python bindings to Vollo RT:
vollo_rt - Improved support and error messages for tensor indexing in compiler
- The unweave transform is now automatic
0.12.2
- Support for LSTM nodes in ONNX frontend
- Support for squeezing, unsqueezing, reduce sum, using
unweavetransformation - Improved error reporting in
vollo_torchlowering to NNIR
0.12.1
vollo-torchfix type hints being incompatible with Python 3.7/3.8vollo-rt.hfix namespacing issue (error_t->vollo_rt_error_t)- Runtime optimisations
- Added IO only benchmarks
0.12.0
- Initial support for ONNX models in compiler
- Support for LSTM nodes
- Improved error reporting in compiler
- Compiler API changes
- New runtime API with access to model metadata
- HW optimisations (pointwise operations)
- IA840F support
0.10.1
- Support for scalar (
int,float) literals in pointwise operations invollo-torch.
0.10.0
- Architectural changes in bitstream to support compiler
- Reduced latency from reduced core to core communication in the bitstream
- Add general model compiler and VM simulation with Python bindings in
vollo-python - Add PyTorch frontend to model compiler in
vollo-torch