- Add support for compiling models with multiple input tensors and multiple
output tensors
- Improve accuracy of LSTM unit
- Change behavior of VOLLO_FP32_ROUND in Vollo RT so that it's enabled by
default; set to 0 to truncate f32 inputs
- Change
vollo-tool reset --pci
to vollo-tool reset-pci
- Expand supported PyTorch stacking and concatenating operations:
concatenate
, stack
, vstack
, hstack
, row_stack
, column_stack
,
dstack
- Expand supported PyTorch transposition operations:
permute
,
swapdims
, swapaxes
, t
, T
, mT
- Initial support for
torch.nn.LSTM
- Performance improvements in VM simulation, especially for LSTMs
- Improve error messages from
vollo_torch.fx.nnir.to_nnir
for unsupported
field accesses (getattr
) in PyTorch model
- Add
f32_round
argument to vollo_compiler.VM.run
methods to choose whether
to round or truncate f32 inputs (previously always rounded)
- Fix handling of non-contiguous input arrays/tensors in Vollo RT Python
bindings
- Fix bug in
streaming_transform
for tensor sum reductions
- Runtime/bitstream optimisation for small inputs (using MMIO instead of DMA)
- Scheduling and architecture optimisations
- Add
reset
subcommand to vollo-tool
- Support ReLU via
torch.relu
- Separate bitstreams from Vollo SDK
- Add c2b64d hw config to support models up to 8M parameters (bitstream and compiler)
- Improve compiler error messages
- Fix example build
- Fix for incorrect
vollo_rt_accelerator_num_cores
introduced in 20.0.1
vollo_rt_add_vm
to test the vollo-rt
API without an accelerator
vollo_rt_load_program_from_buffer
and vollo_compiler.Program.{save,load}_bytes
- Add
vollo_torch.nn.RecurrentStateLSTM
for modelling streaming LSTM models across forward passes
- Codegen fix for
vollo_torch.nn.Scan
- Fix incorrect input validation for
torch.sum
layers
- Change vollo-rt example to compile with older C compilers
- Add support for LayerNorm
- Add support for RMSNorm
- Add support for sqrt operations (
torch.sqrt
and torch.rsqrt
)
- Add support for summing over the data dimension
- Add
cycle_count_per_inference
and compute_duration_per_inference_us
Program methods
- Add support for a wider range of torch arithmetic operation aliases
- Downgrade glibc dependency to support systems with glibc >=2.17
- Add support for
torch.div
, torch.Tensor.div
- Fix compiler code generation bug for division
- Add support for scalars on the left of division
- Add support for
Reciprocal
node in ONNX frontend
- Add support for division by non-constant tensors
- Fix slicing in ONNX frontend
- Fix compiler bug in constant folding
- Add support for partial updates of input data on the accelerator
- VM simulates Vollo accelerator bit-accurately:
bf16_precision
argument
renamed to bit_accurate
and enabled by default
vollo-tool
includes license self-service
- Performance improvements due to DMA optimization
- Add
optimize_transforms
option to the compiler to improve program schedule in some cases
- Add fallback to Vollo RT and vollo-tool for when AVX is not available
- Vollo RT support for using raw DMA buffers to skip IO copy
- Vollo RT remove redundant/noisy warnings on error: it is the user's responsibility to check returned errors
- Compiler optimization for Where nodes
- Compiler scheduling optimizations
- Vollo IP Core public documentation
- Fix vollo-tool compatibility with older bitstreams
- New DMA engine that reduces IO latencies by ~1.3us
- Initial support for non-streaming LSTM
- Vollo IP Core now available on request
- Add C library for configuring IP Core:
vollo-cfg
- Support for slicing/concatenation in the middle of models
- Support for BatchNorm nodes
- Support for Scan/LSTMCell nodes
- Add
--io-only
option to vollo-onnx
- Add
program-metadata
command to vollo-tool
- Fix compiler bug with transposing streaming dimension
- Fix accelerator bug in initial state of streaming models
- Support for filtering dropout layers
- Instruction packing improvements
- LSTM performance improvement
- Improvements to weight sharing
- Support for multi-model programs
- Provide Python bindings to Vollo RT:
vollo_rt
- Improved support and error messages for tensor indexing in compiler
- The unweave transform is now automatic
- Support for LSTM nodes in ONNX frontend
- Support for squeezing, unsqueezing, reduce sum, using
unweave
transformation
- Improved error reporting in
vollo_torch
lowering to NNIR
vollo-torch
fix type hints being incompatible with Python 3.7/3.8
vollo-rt.h
fix namespacing issue (error_t
-> vollo_rt_error_t
)
- Runtime optimisations
- Added IO only benchmarks
- Initial support for ONNX models in compiler
- Support for LSTM nodes
- Improved error reporting in compiler
- Compiler API changes
- New runtime API with access to model metadata
- HW optimisations (pointwise operations)
- IA840F support
- Support for scalar (
int
, float
) literals in pointwise operations in
vollo-torch
.
- Architectural changes in bitstream to support compiler
- Reduced latency from reduced core to core communication in the bitstream
- Add general model compiler and VM simulation with Python bindings in
vollo-python
- Add PyTorch frontend to model compiler in
vollo-torch