Add experimental support for specifying which cores to allocate PyTorch
operations to using vollo_torch.CorePartition
Add support for specifying which cores to allocate models to in a multi-model
program by passing core_indices to vollo_compiler.ProgramBuilder.add_nnir
Optimize sigmoid and SiLU activation functions
Improve spaced latency for some stateful models that use dynamic weights
Reduce tensor RAM usage of state in stateful models
vollo_torch.Fp8Weights now errors if used on operations which require
bf16 weights, such as dynamic weights
Add Alveo V80LL bitstream and vollo_compiler.Config.v80ll_c6b32 hardware
config
Add support for Linear layers where the contracted dimension is not the
data dimension via the allow_dynamic_weights flag for
vollo_compiler.NNIR.to_program
Add support for multiple inputs to vollo_torch.nn.Scan
Add support for indexing with negative indices in: torch.stack,
torch.sum, torch.permute, torch.squeeze, torch.unsqueeze
Add support for torch.nn.functional.linear
Add optional bias argument to vollo_torch.nn.PaddedConv1d
Add inputs_precisions and output_precisions arguments to
vollo_torch.fx.nnir.to_nnir
vollo-compiler:
Add model_input_number_format and model_output_number_format methods
to vollo_compiler.Program
Add vollo_compiler.NumberFormat enum
vollo-rt C/C++ API
Add vollo_rt_add_job, vollo_rt_add_job_partial_update,
vollo_rt_model_input_format, vollo_rt_model_output_format,
vollo_rt_get_raw_buffer_bytes functions and number_format enum
vollo-rt Python bindings
Add add_job, add_job_f32, model_output_format methods to
vollo_rt.VolloRTContext
Memory usage and compilation time improvements in the compiler
Add quick_compile flag to vollo_compiler.NNIR.to_program for faster
compilation
Add max_sparse_entries option to vollo_compiler.NNIR.to_program to
configure the number of nonzero entries allowed in weights for non-standard
memory format MatMuls
Add token-info subcommand to vollo-tool license to show information about
a purchase token
Add info message to vollo-tool license redeem-device if the device being
redeemed for has been redeemed on an expired or nearly expiring token
Add initial support for Alveo V80, further performance optimisations still outstanding
Add support for Napatech NT400D11
Add support for vfio-pci; use load-kernel-driver.sh vfio to load it,
required for V80
Add lock to Vollo RT to prevent concurrent usage of the accelerator
Improve VM cycle count estimates for Agilex devices
Additional model support:
Add support for broadcasting non-constant tensors except along the data dimension
Add grouped convolution support to vollo_torch.nn.PaddedConv1d
Add support for reshape operations
Changes to the API of vollo_torch.nn.Scan: the step function now returns
an output tensor and a separate state tensor instead of a single tensor; the
forward method now takes both an input_axis and an output_axis instead
of a single axis argument