vollo_compiler
- class vollo_compiler.Config
A Vollo accelerator configuration.
Each Vollo bitstream contains a specific configuration of the Vollo accelerator, e.g. number of cores, size of each core, etc.
Program
s need to be compiled for the accelerator configuration that they will be run on.For the bitstreams included in the Vollo SDK, use the preset configs
ia_420f_c6b32()
,ia_840f_c3b64()
.- static ia_420f_c6b32()
IA-420f configuration with 6 cores and block size 32
- static ia_840f_c3b64()
IA-840f configuration with 3 cores and block size 64
- save(json_path: str) None
Save a hardware configuration to a JSON file
- static load(json_path: str, check_version_matches: bool = True) Config
Load a hardware configuration from a JSON file
- Parameters:
json_path – Path to the JSON file
check_version_matches – if the provided JSON file is versioned, check that the version matches the current version
- block_size
Size of the tensor blocks
- num_cores
Number of cores
- tensor_ram_depth
Amount of tensor RAM per-core (in blocks)
- tensor_descriptor_count
Maximum number of tensors per core
- weight_store_depth
Amount of weight store per-core (in blocks)
- accum_store_depth
Amount of accumulator store per-core (in blocks)
- cell_state_depth
Amount of LSTM cell state per-core (in blocks)
- clamp_store_depth
Amount of clamp store per-core (i.e. the maximum number of different clamp configurations that can be used on a single core)
- max_read_size
Maximum size of data that instructions can perform operations on (in blocks)
- io_size
Minimum size of IO packet (in values)
- class vollo_compiler.NNIR
Neural Network Intermediate Representation
The representation of neural networks in the Vollo compiler. It can be built from a PyTorch model using
vollo_torch.fx.nnir.to_nnir()
, or from an ONNX model.- static from_onnx(onnx_path: str, overwrite_input_shape: Optional[list[int]]) NNIR
Load an ONNX model from a file and convert it to an NNIR graph
- streaming_transform(streaming_axis: int) NNIR
Performs the streaming transform, converting the NNIR to a streaming model
- to_program(config: Config, name: Optional[str] = None, *, optimize_transforms: bool = false, output_buffer_capacity: int = 64, write_queue_capacity: int = 32) Program
Compile a NNIR graph to a
Program
.Note that the NNIR model given must be a streaming model.
- Parameters:
config – The hardware configuration to compile the program for
optimize_transforms – Whether to run the VM to decide whether to apply certain transformations or not
output_buffer_capacity – The size of the output buffer in the VM (only used when optimise_transforms is true)
write_queue_capacity – The size of the write queue in the VM (only used when optimise_transforms is true)
name – The name of the program
- __new__(**kwargs)
- class vollo_compiler.Program
A program which can be run on a
VM
, or can be used with the Vollo Runtime to put the program onto hardware.- compute_duration_per_inference_us(clock_mhz: int = 320, write_queue_capacity: int = 32, output_buffer_capacity: int = 64, model_index: int = 0) float
Translate the program’s cycle count per inference to a figure in microseconds by dividing it by the clock speed.
- Parameters:
clock_mhz – Clock speed of the Vollo frequency in MHz.
- cycle_count_per_inference(write_queue_capacity: int = 32, output_buffer_capacity: int = 64, model_index: int = 0) int
The number of cycles the program performs in one inference.
- static io_only_test(config: Config, input_values: int, output_values: int) Program
Make a new program that does no compute and aranges IO such that output only starts when all the input is available on the accelerator
- model_input_shape(model_index: int = 0, input_index: int = 0) Tuple[int]
Get the shape of the input at input_index in model at model_index
- model_input_streaming_dim(model_index: int = 0, input_index: int = 0) Optional[int]
Get the shape of the input at input_index in model at model_index
- model_num_inputs(model_index: int = 0) int
Get the number of inputs model at model_index uses.
- model_num_outputs(model_index: int = 0) int
Get the number of outputs model at model_index uses.
- model_output_shape(model_index: int = 0, output_index: int = 0) Tuple[int]
Get the shape of the output at output_index in model at model_index
- model_output_streaming_dim(model_index: int = 0, output_index: int = 0) Optional[int]
Get the shape of the output at output_index in model at model_index
- num_models() int
The number of models in the program.
- save(output_path: str)
- save_bytes() bytes
- to_vm(write_queue_capacity: int = 32, output_buffer_capacity: int = 64, bit_accurate: bool = True) VM
Construct a stateful Virtual Machine for simulating a Vollo Program
- Parameters:
bit_accurate (bool) – Use a compute model that replicates the VOLLO accelerator with bit-accuracy. Disable to use single precision compute. Defaults to True.
- class vollo_compiler.ProgramBuilder
Tracks an internal list of NNIRs which can be compiled into a single multi-model program
- add_nnir(nnir: NNIR, name: Optional[str] = None, *, optimize_transforms: bool = false, output_buffer_capacity: int = 64, write_queue_capacity: int = 32)
Adds a model program compiled from an NNIR to the ProgramBuilder
- Parameters:
nnir – The NNIR to add
optimize_transforms – Whether to run the VM to decide whether to apply certain transformations or not
output_buffer_capacity – The size of the output buffer in the VM (only used if optimise_transforms is true)
write_queue_capacity – The size of the write queue in the VM (only used if optimise_transforms is true)
name – The name of the model
- to_program()
Builds a program the internal NNIRs
- Parameters:
config – The config describing resources available for the final program
- class vollo_compiler.Metrics
Static metrics of a program.
- clamp_store_depth
Total amount of clamp store available on each core
- clamp_store_used
Amount of clamp store used by the program on each core
- input_bytes
Number of bytes input per-inference for each model
- model_names
The name of each model if specified
- num_instrs
Number of instructions on each core
- output_bytes
Number of bytes output per-inference for each model
- tensor_ram_depth
Total amount of tensor ram available on each core
- tensor_ram_used
Tensor ram used by the program on each core
- weight_store_depth
Total amount of weight store available on each core
- weight_store_used
Amount of weight store used by the program on each core
- class vollo_compiler.VM
A wrapper around a
Program
and the state of a VM- compute_duration_us(clock_mhz: int = 320) float
Translate the VM’s cycle count to a figure in microseconds by dividing it by the clock speed.
- Parameters:
clock_mhz – Clock speed of the Vollo frequency in MHz.
- cycle_count() int
The number of cycles that have been performed so far on the VM across all inferences.
- run(input: numpy.ndarray, model_index: int = 0) numpy.ndarray
Run the VM on a shaped input.
- run_flat(input: numpy.ndarray, model_index: int = 0) numpy.ndarray
Run the VM on a 1D input.
- run_flat_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int = 0) numpy.ndarray
Run the VM on multiple timesteps of input.
- Parameters:
input_timestep_dim (int) – The dimension over which to split the input into timesteps.
output_timestep_dim (int) – The dimension over which to build up the output timesteps, i.e. the timesteps are stacked along this dimension.
- run_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int) numpy.ndarray
Run the VM on multiple timesteps with a shaped input.
- exception vollo_compiler.AllocationError
Failed to allocate memory during compilation.
This can happen if a model requires more space to store weights/activations, etc. than is available for the accelerator configuration.
- exception vollo_compiler.SaveError
Failed to save program.
- exception vollo_compiler.LoadError
Failed to load program.