vollo_compiler
- class vollo_compiler.Config
A Vollo accelerator configuration.
Each Vollo bitstream contains a specific configuration of the Vollo accelerator, e.g. number of cores, size of each core, etc.
Programs need to be compiled for the accelerator configuration that they will be run on.For the bitstreams included in the Vollo SDK, use the preset configs
ia_420f_c6b32(),ia_840f_c3b64().- static ia_420f_c6b32()
IA-420f configuration with 6 cores and block size 32
- static ia_840f_c3b64()
IA-840f configuration with 3 cores and block size 64
- save(json_path: str) None
Save a hardware configuration to a JSON file
- static load(json_path: str, check_version_matches: bool = True) Config
Load a hardware configuration from a JSON file
- Parameters:
json_path – Path to the JSON file
check_version_matches – if the provided JSON file is versioned, check that the version matches the current version
- block_size
Size of the tensor blocks
- num_cores
Number of cores
- tensor_ram_depth
Amount of tensor RAM per-core (in blocks)
- tensor_descriptor_count
Maximum number of tensors per core
- weight_store_depth
Amount of weight store per-core (in blocks)
- accum_store_depth
Amount of accumulator store per-core (in blocks)
- cell_state_depth
Amount of LSTM cell state per-core (in blocks)
- clamp_store_depth
Amount of clamp store per-core (i.e. the maximum number of different clamp configurations that can be used on a single core)
- max_read_size
Maximum size of data that instructions can perform operations on (in blocks)
- io_size
Minimum size of IO packet (in values)
- class vollo_compiler.NNIR
Neural Network Intermediate Representation
The representation of neural networks in the Vollo compiler. It can be built from a PyTorch model using
vollo_torch.fx.nnir.to_nnir(), or from an ONNX model.- static from_onnx(onnx_path: str, overwrite_input_shape: Optional[list[int]]) NNIR
Load an ONNX model from a file and convert it to an NNIR graph
- streaming_transform(streaming_axis: int) NNIR
Performs the streaming transform, converting the NNIR to a streaming model
- to_program(config: Config, name: Optional[str] = None, *, optimize_transforms: bool = false, output_buffer_capacity: int = 64, write_queue_capacity: int = 32) Program
Compile a NNIR graph to a
Program.Note that the NNIR model given must be a streaming model.
- Parameters:
config – The hardware configuration to compile the program for
optimize_transforms – Whether to run the VM to decide whether to apply certain transformations or not
output_buffer_capacity – The size of the output buffer in the VM (only used when optimise_transforms is true)
write_queue_capacity – The size of the write queue in the VM (only used when optimise_transforms is true)
name – The name of the program
- __new__(**kwargs)
- class vollo_compiler.Program
A program which can be run on a
VM, or can be used with the Vollo Runtime to put the program onto hardware.- static io_only_test(config: Config, input_values: int, output_values: int) Program
Make a new program that does no compute and aranges IO such that output only starts when all the input is available on the accelerator
- model_input_shape(model_index: int = 0, input_index: int = 0) Tuple[int]
Get the shape of the input at input_index in model at model_index
- model_input_streaming_dim(model_index: int = 0, input_index: int = 0) Optional[int]
Get the shape of the input at input_index in model at model_index
- model_num_inputs(model_index: int = 0) int
Get the number of inputs model at model_index uses.
- model_num_outputs(model_index: int = 0) int
Get the number of outputs model at model_index uses.
- model_output_shape(model_index: int = 0, output_index: int = 0) Tuple[int]
Get the shape of the output at output_index in model at model_index
- model_output_streaming_dim(model_index: int = 0, output_index: int = 0) Optional[int]
Get the shape of the output at output_index in model at model_index
- num_models() int
The number of models in the program.
- save(output_path: str)
- to_vm(write_queue_capacity: int = 32, output_buffer_capacity: int = 64, bf16_precision: bool = False) VM
Construct a stateful Virtual Machine for simulating a Vollo Program
- Parameters:
bf16_precision (bool) – Use bf16 precision instead of fp32 to simulate the VOLLO accelerator more accurately. Defaults to False.
- class vollo_compiler.Metrics
Static metrics of a program.
- clamp_store_depth
Total amount of clamp store available on each core
- clamp_store_used
Amount of clamp store used by the program on each core
- input_bytes
Number of bytes input per-inference for each model
- model_names
The name of each model if specified
- num_instrs
Number of instructions on each core
- output_bytes
Number of bytes output per-inference for each model
- tensor_ram_depth
Total amount of tensor ram available on each core
- tensor_ram_used
Tensor ram used by the program on each core
- weight_store_depth
Total amount of weight store available on each core
- weight_store_used
Amount of weight store used by the program on each core
- class vollo_compiler.VM
A wrapper around a
Programand the state of a VM- compute_duration_us(clock_mhz: int = 320) float
Translate the VM’s cycle count to a figure in microseconds by dividing it by the clock speed.
- Parameters:
clock_mhz – Clock speed of the Vollo frequency in MHz.
- cycle_count() int
The number of cycles performed on the VM.
- run(input: numpy.ndarray, model_index: int = 0) numpy.ndarray
Run the VM on a shaped input.
- run_flat(input: numpy.ndarray, model_index: int = 0) numpy.ndarray
Run the VM on a 1D input.
- run_flat_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int = 0) numpy.ndarray
Run the VM on multiple timesteps of input.
- Parameters:
input_timestep_dim (int) – The dimension over which to split the input into timesteps.
output_timestep_dim (int) – The dimension over which to build up the output timesteps, i.e. the timesteps are stacked along this dimension.
- run_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int) numpy.ndarray
Run the VM on multiple timesteps with a shaped input.
- exception vollo_compiler.AllocationError
Failed to allocate memory during compilation.
This can happen if a model requires more space to store weights/activations, etc. than is available for the accelerator configuration.
- exception vollo_compiler.SaveError
Failed to save program.
- exception vollo_compiler.LoadError
Failed to load program.