vollo_compiler
- class vollo_compiler.Config
A Vollo accelerator configuration.
Each Vollo bitstream contains a specific configuration of the Vollo accelerator, e.g. number of cores, size of each core, etc.
Program
s need to be compiled for the accelerator configuration that they will be run on.For the bitstreams included in the Vollo SDK, use the preset configs
ia_420f_c6b32()
,ia_840f_c3b64()
.- static ia_420f_c6b32()
IA-420f configuration with 6 cores and block size 32
- static ia_840f_c3b64()
IA-840f configuration with 3 cores and block size 64
- save(json_path: str) None
Save a hardware configuration to a JSON file
- block_size
Size of the tensor blocks
- num_cores
Number of cores
- tensor_ram_depth
Amount of tensor RAM per-core (in blocks)
- tensor_descriptor_count
Maximum number of tensors per core
- weight_store_depth
Amount of weight store per-core (in blocks)
- accum_store_depth
Amount of accumulator store per-core (in blocks)
- cell_state_depth
Amount of LSTM cell state per-core (in blocks)
- clamp_store_depth
Amount of clamp store per-core (i.e. the maximum number of different clamp configurations that can be used on a single core)
- max_read_size
Maximum size of data that instructions can perform operations on (in blocks)
- io_size
Minimum size of IO packet (in values)
- class vollo_compiler.NNIR
Neural Network Intermediate Representation
The representation of neural networks in the Vollo compiler. It can be built from a PyTorch model using
vollo_torch.fx.nnir.to_nnir()
, or from an ONNX model.- static from_onnx(onnx_path: str, overwrite_input_shape: Optional[list[int]]) NNIR
Load an ONNX model from a file and convert it to an NNIR graph
- streaming_transform(streaming_axis: int) NNIR
Performs the streaming transform, converting the NNIR to a streaming model
- to_program(config: Config, name: Optional[str]) Program
Compile a NNIR graph to a
Program
.Performs the same operation as
Program.compile()
Note that the NNIR model given must be a streaming model.
- __new__(**kwargs)
- class vollo_compiler.Program
A program which can be run on a
VM
, or can be used with the Vollo Runtime to put the program onto hardware.- static io_only_test(config: Config, input_values: int, output_values: int) Program
Make a new program that does no compute and aranges IO such that output only starts when all the input is available on the accelerator
- model_input_shape(model_index: int = 0, input_index: int = 0) Tuple[int]
Get the shape of the input at input_index in model at model_index
- model_input_streaming_dim(model_index: int = 0, input_index: int = 0) Optional[int]
Get the shape of the input at input_index in model at model_index
- model_num_inputs(model_index: int = 0) int
Get the number of inputs model at model_index uses.
- model_num_outputs(model_index: int = 0) int
Get the number of outputs model at model_index uses.
- model_output_shape(model_index: int = 0, output_index: int = 0) Tuple[int]
Get the shape of the output at output_index in model at model_index
- model_output_streaming_dim(model_index: int = 0, output_index: int = 0) Optional[int]
Get the shape of the output at output_index in model at model_index
- num_models() int
The number of models in the program.
- save(output_path: str)
- to_vm(write_queue_capacity: int = 1, output_buffer_capacity: int = 64, bf16_precision: bool = False) VM
Construct a stateful Virtual Machine for simulating a Vollo Program
- Parameters:
bf16_precision (bool) – Use bf16 precision instead of fp32 to simulate the VOLLO accelerator more accurately. Defaults to False.
- class vollo_compiler.Metrics
Static metrics of a program.
- clamp_store_depth
Total amount of clamp store available on each core
- clamp_store_used
Amount of clamp store used by the program on each core
- input_bytes
Number of bytes input per-inference for each model
- model_names
The name of each model if specified
- num_instrs
Number of instructions on each core
- output_bytes
Number of bytes output per-inference for each model
- tensor_ram_depth
Total amount of tensor ram available on each core
- tensor_ram_used
Tensor ram used by the program on each core
- weight_store_depth
Total amount of weight store available on each core
- weight_store_used
Amount of weight store used by the program on each core
- class vollo_compiler.VM
A wrapper around a
Program
and the state of a VM- compute_duration_us(clock_mhz: int = 320) float
Translate the VM’s cycle count to a figure in microseconds by dividing it by the clock speed.
- Parameters:
clock_mhz – Clock speed of the Vollo frequency in MHz.
- cycle_count() int
The number of cycles performed on the VM.
- run(input: numpy.ndarray, model_index: int = 0) numpy.ndarray
Run the VM on a shaped input.
- run_flat(input: numpy.ndarray, model_index: int = 0) numpy.ndarray
Run the VM on a 1D input.
- run_flat_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int = 0) numpy.ndarray
Run the VM on multiple timesteps of input.
- Parameters:
input_timestep_dim (int) – The dimension over which to split the input into timesteps.
output_timestep_dim (int) – The dimension over which to build up the output timesteps, i.e. the timesteps are stacked along this dimension.
- run_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int) numpy.ndarray
Run the VM on multiple timesteps with a shaped input.
- exception vollo_compiler.AllocationError
Failed to allocate memory during compilation.
This can happen if a model requires more space to store weights/activations, etc. than is available for the accelerator configuration.
- exception vollo_compiler.SaveError
Failed to save program.
- exception vollo_compiler.LoadError
Failed to load program.