vollo_compiler

class vollo_compiler.Config

A Vollo accelerator configuration.

Each Vollo bitstream contains a specific configuration of the Vollo accelerator, e.g. number of cores, size of each core, etc. Program s need to be compiled for the accelerator configuration that they will be run on.

For the bitstreams included in the Vollo SDK, use the preset configs ia_420f_c6b32(), ia_840f_c3b64().

static ia_420f_c6b32(): IA-420f configuration with 6 cores and block size 32

static ia_840f_c3b64(): IA-840f configuration with 3 cores and block size 64

save(json_path: str) → None: Save a hardware configuration to a JSON file

static load(json_path: str, check_version_matches: bool = True) → Config

Load a hardware configuration from a JSON file

Parameters:

json_path – Path to the JSON file
check_version_matches – if the provided JSON file is versioned, check that the version matches the current version

block_size: Size of the tensor blocks

num_cores: Number of cores

tensor_ram_depth: Amount of tensor RAM per-core (in blocks)

tensor_descriptor_count: Maximum number of tensors per core

weight_store_depth: Amount of weight store per-core (in blocks)

accum_store_depth: Amount of accumulator store per-core (in blocks)

cell_state_depth: Amount of LSTM cell state per-core (in blocks)

clamp_store_depth: Amount of clamp store per-core (i.e. the maximum number of different clamp configurations that can be used on a single core)

max_read_size: Maximum size of data that instructions can perform operations on (in blocks)

io_size: Minimum size of IO packet (in values)

class vollo_compiler.NNIR

Neural Network Intermediate Representation

The representation of neural networks in the Vollo compiler. It can be built from a PyTorch model using vollo_torch.fx.nnir.to_nnir(), or from an ONNX model.

static from_onnx(onnx_path: str, overwrite_input_shape: Optional[list[int]]) → NNIR: Load an ONNX model from a file and convert it to an NNIR graph

streaming_transform(streaming_axis: int) → NNIR: Performs the streaming transform, converting the NNIR to a streaming model

to_program(config: Config, name: Optional[str] = None, *, optimize_transforms: bool = false, output_buffer_capacity: int = 64, write_queue_capacity: int = 32) → Program

Compile a NNIR graph to a Program.

Note that the NNIR model given must be a streaming model.

Parameters:

config – The hardware configuration to compile the program for
optimize_transforms – Whether to run the VM to decide whether to apply certain transformations or not
output_buffer_capacity – The size of the output buffer in the VM (only used when optimise_transforms is true)
write_queue_capacity – The size of the write queue in the VM (only used when optimise_transforms is true)
name – The name of the program

__new__(**kwargs)

class vollo_compiler.Program

A program which can be run on a VM, or can be used with the Vollo Runtime to put the program onto hardware.

hw_config() → Config

static io_only_test(config: Config, input_values: int, output_values: int) → Program: Make a new program that does no compute and aranges IO such that output only starts when all the input is available on the accelerator

static load(input_path: str) → Program

metrics() → Metrics: Static Metrics

model_input_shape(model_index: int = 0, input_index: int = 0) → Tuple[int]: Get the shape of the input at input_index in model at model_index

model_input_streaming_dim(model_index: int = 0, input_index: int = 0) → Optional[int]: Get the shape of the input at input_index in model at model_index

model_num_inputs(model_index: int = 0) → int: Get the number of inputs model at model_index uses.

model_num_outputs(model_index: int = 0) → int: Get the number of outputs model at model_index uses.

model_output_shape(model_index: int = 0, output_index: int = 0) → Tuple[int]: Get the shape of the output at output_index in model at model_index

model_output_streaming_dim(model_index: int = 0, output_index: int = 0) → Optional[int]: Get the shape of the output at output_index in model at model_index

num_models() → int: The number of models in the program.

save(output_path: str)

to_vm(write_queue_capacity: int = 32, output_buffer_capacity: int = 64, bf16_precision: bool = False) → VM

Construct a stateful Virtual Machine for simulating a Vollo Program

Parameters:: bf16_precision (bool) – Use bf16 precision instead of fp32 to simulate the VOLLO accelerator more accurately. Defaults to False.

transform_to_io_only_test() → Program: Make a new program that is IO compatible but does no compute, an IO only test

class vollo_compiler.Metrics

Static metrics of a program.

clamp_store_depth: Total amount of clamp store available on each core

clamp_store_used: Amount of clamp store used by the program on each core

input_bytes: Number of bytes input per-inference for each model

model_names: The name of each model if specified

num_instrs: Number of instructions on each core

output_bytes: Number of bytes output per-inference for each model

tensor_ram_depth: Total amount of tensor ram available on each core

tensor_ram_used: Tensor ram used by the program on each core

weight_store_depth: Total amount of weight store available on each core

weight_store_used: Amount of weight store used by the program on each core

class vollo_compiler.VM

A wrapper around a Program and the state of a VM

compute_duration_us(clock_mhz: int = 320) → float

Translate the VM’s cycle count to a figure in microseconds by dividing it by the clock speed.

Parameters:: clock_mhz – Clock speed of the Vollo frequency in MHz.

cycle_count() → int: The number of cycles performed on the VM.

metrics() → Metrics: Get the static metrics of the program held by the VM

run(input: numpy.ndarray, model_index: int = 0) → numpy.ndarray: Run the VM on a shaped input.

run_flat(input: numpy.ndarray, model_index: int = 0) → numpy.ndarray: Run the VM on a 1D input.

run_flat_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int = 0) → numpy.ndarray

Run the VM on multiple timesteps of input.

Parameters:

input_timestep_dim (int) – The dimension over which to split the input into timesteps.
output_timestep_dim (int) – The dimension over which to build up the output timesteps, i.e. the timesteps are stacked along this dimension.

run_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int) → numpy.ndarray: Run the VM on multiple timesteps with a shaped input.

exception vollo_compiler.AllocationError

Failed to allocate memory during compilation.

This can happen if a model requires more space to store weights/activations, etc. than is available for the accelerator configuration.

exception vollo_compiler.SaveError: Failed to save program.

exception vollo_compiler.LoadError: Failed to load program.