vollo_compiler

class vollo_compiler.Config

A Vollo accelerator configuration.

Each Vollo bitstream contains a specific configuration of the Vollo accelerator, e.g. number of cores, size of each core, etc. Program s need to be compiled for the accelerator configuration that they will be run on.

For the bitstreams included in the Vollo SDK, use the preset configs ia_420f_c6b32(), ia_840f_c3b64().

static ia_420f_c6b32()

IA-420f configuration with 6 cores and block size 32

static ia_840f_c3b64()

IA-840f configuration with 3 cores and block size 64

save(json_path: str) None

Save a hardware configuration to a JSON file

static load(json_path: str, check_version_matches: bool = True) Config

Load a hardware configuration from a JSON file

Parameters:
  • json_path – Path to the JSON file

  • check_version_matches – if the provided JSON file is versioned, check that the version matches the current version

block_size

Size of the tensor blocks

num_cores

Number of cores

tensor_ram_depth

Amount of tensor RAM per-core (in blocks)

tensor_descriptor_count

Maximum number of tensors per core

weight_store_depth

Amount of weight store per-core (in blocks)

accum_store_depth

Amount of accumulator store per-core (in blocks)

cell_state_depth

Amount of LSTM cell state per-core (in blocks)

clamp_store_depth

Amount of clamp store per-core (i.e. the maximum number of different clamp configurations that can be used on a single core)

max_read_size

Maximum size of data that instructions can perform operations on (in blocks)

io_size

Minimum size of IO packet (in values)

class vollo_compiler.NNIR

Neural Network Intermediate Representation

The representation of neural networks in the Vollo compiler. It can be built from a PyTorch model using vollo_torch.fx.nnir.to_nnir(), or from an ONNX model.

static from_onnx(onnx_path: str, overwrite_input_shape: Optional[list[int]]) NNIR

Load an ONNX model from a file and convert it to an NNIR graph

streaming_transform(streaming_axis: int) NNIR

Performs the streaming transform, converting the NNIR to a streaming model

to_program(config: Config, name: Optional[str] = None, *, optimize_transforms: bool = false, output_buffer_capacity: int = 64, write_queue_capacity: int = 32) Program

Compile a NNIR graph to a Program.

Note that the NNIR model given must be a streaming model.

Parameters:
  • config – The hardware configuration to compile the program for

  • optimize_transforms – Whether to run the VM to decide whether to apply certain transformations or not

  • output_buffer_capacity – The size of the output buffer in the VM (only used when optimise_transforms is true)

  • write_queue_capacity – The size of the write queue in the VM (only used when optimise_transforms is true)

  • name – The name of the program

__new__(**kwargs)
class vollo_compiler.Program

A program which can be run on a VM, or can be used with the Vollo Runtime to put the program onto hardware.

compute_duration_per_inference_us(clock_mhz: int = 320, write_queue_capacity: int = 32, output_buffer_capacity: int = 64, model_index: int = 0) float

Translate the program’s cycle count per inference to a figure in microseconds by dividing it by the clock speed.

Parameters:

clock_mhz – Clock speed of the Vollo frequency in MHz.

cycle_count_per_inference(write_queue_capacity: int = 32, output_buffer_capacity: int = 64, model_index: int = 0) int

The number of cycles the program performs in one inference.

hw_config() Config
static io_only_test(config: Config, input_values: int, output_values: int) Program

Make a new program that does no compute and aranges IO such that output only starts when all the input is available on the accelerator

static load(input_path: str) Program
static load_bytes(data: bytes) Program
metrics() Metrics

Static Metrics

model_input_shape(model_index: int = 0, input_index: int = 0) Tuple[int]

Get the shape of the input at input_index in model at model_index

model_input_streaming_dim(model_index: int = 0, input_index: int = 0) Optional[int]

Get the shape of the input at input_index in model at model_index

model_num_inputs(model_index: int = 0) int

Get the number of inputs model at model_index uses.

model_num_outputs(model_index: int = 0) int

Get the number of outputs model at model_index uses.

model_output_shape(model_index: int = 0, output_index: int = 0) Tuple[int]

Get the shape of the output at output_index in model at model_index

model_output_streaming_dim(model_index: int = 0, output_index: int = 0) Optional[int]

Get the shape of the output at output_index in model at model_index

num_models() int

The number of models in the program.

save(output_path: str)
save_bytes() bytes
to_vm(write_queue_capacity: int = 32, output_buffer_capacity: int = 64, bit_accurate: bool = True) VM

Construct a stateful Virtual Machine for simulating a Vollo Program

Parameters:

bit_accurate (bool) – Use a compute model that replicates the VOLLO accelerator with bit-accuracy. Disable to use single precision compute. Defaults to True.

transform_to_io_only_test() Program

Make a new program that is IO compatible but does no compute, an IO only test

class vollo_compiler.ProgramBuilder

Tracks an internal list of NNIRs which can be compiled into a single multi-model program

add_nnir(nnir: NNIR, name: Optional[str] = None, *, optimize_transforms: bool = false, output_buffer_capacity: int = 64, write_queue_capacity: int = 32)

Adds a model program compiled from an NNIR to the ProgramBuilder

Parameters:
  • nnir – The NNIR to add

  • optimize_transforms – Whether to run the VM to decide whether to apply certain transformations or not

  • output_buffer_capacity – The size of the output buffer in the VM (only used if optimise_transforms is true)

  • write_queue_capacity – The size of the write queue in the VM (only used if optimise_transforms is true)

  • name – The name of the model

to_program()

Builds a program the internal NNIRs

Parameters:

config – The config describing resources available for the final program

class vollo_compiler.Metrics

Static metrics of a program.

clamp_store_depth

Total amount of clamp store available on each core

clamp_store_used

Amount of clamp store used by the program on each core

input_bytes

Number of bytes input per-inference for each model

model_names

The name of each model if specified

num_instrs

Number of instructions on each core

output_bytes

Number of bytes output per-inference for each model

tensor_ram_depth

Total amount of tensor ram available on each core

tensor_ram_used

Tensor ram used by the program on each core

weight_store_depth

Total amount of weight store available on each core

weight_store_used

Amount of weight store used by the program on each core

class vollo_compiler.VM

A wrapper around a Program and the state of a VM

compute_duration_us(clock_mhz: int = 320) float

Translate the VM’s cycle count to a figure in microseconds by dividing it by the clock speed.

Parameters:

clock_mhz – Clock speed of the Vollo frequency in MHz.

cycle_count() int

The number of cycles that have been performed so far on the VM across all inferences.

metrics() Metrics

Get the static metrics of the program held by the VM

run(input: numpy.ndarray, model_index: int = 0) numpy.ndarray

Run the VM on a shaped input.

run_flat(input: numpy.ndarray, model_index: int = 0) numpy.ndarray

Run the VM on a 1D input.

run_flat_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int = 0) numpy.ndarray

Run the VM on multiple timesteps of input.

Parameters:
  • input_timestep_dim (int) – The dimension over which to split the input into timesteps.

  • output_timestep_dim (int) – The dimension over which to build up the output timesteps, i.e. the timesteps are stacked along this dimension.

run_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int) numpy.ndarray

Run the VM on multiple timesteps with a shaped input.

exception vollo_compiler.AllocationError

Failed to allocate memory during compilation.

This can happen if a model requires more space to store weights/activations, etc. than is available for the accelerator configuration.

exception vollo_compiler.SaveError

Failed to save program.

exception vollo_compiler.LoadError

Failed to load program.