vollo_rt

class vollo_rt.VolloRTContext

A context for performing computation on Vollo

This wraps the C bindings for the Vollo Runtime. In order for contexts to be properly garbage collected, VolloRTContext is a context manager, and should be used in a with block. This ensures that the context id correctly destroyed after use. VolloRTContext should not be nested, only a single context should be open at any given time.

The general order of operations is as follows:

High level API example:

ctx.add_accelerator(0)
ctx.load_program("program.vollo")

input_arr = np.random.rand(100,50).astype(np.float32)
output = ctx.run(input_arr)

Low level API example:

ctx.add_accelerator(0)
ctx.load_program("program.vollo")

# add jobs
input_arr = np.random.rand(100,50).astype(np.float32)
user_ctx_f32 = ctx.add_job_f32(input_arr)

# loop until the job is complete
completed_jobs = []
while(user_ctx_f32 not in completed_jobs):
    completed_jobs = completed_jobs + ctx.poll()

# retrieve the results from each computation
output_f32 = ctx.get_result(user_ctx_f32)
accelerator_block_size(accelerator_index: int) int

Get the block size of a Vollo accelerator

accelerator_num_cores(accelerator_index: int) int

Get the number of cores of a Vollo accelerator

add_accelerator(accelerator_index: int)

Add an accelerator. The accelerator is specified by its index. The index refers to an accelerator in the sorted list of PCI addresses.

add_job(input, model_index: int = 0)

Sets up a computation on the vollo where the inputs and outputs have type numpy.float32, torch.float32, or torch.bfloat16. Returns a user context, user_ctx. The poll() function will return a list containing user_ctx once the job has been completed.

Note

  • The computation will be performed in bf16 but the driver will perform the conversion (if needed).

  • The computation is only started on the next call to poll. This way it is possible to set up several computations that are kicked off at the same time.

add_job_bf16(input, model_index: int = 0)

Sets up a computation on the vollo where the inputs and outputs have type torch.bfloat16. Returns a user context, user_ctx. The poll function will return a list containing user_ctx once the job has been completed.

Note: The computation is only started on the next call to poll. This way it is possible to set up several computations that are kicked off at the same time.

add_job_f32(input, model_index: int = 0) int

Sets up a computation on the vollo where the inputs and outputs have type numpy.float32 or torch.float32. Returns a user context, user_ctx. The poll() function will return a list containing user_ctx once the job has been completed.

Note

  • The computation will still be performed in bf16 but the driver will perform the conversion.

  • The computation is only started on the next call to poll. This way it is possible to set up several computations that are kicked off at the same time.

get_result(user_ctx: int)

Retrieve the result of the computation corresponding to user_ctx

load_program(program: Union[str, PathLike, Program])

Load a pre-compiled program onto the accelerator.

Arg:

program: One of:

A path to a Vollo program (typically .vollo)

A program straight from the Vollo Compiler

model_input_shape(model_index: int = 0, input_index: int = 0) Tuple[int]

Get the shape of a model input

model_input_streaming_dim(model_index: int = 0, input_index: int = 0) Optional[int]

Get the input streaming dimension or None

model_name(model_index: int = 0) Optional[str]

Get the name of a model if set

model_num_inputs(model_index: int = 0) int

Get the number of inputs the model uses.

model_num_outputs(model_index: int = 0) int

Get the number of outputs the model uses.

model_output_shape(model_index: int = 0, output_index: int = 0) Tuple[int]

Get the shape of a model output

model_output_streaming_dim(model_index: int = 0, output_index: int = 0) Optional[int]

Get the output streaming dimension or None

num_models() int

The number of models in the loaded program.

poll() List[int]

Poll the vollo for completion. Returns a list of user contexts corresponding to completed jobs.

Note: Polling also initiates transfers for new jobs, so you must poll before any progress on these new jobs can be made.

run(input, model_index: int = 0, max_poll_iterations: int = 10000)

Run a single inference of a model on Vollo

This is a simpler API to run single inferences without needing to use add_job(), poll() and get_result() manually