Devices

Interface

ComputableDAGs.AbstractDeviceType
AbstractDevice

Abstract base type for every device, like GPUs, CPUs or any other compute devices. Every implementation needs to implement various functions.

source
ComputableDAGs.MachineType
Machine

A representation of a machine to execute on. Contains information about its architecture (CPUs, GPUs, maybe more). This representation can be used to make a more accurate cost prediction of a DAG state.

See also: Scheduler

source
ComputableDAGs.devicesFunction
devices(t::Type{T}; verbose::Bool) where {T <: AbstractDevice}

Interface function that must be implemented for every subtype of AbstractDevice. Returns a Vector{Type} of the devices for the given AbstractDevice Type available on the current machine.

source
ComputableDAGs.kernelFunction
kernel(dag::DAG, instance, context_module::Module)
Warn

Before calling this function, make sure to have called init_kernel in your session!

For a DAG, and a problem instance, return a KernelAbstractions kernel of signature kernel(input::AbstractVector, output::AbstractVector; ndranges::Int64), which will return the result of the DAG computation of the input on the given output vector on each index (like a broadcast). This function is only available as an extension when KernelAbstractions is loaded.

A simple example call for a kernel generated from this might look like the following:

cuda_kernel(get_backend(inputs), 256)(inputs, outputs; ndrange=length(inputs))

The internal index used is @index(Global) as provided by KernelAbstractions. For more details, please refer to the documentation of KernelAbstractions.jl.

Note

Since RuntimeGeneratedFunctions.jl does not support kernels due to its dynamic nature, this is implemented in a similar but more basic way. One limitation is that the body of the generated function may not contain (non-opaque) closures.

Note

Note that the return value of this function is a wrapper around a KernelAbstractions kernel, not the kernel itself. However, for the user, it should behave exactly the same way and passthrough any arguments and kwargs to the actual KA kernel.

Size limitation

The generated kernel does not use any internal parallelization, i.e., the DAG is compiled into a serialized function, processing each input in a single thread of the GPU. This means it can be heavily parallelized and use the GPU at 100% for sufficiently large input vectors (and assuming the function does not become IO limited etc.). However, it also means that there is a limit to how large the compiled function can be. If it gets too large, the compilation might fail, take too long to complete, the kernel might fail during execution if too much stack memory is required, or other similar problems. If this happens, your problem is likely too large to be compiled to a GPU kernel like this.

Compute Requirements

A GPU function has more restrictions on what can be computed than general functions running on the CPU. In Julia, there are mainly two important restrictions to consider:

  1. Used data types must be stack allocatable, i.e., isbits(x) must be true for arguments and local variables used in ComputeTasks.
  2. Function calls must not be dynamic. This means that type stability is required and the compiler must know in advance which method of a generic function to call. What this specifically entails may change with time and also differs between the different target GPU libraries. From experience, inlining as much as possible can help with this.
Warning

This feature is currently experimental. There are still some unresolved issues with the generated kernels.

source
ComputableDAGs.measure_device!Function
measure_device!(device::AbstractDevice; verbose::Bool)

Interface function that must be implemented for every subtype of AbstractDevice. Measures the compute speed of the given device and writes into it.

source

Detect

Measure

ComputableDAGs.measure_devices!Method
measure_devices(machine::Machine; verbose::Bool)

Measure FLOPS, RAM, cache sizes and what other properties can be extracted for the devices in the given machine.

source

Implementations

General

ComputableDAGs.entry_deviceMethod
entry_device(machine::Machine)

Return the "entry" device, i.e., the device that starts CPU threads and GPU kernels, and takes input values and returns the output value.

source
ComputableDAGs.gen_access_exprMethod
gen_access_expr(fc::FunctionCall)

Return an expression that can be assigned to from the return symbols in the given function call. For a function call with only one return symbol, this might be just the variable name as an expression. For multiple return symbols, this is a structured binding.

source

NUMA

ComputableDAGs.devicesMethod
devices(deviceType::Type{T}; verbose::Bool) where {T <: NumaNode}

Return a Vector of NumaNodes available on the current machine. If verbose is true, print some additional information.

source

GPUs