Lift
Library of parallel computing primitives for GPUs and multi-core CPUs
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros
Static Public Member Functions | List of all members
lift::parallel< system > Struct Template Reference

Dispatch structure for parallel primitives. More...

#include <parallel.h>

Static Public Member Functions

template<typename InputIterator , typename UnaryFunction >
static void for_each (InputIterator begin, InputIterator end, UnaryFunction f, int2 launch_parameters={0, 0})
 Parallel for-each implementation. More...
 
template<typename T , typename UnaryFunction >
static void for_each (pointer< system, T > &vector, UnaryFunction f, int2 launch_parameters={0, 0})
 Pointer version of for-each. More...
 
template<typename UnaryFunction >
static void for_each (uint2 range, UnaryFunction f, int2 launch_parameters={0, 0})
 Range-based version of for-each. More...
 
template<typename UnaryFunction >
static void for_each (uint32 end, UnaryFunction f, int2 launch_parameters={0, 0})
 0-based range version of for-each. More...
 
template<typename InputIterator , typename OutputIterator , typename Predicate >
static void inclusive_scan (InputIterator first, size_t len, OutputIterator result, Predicate op)
 Performs a parallel inclusive scan on [first, first + len[, using op as the scan operator. More...
 
template<typename InputIterator , typename OutputIterator , typename Predicate >
static size_t copy_if (InputIterator first, size_t len, OutputIterator result, Predicate op, allocation< system, uint8 > &temp_storage)
 Performs stream compaction based on a predicate. More...
 
template<typename InputIterator , typename FlagIterator , typename OutputIterator >
static size_t copy_flagged (InputIterator first, size_t len, OutputIterator result, FlagIterator flags, allocation< system, uint8 > &temp_storage)
 Performs stream compaction based on a buffer with boolean flags. More...
 
template<typename InputIterator >
static auto sum (InputIterator first, size_t len, allocation< system, uint8 > &temp_storage) -> typename std::iterator_traits< InputIterator >::value_type
 Computes the arithmetic sum of a buffer. More...
 
template<typename Key , typename Value >
static void sort_by_key (pointer< system, Key > &keys, pointer< system, Value > &values, allocation< system, Key > &temp_keys, allocation< system, Value > &temp_values, allocation< system, uint8 > &temp_storage, int num_key_bits=sizeof(Key)*8)
 Perform a sort-by-key on a key + value buffer pair. More...
 
template<typename Key >
static void sort (allocation< system, Key > &keys, allocation< system, Key > &temp_keys, allocation< system, uint8 > &temp_storage)
 Sort a buffer of keys. More...
 
template<typename KeyIterator , typename ValueIterator , typename ReductionOp >
static size_t reduce_by_key (KeyIterator keys_begin, KeyIterator keys_end, ValueIterator values_begin, KeyIterator output_keys, ValueIterator output_values, allocation< system, uint8 > &temp_storage, ReductionOp reduction_op)
 Perform a reduction by key on a key/value buffer pair. More...
 
template<typename Key , typename Value , typename ReductionOp >
static size_t reduce_by_key (pointer< system, Key > &keys, pointer< system, Value > &values, allocation< system, Key > &output_keys, allocation< system, Value > &output_values, allocation< system, uint8 > &temp_storage, ReductionOp reduction_op)
 Perform a reduction by key on a key/value buffer pair. More...
 
template<typename InputIterator , typename UniqueOutputIterator , typename LengthOutputIterator >
static size_t run_length_encode (InputIterator keys_input, size_t num_keys, UniqueOutputIterator unique_keys_output, LengthOutputIterator run_lengths_output, allocation< system, uint8 > &temp_storage)
 Compute a run-length encoding of the input key buffer. More...
 
static void synchronize ()
 Synchronizes the compute device. More...
 
static void check_errors (void)
 Check for compute device errors. More...
 

Detailed Description

template<target_system system>
struct lift::parallel< system >

Dispatch structure for parallel primitives.

Implementations are exposed in specializations of this structure for each target system.

Definition at line 46 of file parallel.h.

Member Function Documentation

template<target_system system>
static void lift::parallel< system >::check_errors ( void  )
inlinestatic

Check for compute device errors.

This function will check for any pending errors on the compute device. It is a no-op on the CPU. On the GPU, it will poll the runtime for a pending error condition; if found, it will be printed to the console and the calling process will be terminated.

template<target_system system>
template<typename InputIterator , typename FlagIterator , typename OutputIterator >
static size_t lift::parallel< system >::copy_flagged ( InputIterator  first,
size_t  len,
OutputIterator  result,
FlagIterator  flags,
allocation< system, uint8 > &  temp_storage 
)
inlinestatic

Performs stream compaction based on a buffer with boolean flags.

copy_flagged copies every element in the input buffer for which the corresponding index in the flags buffer is true.

Template Parameters
InputIteratorType of the input data iterator.
FlagIteratorType of the input flags iterator.
OutputIteratorType of the output data iterator.
Parameters
firstIterator to the start of the input buffer.
lenNumber of elements in the input buffer.
resultIterator to the start of the output buffer.
flagsDetermines whether each element is copied. For a given index, the element data at that index in the input buffer is copied if and only if the flag at the same index evaluates to true.
temp_storageTemporary storage for use by the implementation. Will be resized if too small.
template<target_system system>
template<typename InputIterator , typename OutputIterator , typename Predicate >
static size_t lift::parallel< system >::copy_if ( InputIterator  first,
size_t  len,
OutputIterator  result,
Predicate  op,
allocation< system, uint8 > &  temp_storage 
)
inlinestatic

Performs stream compaction based on a predicate.

copy_if copies every element in the input buffer for which Predicate evaluates to true into the output buffer.

Template Parameters
InputIteratorType of the input data iterator.
OutputIteratorType of the output data iterator.
Parameters
firstIterator to the start of the input buffer.
lenNumber of elements in the input buffer.
resultIterator to the start of the output buffer.
opDetermines whether each element is copied. Called with a single argument (a copy of or reference to the input element), returns a boolean value that determines whether each element is copied.
temp_storageTemporary storage for use by the implementation. Will be resized if too small.
template<target_system system>
template<typename InputIterator , typename UnaryFunction >
static void lift::parallel< system >::for_each ( InputIterator  begin,
InputIterator  end,
UnaryFunction  f,
int2  launch_parameters = {0, 0} 
)
inlinestatic

Parallel for-each implementation.

Applies UnaryFunction f to each element in the range [begin, end[.

Template Parameters
InputIteratorIterator type for input data.
UnaryFunctionAny callable type taking a single argument. The argument can either be a value of the same type obtained by dereferencing InputIterator, or else a (const or modifiable) reference to that type. The return value is ignored.
Parameters
beginIterator pointing at the first element to be processed.
endIterator pointing at the end of the range to be processed.
fThe function object to be applied to each item in the input.
launch_parametersGrid launch parameters for GPU backend.
template<target_system system>
template<typename T , typename UnaryFunction >
static void lift::parallel< system >::for_each ( pointer< system, T > &  vector,
UnaryFunction  f,
int2  launch_parameters = {0, 0} 
)
inlinestatic

Pointer version of for-each.

Applies UnaryFunction f to each element behind vector.

Template Parameters
TThe underlying data type for vector.
UnaryFunctionSame as in for_each
Parameters
vectorThe memory region to apply f to
fUnaryFunction to apply to each element in vector
launch_parametersSame as in for_each
template<target_system system>
template<typename UnaryFunction >
static void lift::parallel< system >::for_each ( uint2  range,
UnaryFunction  f,
int2  launch_parameters = {0, 0} 
)
inlinestatic

Range-based version of for-each.

Applies UnaryFunction f to each integer in the range [range.x, range.y[ in parallel.

Template Parameters
UnaryFunctionSame as in for_each
Parameters
rangeInteger range to iterate over.
fSame as in for_each
launch_parametersSame as in for_each
template<target_system system>
template<typename UnaryFunction >
static void lift::parallel< system >::for_each ( uint32  end,
UnaryFunction  f,
int2  launch_parameters = {0, 0} 
)
inlinestatic

0-based range version of for-each.

Applies UnaryFunction f to each integer in the range [0, end[ in parallel.

Template Parameters
UnaryFunctionSame as in for_each
Parameters
endEnd point of the interval to iterate over.
fSame as in for_each
launch_parametersSame as in for_each
template<target_system system>
template<typename InputIterator , typename OutputIterator , typename Predicate >
static void lift::parallel< system >::inclusive_scan ( InputIterator  first,
size_t  len,
OutputIterator  result,
Predicate  op 
)
inlinestatic

Performs a parallel inclusive scan on [first, first + len[, using op as the scan operator.

inclusive_scan performs an inclusive prefix-scan on the input buffer. The output is a buffer of the same size, with each element replaced by the result of applying op to every element from the start of the array up to and inclusing the element itself.

Template Parameters
InputIteratorType of the input data iterator.
OutputIteratorType of the output data iterator.
Predicate
Parameters
firstStart of the range to perform the scan over.
lenNumber of data elements to scan over.
resultIterator to the start of the output buffer. Must not be the same as the input buffer.
opThe binary operator to be used as the scan operator. Takes two parameters of Input type, returns the scan value corresponding to the input pair.
template<target_system system>
template<typename KeyIterator , typename ValueIterator , typename ReductionOp >
static size_t lift::parallel< system >::reduce_by_key ( KeyIterator  keys_begin,
KeyIterator  keys_end,
ValueIterator  values_begin,
KeyIterator  output_keys,
ValueIterator  output_values,
allocation< system, uint8 > &  temp_storage,
ReductionOp  reduction_op 
)
inlinestatic

Perform a reduction by key on a key/value buffer pair.

reduce_by_key performs per-key reduction on a key/value buffer. For each set of consecutve identical keys in [keys_begin, keys_end [, the corresponding values are reduced into a single output value by applying a user-specified reduction operator.

Template Parameters
KeyIteratorType of the key iterator
ValueIteratorType of the value iterator
ReductionOpType of the reduction operator — any callable object that can be called with two elements from the value buffer and returns the result of reducing those two elements.
Parameters
keys_beginIterator pointing at the start of the input key buffer
keys_endIterator pointing at the end of the input key buffer
values_beginIterator pointing at the start of the input value buffer
output_keysIterator pointing at the start of the output key buffer
output_valuesIterator pointing at the start of the output value buffer
temp_storageTemporary storage for use by the implementation. Will be resized if too small.
reduction_opThe reduction operator.
Returns
The number of keys/values in the output buffers
template<target_system system>
template<typename Key , typename Value , typename ReductionOp >
static size_t lift::parallel< system >::reduce_by_key ( pointer< system, Key > &  keys,
pointer< system, Value > &  values,
allocation< system, Key > &  output_keys,
allocation< system, Value > &  output_values,
allocation< system, uint8 > &  temp_storage,
ReductionOp  reduction_op 
)
inlinestatic

Perform a reduction by key on a key/value buffer pair.

reduce_by_key performs per-key reduction on a key/value buffer. For each set of consecutve identical keys in [keys_begin, keys_end [, the corresponding values are reduced into a single output value by applying a user-specified reduction operator.

Note that the output buffers are resized to match the input size, but the number of elements actually output (the return value from this function) is likely going to be smaller than that. This aims to prevent reallocations that may shrink buffers which are intended to be reused many times on the same amount of input data.

Template Parameters
KeyData type for the keys
ValueData type for the values
ReductionOpType of the reduction operator — any callable object that can be called with two elements from the value buffer and returns the result of reducing those two elements.
Parameters
keysPointer to the buffer containing input keys
valuesPointer to the buffer containing input values
output_keysBuffer for output keys. Will be resized to match the size of the input buffer.
output_valuesIterator pointing at the start of the output value buffer.
temp_storageTemporary storage for use by the implementation. Will be resized if too small.
reduction_opThe reduction operator.
Returns
The number of keys/values in the output buffers
template<target_system system>
template<typename InputIterator , typename UniqueOutputIterator , typename LengthOutputIterator >
static size_t lift::parallel< system >::run_length_encode ( InputIterator  keys_input,
size_t  num_keys,
UniqueOutputIterator  unique_keys_output,
LengthOutputIterator  run_lengths_output,
allocation< system, uint8 > &  temp_storage 
)
inlinestatic

Compute a run-length encoding of the input key buffer.

run_length_encode computes the lenght of runs of identical keys in keys_input. It outputs a key/value pair for each run of consecutive, identical keys in keys_input, where the value contains the run length.

It is conceptually identical to calling reduce_by_key where all values are equal to 1.

Template Parameters
InputIteratorType of the input key iterator
UniqueOutputIteratorType of the output key iterator
LengthOutputIteratorType of the output iterator for run lengths. The underlying type for this iterator must be an integer type.
Parameters
keys_inputIterator to the start of the input buffer
num_keysNumber of keys in the input buffer
unique_keys_outputIterator to the start of the output buffer for keys. The buffer must be the same size as the input.
run_lengths_outputIterator to the start of the output buffer for run lengths. The buffer must be the same size (in elements) as the input.
Returns
The number of key/length pairs generated in the output buffers.
template<target_system system>
template<typename Key >
static void lift::parallel< system >::sort ( allocation< system, Key > &  keys,
allocation< system, Key > &  temp_keys,
allocation< system, uint8 > &  temp_storage 
)
inlinestatic

Sort a buffer of keys.

sort sorts the contents of keys into ascending order.

Template Parameters
KeyData type for the keys.
Parameters
keysPointer to the buffer containing the keys to be sorted.
temp_keysPointer to a temporary buffer to hold keys during sorting. Will be resized to match the size of keys .
template<target_system system>
template<typename Key , typename Value >
static void lift::parallel< system >::sort_by_key ( pointer< system, Key > &  keys,
pointer< system, Value > &  values,
allocation< system, Key > &  temp_keys,
allocation< system, Value > &  temp_values,
allocation< system, uint8 > &  temp_storage,
int  num_key_bits = sizeof(Key)*8 
)
inlinestatic

Perform a sort-by-key on a key + value buffer pair.

sort_by_key sorts keys and values by ascending order of keys. In other words, it permutes the contents of keys such that they are in ascending order and applies the same permutation to the contents of values.

Template Parameters
KeyData type for the keys
ValueData type for the values
Parameters
keysPointer to the buffer containing the keys to be sorted.
valuesPointer to the buffer containing the values to be sorted.
temp_keysPointer to a temporary buffer to hold keys during sorting. Will be resized to match the size of keys .
temp_valuesPointer to a temporary buffer to hold values during sorting. Will be resized to match the size of values .
temp_storageTemporary storage for use by the implementation. Will be resized if too small.
template<target_system system>
template<typename InputIterator >
static auto lift::parallel< system >::sum ( InputIterator  first,
size_t  len,
allocation< system, uint8 > &  temp_storage 
) -> typename std::iterator_traits< InputIterator >::value_type
inlinestatic

Computes the arithmetic sum of a buffer.

sum performs a reduction using the '+' operator on the input data and '0' as the initial value for the reduction. This works on any integral data type.

Template Parameters
InputIteratorType of the input data iterator.
Parameters
firstIterator to the start of the input buffer.
lenLength of the input buffer.
temp_storageTemporary storage for use by the implementation. Will be resized if too small.
template<target_system system>
static void lift::parallel< system >::synchronize ( )
inlinestatic

Synchronizes the compute device.

This function will wait for the compute device to execute all queued work and go idle. It is a no-op on the CPU and is equivalent to calling cudaDeviceSynchronize() on the GPU.


The documentation for this struct was generated from the following file: