|
Lift
Library of parallel computing primitives for GPUs and multi-core CPUs
|
Dispatch structure for parallel primitives. More...
#include <parallel.h>
Static Public Member Functions | |
| template<typename InputIterator , typename UnaryFunction > | |
| static void | for_each (InputIterator begin, InputIterator end, UnaryFunction f, int2 launch_parameters={0, 0}) |
| Parallel for-each implementation. More... | |
| template<typename T , typename UnaryFunction > | |
| static void | for_each (pointer< system, T > &vector, UnaryFunction f, int2 launch_parameters={0, 0}) |
| Pointer version of for-each. More... | |
| template<typename UnaryFunction > | |
| static void | for_each (uint2 range, UnaryFunction f, int2 launch_parameters={0, 0}) |
| Range-based version of for-each. More... | |
| template<typename UnaryFunction > | |
| static void | for_each (uint32 end, UnaryFunction f, int2 launch_parameters={0, 0}) |
| 0-based range version of for-each. More... | |
| template<typename InputIterator , typename OutputIterator , typename Predicate > | |
| static void | inclusive_scan (InputIterator first, size_t len, OutputIterator result, Predicate op) |
Performs a parallel inclusive scan on [first, first + len[, using op as the scan operator. More... | |
| template<typename InputIterator , typename OutputIterator , typename Predicate > | |
| static size_t | copy_if (InputIterator first, size_t len, OutputIterator result, Predicate op, allocation< system, uint8 > &temp_storage) |
| Performs stream compaction based on a predicate. More... | |
| template<typename InputIterator , typename FlagIterator , typename OutputIterator > | |
| static size_t | copy_flagged (InputIterator first, size_t len, OutputIterator result, FlagIterator flags, allocation< system, uint8 > &temp_storage) |
| Performs stream compaction based on a buffer with boolean flags. More... | |
| template<typename InputIterator > | |
| static auto | sum (InputIterator first, size_t len, allocation< system, uint8 > &temp_storage) -> typename std::iterator_traits< InputIterator >::value_type |
| Computes the arithmetic sum of a buffer. More... | |
| template<typename Key , typename Value > | |
| static void | sort_by_key (pointer< system, Key > &keys, pointer< system, Value > &values, allocation< system, Key > &temp_keys, allocation< system, Value > &temp_values, allocation< system, uint8 > &temp_storage, int num_key_bits=sizeof(Key)*8) |
| Perform a sort-by-key on a key + value buffer pair. More... | |
| template<typename Key > | |
| static void | sort (allocation< system, Key > &keys, allocation< system, Key > &temp_keys, allocation< system, uint8 > &temp_storage) |
| Sort a buffer of keys. More... | |
| template<typename KeyIterator , typename ValueIterator , typename ReductionOp > | |
| static size_t | reduce_by_key (KeyIterator keys_begin, KeyIterator keys_end, ValueIterator values_begin, KeyIterator output_keys, ValueIterator output_values, allocation< system, uint8 > &temp_storage, ReductionOp reduction_op) |
| Perform a reduction by key on a key/value buffer pair. More... | |
| template<typename Key , typename Value , typename ReductionOp > | |
| static size_t | reduce_by_key (pointer< system, Key > &keys, pointer< system, Value > &values, allocation< system, Key > &output_keys, allocation< system, Value > &output_values, allocation< system, uint8 > &temp_storage, ReductionOp reduction_op) |
| Perform a reduction by key on a key/value buffer pair. More... | |
| template<typename InputIterator , typename UniqueOutputIterator , typename LengthOutputIterator > | |
| static size_t | run_length_encode (InputIterator keys_input, size_t num_keys, UniqueOutputIterator unique_keys_output, LengthOutputIterator run_lengths_output, allocation< system, uint8 > &temp_storage) |
| Compute a run-length encoding of the input key buffer. More... | |
| static void | synchronize () |
| Synchronizes the compute device. More... | |
| static void | check_errors (void) |
| Check for compute device errors. More... | |
Dispatch structure for parallel primitives.
Implementations are exposed in specializations of this structure for each target system.
Definition at line 46 of file parallel.h.
|
inlinestatic |
Check for compute device errors.
This function will check for any pending errors on the compute device. It is a no-op on the CPU. On the GPU, it will poll the runtime for a pending error condition; if found, it will be printed to the console and the calling process will be terminated.
|
inlinestatic |
Performs stream compaction based on a buffer with boolean flags.
copy_flagged copies every element in the input buffer for which the corresponding index in the flags buffer is true.
| InputIterator | Type of the input data iterator. |
| FlagIterator | Type of the input flags iterator. |
| OutputIterator | Type of the output data iterator. |
| first | Iterator to the start of the input buffer. |
| len | Number of elements in the input buffer. |
| result | Iterator to the start of the output buffer. |
| flags | Determines whether each element is copied. For a given index, the element data at that index in the input buffer is copied if and only if the flag at the same index evaluates to true. |
| temp_storage | Temporary storage for use by the implementation. Will be resized if too small. |
|
inlinestatic |
Performs stream compaction based on a predicate.
copy_if copies every element in the input buffer for which Predicate evaluates to true into the output buffer.
| InputIterator | Type of the input data iterator. |
| OutputIterator | Type of the output data iterator. |
| first | Iterator to the start of the input buffer. |
| len | Number of elements in the input buffer. |
| result | Iterator to the start of the output buffer. |
| op | Determines whether each element is copied. Called with a single argument (a copy of or reference to the input element), returns a boolean value that determines whether each element is copied. |
| temp_storage | Temporary storage for use by the implementation. Will be resized if too small. |
|
inlinestatic |
Parallel for-each implementation.
Applies UnaryFunction f to each element in the range [begin, end[.
| InputIterator | Iterator type for input data. |
| UnaryFunction | Any callable type taking a single argument. The argument can either be a value of the same type obtained by dereferencing InputIterator, or else a (const or modifiable) reference to that type. The return value is ignored. |
| begin | Iterator pointing at the first element to be processed. |
| end | Iterator pointing at the end of the range to be processed. |
| f | The function object to be applied to each item in the input. |
| launch_parameters | Grid launch parameters for GPU backend. |
|
inlinestatic |
Pointer version of for-each.
Applies UnaryFunction f to each element behind vector.
| T | The underlying data type for vector. |
| UnaryFunction | Same as in for_each |
| vector | The memory region to apply f to |
| f | UnaryFunction to apply to each element in vector |
| launch_parameters | Same as in for_each |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
Performs a parallel inclusive scan on [first, first + len[, using op as the scan operator.
inclusive_scan performs an inclusive prefix-scan on the input buffer. The output is a buffer of the same size, with each element replaced by the result of applying op to every element from the start of the array up to and inclusing the element itself.
| InputIterator | Type of the input data iterator. |
| OutputIterator | Type of the output data iterator. |
| Predicate |
| first | Start of the range to perform the scan over. |
| len | Number of data elements to scan over. |
| result | Iterator to the start of the output buffer. Must not be the same as the input buffer. |
| op | The binary operator to be used as the scan operator. Takes two parameters of Input type, returns the scan value corresponding to the input pair. |
|
inlinestatic |
Perform a reduction by key on a key/value buffer pair.
reduce_by_key performs per-key reduction on a key/value buffer. For each set of consecutve identical keys in [keys_begin, keys_end [, the corresponding values are reduced into a single output value by applying a user-specified reduction operator.
| KeyIterator | Type of the key iterator |
| ValueIterator | Type of the value iterator |
| ReductionOp | Type of the reduction operator — any callable object that can be called with two elements from the value buffer and returns the result of reducing those two elements. |
| keys_begin | Iterator pointing at the start of the input key buffer |
| keys_end | Iterator pointing at the end of the input key buffer |
| values_begin | Iterator pointing at the start of the input value buffer |
| output_keys | Iterator pointing at the start of the output key buffer |
| output_values | Iterator pointing at the start of the output value buffer |
| temp_storage | Temporary storage for use by the implementation. Will be resized if too small. |
| reduction_op | The reduction operator. |
|
inlinestatic |
Perform a reduction by key on a key/value buffer pair.
reduce_by_key performs per-key reduction on a key/value buffer. For each set of consecutve identical keys in [keys_begin, keys_end [, the corresponding values are reduced into a single output value by applying a user-specified reduction operator.
Note that the output buffers are resized to match the input size, but the number of elements actually output (the return value from this function) is likely going to be smaller than that. This aims to prevent reallocations that may shrink buffers which are intended to be reused many times on the same amount of input data.
| Key | Data type for the keys |
| Value | Data type for the values |
| ReductionOp | Type of the reduction operator — any callable object that can be called with two elements from the value buffer and returns the result of reducing those two elements. |
| keys | Pointer to the buffer containing input keys |
| values | Pointer to the buffer containing input values |
| output_keys | Buffer for output keys. Will be resized to match the size of the input buffer. |
| output_values | Iterator pointing at the start of the output value buffer. |
| temp_storage | Temporary storage for use by the implementation. Will be resized if too small. |
| reduction_op | The reduction operator. |
|
inlinestatic |
Compute a run-length encoding of the input key buffer.
run_length_encode computes the lenght of runs of identical keys in keys_input. It outputs a key/value pair for each run of consecutive, identical keys in keys_input, where the value contains the run length.
It is conceptually identical to calling reduce_by_key where all values are equal to 1.
| InputIterator | Type of the input key iterator |
| UniqueOutputIterator | Type of the output key iterator |
| LengthOutputIterator | Type of the output iterator for run lengths. The underlying type for this iterator must be an integer type. |
| keys_input | Iterator to the start of the input buffer |
| num_keys | Number of keys in the input buffer |
| unique_keys_output | Iterator to the start of the output buffer for keys. The buffer must be the same size as the input. |
| run_lengths_output | Iterator to the start of the output buffer for run lengths. The buffer must be the same size (in elements) as the input. |
|
inlinestatic |
Sort a buffer of keys.
sort sorts the contents of keys into ascending order.
| Key | Data type for the keys. |
| keys | Pointer to the buffer containing the keys to be sorted. |
| temp_keys | Pointer to a temporary buffer to hold keys during sorting. Will be resized to match the size of keys . |
|
inlinestatic |
Perform a sort-by-key on a key + value buffer pair.
sort_by_key sorts keys and values by ascending order of keys. In other words, it permutes the contents of keys such that they are in ascending order and applies the same permutation to the contents of values.
| Key | Data type for the keys |
| Value | Data type for the values |
| keys | Pointer to the buffer containing the keys to be sorted. |
| values | Pointer to the buffer containing the values to be sorted. |
| temp_keys | Pointer to a temporary buffer to hold keys during sorting. Will be resized to match the size of keys . |
| temp_values | Pointer to a temporary buffer to hold values during sorting. Will be resized to match the size of values . |
| temp_storage | Temporary storage for use by the implementation. Will be resized if too small. |
|
inlinestatic |
Computes the arithmetic sum of a buffer.
sum performs a reduction using the '+' operator on the input data and '0' as the initial value for the reduction. This works on any integral data type.
| InputIterator | Type of the input data iterator. |
| first | Iterator to the start of the input buffer. |
| len | Length of the input buffer. |
| temp_storage | Temporary storage for use by the implementation. Will be resized if too small. |
|
inlinestatic |
Synchronizes the compute device.
This function will wait for the compute device to execute all queued work and go idle. It is a no-op on the CPU and is equivalent to calling cudaDeviceSynchronize() on the GPU.
1.8.6