Array distributed across many processes supporting remote-memory-access, access to process local buffer, and some linear algebra operations.
More...
Array distributed across many processes supporting remote-memory-access, access to process local buffer, and some linear algebra operations.
This class implements one-sided remote-memory-access (RMA) operations for getting or putting a copy of any section of the array, provides access to the array data local to the current process, and implements simple linear algebra operations. It also exposes synchronization of processes, and fencing to ensure RMA operations complete.
This class is designed for the following usage:
- to do simple linear algebra on whole arrays
- getting sections of the array to transform and accumulate the result into a different array
- initializing the array using put operations
All RMA operations are blocking, meaning that if putting data into the array the process returns as soon as the data is in the network buffer (not necessarily in the array) and if getting data the process returns when the data is copied into the supplied buffer. Performing synchronisation ensures that all RMA operations complete.
The LocalBuffer nested class gives access to the section of distributed array that exists on the current process. It is up to specific implementation of DistributedArray whether exclusive access to the buffer is granted.
The linear algebra operations can be collective or non-collective. In the former, all processes have to participate in the function call, for example dot which requires a collective broadcast. In case of the latter, each process only needs to operate on the local section of the array and no communication is required, for example scaling array by a constant. Collective operations are naturally synchronised.
- Warning
- The base class does not enforce any locking or exclusive access mechanism. It is up to the specific implementation to decide whether this is necessary. The only rule is that synchronisation call must complete any outstanding RMA and linear algebra operations.
Example: blocked matrix vector multiplication
auto x = Array(n, comm);
auto y = Array(n, comm);
auto A = Array(n*n, comm);
x.allocate();
x.zero();
y.zero();
x.sync();
if (rank == 0) {
x.put(lo, hi, values);
x.scatter(indices, values2);
}
initialize(A);
x.sync();
std::vector<double> result_block(bs);
std::vector<double> x_block(bs);
std::vector<double> a_block(bs*bs);
for (auto i = 0; i_col < nb ; ++i){
auto i_lo = i * bs;
auto i_hi = i_lo + bs - 1;
std::fill_n(
begin(result_block), bs, 0.);
for (auto j = 0; j < nb ; ++j){
auto j_lo = j * bs;
auto j_hi = j_lo + bs - 1;
if (NextTask()){
x.get(j_lo, j_hi, x_bloc.data());
A.get((i * nb + j) * bs, (i * nb + j) * bs + bs - 1, a_bloc.data());
matrix_vector_multiply(a_bloc, x_bloc, result_block);
}
}
y.accumulate(i_lo, i_hi, result_block.data());
}
y.sync();
auto begin(Span< T > &x)
Definition: Span.h:84
|
virtual | ~DistrArray ()=default |
|
MPI_Comm | communicator () const |
| return a copy of the communicator More...
|
|
virtual void | sync () const |
| Synchronizes all process in this group and ensures any outstanding operations on the array have completed. More...
|
|
size_t | size () const |
| total number of elements, same as overall dimension of array More...
|
|
bool | compatible (const DistrArray &other) const |
| Checks that arrays are of the same dimensionality. More...
|
|
virtual void | zero () |
| Set all local elements to zero. More...
|
|
virtual void | error (const std::string &message) const |
| stops application with an error More...
|
|
value_type | operator[] (size_t index) |
|
|
Access the section of the array local to this process
|
virtual std::unique_ptr< LocalBuffer > | local_buffer ()=0 |
| Access the buffer local to this process. More...
|
|
virtual std::unique_ptr< const LocalBuffer > | local_buffer () const =0 |
|
virtual const Distribution & | distribution () const =0 |
| Access distribution of the array among processes. More...
|
|
|
One-sided remote-memory-access operations. They are non-collective
|
virtual value_type | at (index_type ind) const =0 |
|
virtual void | set (index_type ind, value_type val)=0 |
| Set one element to a scalar. Global operation. More...
|
|
virtual void | get (index_type lo, index_type hi, value_type *buf) const =0 |
| Gets buffer[lo:hi) from global array (hi is past-the-end). Blocking. More...
|
|
virtual std::vector< value_type > | get (index_type lo, index_type hi) const =0 |
|
virtual void | put (index_type lo, index_type hi, const value_type *data)=0 |
| array[lo:hi) = data[:] (hi is past-the-end). Blocking More...
|
|
virtual void | acc (index_type lo, index_type hi, const value_type *data)=0 |
| array[lo:hi) += scaling_constant * data[:] (hi is past-the-end). Blocking More...
|
|
virtual std::vector< value_type > | gather (const std::vector< index_type > &indices) const =0 |
| gets elements with discontinuous indices from array. Blocking More...
|
|
virtual void | scatter (const std::vector< index_type > &indices, const std::vector< value_type > &data)=0 |
| array[indices[i]] = data[i] Puts vals of elements with discontinuous indices of array. Blocking. More...
|
|
virtual void | scatter_acc (std::vector< index_type > &indices, const std::vector< value_type > &data)=0 |
| array[indices[i]] += vals[i] Accumulates vals of elements into discontinuous indices of array. Atomic, blocking, with on-sided communication More...
|
|
virtual std::vector< value_type > | vec () const =0 |
| Copies the whole buffer into a vector. Blocking. More...
|
|
|
virtual void | fill (value_type val) |
|
virtual void | copy (const DistrArray &y) |
|
virtual void | copy_patch (const DistrArray &y, index_type start, index_type end) |
| Copies elements in a patch of y. If both arrays are empty than does nothing. If only one is empty, throws an error. More...
|
|
virtual void | axpy (value_type a, const DistrArray &y) |
| this[:] += a * y[:]. Throws an error if any array is empty. Add a multiple of another array to this one. Blocking, collective. More...
|
|
virtual void | axpy (value_type a, const SparseArray &y) |
|
virtual void | scal (value_type a) |
| Scale by a constant. Local. More...
|
|
virtual void | add (const DistrArray &y) |
| Add another array to this. Local. Throws error if any array is empty. More...
|
|
virtual void | add (value_type a) |
| Add a constant. Local. More...
|
|
virtual void | sub (const DistrArray &y) |
| Subtract another array from this. Local. Throws error if any array is empty. More...
|
|
virtual void | sub (value_type a) |
| Subtract a constant. Local. More...
|
|
virtual void | recip () |
| Take element-wise reciprocal of this. Local. No checks are made for zero values. More...
|
|
virtual void | times (const DistrArray &y) |
| this[i] *= y[i]. Throws error if any array is empty. More...
|
|
virtual void | times (const DistrArray &y, const DistrArray &z) |
| this[i] = y[i]*z[i]. Throws error if any array is empty. More...
|
|
|
virtual value_type | dot (const DistrArray &y) const |
| Scalar product of two arrays. Collective. Throws error if any array is empty. Both arrays should be part of the same processor group (same communicator). The result is broadcast to each process. More...
|
|
virtual value_type | dot (const SparseArray &y) const |
|
void | divide (const DistrArray &y, const DistrArray &z, value_type shift=0, bool append=false, bool negative=false) |
| this[i] = y[i]/(z[i]+shift). Collective. Throws error if any array is empty. More...
|
|
std::list< std::pair< index_type, value_type > > | min_n (int n) const |
| returns n smallest elements in array x Collective operation, must be called by all processes in the group. More...
|
|
std::list< std::pair< index_type, value_type > > | max_n (int n) const |
| returns n largest elements in array x Collective operation, must be called by all processes in the group. More...
|
|
std::list< std::pair< index_type, value_type > > | min_abs_n (int n) const |
| returns n elements that are largest by absolute value in array x Collective operation, must be called by all processes in the group. More...
|
|
std::list< std::pair< index_type, value_type > > | max_abs_n (int n) const |
| returns n elements that are largest by absolute value in array x Collective operation, must be called by all processes in the group. More...
|
|
std::vector< index_type > | min_loc_n (int n) const |
| find the index of n smallest components in array x Collective operation, must be called by all processes in the group. More...
|
|
std::map< size_t, value_type > | select_max_dot (size_t n, const DistrArray &y) const |
|
std::map< size_t, value_type > | select_max_dot (size_t n, const SparseArray &y) const |
|
std::map< size_t, value_type > | select (size_t n, bool max=false, bool ignore_sign=false) const |
|