Contact: Victor Eijkhout
Res:
Terminology
- Socket: the processor chip
- Processor: we don’t use that word
- Core: one instruction-stream processing unit
- Process: preferred terminology in talking about MPI.
- SPMD model:
simple program multiple data
MPI
prototype
: declarnation
Collectives
- Gathering
- reduction: reduce, n to 1
MPI_Reduce
- gather: collect, subset to one set
- Spreading
- broadcast: identical, 1 to n
- scatter: subsetting
MPI_Scatterv
int MPI_Reduce(
void* send_data, # use buffer, &x
void* recv_data,
int count, # size of x
MPI_Datatype datatype,
MPI_Op op,
int root, # not needed for MPI_Allreduce
MPI_Comm communicator)
int MPI_Gather(
void *sendbuf, int sendcnt, MPI_Datatype sendtype,
void *recvbuf, int recvcnt, MPI_Datatype recvtype,
int root, MPI_Comm comm)
int MPI_Scatter
(void* sendbuf, int sendcount, MPI_Datatype sendtype,
void* recvbuf, int recvcount, MPI_Datatype recvtype,
int root, MPI_Comm comm)
- MPI_MAX - Returns the maximum element.
- MPI_MIN - Returns the minimum element.
- MPI_SUM - Sums the elements.
- MPI_PROD - Multiplies all elements.
- MPI_LAND - Performs a logical and across the elements.
- MPI_LOR - Performs a logical or across the elements.
- MPI_BAND - Performs a bitwise and across the bits of the elements.
- MPI_BOR - Performs a bitwise or across the bits of the elements.
- MPI_MAXLOC - Returns the maximum value and the rank of the process that owns it.
- MPI_MINLOC - Returns the minimum value and the rank of the process that owns it.
buffer
void pointer
: memory address of the data
C
write &x or (void*) &x for scalar
Python
comm.recv, slow but python object; comm.Recv, fast
Scan or 'parallel prefix'
Barrier: Synchronize procs
- almost never needed
- only for timing
Naive realization
- root to all $\alpha + \beta*n$
- better, p2p
Distributed data
- local_index + rank
- global_index
Load balancing
$f(i) = \floor(iN/p)$ given proc $i$: $[f(i),f(i+1)]$
Local info. exchange
Matrices in parallel: distribute domain, not the matrix
p2p: ping-pong * two side: A & B * match: send & recv
int MPI_Send(
const void* buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm)
Semantics:
IN buf: initial address of send buffer (choice)
IN count: number of elements in send buffer (non-negative integer)
IN datatype: datatype of each send buffer element (handle)
IN dest: rank of destination (integer)
IN tag: message tag (integer)
IN comm: communicator (handle)
communication across nodes is x100 slower than within nodes
Blocking
send & recv are blocking operations
- deadlock
receive
send
- might work
send
receive
Bypass blocking
- odds & evens: exchange sort, compare-and-swap
- pairwise exchange
Non-blocking
isend, irecv
need
MPI_Wait
as blocking to refresh the buffer; buffer is safe in blocking comm.Latency hiding
: don't need to hold until comm is doneTEST
: non-blacking wait, do local work if no incoming msg by test
More
- MPI_Bsend, MPI_Ibsend: buffered send
- MPI_Ssend, MPI_Issend: synchronous send
- MPI_Rsend, MPI_Irsend: ready send
- One-sided communication: ‘just’ put/get the data somewhere
- Derived data types: send strided/irregular/inhomogeneous data
- Sub-communicators: work with subsets of MPI_COMM_WORLD
- I/O: efficient file operations
- Non-blocking collectives
Comments !