Here is a MPI rank file generator script written in bash for use on Blue Biou. The idea here is to assign tasks (individual processes in an MPI program) intelligently on a Power7 machine with four eight-core, quad-thread processors.
If you run <= 32 tasks per node, it will skip every four threads ensuring you run one task per core.
If you run <= 64 tasks per node, it will assign threads 1 and 2 of each core, skipping 3 and 4.
The script takes as input the filename of a PBS generated node file. It is safe to pass the environment variable $PBS_NODEFILE if using this script within a job submission script.
#!/bin/bash
# Given a PBS nodes file, generate a rank file for openmpi
#
# Chandler Wilkerson
# Fri Mar 12 2010
function usage() {
echo "Usage: $0 PBS_NODEFILE"
echo
exit 1
}
test $# -lt 1 && usage
test -s $1 || usage
NODES=$(uniq $1)
NNODES=$(uniq $1 | wc -l)
NTASKS=$(cat $1 | wc -l)
PPN=$(($NTASKS / $NNODES))
FIRSTNODE=$(head -n 1 $1)
# Sanity checks
#
if [ $NTASKS -ne $(($PPN * $NNODES)) ]; then
echo "Number of tasks per node is not even?"
exit 1
fi
# This script needs to run on the first node.
if [ $HOSTNAME != $FIRSTNODE ]; then
echo "This script must be run on the first node"
echo " (If this is the case, this script is getting confused)"
exit 1
fi
# Make it so:
if [ $PPN -le "32" ]; then
stride=4
threads=0
nthreads=1
elif [ $PPN -le "64" ]; then
stride=4
threads="0 1"
nthreads=2
elif [ $PPN -le "96" ]; then
stride=4
threads="0 1 2"
nthreads=3
else
stride=1
threads=0
nthreads=1
fi
CURRTASK=0
for node in $NODES; do
for task in $(seq 0 $(($PPN/$nthreads - 1)) ); do
for thread in $threads; do
CURRSLOT=$(($task*$stride + $thread))
echo "rank ${CURRTASK}=$node slot=$CURRSLOT"
CURRTASK=$(($CURRTASK+1))
done
done
done
Here’s an example of how to use the script:
PATH_TO_SCRIPT/build_rankfile.sh $PBS_NODEFILE > $TMPDIR/rankfile.txt
...
mpiexec -rf $TMPDIR/rankfile.txt ...