Here is a MPI rank file generator script written in bash for use on Blue Biou. The idea here is to assign tasks (individual processes in an MPI program) intelligently on a Power7 machine with four eight-core, quad-thread processors.
If you run <= 32 tasks per node, it will skip every four threads ensuring you run one task per core.
If you run <= 64 tasks per node, it will assign threads 1 and 2 of each core, skipping 3 and 4.
The script takes as input the filename of a PBS generated node file. It is safe to pass the environment variable $PBS_NODEFILE if using this script within a job submission script.
#!/bin/bash # Given a PBS nodes file, generate a rank file for openmpi # # Chandler Wilkerson # Fri Mar 12 2010 function usage() { echo "Usage: $0 PBS_NODEFILE" echo exit 1 } test $# -lt 1 && usage test -s $1 || usage NODES=$(uniq $1) NNODES=$(uniq $1 | wc -l) NTASKS=$(cat $1 | wc -l) PPN=$(($NTASKS / $NNODES)) FIRSTNODE=$(head -n 1 $1) # Sanity checks # if [ $NTASKS -ne $(($PPN * $NNODES)) ]; then echo "Number of tasks per node is not even?" exit 1 fi # This script needs to run on the first node. if [ $HOSTNAME != $FIRSTNODE ]; then echo "This script must be run on the first node" echo " (If this is the case, this script is getting confused)" exit 1 fi # Make it so: if [ $PPN -le "32" ]; then stride=4 threads=0 nthreads=1 elif [ $PPN -le "64" ]; then stride=4 threads="0 1" nthreads=2 elif [ $PPN -le "96" ]; then stride=4 threads="0 1 2" nthreads=3 else stride=1 threads=0 nthreads=1 fi CURRTASK=0 for node in $NODES; do for task in $(seq 0 $(($PPN/$nthreads - 1)) ); do for thread in $threads; do CURRSLOT=$(($task*$stride + $thread)) echo "rank ${CURRTASK}=$node slot=$CURRSLOT" CURRTASK=$(($CURRTASK+1)) done done done
Here’s an example of how to use the script:
PATH_TO_SCRIPT/build_rankfile.sh $PBS_NODEFILE > $TMPDIR/rankfile.txt ... mpiexec -rf $TMPDIR/rankfile.txt ...