User Tools

Site Tools


dslab:cluster_setup_guide_ydl

This guide is meant to explain how to create a cluster on computers running Yellow Dog Linux (YDL) using OpenMPI. It was created using 64-bit XServe G5s running YDL 6.2.

This guides assumes that the reader:

  • already installed YDL on each of the computers that will be in the cluster
  • has assigned each computer a static IP Address
  • is familiar with navigating the file system via command line

Keyless ssh Entries

Taken, with minor changes, from Cluster Setup Guide - Debian

In order to ssh into each system without entering a password each time, an rsa key from the local computer needs to be placed in the proper file on the remote system. Sound complicated? It’s pretty easy, just follow along.

On each system, run the following commands:

apt-get install ssh
ssh-keygen

* type Y when asked * ssh-keygen will give you three prompts. Leave all three blank.

Now, copy each node's public key to the same place:

for i in 0 1 2; do scp system$i:~/.ssh/id_rsa.pub ~/.ssh/id_rsa.pub.system$i; done

Put the contents of all files just made into authorized_keys:

for i in 0 1 2; do cat ~/.ssh/id_rsa.pub.system$i >> ~/.ssh/authorized_keys; done

Lastly, copy the newly created authorized_keys file onto all the nodes:

for i in 0 1 2; do scp ~/.ssh/authorized_keys system$i:~/.ssh/; done

The next time you ssh between nodes, you shouldn’t be asked for a password!

Setup Hosts File

Taken verbatim from Cluster Setup Guide - Debian

This file is used for any system on your local network that you want to connect to (ssh, scp, etc) using an alias instead of using the full IP address.

Edit /etc/hosts

* Add a line at the top of the document for each system in the cluster. Here’s an example hosts file.

127.0.0.1	localhost
192.168.1.100	system0
192.168.1.101	system1
192.168.1.102	system2

# The following lines are…

* Also, be sure to remove the default line with 127.0.1.1 from the file.

'BE SURE TO': Copy this hosts file to every system in the cluster.

Installing OpenMPI

First, the GNU Compiler Collection, or gcc, needs to be installed

yum install gcc

On 64 bit machines, a header file indirectly referenced by stdio.h is missing, leading to a compile error. To fix this:

yum install glibc-devel.ppc64

One or two of these three packages may be unnecessary, but it's better to be safe:

yum install openmpi.ppc64
yum install openmpi-devel.ppc64
yum install openmpi

Now MPI is installed, but it is inconvenient to have to type things like /usr/lib64/openmpi/1.2.5-gcc/bin/mpirun ro access it. The solution is to add /usr/lib64/openmpi/1.2.5-gcc/bin/ to your shell's $PATH variable when you login. If the path to this folder is different on your computer, adjust accordingly. Here's how it's done.

for root user

Edit the text file /root/.bash_profile. Add a line that reads

PATH=$PATH:/usr/lib64/openmpi/1.2.5-gcc/bin/ 

before the line that reads “export PATH”. Save the file.

for all other users

Follow the same steps as for the root user, except edit /etc/profile instead.

MPI should at this point work, however all the processes will run on the local computer. Still, it may be helpful to make sure that MPI works at this point.

Write Program

Taken verbatim from Cluster Setup Guide - Debian

Make a new file. It will be ~/hello.c for this example. Copy the contents below into the file:

/*The Parallel Hello World Program*/
#include <stdio.h>
#include <mpi.h>

main(int argc, char **argv)
{
	int node;
	
	MPI_Init(&argc,&argv);
	MPI_Comm_rank(MPI_COMM_WORLD, &node);
	  
	printf("Hello World from Node %d\n",node);
	         
	MPI_Finalize();
}

Compile Program

MPI Programs written in C should be compiled with mpicc. The arguments passed to it are much like those of gcc. So, to compile the hello.c example:

mpicc -o hello hello.c

Run Program

To run an MPI program, type the following:

mpirun -np N ./hello

Where N is the number of processes and hello is the executable file.

Configuring OpenMPI to run jobs across the cluster

If you encounter problems running the program, make sure that there isn't a firewall problem.

/etc/init.d/iptables stop

If you wish to turn prevent this from running at start up, you can run the following command:

chkconfig iptables off

And to be safe:

chkconfig ip6tables off

Configuring NFS (optional)

If you wish to avoid having to copy executables to each node prior to every run and ensure that everything is synced, you can configure the master node as an NFS Server and the other nodes as clients.

I successfully accomplished this by using this guide with the following changes:

  • In place of the instructions in Sections 3.3.2 (Starting the Portmapper) and 3.3.3 (The Daemons), I ran:
chkconfig nfs on
  • The “rpcinfo quota” command is “rpcinfo -p” on YDL
dslab/cluster_setup_guide_ydl.txt · Last modified: 2011/07/21 13:51 by srk3