User Tools

Site Tools


dslab:cluster_setup_guide

Before reading this guide, there are a few concepts that are assumed as prior knowledge.

  • In Unix, (almost) everything is a file. All settings and preferences are contained in various files throughout the file system. If you want to change a setting somewhere, find the appropriate file and edit it to your specifications.
  • Basic networking skills are assumed, like knowing how to make sure your computer is actually online, how to assign static IP addresses, and maintain a standard home network.
  • Knowledge of how to use vim, emacs, pico, or some other command line text editor is essential.

Install Debian

Download a copy of the Debian 4.0 (Etch) netinst.iso image file

As of when this guide was written, this image can be found at: http://www.debian.org/CD/

Burn the image file to a CD. (If you can’t figure out how to complete this step, turn back now.)

Use the CD to install the base Debian OS on all of the systems that are going to be part of the cluster.

  • You can pretty much use the defaults for anything that’s not listed below.
  • For the hostname, pick some base name then add a number to it, incrementing for each node of the cluster.
  • For the domain, choose anything but make sure it's the same on all nodes.
  • You will also be asked for a root password, an initial user’s full name, user name, and password.
    • If you know that you’re going to be installing an NIS or LDAP server for user authentication on another system, you can forgo creating a username on the current system by pressing ESC and selecting the next task in the list.
  • Don't use a network mirror.
  • Don't install any of the optional packages, including the desktop software.
  • Say yes to GRUB.

Update

Once the system restarts, log in as root and edit the /etc/apt/sources.list file.

  • This file tells apt where to check for updates. For our labs, comment out whatever is there and add the following lines:
deb http://mirror/debian etch main
deb-src http://mirror/debian etch main
deb http://mirror/security etch/updates main

Once that’s done, perform the following compound command:

machine:~$ sudo apt-get update && apt-get upgrade

Type ‘y’ whenever needed. This will update your OS and all installed packages to the newest version.

ATTENTION: Everything below this point needs to be edited… the dokuwiki syntax is a tad bit different from mediawiki syntax. Once it is corrected this document will look more readable.. some of the content is still out of date and needs updating.

=Setup Static IP=

If you are going to set up any of the other systems described in this guide, you’ll want to assign static IP addresses to each of the systems in your cluster.

To do this, edit /etc/network/interfaces file. * In the line ‘iface eth0 inet dhcp’, change ‘dhcp’ to ‘static’. Add the following lines with the correct addresses:

iface eth0 inet static	
	address 	IP to be assigned ex. 192.168.1.100
	netmask		the netmask ex. 255.255.255.0
	network		the base network IP. ex. 192.168.1.0
	broadcast	ex. 192.168.1.255
	gateway		IP of the gateway ex. 192.168.1.254
	dns-nameservers	IP of the dns server ex. 102.168.1.1
	dns-search	dns domain name, if used

* For our case here in the lab, the following values should be entered:

	netmask		255.255.255.0
	network		137.238.7.0
	broadcast	137.238.7.255
	gateway		137.238.7.254
	dns-nameservers	137.238.7.1
	dns-search	ds.geneseo.edu

Once those lines are added, you’re going to want to reboot. This is the easiest way to insure that the system will get the appropriate IP address.

=Keyless ssh Entries=

In order to ssh into each system without entering a password each time, an rsa key from the local computer needs to be placed in the proper file on the remote system. Sound complicated? It’s pretty easy, just follow along.

On your local system, run the following commands:

apt-get install ssh
ssh-keygen

* type Y when asked * ssh-keygen will give you three prompts. Leave all three blank.

Now, copy each node's public key to the same place:

for i in 0 1 2; do scp system$i:~/.ssh/id_rsa.pub ~/.ssh/id_rsa.pub.system$i; done

Put the contents of all files just made into authorized_keys:

for i in 0 1 2; do cat ~/.ssh/id_rsa.pub.system$i >> ~/.ssh/authorized_keys; done

Lastly, copy the newly created authorized_keys file onto all the nodes:

for i in 0 1 2; do scp ~/.ssh/authorized_keys system$i:~/.ssh/; done

The next time you ssh between nodes, you shouldn’t be asked for a password!

=Setup Hosts File=

This file is used for any system on your local network that you want to connect to (ssh, scp, etc) using an alias instead of using the full IP address.

Edit /etc/hosts

* Add a line at the top of the document for each system in the cluster. Here’s an example hosts file.

127.0.0.1	localhost
192.168.1.100	system0
192.168.1.101	system1
192.168.1.102	system2

# The following lines are…

* Also, be sure to remove the default line with 127.0.1.1 from the file.

'BE SURE TO': Copy this hosts file to every system in the cluster.

=Setup NIS Server and Clients= This guide was written after following this other guide. [http://lyre.mit.edu/~powell/debian-howto/nis.html]


Pick a system to use as an NIS server, that is, where you want the other systems to grab their user credentials from.

All Systems

Perform the following steps on every system (server and clients):

Install NIS using the following command:

apt-get install nis

* When installing NIS you will be asked to enter an NIS domain. Typical practice is to just copy your dns domain, but this can be anything, as long as every system that is to be in the cluster matches. This can also be changed later in the/etc/defaultdomain file.

Edit the /etc/nsswitch.conf file. * This file tells the computer where to check for user credentials. You want it to more or less match the following example:

passwd:	files nis
group:		files nis
shadow:	files nis
…  
netgroup:	nis

* 'Note': If you list ‘files’ first then ‘nis’, then it will check locally before it checks the NIS server. If you list them vice-versa, then it will check in the opposite order. This is important for troubleshooting issues with duplicate user accounts. * 'Note': As of writing this guide, setting these to ‘compat’ doesn’t currently work.

Server

Perform the following steps on the system designated as the server:

Edit the /etc/ypserv.securenets file. * Comment out the line giving access to everyone. * Add a line for each system in the cluster (including the server). Follow the example below:

…
host	192.168.1.100
host	192.168.1.101
host	192.168.1.102

Edit the /etc/default/nis file. * This file has actual fields that require specific values. For the server, match the below example:

…
NISSERVER=master
…
NISCLIENT=true
…

* 'Note:' It is required that the NIS server also be a client itself.

Edit the /etc/passwd file. * Add the following line to the bottom of the file:

+::::::

* This file also has lines that are associated to user accounts on the computer. Those lines should be moved to below the new line in order for those users to be shared.

Edit the /etc/group file. * Add the following line to the bottom of the file:

+:::

* Like the last file, move the lines associated with shared users to below this new line.

Edit the /etc/shadow file. * Add the following line to the bottom of the file:

+::::::::

* Again, move all lines associated with users you wish to share to below this new line.

Run the following command:

cd /usr/lib/yp && ./ypinit -m

* When running this command, it will ask you to edit the list of NIS servers. If there’s already an entry, ignore it. Just add the IP address of the current system to the list.

Run the following command:

cd /var/yp && make

* This command will build the NIS map files containing all user credentials to be served out. Any time there are changes made to user credentials, this has to be re-run. A simple way for that to happen is the next step.

Edit the /etc/crontab file. * Add the following line to the bottom of the file:

  • /15 * * * * root cd /var/yp && make && cd /

* This line will make the system re-make the NIS map files once every 15 minutes. This is invaluable.

The last thing you need to do to the server is restart the NIS service. * Run the following command:

/etc/init.d/nis restart
Clients

Perform the following steps on all client systems (server included):

Edit the /etc/yp.conf file. * Add the following line to the bottom:

ypserver	IP address of the NIS server ex. 192.168.1.100

* 'NOTE' For the server, using the IP address 127.0.0.1 is fine.

Edit the /etc/default/nis file. * Make sure NISCLIENT=true is set.

Run the following command:

/etc/init.d/nis restart

=Setup NFS Server and Clients=

This guide was written after following this other guide. [http://nfs.sourceforge.net/nfs-howto/]

All Systems

Run the following command on all systems (server and clients):

apt-get install nfs-common
Server

Run the following command on the server:

apt-get install nfs-kernel-server

Edit the /etc/exports file. * Add the following line to the file:

/dir		ip/mask(permission)

* dir - the directory your wish to share * ip - the base ip address of the domain you wish to share to * mask - the netmask for that domain * permission - permissions given to the person mounting the share

An example line for sharing all user home directories (if the NIS Server and NFS Server are the same system) is below:

/home		192.168.1.0/255.255.255.0(rw)

* In this example, the base IP for the domain is 192.168.1.0 and the subnet mask is 255.255.255.0. The rw means that everyone within the given subnet will have read and write privileges.

This can also be done by listing specific IP addresses on the same line, in the following fashion:

/home		192.168.1.100(rw) 192.168.1.101(ro)

Edit the /etc/hosts.deny file. * This file is only looked at after first checking the /etc/hosts.allow file for any explicitly allowed systems. For security reasons, this file is not going to be left blank. Add the following lines to the bottom of the file:

portmap:ALL
lockd:ALL
mountd:ALL
rquotad:ALL
statd:ALL

Edit the /etc/hosts.allow file. * This file is used to explicitly allow systems to mount shares. Add the following line to the bottom of the file:

ALL:ip/mask

* Here’s an example line.

ALL:192.168.1.0/255.255.255.0

* Just like the /etc/exports file, these IP addresses can be defined on an individual basis.

Reboot the system.

=Mount/Unmount an NFS Share=

To mount a share, run the following command:

mount serverIP:/shareDir /mountDir

Here’s an example mount command:

mount 192.168.1.100:/home/user /home/user

To unmount a share, run the following command:

umount /home/user

=Setup AutoFS on Clients=

Run the following command on all NFS client systems:

apt-get install autofs

Edit the /etc/auto.master file. * Add the following line to the bottom of the file.

/shareDir	/etc/auto.shareName --timeout=60

* Here’s an example line.

/home		/etc/auto.home --timeout=60

Make a new file with the name from the above inserted line. In the example case, the file would be /etc/auto.home * Add the following line to the file.

  • serverIP:/shareDir

* Here’s an example line.

  • 192.168.1.100:/home/&

Restart the autofs service with the following command:

/etc/init.d/autofs restart

=MPI Jobs=

Install MPI and Lamboot (and gcc)

Run the following command:

apt-get install lam4-dev lam-runtime lam-mpidoc gcc
Create Hosts File

Create a file anywhere with any name. Just remember where and what it is. This will be a list of the computers to be lambooted. For this guide, the file will be called 'hosts' and be stored in the current user's home directory.

Add every computer to be lambooted to the file. Here's an example:

system0
system1
system3
system4

You can also add them via their IP addresses:

192.168.1.100
192.168.1.101
192.168.1.102
192.168.1.103
Lamboot Cluster

run the following command:

lamboot -v ~/hosts

* This is an example line, using the example file we set up. Make sure to point it to where you want it. The -v is for verbose mode, meaning it will print out full details of what it's doing.

Compile and Run an MPI Program

I'm going to walk you through making, compiling, and running a simple MPI program that will have each system print out a name and number.

Write Program

Make a new file. It will be ~/hello.c for this example. Copy the contents below into the file:

/*The Parallel Hello World Program*/
#include <stdio.h>
#include <mpi.h>

main(int argc, char **argv)
{
	int node;
	
	MPI_Init(&argc,&argv);
	MPI_Comm_rank(MPI_COMM_WORLD, &node);
	  
	printf("Hello World from Node %d\n",node);
	         
	MPI_Finalize();
}

Compile Program

While in the directory of the file run the following command, substituting in the names you're using:

mpicc -o hello hello.c

Run Program

Run the following command, again making any proper substitutions:

mpirun N -npty ./hello

In this example, we've told mpirun to use four of the currently lambooted systems. This number can be changed to anything in the range of the number of systems lambooted. For example, you can run his with a two even if there are more than four systems lambooted.

Lamhalt Cluster

When you're all done, it's standard procedure to stop the lamboot. To do this, run the following command:

lamhalt -v

You don't need the -v, but it can be useful.

dslab/cluster_setup_guide.txt · Last modified: 2010/09/15 14:50 by wedge