Before reading this guide, there are a few concepts that are assumed as prior knowledge.
Download a copy of the Debian 4.0 (Etch) netinst.iso image file
As of when this guide was written, this image can be found at: http://www.debian.org/CD/
Burn the image file to a CD. (If you can’t figure out how to complete this step, turn back now.)
Use the CD to install the base Debian OS on all of the systems that are going to be part of the cluster.
Once the system restarts, log in as root and edit the /etc/apt/sources.list file.
deb http://mirror/debian etch main deb-src http://mirror/debian etch main deb http://mirror/security etch/updates main
Once that’s done, perform the following compound command:
machine:~$ sudo apt-get update && apt-get upgrade
Type ‘y’ whenever needed. This will update your OS and all installed packages to the newest version.
ATTENTION: Everything below this point needs to be edited… the dokuwiki syntax is a tad bit different from mediawiki syntax. Once it is corrected this document will look more readable.. some of the content is still out of date and needs updating.
=Setup Static IP=
If you are going to set up any of the other systems described in this guide, you’ll want to assign static IP addresses to each of the systems in your cluster.
To do this, edit /etc/network/interfaces file. * In the line ‘iface eth0 inet dhcp’, change ‘dhcp’ to ‘static’. Add the following lines with the correct addresses:
iface eth0 inet static address IP to be assigned ex. 192.168.1.100 netmask the netmask ex. 255.255.255.0 network the base network IP. ex. 192.168.1.0 broadcast ex. 192.168.1.255 gateway IP of the gateway ex. 192.168.1.254 dns-nameservers IP of the dns server ex. 102.168.1.1 dns-search dns domain name, if used
* For our case here in the lab, the following values should be entered:
netmask 255.255.255.0 network 137.238.7.0 broadcast 137.238.7.255 gateway 137.238.7.254 dns-nameservers 137.238.7.1 dns-search ds.geneseo.edu
Once those lines are added, you’re going to want to reboot
. This is the easiest way to insure that the system will get the appropriate IP address.
=Keyless ssh Entries=
In order to ssh into each system without entering a password each time, an rsa key from the local computer needs to be placed in the proper file on the remote system. Sound complicated? It’s pretty easy, just follow along.
On your local system, run the following commands:
apt-get install ssh ssh-keygen
* type Y when asked * ssh-keygen will give you three prompts. Leave all three blank.
Now, copy each node's public key to the same place:
for i in 0 1 2; do scp system$i:~/.ssh/id_rsa.pub ~/.ssh/id_rsa.pub.system$i; done
Put the contents of all files just made into authorized_keys:
for i in 0 1 2; do cat ~/.ssh/id_rsa.pub.system$i >> ~/.ssh/authorized_keys; done
Lastly, copy the newly created authorized_keys file onto all the nodes:
for i in 0 1 2; do scp ~/.ssh/authorized_keys system$i:~/.ssh/; done
The next time you ssh between nodes, you shouldn’t be asked for a password!
=Setup Hosts File=
This file is used for any system on your local network that you want to connect to (ssh, scp, etc) using an alias instead of using the full IP address.
Edit /etc/hosts
* Add a line at the top of the document for each system in the cluster. Here’s an example hosts file.
127.0.0.1 localhost 192.168.1.100 system0 192.168.1.101 system1 192.168.1.102 system2 # The following lines are…
* Also, be sure to remove the default line with 127.0.1.1
from the file.
'BE SURE TO
': Copy this hosts file to every system in the cluster.
=Setup NIS Server and Clients= This guide was written after following this other guide. [http://lyre.mit.edu/~powell/debian-howto/nis.html]
Pick a system to use as an NIS server, that is, where you want the other systems to grab their user credentials from.
Perform the following steps on every system (server and clients):
Install NIS using the following command:
apt-get install nis
* When installing NIS you will be asked to enter an NIS domain. Typical practice is to just copy your dns domain, but this can be anything, as long as every system that is to be in the cluster matches. This can also be changed later in the/etc/defaultdomain file.
Edit the /etc/nsswitch.conf file. * This file tells the computer where to check for user credentials. You want it to more or less match the following example:
passwd: files nis group: files nis shadow: files nis … netgroup: nis
* 'Note
': If you list ‘files’ first then ‘nis’, then it will check locally before it checks the NIS server. If you list them vice-versa, then it will check in the opposite order. This is important for troubleshooting issues with duplicate user accounts.
* 'Note
': As of writing this guide, setting these to ‘compat’ doesn’t currently work.
Perform the following steps on the system designated as the server:
Edit the /etc/ypserv.securenets file. * Comment out the line giving access to everyone. * Add a line for each system in the cluster (including the server). Follow the example below:
… host 192.168.1.100 host 192.168.1.101 host 192.168.1.102
Edit the /etc/default/nis file. * This file has actual fields that require specific values. For the server, match the below example:
… NISSERVER=master … NISCLIENT=true …
* 'Note:
' It is required that the NIS server also be a client itself.
Edit the /etc/passwd file. * Add the following line to the bottom of the file:
+::::::
* This file also has lines that are associated to user accounts on the computer. Those lines should be moved to below the new line in order for those users to be shared.
Edit the /etc/group file. * Add the following line to the bottom of the file:
+:::
* Like the last file, move the lines associated with shared users to below this new line.
Edit the /etc/shadow file. * Add the following line to the bottom of the file:
+::::::::
* Again, move all lines associated with users you wish to share to below this new line.
Run the following command:
cd /usr/lib/yp && ./ypinit -m
* When running this command, it will ask you to edit the list of NIS servers. If there’s already an entry, ignore it. Just add the IP address of the current system to the list.
Run the following command:
cd /var/yp && make
* This command will build the NIS map files containing all user credentials to be served out. Any time there are changes made to user credentials, this has to be re-run. A simple way for that to happen is the next step.
Edit the /etc/crontab file. * Add the following line to the bottom of the file:
* This line will make the system re-make the NIS map files once every 15 minutes. This is invaluable.
The last thing you need to do to the server is restart the NIS service. * Run the following command:
/etc/init.d/nis restart
Perform the following steps on all client systems (server included):
Edit the /etc/yp.conf file. * Add the following line to the bottom:
ypserver IP address of the NIS server ex. 192.168.1.100
* 'NOTE
' For the server, using the IP address 127.0.0.1 is fine.
Edit the /etc/default/nis file. * Make sure NISCLIENT=true is set.
Run the following command:
/etc/init.d/nis restart
=Setup NFS Server and Clients=
This guide was written after following this other guide. [http://nfs.sourceforge.net/nfs-howto/]
Run the following command on all systems (server and clients):
apt-get install nfs-common
Run the following command on the server:
apt-get install nfs-kernel-server
Edit the /etc/exports file. * Add the following line to the file:
/dir ip/mask(permission)
* dir - the directory your wish to share * ip - the base ip address of the domain you wish to share to * mask - the netmask for that domain * permission - permissions given to the person mounting the share
An example line for sharing all user home directories (if the NIS Server and NFS Server are the same system) is below:
/home 192.168.1.0/255.255.255.0(rw)
* In this example, the base IP for the domain is 192.168.1.0 and the subnet mask is 255.255.255.0. The rw means that everyone within the given subnet will have read and write privileges.
This can also be done by listing specific IP addresses on the same line, in the following fashion:
/home 192.168.1.100(rw) 192.168.1.101(ro)
Edit the /etc/hosts.deny file. * This file is only looked at after first checking the /etc/hosts.allow file for any explicitly allowed systems. For security reasons, this file is not going to be left blank. Add the following lines to the bottom of the file:
portmap:ALL lockd:ALL mountd:ALL rquotad:ALL statd:ALL
Edit the /etc/hosts.allow file. * This file is used to explicitly allow systems to mount shares. Add the following line to the bottom of the file:
ALL:ip/mask
* Here’s an example line.
ALL:192.168.1.0/255.255.255.0
* Just like the /etc/exports file, these IP addresses can be defined on an individual basis.
Reboot
the system.
=Mount/Unmount an NFS Share=
To mount a share, run the following command:
mount serverIP:/shareDir /mountDir
Here’s an example mount command:
mount 192.168.1.100:/home/user /home/user
To unmount a share, run the following command:
umount /home/user
=Setup AutoFS on Clients=
Run the following command on all NFS client systems:
apt-get install autofs
Edit the /etc/auto.master file. * Add the following line to the bottom of the file.
/shareDir /etc/auto.shareName --timeout=60
* Here’s an example line.
/home /etc/auto.home --timeout=60
Make a new file with the name from the above inserted line. In the example case, the file would be /etc/auto.home * Add the following line to the file.
* Here’s an example line.
Restart the autofs service with the following command:
/etc/init.d/autofs restart
=MPI Jobs=
Run the following command:
apt-get install lam4-dev lam-runtime lam-mpidoc gcc
Create a file anywhere with any name. Just remember where and what it is. This will be a list of the computers to be lambooted. For this guide, the file will be called 'hosts' and be stored in the current user's home directory.
Add every computer to be lambooted to the file. Here's an example:
system0 system1 system3 system4
You can also add them via their IP addresses:
192.168.1.100 192.168.1.101 192.168.1.102 192.168.1.103
run the following command:
lamboot -v ~/hosts
* This is an example line, using the example file we set up. Make sure to point it to where you want it. The -v is for verbose mode, meaning it will print out full details of what it's doing.
I'm going to walk you through making, compiling, and running a simple MPI program that will have each system print out a name and number.
Make a new file. It will be ~/hello.c for this example. Copy the contents below into the file:
/*The Parallel Hello World Program*/ #include <stdio.h> #include <mpi.h> main(int argc, char **argv) { int node; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &node); printf("Hello World from Node %d\n",node); MPI_Finalize(); }
While in the directory of the file run the following command, substituting in the names you're using:
mpicc -o hello hello.c
Run the following command, again making any proper substitutions:
mpirun N -npty ./hello
In this example, we've told mpirun to use four of the currently lambooted systems. This number can be changed to anything in the range of the number of systems lambooted. For example, you can run his with a two even if there are more than four systems lambooted.
When you're all done, it's standard procedure to stop the lamboot. To do this, run the following command:
lamhalt -v
You don't need the -v, but it can be useful.