Before reading this guide, there are a few concepts that are assumed as prior knowledge. * In Unix, (almost) everything is a file. All settings and preferences are contained in various files throughout the file system. If you want to change a setting somewhere, find the appropriate file and edit it to your specifications. * Basic networking skills are assumed, like knowing how to make sure your computer is actually online, how to assign static IP addresses, and maintain a standard home network. * Knowledge of how to use vim, emacs, pico, or some other command line text editor is essential. =====Install Debian===== Download a copy of the Debian 4.0 (Etch) netinst.iso image file As of when this guide was written, this image can be found at: http://www.debian.org/CD/ Burn the image file to a CD. (If you can’t figure out how to complete this step, turn back now.) Use the CD to install the base Debian OS on all of the systems that are going to be part of the cluster. * You can pretty much use the defaults for anything that’s not listed below. * For the hostname, pick some base name then add a number to it, incrementing for each node of the cluster. * For the domain, choose anything but make sure it's the same on all nodes. * You will also be asked for a root password, an initial user’s full name, user name, and password. * If you know that you’re going to be installing an NIS or LDAP server for user authentication on another system, you can forgo creating a username on the current system by pressing ESC and selecting the next task in the list. * Don't use a network mirror. * Don't install any of the optional packages, including the desktop software. * Say yes to GRUB. =====Update===== Once the system restarts, log in as root and edit the /etc/apt/sources.list file. * This file tells apt where to check for updates. For our labs, comment out whatever is there and add the following lines: deb http://mirror/debian etch main deb-src http://mirror/debian etch main deb http://mirror/security etch/updates main Once that’s done, perform the following compound command: machine:~$ sudo apt-get update && apt-get upgrade Type ‘y’ whenever needed. This will update your OS and all installed packages to the newest version. ATTENTION: Everything below this point needs to be edited... the dokuwiki syntax is a tad bit different from mediawiki syntax. Once it is corrected this document will look more readable.. some of the content is still out of date and needs updating. =Setup Static IP= If you are going to set up any of the other systems described in this guide, you’ll want to assign static IP addresses to each of the systems in your cluster. To do this, edit /etc/network/interfaces file. * In the line ‘iface eth0 inet dhcp’, change ‘dhcp’ to ‘static’. Add the following lines with the correct addresses: iface eth0 inet static address IP to be assigned ex. 192.168.1.100 netmask the netmask ex. 255.255.255.0 network the base network IP. ex. 192.168.1.0 broadcast ex. 192.168.1.255 gateway IP of the gateway ex. 192.168.1.254 dns-nameservers IP of the dns server ex. 102.168.1.1 dns-search dns domain name, if used * For our case here in the lab, the following values should be entered: netmask 255.255.255.0 network 137.238.7.0 broadcast 137.238.7.255 gateway 137.238.7.254 dns-nameservers 137.238.7.1 dns-search ds.geneseo.edu Once those lines are added, you’re going to want to ''reboot''. This is the easiest way to insure that the system will get the appropriate IP address. =Keyless ssh Entries= In order to ssh into each system without entering a password each time, an rsa key from the local computer needs to be placed in the proper file on the remote system. Sound complicated? It’s pretty easy, just follow along. On your local system, run the following commands: apt-get install ssh ssh-keygen * type Y when asked * ssh-keygen will give you three prompts. Leave all three blank. Now, copy each node's public key to the same place: for i in 0 1 2; do scp system$i:~/.ssh/id_rsa.pub ~/.ssh/id_rsa.pub.system$i; done Put the contents of all files just made into authorized_keys: for i in 0 1 2; do cat ~/.ssh/id_rsa.pub.system$i >> ~/.ssh/authorized_keys; done Lastly, copy the newly created authorized_keys file onto all the nodes: for i in 0 1 2; do scp ~/.ssh/authorized_keys system$i:~/.ssh/; done The next time you ssh between nodes, you shouldn’t be asked for a password! =Setup Hosts File= This file is used for any system on your local network that you want to connect to (ssh, scp, etc) using an alias instead of using the full IP address. Edit /etc/hosts * Add a line at the top of the document for each system in the cluster. Here’s an example hosts file. 127.0.0.1 localhost 192.168.1.100 system0 192.168.1.101 system1 192.168.1.102 system2 # The following lines are… * Also, be sure to remove the default line with ''127.0.1.1'' from the file. '''BE SURE TO''': Copy this hosts file to every system in the cluster. =Setup NIS Server and Clients= This guide was written after following this other guide. [http://lyre.mit.edu/~powell/debian-howto/nis.html] ---- Pick a system to use as an NIS server, that is, where you want the other systems to grab their user credentials from. ==All Systems== Perform the following steps on every system (server and clients): Install NIS using the following command: apt-get install nis * When installing NIS you will be asked to enter an NIS domain. Typical practice is to just copy your dns domain, but this can be anything, as long as every system that is to be in the cluster matches. This can also be changed later in the/etc/defaultdomain file. Edit the /etc/nsswitch.conf file. * This file tells the computer where to check for user credentials. You want it to more or less match the following example: passwd: files nis group: files nis shadow: files nis … netgroup: nis * '''Note''': If you list ‘files’ first then ‘nis’, then it will check locally before it checks the NIS server. If you list them vice-versa, then it will check in the opposite order. This is important for troubleshooting issues with duplicate user accounts. * '''Note''': As of writing this guide, setting these to ‘compat’ doesn’t currently work. ==Server== Perform the following steps on the system designated as the server: Edit the /etc/ypserv.securenets file. * Comment out the line giving access to everyone. * Add a line for each system in the cluster (including the server). Follow the example below: … host 192.168.1.100 host 192.168.1.101 host 192.168.1.102 Edit the /etc/default/nis file. * This file has actual fields that require specific values. For the server, match the below example: … NISSERVER=master … NISCLIENT=true … * '''Note:''' It is required that the NIS server also be a client itself. Edit the /etc/passwd file. * Add the following line to the bottom of the file: +:::::: * This file also has lines that are associated to user accounts on the computer. Those lines should be moved to below the new line in order for those users to be shared. Edit the /etc/group file. * Add the following line to the bottom of the file: +::: * Like the last file, move the lines associated with shared users to below this new line. Edit the /etc/shadow file. * Add the following line to the bottom of the file: +:::::::: * Again, move all lines associated with users you wish to share to below this new line. Run the following command: cd /usr/lib/yp && ./ypinit -m * When running this command, it will ask you to edit the list of NIS servers. If there’s already an entry, ignore it. Just add the IP address of the current system to the list. Run the following command: cd /var/yp && make * This command will build the NIS map files containing all user credentials to be served out. Any time there are changes made to user credentials, this has to be re-run. A simple way for that to happen is the next step. Edit the /etc/crontab file. * Add the following line to the bottom of the file: */15 * * * * root cd /var/yp && make && cd / * This line will make the system re-make the NIS map files once every 15 minutes. This is invaluable. The last thing you need to do to the server is restart the NIS service. * Run the following command: /etc/init.d/nis restart ==Clients== Perform the following steps on all client systems (server included): Edit the /etc/yp.conf file. * Add the following line to the bottom: ypserver IP address of the NIS server ex. 192.168.1.100 * '''NOTE''' For the server, using the IP address 127.0.0.1 is fine. Edit the /etc/default/nis file. * Make sure NISCLIENT=true is set. Run the following command: /etc/init.d/nis restart =Setup NFS Server and Clients= This guide was written after following this other guide. [http://nfs.sourceforge.net/nfs-howto/] ==All Systems== Run the following command on all systems (server and clients): apt-get install nfs-common ==Server== Run the following command on the server: apt-get install nfs-kernel-server Edit the /etc/exports file. * Add the following line to the file: /dir ip/mask(permission) * dir - the directory your wish to share * ip - the base ip address of the domain you wish to share to * mask - the netmask for that domain * permission - permissions given to the person mounting the share An example line for sharing all user home directories (if the NIS Server and NFS Server are the same system) is below: /home 192.168.1.0/255.255.255.0(rw) * In this example, the base IP for the domain is 192.168.1.0 and the subnet mask is 255.255.255.0. The rw means that everyone within the given subnet will have read and write privileges. This can also be done by listing specific IP addresses on the same line, in the following fashion: /home 192.168.1.100(rw) 192.168.1.101(ro) Edit the /etc/hosts.deny file. * This file is only looked at after first checking the /etc/hosts.allow file for any explicitly allowed systems. For security reasons, this file is not going to be left blank. Add the following lines to the bottom of the file: portmap:ALL lockd:ALL mountd:ALL rquotad:ALL statd:ALL Edit the /etc/hosts.allow file. * This file is used to explicitly allow systems to mount shares. Add the following line to the bottom of the file: ALL:ip/mask * Here’s an example line. ALL:192.168.1.0/255.255.255.0 * Just like the /etc/exports file, these IP addresses can be defined on an individual basis. ''Reboot'' the system. =Mount/Unmount an NFS Share= To mount a share, run the following command: mount serverIP:/shareDir /mountDir Here’s an example mount command: mount 192.168.1.100:/home/user /home/user To unmount a share, run the following command: umount /home/user =Setup AutoFS on Clients= Run the following command on all NFS client systems: apt-get install autofs Edit the /etc/auto.master file. * Add the following line to the bottom of the file. /shareDir /etc/auto.shareName --timeout=60 * Here’s an example line. /home /etc/auto.home --timeout=60 Make a new file with the name from the above inserted line. In the example case, the file would be /etc/auto.home * Add the following line to the file. * serverIP:/shareDir * Here’s an example line. * 192.168.1.100:/home/& Restart the autofs service with the following command: /etc/init.d/autofs restart =MPI Jobs= ==Install MPI and Lamboot (and gcc)== Run the following command: apt-get install lam4-dev lam-runtime lam-mpidoc gcc ==Create Hosts File== Create a file anywhere with any name. Just remember where and what it is. This will be a list of the computers to be lambooted. For this guide, the file will be called 'hosts' and be stored in the current user's home directory. Add every computer to be lambooted to the file. Here's an example: system0 system1 system3 system4 You can also add them via their IP addresses: 192.168.1.100 192.168.1.101 192.168.1.102 192.168.1.103 ==Lamboot Cluster== run the following command: lamboot -v ~/hosts * This is an example line, using the example file we set up. Make sure to point it to where you want it. The -v is for verbose mode, meaning it will print out full details of what it's doing. ==Compile and Run an MPI Program== I'm going to walk you through making, compiling, and running a simple MPI program that will have each system print out a name and number. ===Write Program=== Make a new file. It will be ~/hello.c for this example. Copy the contents below into the file: /*The Parallel Hello World Program*/ #include #include main(int argc, char **argv) { int node; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &node); printf("Hello World from Node %d\n",node); MPI_Finalize(); } ===Compile Program=== While in the directory of the file run the following command, substituting in the names you're using: mpicc -o hello hello.c ===Run Program=== Run the following command, again making any proper substitutions: mpirun N -npty ./hello In this example, we've told mpirun to use four of the currently lambooted systems. This number can be changed to anything in the range of the number of systems lambooted. For example, you can run his with a two even if there are more than four systems lambooted. ==Lamhalt Cluster== When you're all done, it's standard procedure to stop the lamboot. To do this, run the following command: lamhalt -v You don't need the -v, but it can be useful.