Table of Contents

EIGHT MACHINE HPC CLUSTER PROJECT

John T. Rine

Objective

The object of this exercise is to create an 8-machine HPC cluster which is capable of performing HPC tasks.

Materials/Reading/Prerequisites

To do this project, you will need to have successfully completed on 8 computers:

To accomplish this project you will need:

ItemDescriptionQty
computerHPC cluster8
student shelf KVM setupThe location for this installation is largely established and fixed1

Background

Recently, HPC systems have shifted from supercomputing to computing clusters and grids. Open MPI is one of a number of HPC technologies used to implement and create high performance computing systems.

Procedure

These instructions assume that the boot sequence of all of the computers in the cluster are configured to boot from the hard drive before booting from the network. If a computer's boot order is set to boot from the network first, change the boot order.

  1. Set the master KVM bank switch to bank one.
  2. Set the master KVM computer switches to the computer on which the operating system is to be installed.
  3. Turn on the computer.
  4. As a computer boots, press F2 on the keyboard to bypass the “System Setup” screen and go directly to the cmos setup.
  5. In the cmos setup, use the right or left arrow keys to select “Boot” from the main menu.
  6. Verify that “PXE boot to LAN” is enabled. If it isn't enabled, enable it.
  7. In the cmos setup, use the right and left arrow keys to select “Advanced” from the main menu.
  8. Arrow down to “Peripheral Configuration”.
  9. Verify that “Onboard LAN” is enabled. If it isn't enabled, enable it.
  10. If cmos settings were changed save and exit the cmos setup utility. If no changes were made to the cmos, select the exit without saving option.
  11. The Computer reboots.
  12. Press F10 to access the boot menu.
  13. On the boot menu, arrow down to “IBA GE Slot 0208 V1210” and select it.
  14. Next, the Lair Network Boot Menu loads.
  15. Select “Debian/i386 Netboot” and press enter.
  16. Arrow to “Install Squeeze/testing [text]” if required and press enter.
  17. At this point, the installation begins (defaults were selected throughout MOST but not all of the installation).
  18. The “Select a language” screen is displayed. The default is “English”, press enter to select this item.
  19. The “Select a location” screen is displayed. The default is “United States”, press enter this item.
  20. The “Select a keyboard layout” screen is displayed. The default is “American English”, press enter this item.
  21. The “Configure the network” screen is displayed. On this screen, the Host name: “dhcp-175” is entered automatically. press enter to continue.
  22. The “Configure the network” screen is displayed. On this screen, the Domain name: “offbyone.lan” is entered automatically. Press enter to continue.
  23. The “Choose a mirror of the Debian archive” screen is displayed. The default is “United States”. Arrow up to “Enter information manually” item and press enter.
  24. The “Choose a mirror of the Debian archive” screen is displayed. The Debian mirror host name: “mirror” is entered automatically. Press enter to continue.
  25. The “Choose a mirror of the Debian archive” screen is displayed. The Debian archive directory: “/debian/” is entered automatically. Press enter to continue.
  26. The “Choose a mirror of the Debian archive” screen is displayed. HTTP proxy information is blank by default. Press enter to continue.
  27. In the first installation od Debian Squeeze, the installation continued from here, however during the second installation, the installation failed because a file could not be installed from the network archive. This was because there was a newer version that was avalable. When the failure occured, the installer is prompted to either retry or change the mirror. During the second installation the mirror was changed to thr RIT mirror.
  28. Next, the “Set up users and passwords” screen is displayed. The Root password I entered was “bob”.
  29. The “Set up users and passwords” screen is displayed. The installer is prompted to re-enter the password. I entered entered “bob”.
  30. The “Set up users and passwords” screen is displayed. The installer is prompted to enter “The full name for the new user:” I entered “bob”.
  31. The “Set up users and passwords” screen is displayed. The installer is prompted to enter the username for the account. I entered “bob”.
  32. The “Set up users and passwords” screen is displayed. The installer is prompted to enter a password for the new user. I entered “bob”.
  33. Next, the “Configure the clock” screen is displayed. For Select your time zone: “Eastern” is the default. Press enter to continue.
  34. The “Partition disk” screen is displayed. The default is “Guided-use entire disk”, press enter to continue.
  35. The “Partition disk” screen is displayed. The disk to partition: SCSI 1 (0,0,0) (sda)-xxx.xxGB ATA Maxtor. has been entered automatically. Press enter to continue.
  36. The “Partition disk” screen is displayed. Partitioning scheme: “All files in one partition (recommended for new users)” is the default. Press enter to continue.
  37. The “Partition disk” screen is displayed. The default “Finish partitioning and write changes to disk” is displayed. Press enter to continue.
  38. Partitioning message-“The following partitions are going to be formatted: Partition #1 of SCSI1 (0,0,0) sda as ext3; Partition #5 of SCSI1 (0,0,0) sda as swap.”
  39. Popularity contest message “Configuring popularity contest: Participate in the package usage survey?”. Select <no>.
  40. The “Software selection” screen is displayed. The default is “Graphical desktop environment”. Deselect this selection using the spacebar. Using the space bar, select the ssh server. Press enter to continue.
  41. Configuring grub message: “Configuring grub-pc: Install grub boot loader to master boot record”, <yes>.
  42. Installation complete <continue>.
  43. Finishing the installation..
  44. Computer reboots automatically.
  45. Repeat steps 2 through 44 for all computers in the cluster.
  46. Change the KVM computer switch to the first position, node 00.
  47. Change the working directory to /etc/apt. At the command prompt enter “cd /etc/apt”.
  48. Rename the existing sources.list to sources.bak. At the command prompt enter “mv sources.list sources.bak”.
  49. Now download sources.list. At the command prompt, enter “wget http://10.80.2.6/files/students/sources.list”.
  50. Next, open sources.list with a text editor and replace all references to “debversion” to “squeeze”. On the command prompt, enter “vi sources.list”. In vi enter “:%s/debversion/squeeze/g”. Next to save the changes and quit vi, Enter “:wq”.
  51. On the command line, enter and execute: “ssh-keygen -t rsa -C clusterkey”.
  52. change the directory to “.ssh”.
  53. On the command line, enter and execute the following command: “cat id_rsa.pub » authorized_keys”.
  54. On the command line enter and execute the following commands: for i in 1 2 3 4 5 6 7; do ssh node0$i “mkdir -p .ssh; chmod 700 .ssh” sca id_rsa.pub node0$i:.ssh/ ssh node0$i “cat .ssh/idid_rsa.pub » .ssh/authorized_keys; chmod 600 .ssh/*” sca /etc/apt/sources.list node0$i:/etc/apt ssh node0$i “aptitude update; aptitude upgrade” done
  55. On node00, perform an aptitude install openmpi-bin openmpi-libs0 openmpi-dev openmpi-dev build-essential
  56. On the command line, enter and execute the following command: “for i in 1 2 3 4 5 6 7; do ssh node0$i aptitude install openmpi-bin openmpi-libs0 openmpi-doc done”.
  57. On node00, change directory to /etc/openmpi and edit the file “open-default-hostfile” as follows: vi open-default-hostfile
node00 slots=1 max_slots=1
node01 slots=1 max_slots=1
node02 slots=1 max_slots=1
node03 slots=1 max_slots=1
node04 slots=1 max_slots=1
node05 slots=1 max_slots=1
node06 slots=1 max_slots=1
node07 slots=1 max_slots=1

then close vi and save the file :wq.

As part of the HPC cluster, we used a network file system to share resources across the cluster.
We installed the nfs server on VM03,a virtual machine, and on all of the machines in the cluster.
On the nfs server(vm03 in this case) perform the following steps:

  - Log into the virtual machine as root
  - aptitude intstall nfs-kernel-server
  - mkdir /export/home
  - reboot the vm using the reboot command
  - Log into the vm
  - edit the file /etc/exports; add line: /export/home 10.80.3.0/24(rw, synch, fsid=0, crossmnt, no_subtree_check)
  - exportfs -rva

On each of the machines in the cluster perform the following steps:

  - Log into a machine as root
  - aptitude install nfs-common
  - reboot the machine using the reboot command
  - Log into the machine
  - mount -t nfs vm02:/export/home /home

References