This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
dslab:videowall_mpi_config [2010/10/04 21:11] – hps1 | dslab:videowall_mpi_config [2010/10/06 17:44] (current) – wedge | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | =====Configure DSLAB video wall for MPI===== | ||
+ | The following are the steps needed to be taken on the video wall machines to get them ready to participate in an MPI environment, | ||
+ | =====Premise===== | ||
+ | <code text> | ||
+ | On 9/14/10 1:40 PM, a person wrote: | ||
+ | > | ||
+ | > I'd like to start trying to begin a project using a combination of | ||
+ | > OpenGL and MPI to create something across the entire video wall, but not | ||
+ | > just with one X screen spread across them. It seems that the video wall | ||
+ | > doesn' | ||
+ | > seem to find the install howto and was wondering if you knew where to | ||
+ | > find it, or even know what doc I'm talking about. | ||
+ | > wiki, but didn't really find anything. | ||
+ | > | ||
+ | </ | ||
+ | |||
+ | The video wall machines are still from an era pre-regular documentation, | ||
+ | |||
+ | What we'll have to do in order to accommodate your request is to install the necessary MPI packages on them... I've gone and done it on node132, so if you could duplicate the following instructions on nodes 133 through 147, it should get us most of the way there. | ||
+ | |||
+ | =====Install OpenMPI===== | ||
+ | |||
+ | ====Step -1: Log onto machine, become root==== | ||
+ | |||
+ | Most of this stuff you can't do without root privileges. | ||
+ | |||
+ | ====Step 0: Update sources.list==== | ||
+ | |||
+ | |||
+ | Edit / | ||
+ | |||
+ | <file conf sources.list> | ||
+ | deb | ||
+ | deb | ||
+ | </ | ||
+ | |||
+ | <WRAP round info box> | ||
+ | |||
+ | ====Step 1: Update db and packages==== | ||
+ | <cli> | ||
+ | node1##:~$ sudo aptitude update && aptitude upgrade | ||
+ | </ | ||
+ | |||
+ | <WRAP round info box> | ||
+ | |||
+ | ====Step 2: Clean up==== | ||
+ | <cli> | ||
+ | node1##:~$ sudo aptitude update && aptitude clean | ||
+ | </ | ||
+ | |||
+ | <WRAP round info box>I have you run an update again in case there' | ||
+ | |||
+ | ====Step 3: Install OpenMPI packages==== | ||
+ | <cli> | ||
+ | node1##:~$ sudo aptitude install openmpi-bin openmpi-dev openmpi-common | ||
+ | </ | ||
+ | |||
+ | ====Step 4: Configure OpenMPI==== | ||
+ | Edit / | ||
+ | |||
+ | <file conf openmpi-default-hostfile> | ||
+ | node132 | ||
+ | node133 | ||
+ | node134 | ||
+ | node135 | ||
+ | node136 | ||
+ | node137 | ||
+ | node138 | ||
+ | node139 | ||
+ | node140 | ||
+ | node141 | ||
+ | node142 | ||
+ | node143 | ||
+ | node144 | ||
+ | node145 | ||
+ | node146 | ||
+ | node147 | ||
+ | </ | ||
+ | |||
+ | At this point, once performed on all the video wall machines, MPI should be installed and configured. | ||
+ | |||
+ | =====Configuring LDAP===== | ||
+ | The next step is to perform some additional configurations so you can log into them with your DSLAB user account, do development, | ||
+ | |||
+ | You still want to be root. | ||
+ | |||
+ | First up, we'll tackle LDAP: | ||
+ | |||
+ | ====Step 0: Install necessary packages==== | ||
+ | <cli> | ||
+ | node##:~$ sudo aptitude install libpam-ldap libnss-ldap | ||
+ | </ | ||
+ | |||
+ | ====Step 1: Configure PAM for LDAP==== | ||
+ | Edit / | ||
+ | |||
+ | <file conf pam_ldap.conf> | ||
+ | host auth | ||
+ | base dc=dslab, | ||
+ | ldap_version 3 | ||
+ | |||
+ | nss_base_passwd | ||
+ | nss_base_passwd | ||
+ | nss_base_passwd | ||
+ | |||
+ | nss_base_shadow | ||
+ | nss_base_shadow | ||
+ | nss_base_shadow | ||
+ | |||
+ | nss_base_group | ||
+ | nss_base_group | ||
+ | nss_base_group | ||
+ | </ | ||
+ | |||
+ | <WRAP round info box> | ||
+ | |||
+ | ====Step 2: Configure NSS for LDAP==== | ||
+ | There are two elements to NSS configuration. | ||
+ | |||
+ | ===Part A: Configure system-wide NSS settings=== | ||
+ | Edit / | ||
+ | |||
+ | <file conf nsswitch.conf> | ||
+ | # / | ||
+ | # | ||
+ | # Example configuration of GNU Name Service Switch functionality. | ||
+ | # If you have the `glibc-doc-reference' | ||
+ | # `info libc "Name Service Switch"' | ||
+ | |||
+ | passwd: | ||
+ | group: | ||
+ | shadow: | ||
+ | |||
+ | hosts: | ||
+ | networks: | ||
+ | |||
+ | protocols: | ||
+ | services: | ||
+ | ethers: | ||
+ | rpc: files | ||
+ | |||
+ | netgroup: | ||
+ | </ | ||
+ | |||
+ | <WRAP round info box> | ||
+ | |||
+ | ===Part B: Configure NSS LDAP functionality=== | ||
+ | Edit / | ||
+ | |||
+ | <file conf libnss_ldap.conf> | ||
+ | host auth | ||
+ | base dc=dslab, | ||
+ | ldap_version 3 | ||
+ | |||
+ | nss_base_passwd | ||
+ | nss_base_passwd | ||
+ | nss_base_passwd | ||
+ | |||
+ | nss_base_shadow | ||
+ | nss_base_shadow | ||
+ | nss_base_shadow | ||
+ | |||
+ | nss_base_group | ||
+ | nss_base_group | ||
+ | nss_base_group | ||
+ | </ | ||
+ | |||
+ | <WRAP round info box> | ||
+ | |||
+ | ====Step 3: Restart NSCD and related services==== | ||
+ | To let the system know something has changed with regard to system authentication, | ||
+ | |||
+ | * cron - schedule execution of commands | ||
+ | * sshd - secure shell daemon | ||
+ | * nscd - name server caching daemon | ||
+ | |||
+ | We can do that by issuing the following commands (sample output provided): | ||
+ | |||
+ | <cli> | ||
+ | node1##:~$ sudo / | ||
+ | Restarting periodic command scheduler: crond. | ||
+ | node1##:~$ sudo / | ||
+ | Restarting OpenBSD Secure Shell server: sshd. | ||
+ | node1##:~$ sudo / | ||
+ | Restarting Name Service Cache Daemon: nscd. | ||
+ | node1##: | ||
+ | </ | ||
+ | |||
+ | ====Step 4: Test for LDAP functionality==== | ||
+ | While still logged onto the video wall machine, check to see if we can do a user lookup. I'll show you successful operation and unsuccessful operation. | ||
+ | |||
+ | ===Successful LDAP user lookup=== | ||
+ | This is what you'll see when LDAP is configured properly: | ||
+ | |||
+ | <cli> | ||
+ | node1##:~$ id mbw6 | ||
+ | uid=3032(mbw6) gid=3000(dslab) groups=3000(dslab) | ||
+ | node1##: | ||
+ | </ | ||
+ | |||
+ | If this is what you see, success has occurred, and LDAP is working. We can now move on to setting up **autofs**. | ||
+ | |||
+ | ===Unsuccessful LDAP user lookup=== | ||
+ | If something is still out of whack, that same attempt would instead show: | ||
+ | |||
+ | <cli> | ||
+ | node1##:~$ id mbw6 | ||
+ | id: mbw6: No such user | ||
+ | node1##: | ||
+ | </ | ||
+ | |||
+ | And you should check to make sure no typos were made, and that you restarted the mentioned services appropriately. | ||
+ | |||
+ | =====Configuring AUTOFS===== | ||
+ | AutoFS will, upon user login, NFS mount your home directory from the fileserver, making it appear that your home directory is on the local system. This is essential for a cluster environment, | ||
+ | |||
+ | We will go through installing and configuring AutoFS on the video wall machines. | ||
+ | |||
+ | Again, make sure you are root. | ||
+ | |||
+ | ====Step 0: Install packages==== | ||
+ | <cli> | ||
+ | node##:~$ sudo aptitude install autofs | ||
+ | </ | ||
+ | |||
+ | ====Step 1: Configure automount maps==== | ||
+ | |||
+ | There are two files involved in this step. One will exist and need some modifications, | ||
+ | |||
+ | ===Part A: The master map=== | ||
+ | Edit / | ||
+ | |||
+ | <file conf auto.master> | ||
+ | # | ||
+ | # $Id: auto.master, | ||
+ | # | ||
+ | # Sample auto.master file | ||
+ | # This is an automounter map and it has the following format | ||
+ | # key [ -mount-options-separated-by-comma ] location | ||
+ | # For details of the format look at autofs(5). | ||
+ | |||
+ | /home / | ||
+ | </ | ||
+ | |||
+ | ===Part B: The home map=== | ||
+ | This file is referenced by **/ | ||
+ | |||
+ | Edit (or create) / | ||
+ | |||
+ | <file conf auto.home> | ||
+ | * | ||
+ | </ | ||
+ | |||
+ | Short and simple. It instructs the automounter that, for any user, it should attempt to mount the appropriate home directory from the DSLAB file server (data). | ||
+ | |||
+ | ====Step 2: Restart autofs daemon==== | ||
+ | Once this is set, we need to restart the **autofs** daemon, so it will know changes to the configuration files has taken place. | ||
+ | |||
+ | To restart it, do the following: | ||
+ | |||
+ | <cli> | ||
+ | node1##:~$ sudo / | ||
+ | Stopping automounter: | ||
+ | Starting automounter: | ||
+ | node1##: | ||
+ | </ | ||
+ | |||
+ | At this point you should be able to ssh into the video wall machine in question with your regular DSLAB user account. When you log in, your home directory should be there. | ||
+ | |||
+ | If you can log in, but your home directory is missing, something still needs a whack. Check for typos, and be sure to restart the autofs daemon after making any changes. | ||
+ | |||
+ | I'd also recommend the following as common diagnostic steps: | ||
+ | |||
+ | * Ensure that **/home** exists, run (as root): **mkdir -p /home**; **/ | ||
+ | * Reboot the system | ||
+ | |||
+ | If it still doesn' | ||
+ | |||
+ | =====Setting up SSH keys===== | ||
+ | MPI requires access to all the machines in the cluster in order to function, and to make our lives easier, we'll just grant it the ability to log in to machines as it needs to, without our intervention. | ||
+ | |||
+ | To accomplish this, we are going to set up " | ||
+ | |||
+ | This is a task which can be done by creating a public/ | ||
+ | |||
+ | We then need to introduce ourselves to each machine to ensure that any **known host** functionality is sure we've previously established contact, and that it should just allow the connection to take place. | ||
+ | |||
+ | Luckily, some of this functionality has already been automated, in order to accommodate the various users using the main cluster. | ||
+ | |||
+ | ====Creating the keys==== | ||
+ | We're going to run the same script that users of the main cluster run in order to run a cluster job. To do this, we need to log into **node00.dslab.lan** as the user we wish to enable this functionality for (yes, **node00** of the main cluster), and run the script: **setupkeys** | ||
+ | |||
+ | <cli> | ||
+ | node00:~$ setupkeys | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | At this point, we can return to the video wall machines. | ||
+ | |||
+ | ====Introducing ourselves via SSH to the video wall machines==== | ||
+ | To make everything happy, we need to ensure that ssh has no issues logging us into each of the machines on the video wall. To do this, we are going to run the following (just type it in at the prompt, and follow the prompts): | ||
+ | |||
+ | <cli> | ||
+ | node132:~$ for((i=132; i<=147; i++)); do | ||
+ | > echo -n " | ||
+ | > ssh node$i " | ||
+ | > done | ||
+ | </ | ||
+ | |||
+ | As soon as you hit enter on the done (and assuming you have everything typed in correctly), it will start to try and log into each machine on the video wall, one at a time, starting with node132. | ||
+ | |||
+ | If you've never logged into a particular machine before from your account, you'll be prompted to authorized the connection (say " | ||
+ | |||
+ | If you ssh key is set up correctly, it will then display the date (as it has just logged you into that machine and run the date command, then logged back out). | ||
+ | |||
+ | Once you've performed any initial introductions, | ||
+ | |||
+ | With this in place, you should be able to launch MPI jobs. |