User Tools

Site Tools


dslab:videowall_mpi_config

Configure DSLAB video wall for MPI

The following are the steps needed to be taken on the video wall machines to get them ready to participate in an MPI environment, along with configuring them for DSLAB user logins.

Premise

On 9/14/10 1:40 PM, a person wrote:
>
> I'd like to start trying to begin a project using a combination of 
> OpenGL and MPI to create something across the entire video wall, but not 
> just with one X screen spread across them.  It seems that the video wall 
> doesn't have MPI(or at least the mpicc and mpirun commands) and I can't 
> seem to find the install howto and was wondering if you knew where to 
> find it, or even know what doc I'm talking about.  I checked on the 
> wiki, but didn't really find anything.
>

The video wall machines are still from an era pre-regular documentation, so there likely won't be anything… this is something that, going forward, we should make an effort to ensure is rectified.

What we'll have to do in order to accommodate your request is to install the necessary MPI packages on them… I've gone and done it on node132, so if you could duplicate the following instructions on nodes 133 through 147, it should get us most of the way there.

Install OpenMPI

Step -1: Log onto machine, become root

Most of this stuff you can't do without root privileges.

Step 0: Update sources.list

Edit /etc/apt/sources.list, and replace the entire contents with the following:

sources.list
deb     http://mirror/debian/     etch          main contrib non-free
deb     http://mirror/security/   etch/updates  main contrib non-free

Note: Make sure the node is running Debian 4.0 rather than 5.0. If it's 5.0, then point it to the lenny sources instead of the etch sources. I'll try to post more detailed instructions when I get the chance.

Step 1: Update db and packages

node1##:~$ sudo aptitude update && aptitude upgrade

node132 had about 86MB of updates. And it has likely received more attention than the others.

Step 2: Clean up

node1##:~$ sudo aptitude update && aptitude clean

I have you run an update again in case there's a repository key upgraded.

Step 3: Install OpenMPI packages

node1##:~$ sudo aptitude install openmpi-bin openmpi-dev openmpi-common

Step 4: Configure OpenMPI

Edit /etc/openmpi/openmpi-default-hostfile, and replace the entire contents with the following:

openmpi-default-hostfile
node132
node133
node134
node135
node136
node137
node138
node139
node140
node141
node142
node143
node144
node145
node146
node147

At this point, once performed on all the video wall machines, MPI should be installed and configured.

Configuring LDAP

The next step is to perform some additional configurations so you can log into them with your DSLAB user account, do development, compilation, and execution.

You still want to be root.

First up, we'll tackle LDAP:

Step 0: Install necessary packages

node##:~$ sudo aptitude install libpam-ldap libnss-ldap

Step 1: Configure PAM for LDAP

Edit /etc/pam_ldap.conf, and replace the contents with the following:

pam_ldap.conf
host auth
base dc=dslab,dc=lan
ldap_version 3
 
nss_base_passwd     ou=People,dc=dslab,dc=lan?one
nss_base_passwd     ou=People,dc=lair,dc=lan?one
nss_base_passwd     ou=People,dc=sunyit,dc=lan?one
 
nss_base_shadow     ou=People,dc=dslab,dc=lan?one
nss_base_shadow     ou=People,dc=lair,dc=lan?one
nss_base_shadow     ou=People,dc=sunyit,dc=lan?one
 
nss_base_group      ou=Group,dc=dslab,dc=lan?one
nss_base_group      ou=Group,dc=lair,dc=lan?one
nss_base_group      ou=Group,dc=sunyit,dc=lan?one

There will likely be similar looking data already in the file… but the data is not identical, so please replace it in its entirety with this information.

Step 2: Configure NSS for LDAP

There are two elements to NSS configuration.

Part A: Configure system-wide NSS settings

Edit /etc/nsswitch.conf, and replace the contents with the following:

nsswitch.conf
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
 
passwd:         files [SUCCESS=return] ldap
group:          files [SUCCESS=return] ldap
shadow:         files [SUCCESS=return] ldap
 
hosts:          files dns
networks:       files
 
protocols:      files
services:       files
ethers:         files
rpc:            files
 
netgroup:       files

There will likely be similar looking data already in the file… but the data is not identical, so please replace it in its entirety with this information.

Part B: Configure NSS LDAP functionality

Edit /etc/libnss_ldap.conf, and replace the contents with the following:

libnss_ldap.conf
host auth
base dc=dslab,dc=lan
ldap_version 3
 
nss_base_passwd     ou=People,dc=dslab,dc=lan?one
nss_base_passwd     ou=People,dc=lair,dc=lan?one
nss_base_passwd     ou=People,dc=sunyit,dc=lan?one
 
nss_base_shadow     ou=People,dc=dslab,dc=lan?one
nss_base_shadow     ou=People,dc=lair,dc=lan?one
nss_base_shadow     ou=People,dc=sunyit,dc=lan?one
 
nss_base_group      ou=Group,dc=dslab,dc=lan?one
nss_base_group      ou=Group,dc=lair,dc=lan?one
nss_base_group      ou=Group,dc=sunyit,dc=lan?one

There will likely be similar looking data already in the file… but the data is not identical, so please replace it in its entirety with this information.

To let the system know something has changed with regard to system authentication, we need to restart the following services:

  • cron - schedule execution of commands
  • sshd - secure shell daemon
  • nscd - name server caching daemon

We can do that by issuing the following commands (sample output provided):

node1##:~$ sudo /etc/init.d/cron restart
Restarting periodic command scheduler: crond.
node1##:~$ sudo /etc/init.d/ssh restart
Restarting OpenBSD Secure Shell server: sshd.
node1##:~$ sudo /etc/init.d/nscd restart
Restarting Name Service Cache Daemon: nscd.
node1##:~$ 

Step 4: Test for LDAP functionality

While still logged onto the video wall machine, check to see if we can do a user lookup. I'll show you successful operation and unsuccessful operation.

Successful LDAP user lookup

This is what you'll see when LDAP is configured properly:

node1##:~$ id mbw6
uid=3032(mbw6) gid=3000(dslab) groups=3000(dslab)
node1##:~$ 

If this is what you see, success has occurred, and LDAP is working. We can now move on to setting up autofs.

Unsuccessful LDAP user lookup

If something is still out of whack, that same attempt would instead show:

node1##:~$ id mbw6
id: mbw6: No such user
node1##:~$ 

And you should check to make sure no typos were made, and that you restarted the mentioned services appropriately.

Configuring AUTOFS

AutoFS will, upon user login, NFS mount your home directory from the fileserver, making it appear that your home directory is on the local system. This is essential for a cluster environment, as each system will need access to the pertinent data, and the best way, traditionally, to accomplish this is via an NFS mount.

We will go through installing and configuring AutoFS on the video wall machines.

Again, make sure you are root.

Step 0: Install packages

node##:~$ sudo aptitude install autofs

Step 1: Configure automount maps

There are two files involved in this step. One will exist and need some modifications, the other we'll likely be creating (unless an old config exists, in which case we're just going to overwrite it).

Part A: The master map

Edit /etc/auto.master, and replace the contents with the following:

auto.master
#
# $Id: auto.master,v 1.4 2005/01/04 14:36:54 raven Exp $
#
# Sample auto.master file
# This is an automounter map and it has the following format
# key [ -mount-options-separated-by-comma ] location
# For details of the format look at autofs(5).
 
/home   /etc/auto.home --timeout=60 -fstype=nfs4,rw

Part B: The home map

This file is referenced by /etc/auto.master, and will read it when activity takes place in /home (which will occur when a user logs in).

Edit (or create) /etc/auto.home, and replace the contents with the following:

auto.home
*   data:/home/&

Short and simple. It instructs the automounter that, for any user, it should attempt to mount the appropriate home directory from the DSLAB file server (data).

Step 2: Restart autofs daemon

Once this is set, we need to restart the autofs daemon, so it will know changes to the configuration files has taken place.

To restart it, do the following:

node1##:~$ sudo /etc/init.d/autofs restart
Stopping automounter: done.
Starting automounter: done.
node1##:~$ 

At this point you should be able to ssh into the video wall machine in question with your regular DSLAB user account. When you log in, your home directory should be there.

If you can log in, but your home directory is missing, something still needs a whack. Check for typos, and be sure to restart the autofs daemon after making any changes.

I'd also recommend the following as common diagnostic steps:

  • Ensure that /home exists, run (as root): mkdir -p /home; /etc/init.d/autofs restart
  • Reboot the system

If it still doesn't work, check to make sure packages like nfs-common have been installed, and that a modprobe nfs does not result in an error (if you end up needing to install nfs-common, be sure to restart autofs or reboot the system to get changes to take effect).

Setting up SSH keys

MPI requires access to all the machines in the cluster in order to function, and to make our lives easier, we'll just grant it the ability to log in to machines as it needs to, without our intervention.

To accomplish this, we are going to set up “passwordless SSH” access to the cluster machines.

This is a task which can be done by creating a public/private key pair for SSH and storing it in our user's local keychain: ~/.ssh/authorized_keys

We then need to introduce ourselves to each machine to ensure that any known host functionality is sure we've previously established contact, and that it should just allow the connection to take place.

Luckily, some of this functionality has already been automated, in order to accommodate the various users using the main cluster.

Creating the keys

We're going to run the same script that users of the main cluster run in order to run a cluster job. To do this, we need to log into node00.dslab.lan as the user we wish to enable this functionality for (yes, node00 of the main cluster), and run the script: setupkeys

node00:~$ setupkeys
...

At this point, we can return to the video wall machines.

Introducing ourselves via SSH to the video wall machines

To make everything happy, we need to ensure that ssh has no issues logging us into each of the machines on the video wall. To do this, we are going to run the following (just type it in at the prompt, and follow the prompts):

node132:~$ for((i=132; i<=147; i++)); do
> echo -n "[node$i] "
> ssh node$i "date"
> done

As soon as you hit enter on the done (and assuming you have everything typed in correctly), it will start to try and log into each machine on the video wall, one at a time, starting with node132.

If you've never logged into a particular machine before from your account, you'll be prompted to authorized the connection (say “yes”– the whole word, not just 'y', and hit enter).

If you ssh key is set up correctly, it will then display the date (as it has just logged you into that machine and run the date command, then logged back out).

Once you've performed any initial introductions, go ahead and run the above again… you basically want everything to go without a hitch (no prompting for password, no other prompting at all– just a list of the machines and their dates should be output to the screen).

With this in place, you should be able to launch MPI jobs.

dslab/videowall_mpi_config.txt · Last modified: 2010/10/06 13:44 by wedge