The following are the steps needed to be taken on the video wall machines to get them ready to participate in an MPI environment, along with configuring them for DSLAB user logins.
On 9/14/10 1:40 PM, a person wrote: > > I'd like to start trying to begin a project using a combination of > OpenGL and MPI to create something across the entire video wall, but not > just with one X screen spread across them. It seems that the video wall > doesn't have MPI(or at least the mpicc and mpirun commands) and I can't > seem to find the install howto and was wondering if you knew where to > find it, or even know what doc I'm talking about. I checked on the > wiki, but didn't really find anything. >
The video wall machines are still from an era pre-regular documentation, so there likely won't be anything… this is something that, going forward, we should make an effort to ensure is rectified.
What we'll have to do in order to accommodate your request is to install the necessary MPI packages on them… I've gone and done it on node132, so if you could duplicate the following instructions on nodes 133 through 147, it should get us most of the way there.
Most of this stuff you can't do without root privileges.
Edit /etc/apt/sources.list, and replace the entire contents with the following:
deb http://mirror/debian/ etch main contrib non-free deb http://mirror/security/ etch/updates main contrib non-free
Note: Make sure the node is running Debian 4.0 rather than 5.0. If it's 5.0, then point it to the lenny sources instead of the etch sources. I'll try to post more detailed instructions when I get the chance.
node1##:~$ sudo aptitude update && aptitude upgrade
node132 had about 86MB of updates. And it has likely received more attention than the others.
node1##:~$ sudo aptitude update && aptitude clean
I have you run an update again in case there's a repository key upgraded.
node1##:~$ sudo aptitude install openmpi-bin openmpi-dev openmpi-common
Edit /etc/openmpi/openmpi-default-hostfile, and replace the entire contents with the following:
node132 node133 node134 node135 node136 node137 node138 node139 node140 node141 node142 node143 node144 node145 node146 node147
At this point, once performed on all the video wall machines, MPI should be installed and configured.
The next step is to perform some additional configurations so you can log into them with your DSLAB user account, do development, compilation, and execution.
You still want to be root.
First up, we'll tackle LDAP:
node##:~$ sudo aptitude install libpam-ldap libnss-ldap
Edit /etc/pam_ldap.conf, and replace the contents with the following:
host auth base dc=dslab,dc=lan ldap_version 3 nss_base_passwd ou=People,dc=dslab,dc=lan?one nss_base_passwd ou=People,dc=lair,dc=lan?one nss_base_passwd ou=People,dc=sunyit,dc=lan?one nss_base_shadow ou=People,dc=dslab,dc=lan?one nss_base_shadow ou=People,dc=lair,dc=lan?one nss_base_shadow ou=People,dc=sunyit,dc=lan?one nss_base_group ou=Group,dc=dslab,dc=lan?one nss_base_group ou=Group,dc=lair,dc=lan?one nss_base_group ou=Group,dc=sunyit,dc=lan?one
There will likely be similar looking data already in the file… but the data is not identical, so please replace it in its entirety with this information.
There are two elements to NSS configuration.
Edit /etc/nsswitch.conf, and replace the contents with the following:
# /etc/nsswitch.conf # # Example configuration of GNU Name Service Switch functionality. # If you have the `glibc-doc-reference' and `info' packages installed, try: # `info libc "Name Service Switch"' for information about this file. passwd: files [SUCCESS=return] ldap group: files [SUCCESS=return] ldap shadow: files [SUCCESS=return] ldap hosts: files dns networks: files protocols: files services: files ethers: files rpc: files netgroup: files
There will likely be similar looking data already in the file… but the data is not identical, so please replace it in its entirety with this information.
Edit /etc/libnss_ldap.conf, and replace the contents with the following:
host auth base dc=dslab,dc=lan ldap_version 3 nss_base_passwd ou=People,dc=dslab,dc=lan?one nss_base_passwd ou=People,dc=lair,dc=lan?one nss_base_passwd ou=People,dc=sunyit,dc=lan?one nss_base_shadow ou=People,dc=dslab,dc=lan?one nss_base_shadow ou=People,dc=lair,dc=lan?one nss_base_shadow ou=People,dc=sunyit,dc=lan?one nss_base_group ou=Group,dc=dslab,dc=lan?one nss_base_group ou=Group,dc=lair,dc=lan?one nss_base_group ou=Group,dc=sunyit,dc=lan?one
There will likely be similar looking data already in the file… but the data is not identical, so please replace it in its entirety with this information.
To let the system know something has changed with regard to system authentication, we need to restart the following services:
We can do that by issuing the following commands (sample output provided):
node1##:~$ sudo /etc/init.d/cron restart Restarting periodic command scheduler: crond. node1##:~$ sudo /etc/init.d/ssh restart Restarting OpenBSD Secure Shell server: sshd. node1##:~$ sudo /etc/init.d/nscd restart Restarting Name Service Cache Daemon: nscd. node1##:~$
While still logged onto the video wall machine, check to see if we can do a user lookup. I'll show you successful operation and unsuccessful operation.
This is what you'll see when LDAP is configured properly:
node1##:~$ id mbw6 uid=3032(mbw6) gid=3000(dslab) groups=3000(dslab) node1##:~$
If this is what you see, success has occurred, and LDAP is working. We can now move on to setting up autofs.
If something is still out of whack, that same attempt would instead show:
node1##:~$ id mbw6 id: mbw6: No such user node1##:~$
And you should check to make sure no typos were made, and that you restarted the mentioned services appropriately.
AutoFS will, upon user login, NFS mount your home directory from the fileserver, making it appear that your home directory is on the local system. This is essential for a cluster environment, as each system will need access to the pertinent data, and the best way, traditionally, to accomplish this is via an NFS mount.
We will go through installing and configuring AutoFS on the video wall machines.
Again, make sure you are root.
node##:~$ sudo aptitude install autofs
There are two files involved in this step. One will exist and need some modifications, the other we'll likely be creating (unless an old config exists, in which case we're just going to overwrite it).
Edit /etc/auto.master, and replace the contents with the following:
# # $Id: auto.master,v 1.4 2005/01/04 14:36:54 raven Exp $ # # Sample auto.master file # This is an automounter map and it has the following format # key [ -mount-options-separated-by-comma ] location # For details of the format look at autofs(5). /home /etc/auto.home --timeout=60 -fstype=nfs4,rw
This file is referenced by /etc/auto.master, and will read it when activity takes place in /home (which will occur when a user logs in).
Edit (or create) /etc/auto.home, and replace the contents with the following:
* data:/home/&
Short and simple. It instructs the automounter that, for any user, it should attempt to mount the appropriate home directory from the DSLAB file server (data).
Once this is set, we need to restart the autofs daemon, so it will know changes to the configuration files has taken place.
To restart it, do the following:
node1##:~$ sudo /etc/init.d/autofs restart Stopping automounter: done. Starting automounter: done. node1##:~$
At this point you should be able to ssh into the video wall machine in question with your regular DSLAB user account. When you log in, your home directory should be there.
If you can log in, but your home directory is missing, something still needs a whack. Check for typos, and be sure to restart the autofs daemon after making any changes.
I'd also recommend the following as common diagnostic steps:
If it still doesn't work, check to make sure packages like nfs-common have been installed, and that a modprobe nfs does not result in an error (if you end up needing to install nfs-common, be sure to restart autofs or reboot the system to get changes to take effect).
MPI requires access to all the machines in the cluster in order to function, and to make our lives easier, we'll just grant it the ability to log in to machines as it needs to, without our intervention.
To accomplish this, we are going to set up “passwordless SSH” access to the cluster machines.
This is a task which can be done by creating a public/private key pair for SSH and storing it in our user's local keychain: ~/.ssh/authorized_keys
We then need to introduce ourselves to each machine to ensure that any known host functionality is sure we've previously established contact, and that it should just allow the connection to take place.
Luckily, some of this functionality has already been automated, in order to accommodate the various users using the main cluster.
We're going to run the same script that users of the main cluster run in order to run a cluster job. To do this, we need to log into node00.dslab.lan as the user we wish to enable this functionality for (yes, node00 of the main cluster), and run the script: setupkeys
node00:~$ setupkeys ...
At this point, we can return to the video wall machines.
To make everything happy, we need to ensure that ssh has no issues logging us into each of the machines on the video wall. To do this, we are going to run the following (just type it in at the prompt, and follow the prompts):
node132:~$ for((i=132; i<=147; i++)); do > echo -n "[node$i] " > ssh node$i "date" > done
As soon as you hit enter on the done (and assuming you have everything typed in correctly), it will start to try and log into each machine on the video wall, one at a time, starting with node132.
If you've never logged into a particular machine before from your account, you'll be prompted to authorized the connection (say “yes”– the whole word, not just 'y', and hit enter).
If you ssh key is set up correctly, it will then display the date (as it has just logged you into that machine and run the date command, then logged back out).
Once you've performed any initial introductions, go ahead and run the above again… you basically want everything to go without a hitch (no prompting for password, no other prompting at all– just a list of the machines and their dates should be output to the screen).
With this in place, you should be able to launch MPI jobs.