=MPI Jobs=
Run the following command:
apt-get install lam4-dev lam-runtime lam-mpidoc gcc
Create a file anywhere with any name. Just remember where and what it is. This will be a list of the computers to be lambooted. For this guide, the file will be called 'hosts' and be stored in the current user's home directory.
Add every computer to be lambooted to the file. Here's an example:
system0 system1 system3 system4
You can also add them via their IP addresses:
192.168.1.100 192.168.1.101 192.168.1.102 192.168.1.103
run the following command:
lamboot -v ~/hosts
This is an example line, using the example file we set up. Make sure to point it to where you want it. The -v is for verbose mode, meaning it will print out full details of what it's doing.
I'm going to walk you through making, compiling, and running a simple MPI program that will have each system print out a name and number.
Make a new file. It will be ~/hello.c for this example. Copy the contents below into the file:
/*The Parallel Hello World Program*/ #include <stdio.h> #include <mpi.h> main(int argc, char **argv) { int node; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &node); printf("Hello World from Node %d\n",node); MPI_Finalize(); }
While in the directory of the file run the following command, substituting in the names you're using:
mpicc -o hello hello.c
Run the following command, again making any proper substitutions:
mpirun -np 4 ./hello
In this example, we've told mpirun to use four of the currently lambooted systems. This number can be changed to anything in the range of the number of systems lambooted. For example, you can run his with a two even if there are more than four systems lambooted.
When you're all done, it's standard procedure to stop the lamboot. To do this, run the following command:
lamhalt -v
You don't need the -v, but it can be useful.
=Xen=
This guide was written after loosely following this other guide. [http://www.howtoforge.com/debian_etch_xen_3.1]
For this guide, there are two ways to install Xen. One is from source, and another is from custom archives made from the first time building from source. Both will be covered in this guide.
Download source code from website and extract it:
cd /usr/src wget http://bits.xensource.com/oss-xen/release/3.1.0/src.tgz/xen-3.1.0-src.tgz tar -xvzf xen-3.1.0-src.tgz
Run:
cd xen-3.1.0-src
Edit a line in Config.mk to match below (or appropriate to this effect)
:
XEN_ENABLE_PAE = n
Run:
time make -j 4 world cd dist ./install.sh
Change Directory to kernel build folder (wherever that may be)
:
make menuconfig
* make sure memory support is set to 4GB (non-PAE) * set ID string (typically with name and date) * make any other customizations
Run:
time make -j 4
Copy the appropriate kernel, system map, and .config files to /boot and rename them appropriately.
Run:
make modules_install mkinitramfs -o /boot/initrd.img-<kernel name> <kernel name>
Run the following commands as root, making the appropriate substitutions for localities:
apt-get install iproute bridge-utils python-twisted binutils zlib1g-dev python-dev transfig bzip2 screen ssh debootstrap libcurl3-dev libncurses5-dev x-dev build-essential gettext cd /usr/src scp user@location.of:/xensrc.tar . tar -xvf xensrc.tar ./xen-3.1.0-src/dist/install.sh rm xensrc.tar cd / scp user@location.of:/kernel.tar tar -xvf kernel.tar rm kernel.tar
Edit /boot/grub/menu.lst and add the new kernel to the grub menu above the other kernels (near the bottom):
title Debian GNU/Linux, kernel kernelname root (hd0,0) kernel /boot/xen-3.1.0.gz module /boot/vmlinuz-kernelname root=/dev/hda1 ro console=tty0 max_loop=255 module /boot/initrd.img-kernelname savedefault
Run the following commands as root:
depmod kernelname update-rc.d xend defaults 20 21 update-rc.d xendomains defaults 21 20 mv /lib/tls /lib/tls.disabled reboot
These commands will make an archive containing the Xen source:
cd /usr/src tar cvf xensrc.tar xen-3.1.0-src
These commands will make an archive containing the finished kernel and its modules:
cd / tar cvf kernel.tar /boot/vmlinuz-kernelname /boot/config-kernelname /boot/System.map-kernelname /boot/initrd.img-kernelname /lib/modules/kernelname
Be sure to make all appropriate substitutions for kernelname.
NOTE TO THE READER: This section uses a specific setup as an example.
Run the following commands as root:
apt-get install xen-tools mkdir /vserver
Edit the /etc/xen-tools/xen-tools.conf file. These are the lines changed for this example:
dir=/vserver debootstrap=1 size=2Gb dist=etch image=full gateway=137.238.7.254 netmask=255.255.255.0 passwd=1 kernel=/boot/vmlinuz-kernelname initrd=/boot/initrd.img-kernelname mirror=http://137.238.7.148:9999/debian/
You can now make the disk image for a xen virtual machine using the config file you just edited as a basis.
xen-create-image --hostname=____ --ip=____ --ide
You now have a xen virtual machine disk image, swap image, and cfg file for your virtual machine.
xm create -c /etc/xen/____.cfg
This will create a virtual machine from the specified cfg file and connect you to the console (-c).
Detach from the console by pressing CTRL+]
xm list
This will list all running virtual machines (and the host).
xm console <name>
This will re-attach yo to a virtual machine's console.
xm save <name>
This will save the current state of a virtual machine. Saved virtual machines are stored in /var/lib/xen/save
xm restore <name>
This will restore a virtual machine from a saved state. Keep in mind that this will not delete the saved state file.
xm shutdown <name>
This will shutdown the virtual machine properly
xm migrate --live <name> <newhost>
This will migrate a virtual machine from one Xen host to another. This is covered in more detail in the next section.
NOTE TO THE READER: This section uses a specific setup as an example.
In order to migrate a virtual machine, the image file for that machine has to be 'locally' accessible to both the current host and the new host. The simplest way to do this would be to have the image files all stored on an NFS server and mount an NFS share at boot time. This is not done using AutoFS, because those mounts are not made until login. It will also not be done using fstab, in order to ensure that the mount would take place long after a network connection is established. So we do the following: * Copy all image files to a folder on the NFS server. * Add that folder as an NFS share. * Edit /etc/rc.local and add an appropriate mount line to the bottom.
Edit the following line in /etc/xen/xend-config.sxp (or something to that effect) from this:
(xend-relocation-hosts-allow '^localhost$')
To something like this:
(xend-relocation-hosts-allow '^localhost$ ^137[.]238[.]7[.][0-9][0-9][0-9]$ ^shiznit[0-9][.]ds[.]geneseo[.]edu$')
=Various Tidbits=
This guide is specifically for a Dell Optiplex GX260 and it's integrated (see crappy
) graphics.
Install Xorg (as root) with the following command:
apt-get install xorg
Have X handle most of the configuration on its own (thus helping you out) with the following command:
Xorg -configure
Still having problems? That's because Xorg rarely configures things properly. The following changes have to be made to the newly made /root/xorg.conf.new file. (Sorry about the poor readability)
... Section "Input Device" ... Option "Device" "/dev/''psaux''" ... EndSection Section "Monitor" ... ModelName "Monitor Model" ''HorizSync 31.5-79.0'' ''VertRefresh 56.0-76.0'' ''Option "DPMS"'' ''Modeline "1280x1024" 109.62 1280 1336 1472 1720 1024 1024 1026 1062'' End Section ... Section "Device" ... Driver "i810" ''VideoRam 8192'' ... BusID "PCI:0:2:0" ''Option "UseFBDev" "true"'' ''Option "VBERestore" "true"'' EndSection Section "Screen" Add line: ''Modes "1280x1024" "1024x768" "800x600"'' to each SubSection "Display"
Now, copy that configuration to the proper place (so that you don't have to always tell X what configuration to use) with the following command:
cp /root/xorg.conf.new /etc/X11/xorg.conf
* It might be wise to back up the original configuration beforehand.
Now, just run:
startx
If you see a grey screen with a black X in the middle, it works! Continue tweaking it to your heart's content. Once you get it working, try installing an X Window Manager!
Synergy is a fun utility that allows a computer (the synergy server) to share its mouse and keyboard with other computers (the clients). It is cross-platform and allows you to use one keyboard and mouse across multiple machines. It's fairly simple to set up. Included here are a few links to places that describe how to do so. * The Article that led me to looking at Synergy in the first place [http://lifehacker.com/software/top/hack-attack-control-multiple-computers-with-a-single-keyboard-and-mouse-254648.php] * The Synergy Source Forge Project Page. Contains Source, Unix Binaries, and Windows GUI [http://sourceforge.net/projects/synergy2/] * SynergyKM, a GUI tie-in for the Mac [http://software.landryhetu.com/synergy/] * Nice Descriptive Synergy Guide [http://www.linux.com/articles/54628]
Edit the file ~/.bashrc * Add the following line to the bottom of the file.
alias name=’command’
* Here are a few useful alias examples.
alias ls='ls -a --color=auto' alias vi='vim' alias ftp=’ftp -v’ alias lamboot='lamboot -v' alias lamhalt='lamhalt -v' alias grep='grep --color=auto'
For a Mac OS X installation, edit the ~/.profile file. It will probably not exist yet.
Edit (or create) the file ~/.vimrc * Add whatever you want as VIM defaults to the file. An example file is as follows:
syn on set tabstop=4 set shiftwidth=4
Run the following commands:
apt-get install ntpdate ntpdate ntp.geneseo.edu
There are two login messages, one before the login prompt, and one after. The one before is only seen on the local machine, but the one after is also seen during ssh sessions.
The message before the login prompt can be found and edited in the /etc/issue file.
The message after the login prompt can be found and edited in the /etc/motd.tail file. * The /etc/motd file is reset at boot time, using the contents of motd.tail
If there's ever a problem trying to work with a display, got onto that machine and run:
xhost +
That might do the trick.
Commands to try showing off: * xeyes * xloadimage * xsetroot -solid “{COLOR}” * any non-openGL xscreensaver (run on root screen with -root)
* vim * screen * ssh * ftp * less * bsdgames * ntpdate * pciutils
Restore current screen session:
screen -r
Detach from current screen session:
ctrl-a d
List all currently open screen sessions:
ctrl-a "
Create a new screen session:
ctrl-a c
The new universe in the DSLAB will use the internal domain name “dslab.lan”, and complements the LAIR's current domain name “offbyone.lan”.
juicebar (10.81.1.1) is the current DNS server, so perform all changes there for now.
Files of interest on juicebar include:
/var/named/
This is the location of the name server configuration files, as well as the actual DNS zone definitions.
/var/named/master - Zone definitions that the current server is responsible for. /var/named/slave - Zone definitions that the current server is also hosting. /var/named/standard - Zone definitions for basic DNS services (loopback, etc.).
The only place where you'll ever have a need to go is into master. Any changes made to the slave configurations will probably be changed when the servers synchronize.
Inside /var/named/master are (at least) two files:
10.81.1 - reverse lookup dslab.lan - domain definition
dslab.lan defines all the names that are present in the dslab.lan domain, as well as what IP they should be mapped to.
10.81.1 does the opposite. It has the last octet of the IP address that it uses to look up the domain (reverse lookup).
Any changes/additions/modifications must be made, as appropriate, to BOTH files. Follow the existing format.
ADDITIONALLY: Any changes that are made, you MUST update the serial number. This is how the servers recognize changes. If you make changes but do not update the serial number, confusion may occur!
Standard format for the serial number is to use the date. For example, if the date is August 2, 2007, and this is your first change of the day, the serial number should be:
2007080201
If you make additional changes during the same day, obviously the date would not have changed, but you can increment the last 2 digits by one… so the second change would appear as:
2007080202
…and so on. This way, you can make up to 99 changes per day.
When you've completed changes to both files (good practice is to synchonize the serial numbers in both), be sure to restart the service:
juicebar:~# dhcpupdate
No changes need to be done to synchronize, although it may take some time for changes to be recognized by the other DNS server, or even other machines on the network, as DNS requests are commonly cached to speed up lookups.
In conjunction with DNS, DHCP helps to automate the network configuration and location of machines.
Currently, juicebar (10.81.1.4) is also running the DHCP server.
First off, the relevant file for DHCP is:
/etc/dhcpd.conf
Editing this file shows a bunch of information… perhaps slowly read over it to try and make any sense of it all.
The big focus will be the actual machine entries. Scroll/page down until you find the Dell cluster entries.. node132's is presented here as an example:
host node132.dslab.lan { hardware ethernet 00:08:74:D1:64:BD; fixed-address node132.dslab.lan; }
Note the lack of any IP address– this is provided via DNS. However, do note the presence of the MAC address. This is how each machine will receive the same IP each time.
So.. to make additional entries, merely make a copy of an existing entry and fill it out appropriately; changing the host and fixed-address, and putting in the correct MAC address of the machine.
When finished, restart DHCP:
juicebar:~# dhcpupdate
If there are errors, DHCP may fail to start. You can check the system logs (/var/log/daemon.log) to see if there are any problems, and they'll usually indicate the line number of the error.
There exists a VPN server for DSLAB users to connect to. Following will be information to assist in establishing a working configuration, certificate/key, and hopefully a working connection. This information is provided from the point of view of the server– specific client deployments may vary (Linux is easy, MacOS X also somewhat, Windows… good luck).
I've set up a sample config file that can be modified for use. Please substitute all occurrences of the string “USER” with the actual user name associated with the client certificate/key.
You will want to make a COPY of this file, and place it on YOUR machine that will be attempting to VPN in… your specific config file does NOT need to reside on juicebar. The only reason it is here is so you can obtain a copy to use.
############################################################################## # # DSLAB OpenVPN Client Configuration File (sample) # # This configuration is to facilitate the joining of the DSLAB VPN. # # Please replace all instances of USER with the actual user name (also the # name on the VPN certificate/key). # ############################################################################## ############################################################################## # VPN Server Information ############################################################################## remote 137.238.7.4 # IP of remote OpenVPN server port 1194 # Port on which to connect on server proto udp # Type of traffic {tcp-client|udp} ############################################################################## # Network Interfaces ############################################################################## dev-type tap # Type of interface to use {tap|tun} dev tap0 # Interface name (tap0 or tun0) ############################################################################## # Credentials ############################################################################## cd /etc/openvpn # establish proper working directory key dslab/client-USER.key # Server key (private) ca dslab/ca.crt # Certificate (public) cert dslab/client-USER.crt # Server Cert (private) tls-cipher EDH-RSA-DES-CBC3-SHA # set tls cipher type ############################################################################## # Client Settings ############################################################################## comp-lzo # use fast LZO compression keepalive 10 120 # send packets to keep sessions alive nobind # don't bind to local address & port persist-key # don't re-read keys across restarts persist-tun # on restart, don't reset tun device pull # Follow route suggestions of server resolv-retry infinite # keep trying to connect if failure route-delay 8 # delay setting routes for 8 seconds tls-client # enable TLS and assume client role ############################################################################## # System Options ############################################################################## chroot /etc/openvpn # run in a chroot of VPN directory user nobody # after launching, drop privs group nobody # after launching, drop privs daemon # detach and run in background ############################################################################## # Verbosity/Logging Options ############################################################################## #status log/status.log # status log file log-append log/backbone.log # log file verb 3 # level of activity to log (0-11) mute 20 # log at most N consecutive messages ##############################################################################
On the client system, you'll need a place to put the keys/cert, config, and have log files stored… on Linux systems, a good place to put this would be in /etc/openvpn. A recommended directory layout would be as follows:
/etc/openvpn/ /etc/openvpn/client-USER.conf # again, please replace 'USER' with your username /etc/openvpn/dslab/ /etc/openvpn/dslab/ca.key /etc/openvpn/dslab/client-USER.key /etc/openvpn/dslab/client-USER.crt /etc/openvpn/log/
Creating any of these directories that don't exist would be a very good thing for ultimate success.
We will need to create the necessary authenticating bits, which can be done on juicebar (as root). Follow the example below:
First up, go to the right place and get the variables loaded:
juicebar:~# cd /etc/openvpn/easy-rsa/ juicebar:/etc/openvpn/easy-rsa# . ./vars NOTE: If you run ./clean-all, I will be doing a rm -rf on /etc/openvpn/easy-rsa/keys juicebar:/etc/openvpn/easy-rsa#
NOTE: Make sure you do *NOT* run ./clean-all, as that will delete all the keys, and we'd have to start again from scratch.
NOTE: Really, do *NOT* run ./clean-all, it would be VERY BAD (well, we'd have to recreate certs/keys for EVERYONE, which would not be a fun time). So DON'T. Please.
Next, let's create a key (we'll use USER as the example… replace the desired client name (typically a username or short identifying word in place of USER– common practice is just to use your normal username, just substitute 'USER' in the example below with your actual username):
juicebar:/etc/openvpn/easy-rsa# ./build-key client-USER Generating a 1024 bit RSA private key ...................................................................................................++++++ .....++++++ writing new private key to 'client-USER.key' ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [US]: State or Province Name (full name) [NY]: Locality Name (eg, city) [Upstate]: Organization Name (eg, company) [BITS]: Organizational Unit Name (eg, section) []:DSLAB Common Name (eg, your name or your server's hostname) [client-USER]: Email Address [haas@corning-cc.edu]: Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []: Using configuration from /etc/openvpn/easy-rsa/openssl.cnf DEBUG[load_index]: unique_subject = "yes" Check that the request matches the signature Signature ok The Subject's Distinguished Name is as follows countryName :PRINTABLE:'US' stateOrProvinceName :PRINTABLE:'NY' localityName :PRINTABLE:'Upstate' organizationName :PRINTABLE:'BITS' organizationalUnitName:PRINTABLE:'DSLAB' commonName :PRINTABLE:'client-USER' emailAddress :IA5STRING:'haas@corning-cc.edu' Certificate is to be certified until Apr 25 02:47:12 2019 GMT (3650 days) Sign the certificate? [y/n]:y 1 out of 1 certificate requests certified, commit? [y/n]y Write out database with 1 new entries Data Base Updated juicebar:/etc/openvpn/easy-rsa#
Finally, tar up the necessary files:
juicebar:/etc/openvpn/easy-rsa# cd keys juicebar:/etc/openvpn/easy-rsa/keys# tar cvf archive/client-USER.tar client-USER.crt client-USER.key ca.crt client-USER.crt client-USER.key ca.crt juicebar:/etc/openvpn/easy-rsa/keys# gzip -9 archive/client-USER.tar juicebar:/etc/openvpn/easy-rsa/keys#
Distribute it (archive/client-USER.tar.gz) to the appropriate client (along with a custom config file, named appropriately) so it can be used and great happiness with ensue.
juicebar:~# cd /etc/openvpn/ccd juicebar:/etc/openvpn/ccd# cat client-template push "route 10.80.1.0 255.255.255.0 10.81.1.1" push "route 10.80.2.0 255.255.255.0 10.81.1.1" push "route 10.80.3.0 255.255.255.0 10.81.1.1" juicebar:/etc/openvpn/ccd# cp client-template client-USER
The OpenVPN probably doesn't need any restarting. If you SIGHUP it, it will fail to restart due to having dropped root privs, so you'd need to start it up again.
NOTE: As I just said, don't try to restart the OpenVPN daemon. It is set.
It is often helpful, while diagnosing a new client connection, to check the logs. Checking them on the server can be done as follows (as root):
juicebar:~# cd /etc/openvpn/log juicebar:/etc/openvpn/log#
There are 2 files normally in this directory:
backbone.log dslab.log
The file “backbone.log” is for the DSLAB-LAIR VPN connection. Any new VPN connections will NOT be using this, so looking there will not help you.
You will want to tail the “dslab.log” file, which can be done as follows:
juicebar:/etc/openvpn/log# tail -f dslab.log
Your terminal will then contain the end of the client connecting log file… have this running, and then attempt a client connection- you should see messages appear, which MIGHT prove of some value if the connection doesn't work as expected.
There are also log files created on the client's computer (under the “log/” subdirectory– which would be in the same place as the client config file, and next to the “dslab/” directory you'll have to make), and you can “tail -f” that/those files as well to ascertain that side of the connection.
The CoRAID SR420 is our new file storage array. Utilizing the ATA-over-Ethernet (AoE) protocol, it allows for shared storage amongst several machines.
Helping to provide that shared storage is the current use of OCFS2 (the Oracle Cluster FileSystem, version 2), which is a cluster filesystem capable of having multiple entities (peers) read/write to the same volume.
While it still needs more thorough testing, here is the low-down on its current configuration.
make 0 raid5 0.0 0.1 0.2 0.3
Creates logical blade 0 incorporating drives 0.0, 0.1, 0.2, 0.3
Don't forget to put it online:
online 0
That way, you can now list it.
Debian provides a package for AoE. Installing it, along with having a compiled kernel module in place, is all that is needed.
machine:~# apt-get install aoetools
Debian provides a package for OCFS2 as well:
machine:~# apt-get install ocfs2-tools
Be sure to have the kernel module compiled.
To check to see if there are any AoE devices on the network, perform the following:
machine:~# aoe-discover
Which will send out a packet broadcasting any AoE-compliant devices to identify themselves. Then:
machine:~# aoe-stat e0.0 2250.468GB eth0 up
Which gives us the status of any detected AoE devices. In this case, we see (as e0.0) the 2.1TB volume on the CoRAID that was detected over the current machine's eth0 network interface.
The directory:
/dev/etherd/
Contains various AoE-related devices, and amongst them should be an “e0.0” device, which is the block device for the 2.1TB volume.
To mount the volume, simply do the following:
machine:~# mount /dev/etherd/e0.0 /mnt
Or whereever you wish to mount it. It may take a few seconds (this seems to be normal), but then it will mount. It can then be accessed. You can also concurrently mount it on other machines and go about as normal.
This does assume, of course, that the volume is formatted with a cluster filesystem (ie OCFS2) so that all chaos does not break loose.
For starters, there are really only 2 files that need to be created/configured.
/etc/ocfs2/cluster.conf /etc/default/o2cb
We'll start with o2cb first:
node134:~# cat /etc/default/o2cb # # This is a configuration file for automatic startup of the O2CB # driver. It is generated by running 'dpkg-reconfigure ocfs2-tools'. # Please use that method to modify this file. # # O2CB_ENABLED: 'true' means to load the driver on boot. O2CB_ENABLED=true # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. O2CB_BOOTCLUSTER=data # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. O2CB_HEARTBEAT_THRESHOLD=7
Of importance– setting O2CB_ENABLED to true (it is false by default) will cause the appropriate services to start up on boot.
O2CB_BOOTCLUSTER should also be set to the name of the OCFS2 “cluster”, which for now has been named “data”.
Next, we have cluster.conf:
node: ip_port = 7777 ip_address = 192.168.6.132 number = 0 name = node132 cluster = data node: ip_port = 7777 ip_address = 192.168.6.133 number = 1 name = node133 cluster = data ... node: ip_port = 7777 ip_address = 192.168.6.147 number = 15 name = node147 cluster = data cluster: node_count = 16 name = data
Here, we set up all the machines that will be a part of accessing the shared volume. Each machine is given an entry and assigned a number.
Also note the cluster: directive at the very end indicating total node count as well as the name of the OCFS2 “cluster”.
IMPORTANT: Changes to cluster.conf MUST be propagated to ALL PARTICIPATING MEMBERS. I would imagine unmounting/shutting down OCFS2 services on each machine would be a very prudent thing to do. This is the complexity of the cluster filesystem– it MUST be aware of anyone who can touch it, lest weird things happen.
The script:
/etc/init.d/o2cb
Can be used to start, stop, check status of the OCFS2 cluster.
So, to bring it online on a particular machine, do the following:
/etc/init.d/o2cb start
There is a status argument that can be passed as well:
machine:~# /etc/init.d/o2cb status Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking cluster data: Online Checking heartbeat: Not active
The output given is what appears to be an operational output (not sure about the heartbeat setting— something to look into).
And, of course, one can stop O2CB services:
machine:~# /etc/init.d/o2cb stop
OCFS2 has some utilities that can be used to administer certain functions, such as formatting, filesystem checking, and filesystem tuning.
Here is what I did to format the 2.1TB volume on the CoRAID with OCFS2:
machine:~# mkfs.ocfs2 -v -N 24 /dev/etherd/e0.0
Note that if you leave off the “-N 24” the filesystem will default to only 4 nodes. What happens here is when more than 4 machines try to mount, further mount attempts are refused once the limit is reached.
We will have to test and see what sort of performance impact more mounted OCFS2 nodes have on the whole situation.
The tunefs.ocfs2 tool can be used to change settings on an existing OCFS2 volume.
Here is what I did to extend the number of nodes from 4 to 24 (because I didn't originally do it, but my example above does, which would obsolete this step):
machine:~# tunefs.ocfs2 -N 24 -v /dev/etherd/e0.0
Why 24? Well, we've got the 16 video wall machines, and I figured I'd add in some breathing room in case I wanted to test things. Overall, though, I would imagine that the fewer machines mounting the volume, the better. Only further testing will yield what the optimal resulting configuration will end up being.
If you encounter kernel log messages of the following:
peth0: received packet with own address as source address
It means there are bridge MAC addresses that are the same among more than one machine. No idea why this springs up. But, the way to fix it is to change the MAC address of the bridge when it comes up.
There is a modified /etc/xen/scripts/network-bridge script in place on the Dell cluster which has a user-specified value to insert into the MAC, thereby making it different.
Setting this value and rebooting the machine tends to clear up the issue.
D0 node149 D1 node152 D2 node148 E0 node132 E1 node133 E2 node134 E3 node135 E4 node136 E5 node137 E6 node138 E7 node139 E8 node140 E9 node141 EA node142 EB node143 EC node144 ED node145 EE node146 EF node147