=MPI Jobs=


==Install MPI and Lamboot (and gcc)==

Run the following command:
  apt-get install lam4-dev lam-runtime lam-mpidoc gcc

==Create Hosts File==

Create a file anywhere with any name. Just remember where and what it is. This will be a list of the computers to be lambooted. For this guide, the file will be called 'hosts' and be stored in the current user's home directory.


Add every computer to be lambooted to the file. Here's an example:
  system0
  system1
  system3
  system4
You can also add them via their IP addresses:
  192.168.1.100
  192.168.1.101
  192.168.1.102
  192.168.1.103


==Lamboot Cluster==

run the following command:
  lamboot -v ~/hosts
This is an example line, using the example file we set up. Make sure to point it to where you want it. The -v is for verbose mode, meaning it will print out full details of what it's doing.


==Compile and Run an MPI Program==

I'm going to walk you through making, compiling, and running a simple MPI program that will have each system print out a name and number.


===Write Program===

Make a new file. It will be ~/hello.c for this example. Copy the contents below into the file:
	
  /*The Parallel Hello World Program*/
  #include <stdio.h>
  #include <mpi.h>
  
  main(int argc, char **argv)
  {
  	int node;
  	
  	MPI_Init(&argc,&argv);
  	MPI_Comm_rank(MPI_COMM_WORLD, &node);
  	  
  	printf("Hello World from Node %d\n",node);
  	         
  	MPI_Finalize();
  }


===Compile Program===

While in the directory of the file run the following command, substituting in the names you're using:
  mpicc -o hello hello.c


===Run Program===

Run the following command, again making any proper substitutions:
  mpirun -np 4 ./hello
In this example, we've told mpirun to use four of the currently lambooted systems. This number can be changed to anything in the range of the number of systems lambooted. For example, you can run his with a two even if there are more than four systems lambooted.


==Lamhalt Cluster==

When you're all done, it's standard procedure to stop the lamboot. To do this, run the following command:
  lamhalt -v
You don't need the -v, but it can be useful.


=Xen=

This guide was written after loosely following this other guide. [http://www.howtoforge.com/debian_etch_xen_3.1]
----


==Install Xen==

For this guide, there are two ways to install Xen. One is from source, and another is from custom archives made from the first time building from source. Both will be covered in this guide.


===Xen from Source===

Download source code from website and extract it:
  cd /usr/src
  wget http://bits.xensource.com/oss-xen/release/3.1.0/src.tgz/xen-3.1.0-src.tgz
  tar -xvzf xen-3.1.0-src.tgz


Run:
  cd xen-3.1.0-src


Edit a line in Config.mk to match below ''(or appropriate to this effect)'':
  XEN_ENABLE_PAE = n


Run:
  time make -j 4 world
  cd dist
  ./install.sh


Change Directory to kernel build folder ''(wherever that may be)'':
  make menuconfig
* make sure memory support is set to 4GB (non-PAE)
* set ID string (typically with name and date)
* make any other customizations


Run:
  time make -j 4


Copy the appropriate kernel, system map, and .config files to /boot and rename them appropriately.


Run:
  make modules_install
  mkinitramfs -o /boot/initrd.img-<kernel name> <kernel name>


===Xen from Archives===

Run the following commands as root, making the appropriate substitutions for localities:
  apt-get install iproute bridge-utils python-twisted binutils zlib1g-dev python-dev transfig
  		bzip2 screen ssh debootstrap libcurl3-dev libncurses5-dev x-dev build-essential gettext
  cd /usr/src
  scp user@location.of:/xensrc.tar .
  tar -xvf xensrc.tar
  ./xen-3.1.0-src/dist/install.sh
  rm xensrc.tar
  cd /
  scp user@location.of:/kernel.tar
  tar -xvf kernel.tar
  rm kernel.tar

===Both===

Edit /boot/grub/menu.lst and add the new kernel to the grub menu above the other kernels (near the bottom):
  title		Debian GNU/Linux, kernel kernelname
  root		(hd0,0)
  kernel		/boot/xen-3.1.0.gz
  module		/boot/vmlinuz-kernelname root=/dev/hda1 ro console=tty0 max_loop=255
  module		/boot/initrd.img-kernelname
  savedefault


Run the following commands as root:
  depmod kernelname
  update-rc.d xend defaults 20 21
  update-rc.d xendomains defaults 21 20
  mv /lib/tls /lib/tls.disabled
  reboot

==Make Xen Archives==

These commands will make an archive containing the Xen source:
  cd /usr/src
  tar cvf xensrc.tar xen-3.1.0-src


These commands will make an archive containing the finished kernel and its modules:
  cd /
  tar cvf kernel.tar /boot/vmlinuz-kernelname /boot/config-kernelname /boot/System.map-kernelname /boot/initrd.img-kernelname /lib/modules/kernelname
Be sure to make all appropriate substitutions for kernelname.


==Create Virtual Machine==

NOTE TO THE READER: This section uses a specific setup as an example.
----

Run the following commands as root:
  apt-get install xen-tools
  mkdir /vserver


Edit the /etc/xen-tools/xen-tools.conf file. These are the lines changed for this example:
  dir=/vserver
  debootstrap=1
  size=2Gb
  dist=etch
  image=full
  gateway=137.238.7.254
  netmask=255.255.255.0
  passwd=1
  kernel=/boot/vmlinuz-kernelname
  initrd=/boot/initrd.img-kernelname
  mirror=http://137.238.7.148:9999/debian/


You can now make the disk image for a xen virtual machine using the config file you just edited as a basis.
  xen-create-image --hostname=____ --ip=____ --ide


You now have a xen virtual machine disk image, swap image, and cfg file for your virtual machine.

==Standard Xen Commands==

  xm create -c /etc/xen/____.cfg
This will create a virtual machine from the specified cfg file and connect you to the console (-c).

Detach from the console by pressing CTRL+]


  xm list
This will list all running virtual machines (and the host).


  xm console <name>
This will re-attach yo to a virtual machine's console.


  xm save <name>
This will save the current state of a virtual machine. Saved virtual machines are stored in /var/lib/xen/save


  xm restore <name>
This will restore a virtual machine from a saved state. Keep in mind that this will not delete the saved state file.

  xm shutdown <name>
This will shutdown the virtual machine properly


  xm migrate --live <name> <newhost>
This will migrate a virtual machine from one Xen host to another. This is covered in more detail in the next section.

==Migrate Virtual Machines==

NOTE TO THE READER: This section uses a specific setup as an example.
----


In order to migrate a virtual machine, the image file for that machine has to be 'locally' accessible to both the current host and the new host. The simplest way to do this would be to have the image files all stored on an NFS server and mount an NFS share at boot time. This is not done using AutoFS, because those mounts are not made until login. It will also not be done using fstab, in order to ensure that the mount would take place long after a network connection is established. So we do the following:
* Copy all image files to a folder on the NFS server.
* Add that folder as an NFS share.
* Edit /etc/rc.local and add an appropriate mount line to the bottom.


Edit the following line in /etc/xen/xend-config.sxp (or something to that effect) from this:
  (xend-relocation-hosts-allow '^localhost$')
To something like this:
  (xend-relocation-hosts-allow '^localhost$ ^137[.]238[.]7[.][0-9][0-9][0-9]$ ^shiznit[0-9][.]ds[.]geneseo[.]edu$')

=Various Tidbits=


==Setting Up X(org)==

This guide is specifically for a Dell Optiplex GX260 and it's integrated (see ''crappy'') graphics.

Install Xorg (as root) with the following command:
  apt-get install xorg


Have X handle most of the configuration on its own (thus helping you out) with the following command:
  Xorg -configure


Still having problems? That's because Xorg rarely configures things properly. The following changes have to be made to the newly made /root/xorg.conf.new file. (Sorry about the poor readability)
  ...
  Section "Input Device"
  	...
  	Option	"Device" "/dev/''psaux''"
  	...
  EndSection
  
  Section "Monitor"
  	...
  	ModelName	"Monitor Model"
  	''HorizSync 31.5-79.0''
  	''VertRefresh 56.0-76.0''
  	''Option "DPMS"''
  	''Modeline "1280x1024" 109.62 1280 1336 1472 1720 1024 1024 1026 1062''
  End Section
  ...
  Section "Device"
  	...
  	Driver		"i810"
  	''VideoRam	8192''
  	...
  	BusID       "PCI:0:2:0"
  	''Option "UseFBDev" "true"''
  	''Option "VBERestore" "true"''
  EndSection
  
  Section "Screen"
  Add line:
  		''Modes  	"1280x1024" "1024x768" "800x600"''
  to each SubSection "Display"


Now, copy that configuration to the proper place (so that you don't have to always tell X what configuration to use) with the following command:
  cp /root/xorg.conf.new /etc/X11/xorg.conf
* It might be wise to back up the original configuration beforehand.


Now, just run:
  startx
If you see a grey screen with a black X in the middle, it works! Continue tweaking it to your heart's content. Once you get it working, try installing an X Window Manager!

==Synergy==

Synergy is a fun utility that allows a computer (the synergy server) to share its mouse and keyboard with other computers (the clients). It is cross-platform and allows you to use one keyboard and mouse across multiple machines. It's fairly simple to set up. Included here are a few links to places that describe how to do so.
* The Article that led me to looking at Synergy in the first place [http://lifehacker.com/software/top/hack-attack-control-multiple-computers-with-a-single-keyboard-and-mouse-254648.php]
* The Synergy Source Forge Project Page. Contains Source, Unix Binaries, and Windows GUI [http://sourceforge.net/projects/synergy2/]
* SynergyKM, a GUI tie-in for the Mac [http://software.landryhetu.com/synergy/]
* Nice Descriptive Synergy Guide [http://www.linux.com/articles/54628]

==Add Command Alias' to User Account==

Edit the file ~/.bashrc
* Add the following line to the bottom of the file.
  alias name=’command’
* Here are a few useful alias examples.
  alias ls='ls -a --color=auto'
  alias vi='vim'
  alias ftp=’ftp -v’
  alias lamboot='lamboot -v'
  alias lamhalt='lamhalt -v'
  alias grep='grep --color=auto'


For a Mac OS X installation, edit the ~/.profile file. It will probably not exist yet.

==VIM Defaults==

Edit (or create) the file ~/.vimrc
* Add whatever you want as VIM defaults to the file. An example file is as follows:
  syn on
  set tabstop=4
  set shiftwidth=4


==Update System Time==

Run the following commands:
  apt-get install ntpdate
  ntpdate ntp.geneseo.edu


==Change Login Messages==

There are two login messages, one before the login prompt, and one after. The one before is only seen on the local machine, but the one after is also seen during ssh sessions.


The message before the login prompt can be found and edited in the /etc/issue file.

The message after the login prompt can be found and edited in the /etc/motd.tail file.
* The /etc/motd file is reset at boot time, using the contents of motd.tail


==Video Wall Tidbits==

If there's ever a problem trying to work with a display, got onto that machine and run:
  xhost +
That might do the trick.


Commands to try showing off:
* xeyes
* xloadimage
* xsetroot -solid "{COLOR}"
* any non-openGL xscreensaver (run on root screen with -root)


==Useful Packages==

* vim
* screen
* ssh
* ftp
* less
* bsdgames
* ntpdate
* pciutils

==Basic Screen Commands==

Restore current screen session:
  screen -r

Detach from current screen session:
  ctrl-a d

List all currently open screen sessions:
  ctrl-a "

Create a new screen session:
  ctrl-a c

==DNS==
The new universe in the DSLAB will use the internal domain name "dslab.lan", and complements the LAIR's current domain name "offbyone.lan".

juicebar (10.81.1.1) is the current DNS server, so perform all changes there for now.

Files of interest on juicebar include:

  /var/named/

This is the location of the name server configuration files, as well as the actual DNS zone definitions.

  /var/named/master    -  Zone definitions that the current server is responsible for.
  /var/named/slave       -  Zone definitions that the current server is also hosting.
  /var/named/standard -  Zone definitions for basic DNS services (loopback, etc.).

The only place where you'll ever have a need to go is into master. Any changes made to the slave configurations will probably be changed when the servers synchronize.

Inside /var/named/master are (at least) two files:

  10.81.1 - reverse lookup
  dslab.lan   - domain definition

dslab.lan defines all the names that are present in the dslab.lan domain, as well as what IP they should be mapped to.

10.81.1 does the opposite. It has the last octet of the IP address that it uses to look up the domain (reverse lookup).

Any changes/additions/modifications must be made, as appropriate, to BOTH files. Follow the existing format.

ADDITIONALLY: Any changes that are made, you MUST update the serial number. This is how the servers recognize changes. If you make changes but do not update the serial number, confusion may occur!

Standard format for the serial number is to use the date. For example, if the date is August 2, 2007, and this is your first change of the day, the serial number should be:

  2007080201

If you make additional changes during the same day, obviously the date would not have changed, but you can increment the last 2 digits by one... so the second change would appear as:

  2007080202

...and so on. This way, you can make up to 99 changes per day.

When you've completed changes to both files (good practice is to synchonize the serial numbers in both), be sure to restart the service:

  juicebar:~# dhcpupdate

No changes need to be done to synchronize, although it may take some time for changes to be recognized by the other DNS server, or even other machines on the network, as DNS requests are commonly cached to speed up lookups.

==DHCP==

In conjunction with DNS, DHCP helps to automate the network configuration and location of machines.

Currently, juicebar (10.81.1.4) is also running the DHCP server.

First off, the relevant file for DHCP is:

  /etc/dhcpd.conf

Editing this file shows a bunch of information... perhaps slowly read over it to try and make any sense of it all.

The big focus will be the actual machine entries. Scroll/page down until you find the Dell cluster entries.. node132's is presented here as an example:

                host node132.dslab.lan {
                        hardware ethernet 00:08:74:D1:64:BD;
                        fixed-address node132.dslab.lan;
                }

Note the lack of any IP address-- this is provided via DNS. However, do note the presence of the MAC address. This is how each machine will receive the same IP each time.

So.. to make additional entries, merely make a copy of an existing entry and fill it out appropriately; changing the host and fixed-address, and putting in the correct MAC address of the machine.

When finished, restart DHCP:

  juicebar:~# dhcpupdate

If there are errors, DHCP may fail to start. You can check the system logs (/var/log/daemon.log) to see if there are any problems, and they'll usually indicate the line number of the error.

==VPN==
There exists a VPN server for DSLAB users to connect to. Following will be information to assist in establishing a working configuration, certificate/key, and hopefully a working connection. This information is provided from the point of view of the server-- specific client deployments may vary (Linux is easy, MacOS X also somewhat, Windows... good luck).

===Configure the OpenVPN client config (/etc/openvpn/easy-rsa/keys/archive/conf/)===

I've set up a sample config file that can be modified for use. Please substitute all occurrences of the string "USER" with the actual user name associated with the client certificate/key.

You will want to make a COPY of this file, and place it on YOUR machine that will be attempting to VPN in... your specific config file does NOT need to reside on juicebar. The only reason it is here is so you can obtain a copy to use.

  ##############################################################################
  #
  #   DSLAB OpenVPN Client Configuration File (sample)
  #
  #   This configuration is to facilitate the joining of the DSLAB VPN.
  #
  #   Please replace all instances of USER with the actual user name (also the
  #   name on the VPN certificate/key).
  #
  ##############################################################################
  
  ##############################################################################
  #   VPN Server Information
  ##############################################################################
  remote          137.238.7.4             # IP of remote OpenVPN server
  port            1194                    # Port on which to connect on server
  proto           udp                     # Type of traffic {tcp-client|udp}
  
  ##############################################################################
  #   Network Interfaces
  ##############################################################################
  dev-type        tap                     # Type of interface to use {tap|tun}
  dev             tap0                    # Interface name (tap0 or tun0)
  
  ##############################################################################
  #   Credentials
  ##############################################################################
  cd              /etc/openvpn            # establish proper working directory
  key             dslab/client-USER.key   # Server key (private)
  ca              dslab/ca.crt            # Certificate (public)
  cert            dslab/client-USER.crt   # Server Cert (private)
  tls-cipher      EDH-RSA-DES-CBC3-SHA    # set tls cipher type
  
  ##############################################################################
  #   Client Settings
  ##############################################################################
  comp-lzo                                # use fast LZO compression
  keepalive       10      120             # send packets to keep sessions alive
  nobind                                  # don't bind to local address & port
  persist-key                             # don't re-read keys across restarts
  persist-tun                             # on restart, don't reset tun device
  pull                                    # Follow route suggestions of server
  resolv-retry    infinite                # keep trying to connect if failure
  route-delay     8                       # delay setting routes for 8 seconds
  tls-client                              # enable TLS and assume client role
  
  ##############################################################################
  #   System Options
  ##############################################################################
  chroot          /etc/openvpn            # run in a chroot of VPN directory
  user            nobody                  # after launching, drop privs
  group           nobody                  # after launching, drop privs
  daemon                                  # detach and run in background
  
  ##############################################################################
  #   Verbosity/Logging Options
  ##############################################################################
  #status         log/status.log          # status log file
  log-append      log/backbone.log        # log file
  verb            3                       # level of activity to log (0-11)
  mute            20                      # log at most N consecutive messages
  
  ##############################################################################

====Client Config====
On the client system, you'll need a place to put the keys/cert, config, and have log files stored... on Linux systems, a good place to put this would be in /etc/openvpn. A recommended directory layout would be as follows:

  /etc/openvpn/
  /etc/openvpn/client-USER.conf      # again, please replace 'USER' with your username
  /etc/openvpn/dslab/
  /etc/openvpn/dslab/ca.key
  /etc/openvpn/dslab/client-USER.key
  /etc/openvpn/dslab/client-USER.crt
  /etc/openvpn/log/

Creating any of these directories that don't exist would be a very good thing for ultimate success.

===Create client certificate/key===
We will need to create the necessary authenticating bits, which can be done on juicebar (as root). Follow the example below:

First up, go to the right place and get the variables loaded:

  juicebar:~# cd /etc/openvpn/easy-rsa/
  juicebar:/etc/openvpn/easy-rsa# . ./vars
  NOTE: If you run ./clean-all, I will be doing a rm -rf on /etc/openvpn/easy-rsa/keys
  juicebar:/etc/openvpn/easy-rsa#

NOTE: Make sure you do *NOT* run ./clean-all, as that will delete all the keys, and we'd have to start again from scratch.

NOTE: Really, do *NOT* run ./clean-all, it would be VERY BAD (well, we'd have to recreate certs/keys for EVERYONE, which would not be a fun time). So DON'T. Please.

Next, let's create a key (we'll use USER as the example... replace the desired client name (typically a username or short identifying word in place of USER-- common practice is just to use your normal username, just substitute 'USER' in the example below with your actual username):

  juicebar:/etc/openvpn/easy-rsa# ./build-key client-USER
  Generating a 1024 bit RSA private key
  ...................................................................................................++++++
  .....++++++
  writing new private key to 'client-USER.key'
  -----
  You are about to be asked to enter information that will be incorporated
  into your certificate request.
  What you are about to enter is what is called a Distinguished Name or a DN.
  There are quite a few fields but you can leave some blank
  For some fields there will be a default value,
  If you enter '.', the field will be left blank.
  -----
  Country Name (2 letter code) [US]:
  State or Province Name (full name) [NY]:
  Locality Name (eg, city) [Upstate]:
  Organization Name (eg, company) [BITS]:
  Organizational Unit Name (eg, section) []:DSLAB
  Common Name (eg, your name or your server's hostname) [client-USER]:
  Email Address [haas@corning-cc.edu]:
  
  Please enter the following 'extra' attributes
  to be sent with your certificate request
  A challenge password []:
  An optional company name []:
  Using configuration from /etc/openvpn/easy-rsa/openssl.cnf
  DEBUG[load_index]: unique_subject = "yes"
  Check that the request matches the signature
  Signature ok
  The Subject's Distinguished Name is as follows
  countryName           :PRINTABLE:'US'
  stateOrProvinceName   :PRINTABLE:'NY'
  localityName          :PRINTABLE:'Upstate'
  organizationName      :PRINTABLE:'BITS'
  organizationalUnitName:PRINTABLE:'DSLAB'
  commonName            :PRINTABLE:'client-USER'
  emailAddress          :IA5STRING:'haas@corning-cc.edu'
  Certificate is to be certified until Apr 25 02:47:12 2019 GMT (3650 days)
  Sign the certificate? [y/n]:y
  
  
  1 out of 1 certificate requests certified, commit? [y/n]y
  Write out database with 1 new entries
  Data Base Updated
  juicebar:/etc/openvpn/easy-rsa#

Finally, tar up the necessary files:

  juicebar:/etc/openvpn/easy-rsa# cd keys
  juicebar:/etc/openvpn/easy-rsa/keys# tar cvf archive/client-USER.tar client-USER.crt client-USER.key ca.crt
  client-USER.crt
  client-USER.key
  ca.crt
  juicebar:/etc/openvpn/easy-rsa/keys# gzip -9 archive/client-USER.tar
  juicebar:/etc/openvpn/easy-rsa/keys#

Distribute it (archive/client-USER.tar.gz) to the appropriate client (along with a custom config file, named appropriately) so it can be used and great happiness with ensue.

===Configure client-specific OpenVPN settings===

  juicebar:~# cd /etc/openvpn/ccd
  juicebar:/etc/openvpn/ccd# cat client-template
  push "route 10.80.1.0 255.255.255.0 10.81.1.1"
  push "route 10.80.2.0 255.255.255.0 10.81.1.1"
  push "route 10.80.3.0 255.255.255.0 10.81.1.1"
  juicebar:/etc/openvpn/ccd# cp client-template client-USER

The OpenVPN probably doesn't need any restarting. If you SIGHUP it, it will fail to restart due to having dropped root privs, so you'd need to start it up again.

NOTE: As I just said, don't try to restart the OpenVPN daemon. It is set.

===Troubleshooting Client Connection===
It is often helpful, while diagnosing a new client connection, to check the logs. Checking them on the server can be done as follows (as root):

  juicebar:~# cd /etc/openvpn/log
  juicebar:/etc/openvpn/log#

There are 2 files normally in this directory:

  backbone.log
  dslab.log

The file "backbone.log" is for the DSLAB-LAIR VPN connection. Any new VPN connections will NOT be using this, so looking there will not help you.

You will want to tail the "dslab.log" file, which can be done as follows:

  juicebar:/etc/openvpn/log# tail -f dslab.log

Your terminal will then contain the end of the client connecting log file... have this running, and then attempt a client connection- you should see messages appear, which MIGHT prove of some value if the connection doesn't work as expected.

There are also log files created on the client's computer (under the "log/" subdirectory-- which would be in the same place as the client config file, and next to the "dslab/" directory you'll have to make), and you can "tail -f" that/those files as well to ascertain that side of the connection.

==CoRAID/OCFS2==
The CoRAID SR420 is our new file storage array. Utilizing the ATA-over-Ethernet (AoE) protocol, it allows for shared storage amongst several machines.

Helping to provide that shared storage is the current use of OCFS2 (the Oracle Cluster FileSystem, version 2), which is a cluster filesystem capable of having multiple entities (peers) read/write to the same volume.

While it still needs more thorough testing, here is the low-down on its current configuration.

===Create new logical storage blade===
  make 0 raid5 0.0 0.1 0.2 0.3

Creates logical blade 0 incorporating drives 0.0, 0.1, 0.2, 0.3

Don't forget to put it online:

  online 0

That way, you can now list it.

===Install AoE tools===
Debian provides a package for AoE. Installing it, along with having a compiled kernel module in place, is all that is needed.

  machine:~# apt-get install aoetools

===Install OCFS2 support===
Debian provides a package for OCFS2 as well:

  machine:~# apt-get install ocfs2-tools

Be sure to have the kernel module compiled.

===Check for AoE devices===
To check to see if there are any AoE devices on the network, perform the following:

  machine:~# aoe-discover

Which will send out a packet broadcasting any AoE-compliant devices to identify themselves. Then:

  machine:~# aoe-stat
             e0.0      2250.468GB   eth0 up

Which gives us the status of any detected AoE devices. In this case, we see (as e0.0) the 2.1TB volume on the CoRAID that was detected over the current machine's eth0 network interface.

The directory:

  /dev/etherd/

Contains various AoE-related devices, and amongst them should be an "e0.0" device, which is the block device for the 2.1TB volume.

===Mount AoE volume===
To mount the volume, simply do the following:

  machine:~# mount /dev/etherd/e0.0 /mnt

Or whereever you wish to mount it. It may take a few seconds (this seems to be normal), but then it will mount. It can then be accessed. You can also concurrently mount it on other machines and go about as normal.

This does assume, of course, that the volume is formatted with a cluster filesystem (ie OCFS2) so that all chaos does not break loose.

===Configure OCFS2===
For starters, there are really only 2 files that need to be created/configured.

  /etc/ocfs2/cluster.conf
  /etc/default/o2cb

We'll start with o2cb first:

  node134:~# cat /etc/default/o2cb
  #
  # This is a configuration file for automatic startup of the O2CB
  # driver.  It is generated by running 'dpkg-reconfigure ocfs2-tools'.
  # Please use that method to modify this file.
  #
  
  # O2CB_ENABLED: 'true' means to load the driver on boot.
  O2CB_ENABLED=true
  
  # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
  O2CB_BOOTCLUSTER=data
  
  # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
  O2CB_HEARTBEAT_THRESHOLD=7
  
Of importance-- setting O2CB_ENABLED to true (it is false by default) will cause the appropriate services to start up on boot.

O2CB_BOOTCLUSTER should also be set to the name of the OCFS2 "cluster", which for now has been named "data".

Next, we have cluster.conf:

  node:
        ip_port = 7777
        ip_address = 192.168.6.132
        number = 0
        name =  node132
        cluster = data
  
  node:
        ip_port = 7777
        ip_address = 192.168.6.133
        number = 1
        name =  node133
        cluster = data
  
  ...
  
  node:
        ip_port = 7777
        ip_address = 192.168.6.147
        number = 15
        name =  node147
        cluster = data
  
  cluster:
        node_count = 16
        name = data
  
Here, we set up all the machines that will be a part of accessing the shared volume. Each machine is given an entry and assigned a number.

Also note the cluster: directive at the very end indicating total node count as well as the name of the OCFS2 "cluster".

IMPORTANT: Changes to cluster.conf MUST be propagated to ALL PARTICIPATING MEMBERS. I would imagine unmounting/shutting down OCFS2 services on each machine would be a very prudent thing to do. This is the complexity of the cluster filesystem-- it MUST be aware of anyone who can touch it, lest weird things happen.

===Manually dealing with O2CB===
The script:

  /etc/init.d/o2cb

Can be used to start, stop, check status of the OCFS2 cluster.

====Starting O2CB====
So, to bring it online on a particular machine, do the following:

  /etc/init.d/o2cb start

====Checking status of O2CB====
There is a status argument that can be passed as well:

  machine:~# /etc/init.d/o2cb status
  Module "configfs": Loaded
  Filesystem "configfs": Mounted
  Module "ocfs2_nodemanager": Loaded
  Module "ocfs2_dlm": Loaded
  Module "ocfs2_dlmfs": Loaded
  Filesystem "ocfs2_dlmfs": Mounted
  Checking cluster data: Online
  Checking heartbeat: Not active

The output given is what appears to be an operational output (not sure about the heartbeat setting--- something to look into).

====Stopping O2CB====
And, of course, one can stop O2CB services:

  machine:~# /etc/init.d/o2cb stop

===OCFS2 operations===
OCFS2 has some utilities that can be used to administer certain functions, such as formatting, filesystem checking, and filesystem tuning.

====Format an OCFS2 volume====
Here is what I did to format the 2.1TB volume on the CoRAID with OCFS2:

  machine:~# mkfs.ocfs2 -v -N 24 /dev/etherd/e0.0

Note that if you leave off the "-N 24" the filesystem will default to only 4 nodes. What happens here is when more than 4 machines try to mount, further mount attempts are refused once the limit is reached.

We will have to test and see what sort of performance impact more mounted OCFS2 nodes have on the whole situation.

====Tune OCFS2 settings====
The tunefs.ocfs2 tool can be used to change settings on an existing OCFS2 volume.

Here is what I did to extend the number of nodes from 4 to 24 (because I didn't originally do it, but my example above does, which would obsolete this step):

  machine:~# tunefs.ocfs2 -N 24 -v /dev/etherd/e0.0

Why 24? Well, we've got the 16 video wall machines, and I figured I'd add in some breathing room in case I wanted to test things. Overall, though, I would imagine that the fewer machines mounting the volume, the better. Only further testing will yield what the optimal resulting configuration will end up being.

==Xen error==
If you encounter kernel log messages of the following:

  peth0: received packet with  own address as source address

It means there are bridge MAC addresses that are the same among more than one machine. No idea why this springs up. But, the way to fix it is to change the MAC address of the bridge when it comes up.

There is a modified /etc/xen/scripts/network-bridge script in place on the Dell cluster which has a user-specified value to insert into the MAC, thereby making it different.

Setting this value and rebooting the machine tends to clear up the issue.

===peth0 MAC address table===

  D0    node149
  D1    node152
  D2    node148
  E0    node132
  E1    node133
  E2    node134
  E3    node135
  E4    node136
  E5    node137
  E6    node138
  E7    node139
  E8    node140
  E9    node141
  EA    node142
  EB    node143
  EC    node144
  ED    node145
  EE    node146
  EF    node147