STATUS updates
=====TODO=====
* How to handle UNIX journal keywords?
* Need to finish writing up HPC0 projects
* the formular plugin is giving me errors, need to figure this out (email assignment form)
* use include plugin to include a page containing various prior month status pages
* can I install writer2latex on wildebeest herd without needing gcj??
* update lair-nfs for new idmap Domain of "lair"
* put UNIX course listing examples in public directory
=====URLs=====
Some links of interest:
* http://www.freelists.org/post/dokuwiki/invoke-mediamanager-in-a-plugin,2
* unrelated: http://infoworld.com/d/adventures-in-it/run-it-business-why-thats-train-wreck-waiting-happen-477
* [[http://www.youtube.com/watch?v=ggB33d0BLcY&feature=player_embedded#|laddergoat]]
* [[http://www.llvm.org/|LLVM]]
* [[http://fluxbox-wiki.org/index.php?title=Howto_set_the_background|Fluxbox config]]
* [[http://www.reocities.com/harpin_floh/glglobe_page.html|GLglobe]]
* [[http://www.heavens-above.com/|Heavens Above]]
=====Other Days=====
=====April 25th, 2010=====
====irssi window moving====
I ended up with a non-optimal situation where some of my common windows in irssi were on different window numbers.
I googled it, and arrived at a fix... let's say I wanted #unix on window id 4, but it was on 12.
First, I'd switch to #unix in id 12, and do the following:
##
## Extra Secure Settings for g7 backups through RR connection
##
PasswordAuthentication no
AllowUsers user1 user2 user3 user4
LoginGraceTime 4
X11Forwarding no
PermitRootLogin no
Basically, lock it down and lock it down tight. Allowing only the most authorized of users in, and with an exceptionally small login window.
Obviously there are some problems with this (g7 isn't fast enough to make the login window).
So what I'm going to end up doing instead is rig up a temporal opening (ie fire up a server with allowing permissions only on the times of g7 backups, then immediately close them off afterward). This is certainly more ideal an approach, because nobody has any business getting to the machine externally anyway (VPN access, sure).
====lair-nfs====
I finally got around to resolving the idmap.conf settings in lair-nfs (resolving the idmap issues after the power issues, I finally set the domain to "lair" instead of "localdomain")... this of course meant that EVERY machine that utilizes nfs must be updated to use that domain.
At the time I did it manually.
So I finally put the changes in, and also figured I'd resolve the other longstanding issue with lair-nfs--- installation on etch systems. The issue is that lair-nfs modprobes the 'nfs4' module.. it turns out that etch does not have a module called 'nfs4', but just 'nfs'.
To fix, I added the following logic (and hence 1.2.0-6 was created and added to the repository):
81 # Ensure NFS support is available, otherwise package install will fail.
82 echo -n "Loading NFS4 kernel module ... "
83 nfsmodname="`cat /etc/debian_version | grep '4.0' | wc -l`"
84 if [ "$nfsmodname" -eq 0 ]; then
85 modprobe nfs4 && echo "done." || "failed!"
86 else # etch doesn't have an 'nfs4' module
87 modprobe nfs && echo "done." || "failed!"
88 fi
Just a simple version check leading to the appropriate module insertion. Bam! Now a seamless install whether on etch or lenny.
====LAIR packages====
Just a quick review of the LAIR package structure.
LAIR packages are no longer hosted on web! Do not store them there.
Instead, they are on nfs! Under the /export/packages directory. All necessary scripts have been moved there.
In fact I just removed the old directories on web (/var/www/pages/packages/), to avoid any future confusion. Because it got me good, even though I made the change.
====VMs on NFS serving Sokraits====
I decided, in an effort to reintroduce some of the cool Xen functionality, start moving VMs onto NFS under /export/xen (basically take the entire /xen directory tree that exists on sokraits and halfadder, and put it on nfs); then have them NFS mount it... this will afford us the flexibility of using live migration of VMs, and gives us a little bit of data insurance as we're no longer storing the only live copy of the VM on one hard disk (but NFS's DRBD... so if we lose "a" disk, it isn't so bad).
===nfs config===
Configuration on nfs was in /etc/exports:
/export/xen 10.80.2.42/32(rw,sync,no_root_squash,no_subtree_check,fsid=2) 10.80.2.46/32(rw,sync,no_root_squash,no_subtree_check,fsid=2) 10.80.2.47/32(rw,sync,no_root_squash,no_subtree_check,fsid=2)
And then exporting the new share:
(xend-relocation-server yes)
(xend-relocation-port 8002)
(xend-relocation-address '')
(xend-relocation-hosts-allow '^localhost$ ^localhost\\.localdomain$ ^10.80.2.42$ ^10.80.2.46$ ^10.80.2.47$ ^halfadder$ ^halfadder.offbyone.lan$ ^sokraits$ ^sokraits.offbyone.lan$ ^yourmambas$ ^yourmambas.offbyone.lan$')
Again, I restricted access to JUST the VM servers.
In sokraits' /etc/rc.local, the following:
# Try to fix NFS callback stupidity
modprobe nfs4
sysctl fs.nfs.nfs_callback_tcpport=2049
/etc/init.d/nfs-common restart
# Mount Xen NFS share
mount -t nfs4 -o proto=tcp,intr nfs:/xen /xen
Rebooted it, and bam! We're up and running.
I shut down antelope, gnu, repos, and www, and copied their images over to nfs.
Then, I xm created them on sokraits. All four of those VMs are now up and running on sokraits (relieving halfadder of the heavier VM load it has shouldered for a few weeks now), but all VM data is being retried from NFS.
It does drive load up a little, especially spiking during a VM boot or significant VM disk activity (doing an aptitude upgrade, for instance). We'll have to see if it is worth it to put ALL our production VMs on there (I figured www would be a good semi-test, as it likely sees usage as significant as lab46).
But, give it a few minutes to settle down, and load seems to settle. Again, this is only with 4 VMs (out of 14 total) running under this new setup. I'll slowly push some more over and see how it handles on load.
This is one of those areas where Ceph would likely shine for us.
====VMs updated====
In addition to being moved from halfadder to nfs to be launched via NFS on sokraits, antelope; gnu; repos; and www had an "aptitude update && aptitude upgrade && aptitude clean" performed on them today.
I also moved over web, db, and lab46db, bringing the total number of VMs running on sokraits off nfs to 7 (equally balancing VMs between sokraits and halfadder once again), updating with the same logic as above.
Load on nfs still seems okay... I'll be looking at the lrrd reports tomorrow on load to see how it made out... I have a feeling that, aside from the higher spikes due to startup and disk-heavy maintenance tasks, things will hopefully not be overburdened... really won't know until everyone is sitting in the LAIR logging into the pod machines, with people running stuff on Lab46 and having multiple logins taking place.
=====April 16th, 2010=====
====mambas/sokraits====
Finding the urge to resume some ongoing projects, I wandered into the LAIR and fired up mambas and sokraits. Performed updates on both.
I had left sokraits off from our electric circuit adventures... and seeing as such are probably over for a while, I figured I'd start migrating VMs back to LAIReast.
For mambas, I'd like to use it as a potential upgrade environment for VMs (ie the Lab46 rebuild).
For now, mambas is running Ubuntu 9.10, but I may upgrade it to 10.10 once it is released as it is going to be a LTS (Long Term Support) release-- figured if I make the Ubuntu switch it is likely preferable to use that (an LTS release), so if things remain running for long periods of time, we'll at least still get some level of updates being released.
In its current form, Ubuntu actually lacks Xen kernel support for dom0 (they can be domUs out of the box, but Ubuntu hasn't been packaging Xen-dom0-bootable kernels). According to one of the Ubuntu maintainers, it is really more out of waiting for the pv_opts to be more formalized in the mainstream kernel. All the userspace tools and daemons are in the package repository.
And amusingly, one of their preferred methods for running Xen on Ubuntu is to launch it with a Debian Xen kernel.
Some links:
* https://help.ubuntu.com/community/Xen
* http://www.chrisk.de/blog/2008/12/how-to-run-xen-in-ubuntu-intrepid-without-compiling-a-kernel-by-yourself/
* http://mediakey.dk/~cc/ubuntu-howto-install-xen/
* http://mediakey.dk/~cc/xen-howto-install-windows/
* http://mediakey.dk/~cc/howto-install-windows-xp-vista-on-xen/
* http://ubuntuforums.org/showthread.php?t=1320412
I also want to explore KVM.. they seem to be doing some interesting things. There's a particularly interesting project called SheepDog:
* http://www.osrg.net/sheepdog/
That looks to in some ways nicely add a level of redundancy to storage for VMs, of course... such things would also be solved when Ceph's DFS is production worthy.
After some playing.. Ubuntu is looking less and less usable for my needs. Originally I had wanted to use it for the following reasons:
* more up-to-date software
* more up-to-date kernel and Xen
* more software (less strict licensing than Debian)
* includes 64-bit userspace
* give me an opportunity to get used to more of the ubuntuisms
But, Ubuntu-managed Xen is currently a no-go, and I just realized that Lenny includes 64-bit userspace. Their 6.0 "Squeeze" release is just around the corner, we're used to Debian, it includes Xen and all its fixings... and aside from it getting a little long in the tooth, it works. So I will likely be reinstalling Mambas with Debian to proceed with my Evil Plans(tm).
And now, reading up on the progress of Debian squeeze... that is (unsurprisingly) behind schedule... they were aiming for a Spring 2010 release which was pushed to a Summer 2010 release, and now are looking at even farther out. I really didn't want to go with Lenny, simply because I know Lab46 will be deployed for a couple of years before a rebuild (even if I have better intentions).
====exploring KVM on mambas====
So, with nothing set in stone, I decided to play with KVM a bit.
Some links:
* http://www.wepoca.net/doc/kvm-install-ubuntu-910
* http://www.howtoforge.com/virtualization-with-kvm-on-ubuntu-9.10
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet manual
# The bridge interface (for KVM)
auto br0
iface br0 inet static
address 10.80.2.42
network 10.80.2.0
netmask 255.255.255.0
broadcast 10.80.2.255
gateway 10.80.2.1
bridge_ports eth0
bridge_stp off
bridge_fd 0
bridge_maxwait 0
=====April 10th, 2010=====
Sometime around 3:10AM this morning, Lab46 locked up again.
Did the usual. It is back up.
=====April 3rd, 2010=====
Around 2:26PM, Lab46 locked up. Got it all restarted.
As an aside, when the system is really busy, it starts spewing clock skew errors... when jjansen4 runs his plethora of bots, that seems to aggravate the situation.
I removed libc6-xen, that seemed to mitigate the libc error 4 messages I thought may have been causing the lockups... as we can see with the April 10th lockup, that doesn't seem to have fixed it.