User Tools

Site Tools


haas:status:status_201003

STATUS updates

TODO

  • How to handle UNIX journal keywords?
  • Need to finish writing up HPC0 projects
  • the formular plugin is giving me errors, need to figure this out (email assignment form)
  • use include plugin to include a page containing various prior month status pages
  • can I install writer2latex on wildebeest herd without needing gcj??
  • update lair-nfs for new idmap Domain of “lair”
  • put UNIX course listing examples in public directory

URLs

Other Days

March 28th, 2010

Lab46 locked

Around 2:28pm today, Lab46 decided to lock up again. A good, hard lock.

Restarted it. Hmmm….

March 24th, 2010

lrrd on nfs2

Turns out nfs2 was still not reporting in lrrd, I went in and restarted nut and lrrd, I also changed the hostname back to 'nfs2', and it resumed reporting. No apparent problems from the hostname change, which is good, as I'd prefer it to be 'nfs2'.

idmap verbosity turned down

After successful operation from our power outages last week, I figured there's no need to continue to view all the idmap verbosity, so on nfs2 in /etc/idmapd.conf, I set Verbosity back to 0 (from the 7 I had set it to in order to be much more verbose).

Restarted nfs-common, no problems were apparent.

the uptime to rule them all

Although not directly in the realm of the LAIR, a distant ancestor, g7, as of 3:37PM this afternoon, has been up for 600 days.

galaxy7:~$ w
 3:37PM  up 600 days, 1 user, load averages: 0.12, 0.09, 0.08

A 50MHz Sun SPARCstation LX that sees very little usage or attention (aside from me), it is nice to be able to see such things as this from time to time.

Hazzah!

March 23rd, 2010

Lab46 locked

Lab46 decided to be problematic and hardlock during my Tuesday morning ASM class.

No apparent evidence of aggravation, just locked up. I destroyed and re-started the VM.

Hopefully not indicative of any “wildebeast”-esque problems (I intend to rebuild Lab46 this summer, so hopefully it holds out until then).

non-reporting of NFSes in LRRD

It turns out that the “nut” daemon was not running on either nfs'es after the power issues, which caused elements of lrrd to try and read from the serial port, but not having the ability to do that (permission problems, actually), and more or less silently failing and confusing it.

Ian fixificated the situation.

March 18th, 2010

Power overhaul largely completed… some initial circuit testing was done, resulting in the LAIRwest rack losing power, so once again everything went down.

Some circuit testing still needs to be done, but that may not take place for some time; for the time being, I've located virtually the entire universe on LAIRwest, as it is currently plugged into a dedicated circuit.

NFS resumption info

I ran into a few quick issues getting the NFSes back up and running… following will be the steps I followed:

# Bring the drbd volume to primary status on this peer
/sbin/drbdadm primary r0
 
# Mount the DRBD volume on /export
/bin/mount /dev/drbd0 /export
 
# Set hostname... this may not be needed, I'll have to test it (as it somewhat breaks LRRD)
hostname nfs
 
# Restart the NFS server
/etc/init.d/nfs-kernel-server restart
/etc/init.d/nfs-common restart
 
# Once /export is mounted, we can bring our repository access online.
/etc/init.d/thttpd start
 
# Restart other services
/etc/init.d/cron restart
/etc/init.d/nscd restart

This is in the /etc/rc.local on nfs2…

I ran into some issues involving idmap for NFS… I ended up with the following (propagated to all LAIR systems):

###LAIRCONF###
[General]
Domain = lair
Pipefs-Directory = /var/lib/nfs/rpc_pipefs
Verbosity = 0

[Mapping]
Nobody-User = nobody
Nobody-Group = nogroup

[Translation]
Method = nsswitch

The key here was: Domain = lair

Before it was set to localhost… I doubt this was the cause of the problem (I probably just needed to restart nfs-common on nfs2, which I didn't do until after I changed the Domain to lair). But, I've been wanting to do that for some time anyway.

I need to update that in lair-nfs … basically, this problem manifested itself by showing all file ownerships as nobody:nobody, which obviously isn't preferable.

March 17th, 2010

Power! A campus electrician showed up at the LAIR to look into our power issues (or avoiding them), and ended up running us some dedicated circuits to the LAIR shelving area.

In the process, some breakers were flipped and we lost power to the LAIReast rack, taking down pretty much everything (whoops).

<html><center></html>

<html></center></html>

haas/status/status_201003.txt · Last modified: 2010/09/16 18:31 by 127.0.0.1