User Tools

Site Tools


haas:status:status_201104

Table of Contents

<html><center></html>STATUS updates<html></center></html>


TODO

  • the formular plugin is giving me errors, need to figure this out (email assignment form)
  • set up symlink and cron job for userhomedirbackup.sh on nfs2; update wiki page for nfs
  • update grade not-z scripts to handle/be aware of winter terms
  • update system page for (new)www
  • update system page for db
  • migrate nfs1/nfs2 system page to current wiki
  • flake* multiseat page

URLs

Other Days

April 30th, 2011

VirtualBox OpenBSD 4.9 installs

I installed 2 additional OpenBSD systems as virtual machines… one i386, the other amd64.

Ports

Instead of installing packages, I opted to go the ports system route, which ends up building the packages and installing them (I mostly knew that, but didn't, so this is a good realization).

I made the following ports (make and make install):

  • bash-4.1.9p0.tgz
  • bzip2-1.0.6.tgz (created as a dependency)
  • fileutils-4.1p5.tgz
  • gettext-0.18.1p0.tgz (dependency)
  • gmake-3.81p1.tgz
  • gnuls-4.1p2.tgz (created with fileutils)
  • gperf-3.0.4.tgz (dep)
  • groff-1.15.4.7p3.tgz (dep)
  • groff-mdoc-0.0.tgz (dep)
  • kermit-8.0.211.tgz (minicom needs it)
  • libiconv-1.13p2.tgz (dep)
  • libidn-1.19.tgz (dep)
  • lrzsz-0.12.20p0.tgz (minicom needs it)
  • lzo2-2.04.tgz (dep)
  • minicom-2.2.tgz
  • openvpn-2.1.4.tgz
  • screen-4.0.3p2.tgz
  • vim-7.3.3p1-no_x11.tgz
  • vim-lang-7.3.3p1-no_x11.tgz
  • wget-1.12p1.tgz

Really, one just needs to cd into /usr/ports and then into the appropriate subdirectories to run the make followed by make install

vim no-X

To build the non-X11 version of vim, I needed to make a change to the Makefile in /usr/ports/editors/vim… basically, look for the following lines:

FLAVORS=        huge gtk2 athena motif no_x11 perl python ruby
FLAVOR?=        gtk2

And change the FLAVOR?= line from gtk2 to no_x11… continue as usual.

April 29th, 2011

LAIRwall @ CCC

Yesterday, we set up the LAIRwall at CCC for the 2011 Student Art Show. Everything tested out fine.

This morning, it appears that only wall01, wall03, and wall05 powered on at the designated time… wall02, wall04, and wall06 did not power on until sometime later (~8:11am— supposed to power on around 7:32-7:40am).

My first remedy is to ensure all clocks are synchronized accordingly.

April 28th, 2011

PXE booting OpenBSD 4.9

My preorder of OpenBSD 4.9 arrived today, and I happily set about getting things set up to PXE boot from it, primarily focusing on the ALIX.2 board.

I had to configure PXE to force the serial console on install, which was made possible by creating an etc/boot.conf file within /export/tftpboot on NFS, which contains:

set tty com0
boot tftp:/distros/openbsd/4.9/bsd.i386.rd

I also configured a DHCP and DNS entry for the ALIX board, so I could purposefully steer it in the right direction (since the LAIR netboot menu couldn't exactly appear and be usable).

I got OpenBSD 4.9/i386 installed on it.

Some links:

Read-Only OpenBSD:

April 25th, 2011

calcweek logic bug

Turns out my calcweek script has a logic bug! It was a case where calculations taking place between two and three digit numbers would yield woefully incorrect values for the week (tripping up the logic, making it think we're past the end of the semester, and forcing week to be 0).

I manually overrode the gn scripts last week, but forgot to look into it. This morning, I fixed it.

The fix, we need to add a leading “0” to the $sem_start variable during the week calculation when the current day of the year ($sem_today) is greater than or equal to 100.

##
## Perform THE calculation
##
    fill=""                                                                           
    if [ "$sem_start" -lt 100 ]; then
        if [ "$sem_today" -ge 100 ]; then
            fill="0"
        fi
    fi
 
    week="$((((((${cur_year}${sem_today})-(${sem_year}${fill}${sem_start}))/7)+1)-$boffset))"

If the conditions are right, $fill becomes “0”… otherwise, it remains null.

Note that we couldn't just slap a leading 0 onto variables if they were less than 100, as that would automatically kick on “I'm an octal number” functionality, which we definitely do not want!

April 23rd, 2011

lair.lan pings/access from VPN

I discovered this morning that, although I was able to VPN in to the lair.lan side of the network, as I always do… I was unable to ping or ssh into anything on the lair.lan side of the universe.

I could ssh into juicebox, and anything on the offbyone.lan or student.lab portions of the universe, but not places like ahhcobras.

I went in and enabled a skip on the tun0 and tun1 interfaces on jb, reloaded the rules, and things lit back up.

maint.sh

After seeing some output errors on main.sh for several weeks, I thought to finally do something about it. So I went in and tweaked the logic.

LDAP attributes not indexed

I thought to look (again) into fixing that indexed log message that profusely pops up.

I stumbled across a link I had visited before, this time it appeared to work:

auth1:~# echo "dn: olcDatabase={1}hdb,cn=config
changetype: modify
replace: olcDbIndex
olcDbIndex: uid,uidNumber,gidNumber,memberUid,uniqueMember,objectClass,cn eq" > indexchanges.ldif
auth1:~# sudo ldapmodify -f indexchanges.ldif -D cn=admin,cn=config -x -y /etc/ldap.secret
auth1:~# sudo /etc/init.d/slapd stop
auth1:~# sudo su -s /bin/bash -c slapindex openldap
auth1:~# sudo /etc/init.d/slapd start

The trick is the slapindex that I never noticed before and therefore never ran.

bigworld spam broadcast udp packets

With my campus network explorations, I have discovered some background packet noise taking place on our network, emanating from the mgough's bigworld server.

Specifically, it is some UDP traffic sent from port 20018 to port 20018 to 255.255.255.255 (broadcast on a /32?), and at least everyone on 10.80.2.x sees it (annoying but harmless as far as I am concerned).

A packet sniff for it would show the following:

machine:~# tcpdump -i eth0 host 10.80.2.60
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
16:27:55.519034 IP bigworld.offbyone.lan.20018 > 255.255.255.255.20018: UDP, length 20
16:27:58.106817 IP bigworld.offbyone.lan.20018 > 255.255.255.255.20018: UDP, length 20
16:28:01.500528 IP bigworld.offbyone.lan.20018 > 255.255.255.255.20018: UDP, length 20
16:28:04.339277 IP bigworld.offbyone.lan.20018 > 255.255.255.255.20018: UDP, length 20
16:28:06.859047 IP bigworld.offbyone.lan.20018 > 255.255.255.255.20018: UDP, length 20
16:28:09.531102 IP bigworld.offbyone.lan.20018 > 255.255.255.255.20018: UDP, length 20
16:28:13.387211 IP bigworld.offbyone.lan.20018 > 255.255.255.255.20018: UDP, length 20
16:28:16.463583 IP bigworld.offbyone.lan.20018 > 255.255.255.255.20018: UDP, length 20
16:28:20.451743 IP bigworld.offbyone.lan.20018 > 255.255.255.255.20018: UDP, length 20

So, as I've been “getting around to” fixing small things today, I finally put a lid on this one, via /etc/rc.local so it'll be fixed on subsequent boots:

iptables -A OUTPUT -p udp --dport 20018 -j DROP

bam! Traffic stopped. The world is a little bit quieter.

April 22nd, 2011

improving www performance

I've continued to look into possible configuration issues that would impact our campus network performance problems. I continue to come up empty handed. There is NOTHING I am setting that is directly causing an adverse reaction.

mod_deflate

I did have an idea about looking into compressing traffic (especially after my experiences with latency yesterday). After some looking, I discovered mod_deflate.

I enabled it, and restarted apache2.

According to http://www.whatsmyip.org/http_compression/, traffic is now compressed (typically around 70%), so the amount of data transferred is therefore much less. Maybe this will help (but only in a “avoiding the problem” sort of way).

juicebox PF

I set about implementing my PF rule improvements on juicebox this morning. Aside from accidentally blocking ICMP traffic for about 20-30 minutes, the transition appears to have gone smoothly, and will enable me to try some other experiments re: the campus network performance.

PF version

We may need to wait until JB is upgraded, as I think I want to try playing with relayd and do some TCP proxying (local connection to port 80 on JB then gets directed off somewhere… so the IP would appear as coming from JB itself, not from the internet).

caprisun PF queues

I've added some additional queues:

  • vpn
    • backbone - traffic specifically on the backbone VPN link
    • offbyone - traffic on the offbyone user VPN
    • cloudvpn - cloudVPN traffic (likely the LAIRwall when it is traveling)
  • web
    • web_in - incoming to www traffic (lab46 web sites)
    • web_out - offbyone.lan outgoing to internet web traffic

I apparently need to do a full PF flush at some point, because the VPN sub-queues do not seem to be working (all traffic still is being tallied in the master vpn queue). All the other queues/sub-queues are working as they should.

random idea

Since the MTU downsizing didn't seem to have any appreciable impact, and disabling DNS didn't make a difference… a random idea I should try at some point is to (on capri), forcibly lower the TTL on outgoing packets (preferably as a returned packet on an established HTTP connection) to see if that makes any difference when packets are lost.

After all, when coming from campus to the BDC… we are still in somewhat of a closed loop, so there shouldn't be that many routers in the way.

April 21st, 2011

network latency

I had an opportunity to realize some network latency while up on campus. On a 10.100.60.210 IP (the computer in my office up on campus)… I was experiencing the horrific delays and page load times.

Unfortunately, my MTU theories did not seem to rectify it (although occasionally I would get an amazing page load transaction).

Typically, it appears as though traffic bursts and then halts, bursts again and then halts… etc. until done.

I was able to witness this through: telnet 143.66.50.18 80 and doing a: get /

I wonder if those lost packets are mucking with the transaction, causing delays as things timeout and re-ACK.

At any rate it is good to realize I have more ready access to a likely troublesome machine for testing.

April 20th, 2011

capri scrubbing / ssh queueing

I discovered that tagging all outgoing packets as ToS “low delay” broke ALTQ's ability to distinguish between interactive ssh traffic and “bulk” scp/sftp ssh traffic.

I have removed this option, and things are back to where they should be.

April 19th, 2011

Wireshark on OpenBSD

As I'm investigating the network performance issues on the campus network, I definitely need to learn to use a packet sniffing tool. In the long term I want that tool to be tcpdump, but in the present time Wireshark (if only deceptively) gives me the feeling I have some flexibility over the information I can access.

So, following on that, I need to have wireshark deployed in all the places I need to analyze traffic.

This means I need to put wireshark on capri, and that's exactly what I've been working on.

Installing Wireshark from source

The process of install Wireshark…

First up, I need to have gtk+ 2 installed (since it is an X/GTK2 application). This means pulling in the various packages (there are many prerequisites).

caprisun:~# export PKG_PATH=ftp://ftp.openbsd.org/pub/OpenBSD/4.4/packages/i386/
caprisun:~# pkg_add pkg_add gtk+2-2.12.11.tgz
...

As a point of information, packages being pulled off ftp from the internet (via capri) appropriate are queued in the “etc” queue, thereby not interfering with existing ssh, vpn, or web traffic.

Once that is done, download/extract the wireshark source, and run configure, then 'gmake'.

caprisun:~$ wget http://wiresharkdownloads.riverbed.com/wireshark/src/wireshark-1.4.6.tar.bz2
...
caprisun:~$ tar -jxf wireshark-1.4.6.tar.bz2
...
caprisun:~$ cd wireshark-1.4.6
caprisun:~/wireshark-1.4.6$ ./configure
...
caprisun:~/wireshark-1.4.6$ gmake
...

Also of note, making sure the OpenBSD packages 'gmake' and 'gtar' are installed (I had tar aliased to gtar– the default BSD tar doesn't recognize -j).

adding paths to ldconfig

I found I needed to add some more libraries to the system library path. ldconfig is the way to do this:

ldconfig -R /usr/local/lib
ldconfig -R /usr/X11R6/lib

OpenBSD X forwarding

Assuming X11 forwarding is enabled in sshd (kill -1 to reload the config), and you ssh in with -X specified, you can run applications and it should work.

April 18th, 2011

campus network performance

So far I've heard that the latency problem hasn't gone away, but no responses from students yet aside from “seems good”. So we will see.

Empty Zones / RFC1918

I added a few more entries for empty zones into capri's DNS config, to handle some campus subnets (especially computer labs) to see if this helps to make any difference in the on-going campus network performance investigations.

campus network performance, cont'd

My explorations continue….

First up, more useful links:

Learned that ICMP is pretty much all blocked, so Path MTU Discovery is out of the question, and we must resort to packet fragmentation (the first link to the cisco whitepaper is even quite informative).

I disabled Path MTU Discovery on capri (via sysctl).

Also adjusted the ext_if scrub rule in pf.conf:

scrub      on $ext_if all random-id min-ttl 64 max-mss 1400 \
                          set-tos lowdelay reassemble tcp fragment reassemble

I'm starting to think that we may not need ALL the scrubbing rules, and perhaps just one per interface will do.

Still need to do lots of testing…

April 17th, 2011

prioritizing ssh traffic

An additional feature to add to my queueing setup on caprisun was the differentiation of ssh traffic (interactive ssh sessions vs. scp/sftp transfers).

As it turns out, pf can distinguish between the two types of traffic by analyzing the ToS (Type of Service) flag, where interactive ssh sessions flip the “low delay” setting, and the bigger “bulk” transfers do not.

pf makes this nice with the queue keyword… actual syntax follows:

altq on $ext_if cbq bandwidth 1980Kb queue { ssh, vpn, web, etc }
    queue ssh bandwidth 40% priority 7 cbq(borrow red) { ssh_login, ssh_bulk }
        queue ssh_login bandwidth 60% priority 7 cbq(borrow red)
        queue ssh_bulk  bandwidth 40% priority 5 cbq(borrow red)
    queue vpn bandwidth 20% priority 6 cbq(borrow red)
    queue web bandwidth 30% priority 4 cbq(borrow red)
    queue etc bandwidth 10% priority 1 cbq(default borrow)

...

pass in quick on { $int_if, $bbn_if } from any to any tagged SSHTAG \
    queue (ssh_bulk, ssh_login)

pass in quick on { $ext_if } proto tcp from $approved tagged SSHTAG \
    queue (ssh_bulk, ssh_login)

pass in quick on { $ext_if } proto tcp tagged SSHTAG \
    queue (ssh_bulk, ssh_login) flags S/SA keep state \
    (max-src-conn 48, max-src-conn-rate 6/60, overload <brutes> flush global)

Note the queue (ssh_bulk, ssh_login)… that's the magic… the 2nd argument is for ToS of low delay or content-less ACK packets. So bam! Just have unique queues set aside, and assign as appropriate.

awesome OpenBSD site

I stumbled across this site, calomel.org, which has tons of nifty tutorials on PF and related OpenBSD thingies.

This site, also has some nifty things:

packet sniffing fun

I've been revisiting the traffic sniffing adventures from earlier in the week.

To start off with, useful links:

Some additional vectors of attack include:

  • are ACKs getting returned fast enough?
  • disabling of hardware checksum offloading (disabled on www and VM server, can't really disable it on capri)
  • MTU/MSS stuff

So far, I haven't had much luck with the MTU sizing stuff… all the examples I try, and I can never get the “payload too big” message, and I can crank it up to really obscene values.

rebooted caprisun

While toying with MTU values, I apparently set one too small (to 512). This created a bout of unhappiness, and to make sure everything was happy, I gave capri a reboot. Lesson learned.

Path MTU Discovery (PMTU-D)

My network explorations have led me into the realm of Path MTU Discovery.

Links!

caprisun sysctl tweaks

Referencing the Network Tuning and Performance guide at:

I ended up modifying the following sysctl's on caprisun (set in /etc/sysctl.conf):

net.inet.tcp.mssdflt=1400       # Set MSS
net.inet.ip.ifq.maxlen=768      # Maximum allowed input queue len (256*# of interfaces)
net.inet.tcp.ackonpush=1        # acks for packets with push bit set should'nt be delayed
net.inet.tcp.ecn=1              # Explicit Congestion Notification enabled
net.inet.tcp.recvspace=262144   # Increase TCP "receive" window size to increase perf

From system defaults, net.inet.ip.forwarding has also been changed (set to 1) to enable IP forwarding.

April 16th, 2011

more pf playing

My quest to better learn, and subsequently optimize our PF rulesets continue!

Some links:

I installed pftop on capri, which allows me to see lots of neat things all condensed and organized.

pftop -v rules

has been particularly useful.

April 15th, 2011

poor web traffic- depending on your network

Kelly came and sat with me yesterday to help debug the on-going performance issues experienced from SOME CCC networks to the Lab46 web server.

Specifically, the following network VLANs experience sub-optimal delays (9000+ mS):

  • student (10.100)
  • staff (10.100)
  • server (143.66)

But the following consistently demonstrate adequate (200 mS) responses:

  • wireless (192.168)
  • vpn (10.200)
  • external to the college (internet)

These results were gathered from performing “httping”s from the various networks.

What is confusing is that:

  • we (the LAIR) is not doing any filtering based on network location
  • it looks like an external-to-us network issue (specifically with how those 3 problematic VLANs are configured)
  • we narrowed the problem with those networks to the apache2 install + content served on www

A bit more on that last point:

  • we tried different experiments testing http packet latency by plugging in different web servers at different parts of the network (right into the wall to bypass the router, reachable through the router, running on a different IP address, etc.)
  • the non-LAIRified apache2 instance on www consistently showed no problems
  • our router is not a problem
  • hitting a non-wiki page on the lab46 web server showed zero problems (so it points even more towards content, or dokuwiki/websvn, both of which were tested).

So, while the symptoms may indicate that the problem is specific to our web server (either the config, or the config+certain content)… I am not entirely convinced that is the problem.

So could dokuwiki be, despite the fact we have not, selectively filtering connections from those networks but not chosen others? How would it know the difference between 10.100 and 10.200 when nowhere in its config have we ever mentioned any of those addresses?

One thing I did notice was during some of our network investigations, wireshark identified some TCP fragment reassembly errors.

In the process of investigating the apache2 config, I thought to search for the combination of two clearly unrelated things… “apache2 tcp fragment reassembly”, and variations therein.

This actually turned up some very interesting things. The useful URLs:

It indicated a possible MSS/MTU problem on the network, which would cause such problems when sufficiently large pages are served, resulting in painfully slow page loads (exactly the issue, on those networks). This refined exploration led me to the following informative page:

Which goes into some nice detail about MTU, ICMP filtering, and related things.

The interesting bit I read from this document was:

Many network administrators have decided to filter ICMP at a router or firewall. There are valid (and many invalid) reasons for doing this, however it can cause problems. ICMP is an integral part of the Internet and can not be filtered without due consideration for the effects.

In this case, if the ICMP can't fragment errors can not get back to the source host due to a filter, the host will never know that the packets it is sending are too large. This means it will keep trying to send the same large packet, and it will keep being dropped–silently dropped from the view of any system on the other side of the filter.

We know that campus blocks ICMP, and is likely running a tighter network on the troublesome VLANs than the non-troublesome ones.

So my current investigations will be exploring MTU size (and where that needs to be set to hopefully make a difference) and then moving up from there.

apache2 config optimizing

As a result of the performance investigations, I did uncover some areas of our apache2 config that could see some improvements.

The following documents proved useful in this regard:

pf configuration fun

Some useful pf information:

I found some tweaks to apply to capri's pf.conf.

MTU experiments

I set an MTU of 1400 on the following machines/interfaces (manually set, will go away on reboot):

  • lab46:eth0
  • www:eth0
  • caprisun:pcn0
  • caprisun:em1

No discernible difference in performance yet (at least on the “known working” networks)… will be testing this and my other changes later on our test environment:

  • potentially better packet scrubbing (including max-mss of 1300– to complement the mtus of 1400).
  • maybe better icmp packet passage
  • apache2 optimizations

The MSS/MTU ratio is not optimal, but somewhat intentionally undercut to see if it makes a difference. We'll be send more but smaller packets. Let's see how this works.

Update

After some further searching, playing, and exploring, I have refined the values a bit.

On the above-mentioned systems, MTU is now set to 1440, and MSS is at 1400.

Still have to do much more testing to see if this actually made any difference whatsoever.

Neat links that have some interesting information:

April 14th, 2011

queue fun on capri

Started tinkering with altq on pf on caprisun…

Some useful links:

April 13th, 2011

caprisun pcn0

While debugging the “lab46 slowness” and follow-up with the LAIR's interaction with IT, I discovered that the subnet and broadcast address associated with 143.66.50.18 was incorrect. So we're going to try setting it manually with the correct values.

Before:

!ifconfig $if lladdr 00:16:3e:5d:88:d6 metric 0
dhcp

After:

!ifconfig $if lladdr 00:16:3e:5d:88:d6 metric 0
inet 143.66.50.18 255.255.255.248 143.66.50.23
!route add -net default 143.66.50.22

Caprisun was up for 300 days.

hostname.em1

Further optimizations.

Before:

inet 10.80.2.1 255.255.255.0 10.80.2.255
inet alias 192.168.10.248 255.255.255.0 192.168.10.255

After:

inet 10.80.2.1 255.255.255.0 10.80.2.255

hostname.em0

And optimizations here too.

Before:

inet 10.10.10.2 255.255.255.0 10.10.10.255
!route add 10.80.1.0/24 10.10.10.1
!route add 10.80.3.0/24 10.10.10.3

After:

inet 10.10.10.2 255.255.255.0 10.10.10.255

rc.local nullmailer

Boot sequence was hanging on the start of nullmailer-send, because it was grabbing the foreground.

This was fixed by doing the following:

(/usr/local/sbin/nullmailer-send 2>&1 &) && echo -n "nullmailer "

Still spits out “rescanning queue”, but does what we want.

April 5th, 2011

Updates

Did an aptitude update && aptitude upgrade on the wildebeest herd, nfs1 and nfs2. Minor updates were applied.

LAIRstation updates

Also updated the lairstations, bringing lairstation4 back on-line.

Many updates were available to install, and then I also installed SDL and the Java JDK (had to enable the canonical entry in sources.list).

Total package list is as follows:

lairstationX:~# aptitude install libsdl-ttf2.0-0 libsdl-ttf2.0-dev libsdl-sound1.2 libsdl-sound1.2-dev libsdl-net1.2 libsdl-net1.2-dev libsdl-mixer1.2 libsdl-mixer1.2-dev libsdl-gfx1.2-4 libsdl-gfx1.2-dev libsdl-image1.2 libsdl-image1.2-dev sun-java6-jdk

April 4th, 2011

Symbol Fun

Squirrel was playing with some symbol manipulation in his SunPC decoding efforts, and I found some interesting knowledge worth documenting.

First up, two files– main.c and thing.c, defined as follows:

main.c

#include <stdio.h>
 
void thing();
 
int main()
{
    thing();
    return(0);
}
 
void thing()
{
    printf("Local thing\n");
}

thing.c

#include <stdio.h>
 
void thing()
{
    printf("External thing\n");
}

Compiling both

lab46:~$ gcc -c main.c
lab46:~$ gcc -c thing.c

Checking symbols

lab46:~$ nm main.o
0000000000000000 T main
                 U puts
0000000000000015 T thing
lab46:~$ nm thing.o
                 U puts
0000000000000000 T thing
lab46:~$ 

Weakening symbols

The intention is to have main() call the thing() from thing.o, instead of main.o… but we can't do this by default because it sees the thing() in main.o.

To get around this, one way is to alter the state of the local thing symbol to a “weakened” state, which can be accomplished via the objcopy command:

lab46:~$ objcopy -Wthing main.o
lab46:~$ nm main.o
0000000000000000 T main
                 U puts
0000000000000015 W thing
lab46:~$ 

Note from the previous nm output the change from 'T' to 'W' states for the thing symbol. This means thing as part of main.o is now weakened.

Compiling it all together

We can then perform the desired operation:

lab46:~$ gcc -o final main.o thing.o
lab46:~$ ./final
External thing
lab46:~$ 

Pretty cool.

Question is… is there a way to remove the thing symbol entirely from main.o, as if it were never implemented there in the first place? That is still the question.

Some pages I found that might be useful:

April 3rd, 2011

commitchk error

It was discovered (last week) that the wiki edits were being incorrectly reported for the commitchk script (component of the grade not-z).

I finally got around to looking at it, re-remembering what I was doing, and starting the debug process, when I noticed an instance of “notes/data” hardcoded into some filtering logic. Aha! Of course the wiki edits were coming up empty… it was never allowed to look in the right place!

Changed “notes/data” to “notes/${CLASS}” as it should have been, and it lit right up, no other changes to the script needed.

Fixed.

April 2nd, 2011

Plan9 on VirtualBox

On a whim I did a google search for running Plan9 under VirtualBox (after a few unsuccessful attempts at getting a MacOS X VM up and running)… turns out that, as of ~ version 4.0.2 of VirtualBox, Plan9 actually appears to run, and runs well.

I confirmed this after downloading last night's ISO (for both the main Plan9 distro and 9atom)… the regular Plan9 distro installed without a hitch… in fact did better than I expected, as I felt I pushed it a little (1280x1024x24!)

Got networking up and running via mostly manual config, and proceeded to install some of the necessary contrib packages (vim!) so I can complete the actual system configuration. Of course, existing entirely within rio from the start (doing a graphical install of Plan9 is a pleasant change from what I'm used to) makes a lot of things a lot easier.

DSLAB cluster back up

A few days ago, a daily script of mine reported a loss of contact with the entire DSLAB cluster… data was still up, but none of the spartas were.

It turned out a power cable of some sort had been accidentally unplugged. Upon fixing this, the spartas were still inaccessible. Apparently a problem with DNS entries was the issue.

The cluster resumed operations, with all appropriate filesystems mounted.

LAIR backups

First saturday of the month, backups appear to have gone off on schedule.

MacOS X VM audio

Apparently there are ways to get audio working in a MacOS X VM:

April 1st, 2011

Power blip

Squirrel reported a brief power blip in the LAIR. Nothing appears to have been adversely affected. No machines (even non-UPS machines) appear to have gone off-line.

NFS IRQ errors

This was noticed prior to this day, but getting around to reporting it now— I noticed a slew of these appear in the logs for nfs one day earlier this week:

[38622060.304192] eth0: too many iterations (6) in nv_nic_irq.
[38622060.316295] eth0: too many iterations (6) in nv_nic_irq.
[38622060.447023] eth0: too many iterations (6) in nv_nic_irq.
[38622064.098704] eth0: too many iterations (6) in nv_nic_irq.
[38622064.112054] eth0: too many iterations (6) in nv_nic_irq.
[38622064.126185] eth0: too many iterations (6) in nv_nic_irq.
[38622065.153317] eth0: too many iterations (6) in nv_nic_irq.

Not what I'd like to see… but the machine still appears to be hauling along. The great news is that nfs1 had been running uninterrupted for 377 days (so the errors occurred Tuesday afternoon, March 30th)! Over a year of uptime… that put it in the uptime club with cobras (Which we never could get an exact figure for)… not sure if we've got any other long running gems plugging away in the LAIR.

<html><center></html>

<html></center></html>

haas/status/status_201104.txt · Last modified: 2011/05/01 01:12 by 127.0.0.1