User Tools

Site Tools


haas:status:status_201504

STATUS updates

TODO

  • update grade not-z scripts to handle/be aware of winter terms
  • update system page for (new)www
  • load balance/replicate www/wiki content between LAIR and DSLAB
  • update system page for db

URLs

Some links of interest:

Other Days

April 10th, 2015

DSLAB cluster possibly back up

After failing to get a new version of OpenMPI up and running, I reverted to the squeeze packages, and downloaded version 5.1.0 of LS-DYNA… it seems to run, although the sample test file I have doesn't seem to work.

I had to make a couple manual symlinks to shared object libraries for OpenMPI for everything to finally light up.

Now all I need is to regain access to the serial console on the CoRAID, rig up some sort of status notification, and also rig up something so that periodic backups can be performed.

April 4th, 2015

DSLAB cluster/fileserver/CoRAID

The backup drive has been attached to node00, so I set about re-remembering how to mount an exFAT-formatted filesystem. Seems I just needed to set up apt to check squeeze-backports, and then just install fuse-exfat and exfat-utils (the latter was already installed).

I was then able to copy back some of the home directory files to facilitate getting cluster operations back up and running.

The only outstanding item I currently need is a current year DYNA license, which I am hoping is lingering as an e-mail attachment somewhere.

Everything is a file

I wrote a quick little program that read bytes from STDIN and sent them to STDERR… and I redirected /dev/ttyUSB0 as STDIN and STDOUT, so I could use it to communicate with the CoRAID, in order to obtain a status report of the drives (possibly as a daily cron job).

It actually “worked”… a few issues with early termination, and it didn't exactly like me overshooting things… kernel ended up throwing an exception with the module, so I had to reboot the fileserver. Still pretty darn neat.

April 1st, 2015

DSLAB cluster

About a week or so ago, we had a two drive failure in the CoRAID device, rendering the data volume corrupt.

After some diagnostics (serialing in to the CoRAID) and scavenging replacement drives, we are slowly rebuilding.

Some improvements underway:

  • RAID 10, for a 1.5TB volume support up to 2 drives failing (if the right 2nd drive is failing)
  • Possibly new version of DYNA (possibly even the latest) due to
  • Possibly building and installing a newer (possibly latest) version of OpenMPI

I did away with the LVM, so we now have a straight mount from the CoRAID, now also formatted with ext3 instead of XFS (simpler, possibly can take more of a beating).

Waiting on the last offline backup to be retrieved, so various configuration files can be retrieved.

So far it appears promising that new versions of DYNA and OpenMPI may be deployed, giving us an unexpected upgrade (at least making the whole endeavor more exciting).

haas/status/status_201504.txt · Last modified: 2015/05/03 13:59 by wedge