User Tools

Site Tools


haas:status:status_201410

STATUS updates

TODO

  • the formular plugin is giving me errors, need to figure this out (email assignment form)
  • update grade not-z scripts to handle/be aware of winter terms
  • update system page for (new)www
  • load balance/replicate www/wiki content between LAIR and DSLAB
  • update system page for db
  • migrate nfs1/nfs2 system page to current wiki

URLs

Some links of interest:

Other Days

October 22nd, 2014

pod00 flaked out

pod00 finally gave up the ghost, prompting an early morning swap-out prior to class Thursday.

Now running on an Optiplex 755.

October 11th, 2014

rsync delete locally deleted files on remote

Turns out my rsync setup was not removing files deleted on the local side of things. A quick tweak seems to have fixed the problem nicely:

sokraits:~# rsync -avF --delete --one-file-system / data.lair.lan:/export/tftpboot/netboot/sokraits/disk/

The -F argument (possibly not even needed in this scenario) and, more importantly, –delete, took care of the issue. Added this to the cron job on all applicable machines. This should now bring my initramfs sizes down on sokraits and halfadder.

Useful URL:

package tweaking on halfadder

I was feeling like living on the edge, so I removed tasksel remotely (which due to dependencies also removed the ssh server)… turns out it did NOT kill my active connection, but just in case it did, I temporarily installed a telnet server (telnetd) and had an established connection via that.

Also: root logins via telnet are disabled by default, due to a securetty module being loaded by pam… to make it work in a pinch, edit /etc/pam.d/login and comment out the securetty line.

Removed my desired redundant packages, reinstalled openssh-server (backed up the ssh keys to avoid identity issues), and uninstalled telnetd. All set.

Seems I did not migrate any VMs over to halfadder… fixing that presently.

Useful URL:

October 10th, 2014

data2 back up, with ramdisk

After my Wednesday grub wranglings, data2 was still not yet back up and running out of a ramdisk, so after some further tweaking, I discovered it had regenerated its /etc/fstab file, and therefore refusing to properly boot. Once I fixed it (no device for /), it was happy.

grub

I figured out how to disable VGA modes for the grub menu.

First, edit /etc/default/grub and enable:

GRUB_TERMINAL=console

It is also said that the following will also do the task (I did both, although it is very likely only one is ultimately needed, and it may well be an either/or type of situation):

GRUB_GFXMODE=text

Next, I re-ran:

data2:~# update-grub

I could then edit /boot/grub/grub.cfg and change the initrd to my custom ramboot version.

Reboot, and bam! We're all set (and can see the grub menu at boot).

October 8th, 2014

halfadder

With some (hopefully) free time Wednesday, I commenced the transition of halfadder over to the full-system-in-initramfs setup.

Hoping to figure out the exact details to make it go.

  • Specifying root=/dev/ram0 does not seem to matter
  • Attempting a boot with core /dev files copied into /dev … no go
  • It seems having init located in / was the trick, even though I tried a boot specifying init to be in /sbin… so just some thing to remember.

halfadder is now back up and running.

nfs2 redo

With an SSD now freed up, I am going to replicate data1's boot drive and install that into data2, so we'll be using one fewer magnetic drive. Also, it will give me an opportunity to fix some issues on data2's current install.

October 6th, 2014

sokraits/halfadder kernel set to 3.14

Due to potential (current) instability with ocfs2 modules in the 3.16 kernel, I have reverted back to 3.14 for the time being.

In time I will give it another shot, but for now, I err on the side of stability.

lair package repo back up

After far too long being unavailable, I finally looked into getting the lair package repository back up and running.

Turns out mini-httpd wasn't up to the task, so I reverted back to the original (and preferred) micro-httpd, which has no config, and is run via inetd.

Basically, the configuration is as follows:

www	stream	tcp	nowait nobody	/usr/sbin/micro-httpd /usr/sbin/micro-httpd /export/repository

Bam! The mirror is accessible, aptitude configured for the lair repository lights up during an update.

Now to finally update those lair packages…

sokraits netboot, cont'd

After taking a break for a day, it is time to resume my efforts to get sokraits with its state-of-the-LAIR-art system-in-an-initramfs up and running. I have a few variables to try, hopefully it is just a matter of having “/init” in place (or, specifying a path to init from the kernel command-line).

updates

Still getting a panic… quickly caught some glimpse of an error indicating something to do with initramfs… be nice if I could get the output to hang around long enough so I could determine what the problem is.

Useful URLs:

For the moment assuming the problem is a lack of proper filesystem mounting and/or lack of basic device files.

I've copied over some of the basic device files (hopefully this doesn't conflict with me also mounting devtmpfs in /etc/fstab).

Another variable to try is booting with the older (but known working 3.14 kernel, vs. the brand new 3.16 kernel that has yet to be verified working.)

moar updates (working!)

SUCCESS!

Seems the /etc/fstab was the key… sokraits is now back up and running. Currently re-syncing its drbd volume with halfadder.

Had to do a bit of intervention to get it talking again. Turns out following the “manual split brain recovery” instructions did the trick:

Essentially:

sokraits:~# drbdadm disconnect xen_data
sokraits:~# drbdadm secondary xen_data
sokraits:~# drbdadm connect --discard-my-data xen_data

Then on halfadder:

halfadder:~# drbdadm connect xen_data

Apparently I needed to disconnect, which I wasn't doing.

But hey, that is resolved. Once data is synced, just a few more tweaks and we should be production ready… probably leave it up overnight to see how it fares (or through to Wednesday, depending), and then have it take over while I perform similar enhancements on halfadder.

October 4th, 2014

pxeboot stuff

So, my first attempt at netbooting sokraits with its filesystem-in-an-initrd worked with a resounding success. Unfortunately, I had forgotten to instruct it not to try and activate swap, so a kernel panic ensued.

But the sucker not only booted, it did so exactly as I intended– hypervisor, kernel, initrd.

That is beyond fantastic.

Also: syslinux files need to be chmod 755/644 in order to be read. Live and learn.

Currently updating the initrd images with my fstab fix, and perhaps I'll wander in later and give it a second try. It would be wonderful to get sokraits back up and running, and have it in production before my usual classweek busy-ness commences.

update

Fixing the fstab (and updating /etc/mdadm/mdadm.conf) did not resolve the issue.

Upon further investigation, I found mention in this document:

That:

The kernel will always panic if PID 1 exits

Which is pretty much my problem; it is unable to locate a suitable init process. So I made a link to /init from /sbin/init

If it can handle the symlink, this should be enough to make it happy.

Also questions needing to be answered:

  • do we need ro option since it is the ramdisk?
  • do we need to specify root device since it is the ramdisk?
  • are we using tmpfs / initramfs? Or are we doing some sort of initrd still? (likely figure this out once the thing works).

Certainly a few variations to try. I am still excited as it feels close- just have to connect the last few dots, and this amazing and long-sought realm will be at hand!

lairdump capri

Not sure why it didn't do it before, but capri claimed an ssh key mismatch when doing its lairdump this morning. Cleaned them out of known_hosts, and all works fine. Perhaps the keys on data1/data2 are different? I should fix that when I rebuild data2.

Same deal with juicebox… both now resolved.

removing 'rc' status packages

Sometimes, after uninstalling or upgrading a package, remnants of older packages stay around, and have a status listing of 'rc' in the dpkg -l output.

I finally thought to look into this, after an aptitude remove –purge failed to resolve it.

The solution? dpkg -P <pkgname>

Works like a charm. Even wrote a little for loop to take care of all the ones installed on the current system:

machine:~# for pkgname in `dpkg -l | grep '^rc' | sed 's/  */;/g' | cut -d';' -f 2`; do dpkg -P ${pkgname}; done
...
for pkgname in `dpkg -l | grep '^rc' | sed 's/  */;/g' | cut -d';' -f 2`; do
    dpkg -P ${pkgname}
done

Useful URL:

diskless NFSroot

URL

First and foremost, this document looks amazing:

This one is also proving informative:

sokraits redux: first attempt

For my first attempt at reviving sokraits, I have done an rsync of halfadder's filesystem (onto data), made a copy to become the new sokraits, tweaked the pertinent config files to give it its identity, and restored unique data (like system ssh keys).

Thanks to my sometimes prolific documenting on the wiki, I saved myself some headaches (I neglected to back up the udev persistent net rules, and mdadm.conf, which had unique data in them).

initial ramdisk with entire system in it

Once the copy was complete, I stuffed it all in its own initrd file:

data1:/export/tftpboot/netboot/sokraits/disk# find . | cpio -c -o | gzip -9 > ../initrd.sokraits.gz

Repeat the same step for halfadder.

configure custom pxe boot

In /export/tftpboot/pxelinux.cfg/ are the IP-based config files (or default if there are no matches– default is what brings up the LAIR netboot menu).

Created two new files (0A50012F for halfadder, 0A50012E for sokraits), and configured them as follows:

sokraits

default netboot
prompt 1
timeout 2

label netboot
    kernel netboot/mboot.c32
    append netboot/sokraits/xen-4.4-amd64.gz --- netboot/sokraits/linux console=tty0 root=/dev/ram0 ro --- netboot/sokraits/initrd.gz

label memtest
    kernel distros/memtest/memtest86+

halfadder is the same, just switch out the hostname directory appropriately.

So then, we just have to ensure that the hypervisor, kernel, and initial ramdisk are in the netboot/sokraits/ directory, and we may be good to go.

Helpful URLs:

halfadder FS synced

In preparation for bringing sokraits back into action, I've done the initial step of duplicating halfadder's data to the fileserver, into /export/tftpboot/diskless/halfadder/, then making a copy of it in /export/tftpboot/diskless/sokraits/

Magic rsync line is:

  • rsync -av –one-file-system / data.lair.lan:/export/tftpboot/diskless/halfadder/

It performs the transfer over ssh. I have added a cronjob to keep it up-to-date.

Now I can reintegrate some unique sokraits data bits and set up the pxe booting side of things and see if this thing actually works.

Useful URL:

October 3rd, 2014

DSLAB

During a Geneseo visit, I made sure to scrounge up and set aside two drives for use as spares, as some sort of disk failure will likely occur over the course of the year. Drives are placed next to the fileserver in static bags.

haas/status/status_201410.txt · Last modified: 2014/11/01 05:12 by 127.0.0.1