=====Overview=====
My standing policy on lab46 account persistence has been that accounts will remain so long as:
* no more than 2 years of time has gone by since last account logon
Fairly simple, but with the increasing capabilities of Lab46 over the years, especially as it has grown in the LAIR, has seen some complications become introduced:
* web-only users (ie sftp) never log into the system (lastlog never reflects a login)
* lab46 upgrades over the years have lost original lastlog data
* maildir data is on mail, homedir data is on nfs, auth data is on auth, login data on lab46
The idea of a comprehensive stale user grooming script had snags back then, but even more now.
=====20100105 attempt=====
As I find myself debugging the newly establishing mail functionality, I figured it would be easier if I didn't waste my efforts on accounts that would never likely be used again. So I set about seeing what I could about trimming down the number of accounts I'd have to deal with.
====script1: never logged ins and public_html===
My first script focused on the following criteria:
* users in group 'lab46'
* ~/real.bash_profile still exists (hinting at a never log in event, but as I found out, not always true)
* lastlog would report last system login (or ** Never Logged In **)
* public_html's timestamp would be analyzed
That script is as follows:
#!/bin/bash
#
for user in `/bin/ls -1`; do
if [ -e $user/real.bash_profile ] && [ `groups $user | grep lab46 | wc -l` -eq 1 ]; then
loginchk="[ `lastlog | grep $user | cut -c 44-73 2>/dev/null` ]"
htmlout="`ls -ld $user/public_html | sed -e \"s/drwx.*$user lab46 [1-9][0-9]* \(20[0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]\) $user\//\1 /\"`"
printf "%10s ... " $user
echo "$loginchk $htmlout @$user"
fi
done
The problem, I discovered, is that some users (and even very much still active users) STILL possessed their ~/real.bash_profile files. They must have interrupted the script when it prompted for a password change, never finishing the process and thereby deleting that file.
At any rate, this would produce output, and I tweaked it favorably by doing the following when I ran it:
lab46:/home$ sudo /path/to/myscript | grep '200[0-7]' | cut -d'@' -f2 > /tmp/userdelcandidates
This would output any user that matched data (either within lastlog output OR public_html timestamp) falling in those range of years. Conveniently, it worked out quite nicely; certainly room for error, but upon personally analyzing the list of users, I didn't sense any that were in fact recent, so I went ahead with the process of removing them.
The following script was used to back them up (on both nfs and mail), preserving permissions, so that if an error was made, they could at least be restored relatively easily, without any loss of data.
cd /home # basically, cd to where the data is
for user in `cat /tmp/userdelcandidates`; do
echo "$user ..."
tar cpf /tmp/userbackup/$user-20100105.tar $user
gzip -9 /tmp/userbackup/$user-20100105.tar
rm -rf $user
done
In all, 59 users were able to be scrubbed as a result of this check.
====Script 2: .bash_history may hold the key====
I realized through some variations of the first script that there are some users who have logged in, but their accounts have fallen into disuse. Also, their accounts were created before public_html was automatically created on account login (instead, users were expected to create it if they desired to use it... how times have changed).
For once-active logging in lab46 users, there IS a file that will hold a nice timestamp of their effective last login.
Let's see who this might affect:
lab46:/home$ sudo /bin/ls -l */.bash_history | grep '200[0-7]' | cut -c14-24 > /root/userdelcandidates2
This would not be the final list.. I'd have to cross-check it with some other factors (public_html, or heck, even every other file in the directory for timestamps).
But as it is, that action alone tagged 107 users. Scanning the list, I do not see any false positives, and likely could just go ahead and backup/remove those listed (but I won't, out of the sake of completeness).
Processing users that have .bash_history history within the designated "stale" range, I came up with the following, which checks out ALL files within that particular user's home directory for any sign of modern (read: 2009 or 2010) activity:
for user in `cat /root/userdelcandidates2`; do
echo -n "$user:"
/bin/ls -l $user/* $user/.[a-zA-z]* | sed -e 's/^.......... .*lab46 *[1-9][0-9]* //g' | \
egrep '(2009|2010)' | grep -v pine | grep -v '2009-04-07' | wc -l
done > /root/userdelcandidates3
Which then allows me to do this:
for user in `cat /tmp/userdelcandidates3`; do
tar cpf /tmp/userbackup/$user-20100105.tar $user
echo "$user ..."
gzip -9 /tmp/userbackup/$user-20100105.tar
rm -rf $user
done
106 users with that run (59+106 = 165 down, with 200 as my goal).
Although there were 2 false positives:
* one user had a filename that contained the substring '2010'
* one user who was in no way a stale user showed up in the list
* the size of their ~/.bash_history was '2005' blocks
So it sounds like I need to incorporate a clean separation of file names and sizes from time stamps. This would eliminate those above-encountered false positives nicely.
====Script 3: The script that may rule them all====
As I considered the previous scripts, something occurred to me-- be they a shell user or web-only user, the FILES hold the truth.
As I discovered with the .bash_history exploration above, I ended up scanning all files in the directory and then scoring them based upon how many times 2009 or 2010 took place (based on an original .bash_history reporting of 2000-2007, so there was the possibility that some users who last used the system in 2008 could have been unintentionally snagged... whoops. I didn't see anyone that stuck out though, and I DID back up their data, so not a big deal in the long run).
What if we were to use the following script, in general (a lot more processor intensive, but then it makes it seem like lots of important work is being done):
for user in `/bin/ls -1`; do
echo -n "$user:"
/bin/ls -l $user/* $user/.[a-zA-z]* | grep -v 'pine' | \
sed -e "s/^.*$user *[a-zA-Z0-9][a-zA-Z0-9]* *[0-9][0-9]* *\(20[0-9][0-9]-[0-9][0-9]-[0-9][0-9] *[0-9][0-9]:[0-9][0-9]\) *.*$/\1/" | \
egrep '(2008|2009|2010)' | grep -v '2009-04-07' | wc -l
done | grep ':0$' > /tmp/userdelcandidates
Bam! Looks like that problem is solved. As it turns out, there was only one additional user that popped up, so the grand total ends up being at 166... not quite the 200 I thought I'd hit... but still, that's 166 less accounts to deal with.