User Tools

Site Tools


Sidebar

projects

  • uxi0 (due 20150128)
  • arc0 (due 20150204)
  • pbx0 (due 20150211)
  • pbx1 (due 20150225)
  • udr0 (due 20150311)
  • udr1 (due 20150318)
  • udr2 (due 20150408)
  • EoCE - bottom of Opus (due 20150514 by 4:30pm)
haas:spring2015:unix:projects:udr2

This is an old revision of the document!


Corning Community College

CSCS1730 UNIX/Linux Fundamentals

~~TOC~~

Project: UNIX DATA RECOVERY (udr2)

Errata

Typos and bug fixes:

  • no fixes of note

Objective

Continuing our “1337 haxxing” series of projects, we've found considerable conceptual self-imposed roadblocks blocking our employment of otherwise simple computing properties (that data is a series of bytes, and ultimately, that everything is a file).

We resume our exploration with another practical example, this time based on real data generated by an EEG device. The intersection of hardware, software, and logic play vital roles in problem solving activities (even if it is just enabling analysts to make more educated guesses), and seems to be a skill increasingly taken for granted and alien.

Background

An electroencephalogram (EEG) is a test that detects electrical activity in your brain. Brain cells communicate via electrical impulses and are active all the time, even when you're asleep. This activity can be visualized as wavy lines on an EEG recording, but ultimately is sourced from raw bytes sampled from the device performing the data acquisition.

Sleep is a common area of study where this is particularly applicable, and is even somewhat of a modern day fad- smartphone apps to special wristbands can be used to monitor aspects of our sleeping quality, and more products are coming to market all the time.

We will be analyzing data generated by a consumer-grade EEG headset– basically a device one wears when going to sleep, and via conductive pads in contact with the skin on the forehead, monitors the brainwaves and can determine their level of activity (especially in regard to whether they are asleep, and what level of sleep they are at).

The data was obtained from a live session (me, sleeping) during my initial polyphasic sleep adaptation a few years ago– so there'll be opportunities to see “normal, boring” sleep patterns, transitions, and even more optimized and sleep sessions (including rather restful 20-minute power naps).

The device used generated bytes of raw data, which I captured into individual data files. We will be learning how that data is structured so that we may parse it, and ultimately derive information such as sleep duration, type of sleep, etc.

Like udr0 and udr1… we're just manipulating (reading/writing) bytes of data, and applying specific rules and methods to how we interpret various bytes, or sequences of bytes.

Once again there is a conceptual as well as practical angle… some people will struggle more with one over the other, and as always: questions are not just encouraged, they are expected for success!

EEG data packet format

EEG data is represented in the form of data packets– collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.).

It is important to note that the data, in cases of multibyte values, is little endian in orientation.

The format of the data packet is as follows:

Data Packet

Field Length (bytes) Description
'A' (0x41) 1 character starting the packet
4 (0x04) 1 the protocol “version”, of which only 4 is currently supported
checksum 1 a one byte checksum formed by summing the identifier byte and all the data bytes
msglen 2 a two byte message length (little endian). This length includes the size of the data payload plus the identifier
inv_msglen 2 is the inverse of msglen sent for redundancy. If msglen does not match ~inv_msglen, we can start looking for the next packet immediately, instead of reading some arbitrary number of bytes, based on a bad length
time_sec 1 the lower 8 bits of the current unix time (when session was recorded)
sub_sec 2 the 16-bit sub-second (runs through 0xFFFF in 1 second), LSB first
seqnum 1 the 8-bit sequence number
datatype 1 the datatype (see data type subfield table below)
datablock variable the array of binary data

EEG data packet format

EEG data is represented in the form of data packets– collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.). The format of the data packet is as follows:

Data Type Subfield of Data Packet

These are the data types generated by the EEG device and could manifest within the data file. Note that this data will be contained in the datatype field of the data packet, and any follow-up data will be present in the datablock array field.

Type ID (hex) Type Name Description
0x00 event an event has occured (see event table below)
0x02 slice_end marks the end of a slice of data (a slice can span multiple packets)
0x03 version version of the raw data output
0x80 waveform raw time domain brainwave
0x83 frequency_bins frequency bins derived from waveform
0x84 signal signal quality range of waveform (0-30)
0x8A timestamp full timestamp from EEG device's RTC
0x97 impedance impedance across the headband
0x9C badsignal signal contains artifacts
0x9D sleepstage current sleep stage (produced in 30 second samples, see sleepstage table below)

Event table

These are the possible events generated by the EEG device and could manifest within the data file. Note that this data will be contained in the datablock field (the array) of the data packet when the datatype has been identified as an event.

Event ID Event Name Description
0x05 session_start data acquisition session has commenced
0x07 sleep_start user is asleep
0x0E headset_disengaged EEG headset has been set on dock
0x0F headset_engaged EEG headset taken off dock
0x10 alarm_off user turned off alarm functionality
0x11 alarm_snooze user hit enabled snooze delay on alarm functionality
0x13 alarm_play set alarm is now going off
0x15 session_end data acquisition session has ceased
0x24 headset_introduce a new headband ID has been read

Sleep Stage table

These are the possible sleep stages recognized by the EEG device (this data will be located in the datablock field (the array) of the data packet when the data type has been identified as a sleepstage.

SleepStage ID SleepStage Name Description
0x00 undefined insufficient data to determine sleep stage
0x01 conscious user is in an awakened state
0x02 rem user is experiencing REM (Random Eye Movement) sleep
0x03 light user is experiencing light sleep
0x04 deep user is experiencing deep sleep (SWS)

Frequency Bins table

Frequency Bins are a measurement of the current waveform frequency being experienced, which is analyzed by the EEG device and factors into the Sleep Stage determination. This would be considered a more raw form of data, should additional analysis be desired.

ID Named Range (Hz) Description
0x00 2-4 Delta
0x01 4-8 Theta
0x02 8-13 Alpha
0x03 13-18 Beta
0x04 18-21 Beta
0x05 11-14 Beta (sleep spindles)
0x06 30-50 Gamma

Obtain the file

This week's project is located in the spring2015/udr1/ directory of the UNIX Public Directory, in a file called: data.file

Make a copy of this into your home directory somewhere and set to work.

NOTE: Hopefully it has been standard practice to locate project files in their own unique subdirectory, such as under src/unix/, where you can then add/commit/push the results to your repository (you ARE regularly putting stuff in your repository, aren't you?)

Process

The data you seek (2 files) is obfuscated and contained within this file.

Plain text directions give clues on how to find both pieces of information, and it is up to you to use your skills to extract the necessary data.

Some additional information:

  • The first file should be named udr1.text and be properly oriented.
  • The second (big) file runs from the starting point until the very end of the file
  • It should be named 'gizmo', and reside in your current working directory.
  • gizmo is binary data, and entirely reversed- you need to get its bytes back in order (last byte should be first byte, 2nd to last should be 2nd, etc.)
    • You are to write a shell script to perform the de-reversal of the data, reading from data.file and through whatever processing is needed, produce the file called gizmo.
  • The urev tool has some additional constraints with respect to gizmo… running it should notify you of any details you are lacking.

Useful tools

You may want to become familiar with the manual pages of the following tools (in addition to tools you've already encountered):

  • dd(1)
  • bc(1)
  • du(1)
  • bash(1) shell scripting
  • od(1)
  • bvi(1)
  • hexedit(1)

… along with other tools previously encountered.

Submission

Successful completion will result in the following criteria being met:

  • Resulting file with proper settings should enable you to run urev tool.
  • You have completed all weekly exercises (96, I think) before the deadline, being mindful of the intentionally-paced nature of urev.
    • Bonus opportunity: while still performing a minimum of 3 distict urev sessions, how could you get around the urev-imposed time limit? (Without copying/changing urev).
  • When all is said and done, you will submit 3 files:
    • udr1.text
      • Append the dd line(s) as well as any other command lines needed to extract and properly re-orient the file. Also be sure to indicate what is in the file you found (content, not just type of data).
    • your bash script enabling the processing of data.file to produce gizmo
      • Be sure to include comments indicating the reasoning behind actions taken
    • Your extracted/processed gizmo file

Submit

Please submit as follows:

lab46:~/src/unix/udr1$ submit unix udr1 udr1.text getgizmo.bash gizmo
Submitting unix project "udr1":
    -> udr1.text(OK) 
    -> getgizmo.bash(OK)
    -> gizmo(OK) 

SUCCESSFULLY SUBMITTED
lab46:~/src/unix/udr1$ 
haas/spring2015/unix/projects/udr2.1426509302.txt.gz · Last modified: 2015/03/16 12:35 by wedge