This is an old revision of the document!

Project: UNIX DATA RECOVERY (udr2)

Errata

Typos and bug fixes:

no fixes of note

Objective

Continuing our “1337 haxxing” series of projects, we've found considerable conceptual self-imposed roadblocks blocking our employment of otherwise simple computing properties (that data is a series of bytes, and ultimately, that everything is a file).

We resume our exploration with another practical example, this time based on real data generated by an EEG device. The intersection of hardware, software, and logic play vital roles in problem solving activities (even if it is just enabling analysts to make more educated guesses), and seems to be a skill increasingly taken for granted and alien.

Background

An electroencephalogram (EEG) is a test that detects electrical activity in your brain. Brain cells communicate via electrical impulses and are active all the time, even when you're asleep. This activity can be visualized as wavy lines on an EEG recording, but ultimately is sourced from raw bytes sampled from the device performing the data acquisition.

Sleep is a common area of study where this is particularly applicable, and is even somewhat of a modern day fad- smartphone apps to special wristbands can be used to monitor aspects of our sleeping quality, and more products are coming to market all the time.

We will be analyzing data generated by a consumer-grade EEG headset– basically a device one wears when going to sleep, and via conductive pads in contact with the skin on the forehead, monitors the brainwaves and can determine their level of activity (especially in regard to whether they are asleep, and what level of sleep they are at).

The data was obtained from a live session (me, sleeping) during my initial polyphasic sleep adaptation a few years ago– so there'll be opportunities to see “normal, boring” sleep patterns, transitions, and even more optimized and sleep sessions (including rather restful 20-minute power naps).

The device used generated bytes of raw data, which I captured into individual data files. We will be learning how that data is structured so that we may parse it, and ultimately derive information such as sleep duration, type of sleep, etc.

Like udr0 and udr1… we're just manipulating (reading/writing) bytes of data, and applying specific rules and methods to how we interpret various bytes, or sequences of bytes.

Once again there is a conceptual as well as practical angle… some people will struggle more with one over the other, and as always: questions are not just encouraged, they are expected for success!

EEG data packet format

EEG data is represented in the form of data packets– collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.).

It is important to note that the data, in cases of multibyte values, is little endian in orientation.

The format of the data packet is as follows:

Data Packet

Field	Length (bytes)	Description
'A' (0x41)	1	character starting the packet
4 (0x04)	1	the protocol “version”, of which only 4 is currently supported
checksum	1	a one byte checksum formed by summing the identifier byte and all the data bytes
msglen	2	a two byte message length (little endian). This length includes the size of the data payload plus the identifier
inv_msglen	2	is the inverse of msglen sent for redundancy. If msglen does not match ~inv_msglen, we can start looking for the next packet immediately, instead of reading some arbitrary number of bytes, based on a bad length
time_sec	1	the lower 8 bits of the current unix time (when session was recorded)
sub_sec	2	the 16-bit sub-second (runs through 0xFFFF in 1 second), LSB first
seqnum	1	the 8-bit sequence number
datatype	1	the datatype (see data type subfield table below)
datablock	variable	the array of binary data

EEG data packet format

EEG data is represented in the form of data packets– collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.). The format of the data packet is as follows:

Data Type Subfield of Data Packet

These are the data types generated by the EEG device and could manifest within the data file. Note that this data will be contained in the datatype field of the data packet, and any follow-up data will be present in the datablock array field.

Type ID (hex)	Type Name	Description
0x00	event	an event has occured (see event table below)
0x02	slice_end	marks the end of a slice of data (a slice can span multiple packets)
0x03	version	version of the raw data output
0x80	waveform	raw time domain brainwave
0x83	frequency_bins	frequency bins derived from waveform
0x84	signal	signal quality range of waveform (0-30)
0x8A	timestamp	full timestamp from EEG device's RTC
0x97	impedance	impedance across the headband
0x9C	badsignal	signal contains artifacts
0x9D	sleepstage	current sleep stage (produced in 30 second samples, see sleepstage table below)

Event table

These are the possible events generated by the EEG device and could manifest within the data file. Note that this data will be contained in the datablock field (the array) of the data packet when the datatype has been identified as an event.

Event ID	Event Name	Description
0x05	session_start	data acquisition session has commenced
0x07	sleep_start	user is asleep
0x0E	headset_disengaged	EEG headset has been set on dock
0x0F	headset_engaged	EEG headset taken off dock
0x10	alarm_off	user turned off alarm functionality
0x11	alarm_snooze	user hit enabled snooze delay on alarm functionality
0x13	alarm_play	set alarm is now going off
0x15	session_end	data acquisition session has ceased
0x24	headset_introduce	a new headband ID has been read

Sleep Stage table

These are the possible sleep stages recognized by the EEG device (this data will be located in the datablock field (the array) of the data packet when the data type has been identified as a sleepstage.

SleepStage ID	SleepStage Name	Description
0x00	undefined	insufficient data to determine sleep stage
0x01	conscious	user is in an awakened state
0x02	rem	user is experiencing REM (Random Eye Movement) sleep
0x03	light	user is experiencing light sleep
0x04	deep	user is experiencing deep sleep (SWS)

Frequency Bins table

Frequency Bins are a measurement of the current waveform frequency being experienced, which is analyzed by the EEG device and factors into the Sleep Stage determination. This would be considered a more raw form of data, should additional analysis be desired.

ID	Named Range (Hz)	Description
0x00	2-4	Delta
0x01	4-8	Theta
0x02	8-13	Alpha
0x03	13-18	Beta
0x04	18-21	Beta
0x05	11-14	Beta (sleep spindles)
0x06	30-50	Gamma

Obtain the file

This week's project is located in the spring2015/udr1/ directory of the UNIX Public Directory, in a file called: data.file

Make a copy of this into your home directory somewhere and set to work.

NOTE: Hopefully it has been standard practice to locate project files in their own unique subdirectory, such as under src/unix/, where you can then add/commit/push the results to your repository (you ARE regularly putting stuff in your repository, aren't you?)

Process

The data you seek (2 files) is obfuscated and contained within this file.

Plain text directions give clues on how to find both pieces of information, and it is up to you to use your skills to extract the necessary data.

Some additional information:

The first file should be named udr1.text and be properly oriented.
The second (big) file runs from the starting point until the very end of the file
It should be named 'gizmo', and reside in your current working directory.
gizmo is binary data, and entirely reversed- you need to get its bytes back in order (last byte should be first byte, 2nd to last should be 2nd, etc.)
- You are to write a shell script to perform the de-reversal of the data, reading from data.file and through whatever processing is needed, produce the file called gizmo.
The urev tool has some additional constraints with respect to gizmo… running it should notify you of any details you are lacking.

Useful tools

You may want to become familiar with the manual pages of the following tools (in addition to tools you've already encountered):

dd(1)
bc(1)
du(1)
bash(1) shell scripting
od(1)
bvi(1)
hexedit(1)

… along with other tools previously encountered.

Submission

Successful completion will result in the following criteria being met:

Resulting file with proper settings should enable you to run urev tool.
You have completed all weekly exercises (96, I think) before the deadline, being mindful of the intentionally-paced nature of urev.
- Bonus opportunity: while still performing a minimum of 3 distict urev sessions, how could you get around the urev-imposed time limit? (Without copying/changing urev).
When all is said and done, you will submit 3 files:
- udr1.text
  - Append the dd line(s) as well as any other command lines needed to extract and properly re-orient the file. Also be sure to indicate what is in the file you found (content, not just type of data).
- your bash script enabling the processing of data.file to produce gizmo
  - Be sure to include comments indicating the reasoning behind actions taken
- Your extracted/processed gizmo file

Submit

Please submit as follows:

lab46:~/src/unix/udr1$ submit unix udr1 udr1.text getgizmo.bash gizmo
Submitting unix project "udr1":
    -> udr1.text(OK) 
    -> getgizmo.bash(OK)
    -> gizmo(OK) 

SUCCESSFULLY SUBMITTED
lab46:~/src/unix/udr1$

Lab46 Wiki

Sidebar

Table of Contents

Project: UNIX DATA RECOVERY (udr2)

Errata

Objective

Background

EEG data packet format

Data Packet

EEG data packet format

Data Type Subfield of Data Packet

Event table

Sleep Stage table

Frequency Bins table

Obtain the file

Process

Useful tools

Submission

Submit

Lab46 Wiki

User Tools

Site Tools

Sidebar

Table of Contents

Project: UNIX DATA RECOVERY (udr2)

Errata

Objective

Background

EEG data packet format

Data Packet

EEG data packet format

Data Type Subfield of Data Packet

Event table

Sleep Stage table

Frequency Bins table

Obtain the file

Process

Useful tools

Submission

Submit

Page Tools