Corning Community College
CSCS2700 Data Communications
Sleep Data Exploration
Typos and bug fixes:
An electroencephalogram (EEG) is a test that detects electrical activity in your brain. Brain cells communicate via electrical impulses and are active all the time, even when you're asleep. This activity can be visualized as wavy lines on an EEG recording, but ultimately is sourced from raw bytes sampled from the device performing the data acquisition.
Sleep is a common area of study where this is particularly applicable, and is even somewhat of a modern day fad- smartphone apps to special wristbands can be used to monitor aspects of our sleeping quality, and more products are coming to market all the time.
We will be analyzing data generated by a consumer-grade EEG headset– basically a device one wears when going to sleep, and via conductive pads in contact with the skin on the forehead, monitors the brainwaves and can determine their level of activity (especially in regard to whether they are asleep, and what level of sleep they are at).
The data was obtained from a live session (me, sleeping) during my initial polyphasic sleep adaptation a few years ago– so there'll be opportunities to see “normal, boring” sleep patterns, transitions, and even more optimized and sleep sessions (including rather restful 20-minute power naps).
The device used generated bytes of raw data, which I captured into individual data files. We will be learning how that data is structured so that we may parse it, and ultimately derive information such as sleep duration, type of sleep, etc.
We're just manipulating (reading/writing) bytes of data, and applying specific rules and methods to how we interpret various bytes, or sequences of bytes.
Once again there is a conceptual as well as practical angle… some people will struggle more with one over the other, and as always: questions are not just encouraged, they are expected for success!
EEG data is represented in the form of data packets– collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.).
It is important to note that the data, in cases of multibyte values, is little endian in orientation.
The format of the data packet is as follows:
Field | Length (bytes) | Description |
---|---|---|
'A' (0x41) | 1 | character starting the packet |
'4' (0x34) | 1 | the protocol “version”, of which only '4' is currently supported |
checksum | 1 | a one byte checksum formed by summing the identifier byte and all the data bytes |
msglen | 2 | a two byte message length (little endian). This length includes the size of the data payload plus the identifier |
inv_msglen | 2 | is the inverse of msglen sent for redundancy. If msglen does not match ~inv_msglen, we can start looking for the next packet immediately, instead of reading some arbitrary number of bytes, based on a bad length |
time_sec | 1 | the lower 8 bits of the current unix time (when session was recorded) |
sub_sec | 2 | the 16-bit sub-second (runs through 0xFFFF in 1 second), LSB first |
seqnum | 1 | the 8-bit sequence number |
datatype | 1 | the datatype (see data type subfield table below) |
datablock | variable | the array of binary data |
These are the data types generated by the EEG device and could manifest within the data file. Note that this data will be contained in the datatype field of the data packet, and any follow-up data will be present in the datablock array field.
Type ID (hex) | Type Name | Description |
---|---|---|
0x00 | event | an event has occured (see event table below) |
0x02 | slice_end | marks the end of a slice of data (a slice can span multiple packets) |
0x03 | version | version of the raw data output |
0x80 | waveform | raw time domain brainwave |
0x83 | frequency_bins | frequency bins derived from waveform |
0x84 | signal | signal quality range of waveform (0-30) |
0x8A | timestamp | full timestamp from EEG device's RTC |
0x97 | impedance | impedance across the headband |
0x9C | badsignal | signal contains artifacts |
0x9D | sleepstage | current sleep stage (produced in 30 second samples, see sleepstage table below) |
These are the possible events generated by the EEG device and could manifest within the data file. Note that this data will be contained in the datablock field (the array) of the data packet when the datatype has been identified as an event.
Event ID | Event Name | Description |
---|---|---|
0x05 | session_start | data acquisition session has commenced |
0x07 | sleep_start | user is asleep |
0x0E | headset_disengaged | EEG headset has been set on dock |
0x0F | headset_engaged | EEG headset taken off dock |
0x10 | alarm_off | user turned off alarm functionality |
0x11 | alarm_snooze | user hit enabled snooze delay on alarm functionality |
0x13 | alarm_play | set alarm is now going off |
0x15 | session_end | data acquisition session has ceased |
0x24 | headset_introduce | a new headband ID has been read |
These are the possible sleep stages recognized by the EEG device (this data will be located in the datablock field (the array) of the data packet when the data type has been identified as a sleepstage.
SleepStage ID | SleepStage Name | Description |
---|---|---|
0x00 | undefined | insufficient data to determine sleep stage |
0x01 | conscious | user is in an awakened state |
0x02 | rem | user is experiencing REM (Random Eye Movement) sleep |
0x03 | light | user is experiencing light sleep |
0x04 | deep | user is experiencing deep sleep (SWS) |
Frequency Bins are a measurement of the current waveform frequency being experienced, which is analyzed by the EEG device and factors into the Sleep Stage determination. This would be considered a more raw form of data, should additional analysis be desired.
ID | Named Range (Hz) | Description |
---|---|---|
0x00 | 2-4 | Delta |
0x01 | 4-8 | Theta |
0x02 | 8-13 | Alpha |
0x03 | 13-18 | Beta |
0x04 | 18-21 | Beta |
0x05 | 11-14 | Beta (sleep spindles) |
0x06 | 30-50 | Gamma |
With the use of a hex editor, we can manually identify and decode the EEG data packets, using the information provided above.
In the file session-201211020309.raw (November 2nd, 2012, core sleep session starting at 3:09am), the following data can be seen (snippeted from a bvi session):
00002580 3F 05 00 FA FF D7 0E 00 34 02 51 DA 12 00 41 34 7F 05 00 FA ?.......4.Q...A4.... 00002594 FF D8 06 00 35 8A D8 3A 93 50 41 34 06 05 00 FA FF D8 08 00 ....5..:.PA4........ 000025A8 36 03 03 00 00 00 41 34 40 05 00 FA FF D8 0E 00 37 02 52 DA 6.....A4@.......7.R. 000025BC 12 00 41 34 80 05 00 FA FF D9 04 00 38 8A D9 3A 93 50 41 34 ..A4........8..:.PA4
If you look over in the ASCII field on the far right of the line started by offset 00002594, you will see a “:.PA4”… according to the data packet field breakdown above, the start of the packet will be an 'A', followed by a '4'… so seeing a fairly isolated “A4” is an excellent indication we are looking at a new data packet.
bvi informs us that the lone “A4” 2-byte sequence ('A' byte followed by '4' byte) is at offset 0000259E.
The byte prior to the next “A4” (the next line– 000025A8) occurs at offset 000025AD.
It would seem (especially upon converting 259E and 25AD to decimal), there is a 15 byte difference (so a 16-byte duration) to this particular packet. Let's dig deeper…
First, to reduce analysis paralysis, let us extract specifically this byte.
We need the decimal equivalents of 259E and 25AD:
$ echo "ibase=16; 259E" | bc 9630 $ echo "ibase=16; 25AD" | bc 9645 $
And then, calculate their difference (how long is this packet):
$ echo "9645-9630" | bc 15 $
Okay, so we have a 15 bytes of data following offset 9630 (decimal). We need to remember to include the byte at offset 9630, so 15+1=16 total bytes in this packet. Let us extract just that packet into a file for further analysis:
$ dd if=session-201211020309.raw of=packet bs=1 skip=9630 count=16 16+0 records in 16+0 records out 16 bytes (16 B) copied, 0.141976 s, 0.1 kB/s $
Finally, let's get a hexdump and further decode this arbitrary packet:
$ od -A x -t x1z -v packet 000000 41 34 06 05 00 fa ff d8 08 00 36 03 03 00 00 00 >A4........6.....< 000010 $
Note that with our extraction from the data file, the original offset is no longer valid (we now have a file with JUST our packet in it, and our file begins at offset 0).
Okay… let's break this down (reference the info tables above):
According to this, our message length is 0x0005 (or 5 in decimal) bytes long.
If you have questions about bit inversions, it is merely flipping 0 to 1, and 1 to 0. In our 0x05 example, we have this:
The EEG device is inverting the message length data and placing them in our data packet so we can use it as a form of data validation, to make sure we're looking at a real packet (strategies like this are not uncommon– it is part of interfacing real world devices to the digital environments of computers).
And we see that the inverted message length checks out with the regular message length… we've passed one of the tests ensuring this is a valid packet.
It would seem the “message length” consists of the data type byte plus the length of the datablock. We see from the 2-byte message length sequence above that the msglen is 5 bytes… 1 of those bytes is the data type byte, which leaves 4 bytes remaining for the datablock array.
As it is multibyte, it needs to be treated as little endian (lower-order byte first, followed by upper-order bytes)… we see from our hex display there are 4 bytes remaining in our packet:
03 00 00 00
So, doing a straight reversal, that would give us: 00 00 00 03, a 32-bit (4-byte) value, containing the number 3, the apparent version of things (different from the packet format version above).
Let's address the checksum calculation skipped above… now that we know our data type + datablock bytes (all 5 of them), the checksum is calculated by adding together all 5 of those bytes (but only storing the result in a 1 byte storage space, which will likely mean wraparounds like it is nobody's business with more exotic values). Let's trace it out:
0x03 (data type) + 0x03 (first byte of datablock) + 0x00 (second byte of datablock) + 0x00 (third byte of datablock) + 0x00 (fourth byte of data block) = 0x06.
What was the value stored in the checksum field of the our extracted data packet (byte #2): 0x06. Aha! The sum of the data checks out (this is our other test to ensure packet data validity).
There we have it… one decoded packet, of potentially many.
Pretty awesome, right?
This week's project is located in the fall2015/sde0/ directory of the UNIX Public Directory, in an archive called: sleepfun.tar.bz2
Make a copy of this into your home directory somewhere and set to work.
NOTE: Hopefully it has been standard practice to locate project files in their own unique subdirectory, such as under src/datacomm/, where you can then add/commit/push the results to your repository (you ARE regularly putting stuff in your repository, aren't you?)
Upon extraction of the files in sleepfun.tar.bz2, you should have the following files:
Session files are named with the date and time of the start of the particular sleep session, encoded as follows (YYYYMMDDhhmm):
So, 201211020309 means 2012/11/02 at 3:09am was the recorded time of the start of this particular sleep session (I was exploring with a dual core sleep schedule around this time, so this would have been my 2nd core).