User Tools

Site Tools


haas:spring2020:comporg:projects:sleepdata

Corning Community College

CSCS2700 Data Communications

Sleep Data Exploration

Project: SLEEP DATA EXPLORATION (sde0)

Errata

Typos and bug fixes:

  • <DESCRIPTION> (DATESTRING)

Background

An electroencephalogram (EEG) is a test that detects electrical activity in your brain. Brain cells communicate via electrical impulses and are active all the time, even when you're asleep. This activity can be visualized as wavy lines on an EEG recording, but ultimately is sourced from raw bytes sampled from the device performing the data acquisition.

Sleep is a common area of study where this is particularly applicable, and is even somewhat of a modern day fad- smartphone apps to special wristbands can be used to monitor aspects of our sleeping quality, and more products are coming to market all the time.

We will be analyzing data generated by a consumer-grade EEG headset– basically a device one wears when going to sleep, and via conductive pads in contact with the skin on the forehead, monitors the brainwaves and can determine their level of activity (especially in regard to whether they are asleep, and what level of sleep they are at).

The data was obtained from a live session (me, sleeping) during my initial polyphasic sleep adaptation a few years ago– so there'll be opportunities to see “normal, boring” sleep patterns, transitions, and even more optimized and sleep sessions (including rather restful 20-minute power naps).

The device used generated bytes of raw data, which I captured into individual data files. We will be learning how that data is structured so that we may parse it, and ultimately derive information such as sleep duration, type of sleep, etc.

We're just manipulating (reading/writing) bytes of data, and applying specific rules and methods to how we interpret various bytes, or sequences of bytes.

Once again there is a conceptual as well as practical angle… some people will struggle more with one over the other, and as always: questions are not just encouraged, they are expected for success!

EEG data packet format

EEG data is represented in the form of data packets– collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.).

It is important to note that the data, in cases of multibyte values, is little endian in orientation.

The format of the data packet is as follows:

Data Packet

Field Length (bytes) Description
'A' (0x41) 1 character starting the packet
'4' (0x34) 1 the protocol “version”, of which only '4' is currently supported
checksum 1 a one byte checksum formed by summing the identifier byte and all the data bytes
msglen 2 a two byte message length (little endian). This length includes the size of the data payload plus the identifier
inv_msglen 2 is the inverse of msglen sent for redundancy. If msglen does not match ~inv_msglen, we can start looking for the next packet immediately, instead of reading some arbitrary number of bytes, based on a bad length
time_sec 1 the lower 8 bits of the current unix time (when session was recorded)
sub_sec 2 the 16-bit sub-second (runs through 0xFFFF in 1 second), LSB first
seqnum 1 the 8-bit sequence number
datatype 1 the datatype (see data type subfield table below)
datablock variable the array of binary data

Data Type Subfield of Data Packet

These are the data types generated by the EEG device and could manifest within the data file. Note that this data will be contained in the datatype field of the data packet, and any follow-up data will be present in the datablock array field.

Type ID (hex) Type Name Description
0x00 event an event has occured (see event table below)
0x02 slice_end marks the end of a slice of data (a slice can span multiple packets)
0x03 version version of the raw data output
0x80 waveform raw time domain brainwave
0x83 frequency_bins frequency bins derived from waveform
0x84 signal signal quality range of waveform (0-30)
0x8A timestamp full timestamp from EEG device's RTC
0x97 impedance impedance across the headband
0x9C badsignal signal contains artifacts
0x9D sleepstage current sleep stage (produced in 30 second samples, see sleepstage table below)

Event table

These are the possible events generated by the EEG device and could manifest within the data file. Note that this data will be contained in the datablock field (the array) of the data packet when the datatype has been identified as an event.

Event ID Event Name Description
0x05 session_start data acquisition session has commenced
0x07 sleep_start user is asleep
0x0E headset_disengaged EEG headset has been set on dock
0x0F headset_engaged EEG headset taken off dock
0x10 alarm_off user turned off alarm functionality
0x11 alarm_snooze user hit enabled snooze delay on alarm functionality
0x13 alarm_play set alarm is now going off
0x15 session_end data acquisition session has ceased
0x24 headset_introduce a new headband ID has been read

Sleep Stage table

These are the possible sleep stages recognized by the EEG device (this data will be located in the datablock field (the array) of the data packet when the data type has been identified as a sleepstage.

SleepStage ID SleepStage Name Description
0x00 undefined insufficient data to determine sleep stage
0x01 conscious user is in an awakened state
0x02 rem user is experiencing REM (Random Eye Movement) sleep
0x03 light user is experiencing light sleep
0x04 deep user is experiencing deep sleep (SWS)

Frequency Bins table

Frequency Bins are a measurement of the current waveform frequency being experienced, which is analyzed by the EEG device and factors into the Sleep Stage determination. This would be considered a more raw form of data, should additional analysis be desired.

ID Named Range (Hz) Description
0x00 2-4 Delta
0x01 4-8 Theta
0x02 8-13 Alpha
0x03 13-18 Beta
0x04 18-21 Beta
0x05 11-14 Beta (sleep spindles)
0x06 30-50 Gamma

Example Analysis

With the use of a hex editor, we can manually identify and decode the EEG data packets, using the information provided above.

In the file session-201211020309.raw (November 2nd, 2012, core sleep session starting at 3:09am), the following data can be seen (snippeted from a bvi session):

00002580  3F 05 00 FA FF D7 0E 00 34 02 51 DA 12 00 41 34 7F 05 00 FA ?.......4.Q...A4....
00002594  FF D8 06 00 35 8A D8 3A 93 50 41 34 06 05 00 FA FF D8 08 00 ....5..:.PA4........
000025A8  36 03 03 00 00 00 41 34 40 05 00 FA FF D8 0E 00 37 02 52 DA 6.....A4@.......7.R.
000025BC  12 00 41 34 80 05 00 FA FF D9 04 00 38 8A D9 3A 93 50 41 34 ..A4........8..:.PA4

If you look over in the ASCII field on the far right of the line started by offset 00002594, you will see a “:.PA4”… according to the data packet field breakdown above, the start of the packet will be an 'A', followed by a '4'… so seeing a fairly isolated “A4” is an excellent indication we are looking at a new data packet.

bvi informs us that the lone “A4” 2-byte sequence ('A' byte followed by '4' byte) is at offset 0000259E.

The byte prior to the next “A4” (the next line– 000025A8) occurs at offset 000025AD.

It would seem (especially upon converting 259E and 25AD to decimal), there is a 15 byte difference (so a 16-byte duration) to this particular packet. Let's dig deeper…

First, to reduce analysis paralysis, let us extract specifically this byte.

We need the decimal equivalents of 259E and 25AD:

$ echo "ibase=16; 259E" | bc
9630
$ echo "ibase=16; 25AD" | bc
9645
$ 

And then, calculate their difference (how long is this packet):

$ echo "9645-9630" | bc
15
$ 

Okay, so we have a 15 bytes of data following offset 9630 (decimal). We need to remember to include the byte at offset 9630, so 15+1=16 total bytes in this packet. Let us extract just that packet into a file for further analysis:

$ dd if=session-201211020309.raw of=packet bs=1 skip=9630 count=16
16+0 records in
16+0 records out
16 bytes (16 B) copied, 0.141976 s, 0.1 kB/s
$ 

Finally, let's get a hexdump and further decode this arbitrary packet:

$ od -A x -t x1z -v packet 
000000 41 34 06 05 00 fa ff d8 08 00 36 03 03 00 00 00  >A4........6.....<
000010
$ 

Note that with our extraction from the data file, the original offset is no longer valid (we now have a file with JUST our packet in it, and our file begins at offset 0).

Okay… let's break this down (reference the info tables above):

  • byte 0: packet start (0x41 – 'A')
  • byte 1: protocol version (0x34 – '4')
  • byte 2: checksum– see below for calculation (0x06)
  • byte 3: lower-order byte of message length (0x05)
  • byte 4: upper-order byte of message length (0x00)

According to this, our message length is 0x0005 (or 5 in decimal) bytes long.

  • byte 5: lower-order inverted byte of message length (was 0x05 above, should be 0xFA)
  • byte 6: upper-order inverted byte of message length (was 0x00 above, should be 0xFF)

If you have questions about bit inversions, it is merely flipping 0 to 1, and 1 to 0. In our 0x05 example, we have this:

  • normal: 00000101 (05) or 0000 (0) 0101 (5)
  • inverted: 11111010 (FA) or 1111 (F) 1010 (A)

The EEG device is inverting the message length data and placing them in our data packet so we can use it as a form of data validation, to make sure we're looking at a real packet (strategies like this are not uncommon– it is part of interfacing real world devices to the digital environments of computers).

And we see that the inverted message length checks out with the regular message length… we've passed one of the tests ensuring this is a valid packet.

  • byte 7: lower-order byte of 32-bit UNIX time (0xd8) – this will make more sense in the context of the actual time (once known)
  • byte 8: lower-order byte of subsecond (0x08)
  • byte 9: upper-order byte of subsecond (0x00)
  • byte 10: sequence number (0x36)
  • byte 11: data type (0x03) – according to the table, 0x03 is a 'version'
  • byte 12: datablock (msglen-1)

It would seem the “message length” consists of the data type byte plus the length of the datablock. We see from the 2-byte message length sequence above that the msglen is 5 bytes… 1 of those bytes is the data type byte, which leaves 4 bytes remaining for the datablock array.

As it is multibyte, it needs to be treated as little endian (lower-order byte first, followed by upper-order bytes)… we see from our hex display there are 4 bytes remaining in our packet:

03 00 00 00

So, doing a straight reversal, that would give us: 00 00 00 03, a 32-bit (4-byte) value, containing the number 3, the apparent version of things (different from the packet format version above).

Let's address the checksum calculation skipped above… now that we know our data type + datablock bytes (all 5 of them), the checksum is calculated by adding together all 5 of those bytes (but only storing the result in a 1 byte storage space, which will likely mean wraparounds like it is nobody's business with more exotic values). Let's trace it out:

0x03 (data type) + 0x03 (first byte of datablock) + 0x00 (second byte of datablock) + 0x00 (third byte of datablock) + 0x00 (fourth byte of data block) = 0x06.

What was the value stored in the checksum field of the our extracted data packet (byte #2): 0x06. Aha! The sum of the data checks out (this is our other test to ensure packet data validity).

There we have it… one decoded packet, of potentially many.

Pretty awesome, right?

Obtain the files

This week's project is located in the fall2015/sde0/ directory of the UNIX Public Directory, in an archive called: sleepfun.tar.bz2

Make a copy of this into your home directory somewhere and set to work.

NOTE: Hopefully it has been standard practice to locate project files in their own unique subdirectory, such as under src/datacomm/, where you can then add/commit/push the results to your repository (you ARE regularly putting stuff in your repository, aren't you?)

Data Files

Upon extraction of the files in sleepfun.tar.bz2, you should have the following files:

  • session-201211020309.raw (5866460 bytes, or 5.6MB) – core sleep session
  • session-201301041418.raw (360135 bytes) – nap
  • session-201301311908.raw (4955855 bytes) – core sleep session
  • session-201302010218.raw (2719296 bytes) – core sleep session
  • session-201302200614.raw (524705 bytes) – nap
  • session-201303051015.raw (511190 bytes) – nap

Session files are named with the date and time of the start of the particular sleep session, encoded as follows (YYYYMMDDhhmm):

  • YYYY - 4-digit year (2012)
  • MM - 2-digit month (11)
  • DD - 2-digit day (02)
  • hh - 2-digit hour (24-hour time, so 03 means 3am)
  • mm - 2-digit minute (09)

So, 201211020309 means 2012/11/02 at 3:09am was the recorded time of the start of this particular sleep session (I was exploring with a dual core sleep schedule around this time, so this would have been my 2nd core).

haas/spring2020/comporg/projects/sleepdata.txt · Last modified: 2015/09/02 14:38 by 127.0.0.1