This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
haas:spring2015:unix:projects:udr2 [2015/03/16 10:54] – created wedge | haas:spring2015:unix:projects:udr2 [2015/03/30 16:39] (current) – [Errata] wedge | ||
---|---|---|---|
Line 11: | Line 11: | ||
Typos and bug fixes: | Typos and bug fixes: | ||
- | * no fixes of note | + | * bgrep was giving the address |
+ | * This should not change anything, save for saving you an additional calculation to determine the start of the packet. | ||
+ | * My aforementioned fix did not work, reverted **bgrep** to original version (20150324) | ||
+ | * Implemented new fix: **bgrep** should now be correctly reporting the starting address of the matched pattern -- no change on your part, just start using it (and be aware that the address represents the start of the pattern, and not the end) (20150330) | ||
=====Objective===== | =====Objective===== | ||
Continuing our "1337 haxxing" | Continuing our "1337 haxxing" | ||
Line 21: | Line 23: | ||
An electroencephalogram (EEG) is a test that detects electrical activity in your brain. Brain cells communicate via electrical impulses and are active all the time, even when you're asleep. This activity can be visualized as wavy lines on an EEG recording, but ultimately is sourced from raw bytes sampled from the device performing the data acquisition. | An electroencephalogram (EEG) is a test that detects electrical activity in your brain. Brain cells communicate via electrical impulses and are active all the time, even when you're asleep. This activity can be visualized as wavy lines on an EEG recording, but ultimately is sourced from raw bytes sampled from the device performing the data acquisition. | ||
- | Sleep | + | Sleep is a common area of study where this is particularly applicable, and is even somewhat of a modern day fad- smartphone apps to special wristbands can be used to monitor aspects of our sleeping quality, and more products are coming to market all the time. |
- | =====Obtain | + | We will be analyzing data generated by a consumer-grade EEG headset-- basically a device one wears when going to sleep, and via conductive pads in contact with the skin on the forehead, monitors the brainwaves and can determine their level of activity (especially in regard to whether they are asleep, and what level of sleep they are at). |
- | This week's project is located in the **spring2015/ | + | |
+ | The data was obtained from a live session (me, sleeping) during my initial polyphasic sleep adaptation a few years ago-- so there' | ||
+ | |||
+ | The device used generated bytes of raw data, which I captured into individual data files. We will be learning how that data is structured so that we may parse it, and ultimately derive information such as sleep duration, type of sleep, etc. | ||
+ | |||
+ | Like udr0 and udr1... we're just manipulating (reading/ | ||
+ | |||
+ | Once again there is a conceptual as well as practical angle... some people will struggle more with one over the other, and as always: questions are not just encouraged, they are expected for success! | ||
+ | =====EEG data packet format===== | ||
+ | EEG data is represented in the form of data packets-- collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.). | ||
+ | |||
+ | It is important to note that the data, in cases of multibyte values, is little endian in orientation. | ||
+ | |||
+ | The format of the data packet is as follows: | ||
+ | |||
+ | ====Data Packet==== | ||
+ | |||
+ | ^ Field ^ Length (bytes) | ||
+ | | ' | ||
+ | | ' | ||
+ | | checksum | ||
+ | | msglen | ||
+ | | inv_msglen | ||
+ | | time_sec | ||
+ | | sub_sec | ||
+ | | seqnum | ||
+ | | datatype | ||
+ | | datablock | ||
+ | |||
+ | |||
+ | ====Data Type Subfield of Data Packet==== | ||
+ | These are the data types generated by the EEG device and could manifest within the data file. Note that this data will be contained in the **datatype** field of the data packet, and any follow-up data will be present in the **datablock** array field. | ||
+ | |||
+ | ^ Type ID (hex) ^ Type Name ^ Description | ||
+ | | 0x00 | event | an event has occured (see event table below) | ||
+ | | 0x02 | slice_end | ||
+ | | 0x03 | version | ||
+ | | 0x80 | waveform | ||
+ | | 0x83 | frequency_bins | ||
+ | | 0x84 | signal | ||
+ | | 0x8A | timestamp | ||
+ | | 0x97 | impedance | ||
+ | | 0x9C | badsignal | ||
+ | | 0x9D | sleepstage | ||
+ | |||
+ | ====Event table==== | ||
+ | |||
+ | These are the possible events generated by the EEG device and could manifest within the data file. Note that this data will be contained in the **datablock** field (the array) of the data packet when the datatype has been identified as an **event**. | ||
+ | |||
+ | ^ Event ID ^ Event Name ^ Description | ||
+ | | 0x05 | session_start | ||
+ | | 0x07 | sleep_start | ||
+ | | 0x0E | headset_disengaged | ||
+ | | 0x0F | headset_engaged | ||
+ | | 0x10 | alarm_off | ||
+ | | 0x11 | alarm_snooze | ||
+ | | 0x13 | alarm_play | ||
+ | | 0x15 | session_end | ||
+ | | 0x24 | headset_introduce | ||
+ | |||
+ | ====Sleep Stage table==== | ||
+ | These are the possible sleep stages recognized by the EEG device (this data will be located in the **datablock** field (the array) of the data packet when the data type has been identified as a **sleepstage**. | ||
+ | |||
+ | ^ SleepStage ID ^ SleepStage Name ^ Description | ||
+ | | 0x00 | undefined | ||
+ | | 0x01 | conscious | ||
+ | | 0x02 | rem | user is experiencing REM (Random Eye Movement) sleep | | ||
+ | | 0x03 | light | user is experiencing light sleep | | ||
+ | | 0x04 | deep | user is experiencing deep sleep (SWS) | | ||
+ | |||
+ | ====Frequency Bins table==== | ||
+ | Frequency Bins are a measurement of the current waveform frequency being experienced, | ||
+ | |||
+ | ^ ID ^ Named Range (Hz) ^ Description | ||
+ | | 0x00 | 2-4 | Delta | | ||
+ | | 0x01 | 4-8 | Theta | | ||
+ | | 0x02 | 8-13 | Alpha | | ||
+ | | 0x03 | 13-18 | Beta | | ||
+ | | 0x04 | 18-21 | Beta | | ||
+ | | 0x05 | 11-14 | Beta (sleep spindles) | ||
+ | | 0x06 | 30-50 | Gamma | | ||
+ | |||
+ | =====Example Analysis===== | ||
+ | With the use of a hex editor, we can manually identify and decode the EEG data packets, using the information provided above. | ||
+ | |||
+ | In the file **session-201211020309.raw** (November 2nd, 2012, core sleep session starting at 3:09am), the following data can be seen (snippeted from a **bvi** session): | ||
+ | |||
+ | < | ||
+ | 00002580 | ||
+ | 00002594 | ||
+ | 000025A8 | ||
+ | 000025BC | ||
+ | </ | ||
+ | |||
+ | If you look over in the ASCII field on the far right of the line started by offset **00002594**, | ||
+ | |||
+ | **bvi** informs us that the lone " | ||
+ | |||
+ | The byte prior to the next " | ||
+ | |||
+ | It would seem (especially upon converting 259E and 25AD to decimal), there is a 15 byte difference (so a 16-byte duration) to this particular packet. Let's dig deeper... | ||
+ | |||
+ | First, to reduce analysis paralysis, let us extract specifically this byte. | ||
+ | |||
+ | We need the decimal equivalents of 259E and 25AD: | ||
+ | |||
+ | < | ||
+ | $ echo " | ||
+ | 9630 | ||
+ | $ echo " | ||
+ | 9645 | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | And then, calculate their difference (how long is this packet): | ||
+ | |||
+ | < | ||
+ | $ echo " | ||
+ | 15 | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | Okay, so we have a 15 bytes of data following offset 9630 (decimal). We need to remember to include the byte at offset 9630, so 15+1=16 total bytes in this packet. Let us extract just that packet into a file for further analysis: | ||
+ | |||
+ | < | ||
+ | $ dd if=session-201211020309.raw of=packet bs=1 skip=9630 count=16 | ||
+ | 16+0 records in | ||
+ | 16+0 records out | ||
+ | 16 bytes (16 B) copied, 0.141976 s, 0.1 kB/s | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | Finally, let's get a hexdump and further decode this arbitrary packet: | ||
+ | |||
+ | < | ||
+ | $ od -A x -t x1z -v packet | ||
+ | 000000 41 34 06 05 00 fa ff d8 08 00 36 03 03 00 00 00 > | ||
+ | 000010 | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | Note that with our extraction from the data file, the original offset is no longer valid (we now have a file with JUST our packet in it, and our file begins at offset 0). | ||
+ | |||
+ | Okay... let's break this down (reference the info tables above): | ||
+ | |||
+ | * byte 0: packet start (0x41 -- ' | ||
+ | * byte 1: protocol version (0x34 -- ' | ||
+ | * byte 2: checksum-- see below for calculation (0x06) | ||
+ | * byte 3: lower-order byte of message length (0x05) | ||
+ | * byte 4: upper-order byte of message length (0x00) | ||
+ | |||
+ | According to this, our message length is 0x0005 (or 5 in decimal) bytes long. | ||
+ | |||
+ | * byte 5: lower-order inverted byte of message length (was 0x05 above, should be 0xFA) | ||
+ | * byte 6: upper-order inverted byte of message length (was 0x00 above, should be 0xFF) | ||
+ | |||
+ | If you have questions about bit inversions, it is merely flipping 0 to 1, and 1 to 0. In our 0x05 example, we have this: | ||
+ | |||
+ | * normal: 00000101 (05) or 0000 (0) 0101 (5) | ||
+ | * inverted: 11111010 (FA) or 1111 (F) 1010 (A) | ||
+ | |||
+ | The EEG device is inverting the message length data and placing them in our data packet so we can use it as a form of data validation, to make sure we're looking at a real packet (strategies like this are not uncommon-- it is part of interfacing real world devices to the digital environments of computers). | ||
+ | |||
+ | And we see that the inverted message length checks out with the regular message length... we've passed one of the tests ensuring this is a valid packet. | ||
+ | |||
+ | * byte 7: lower-order byte of 32-bit UNIX time (0xd8) -- this will make more sense in the context of the actual time (once known) | ||
+ | * byte 8: lower-order byte of subsecond (0x08) | ||
+ | * byte 9: upper-order byte of subsecond (0x00) | ||
+ | * byte 10: sequence number (0x36) | ||
+ | * byte 11: data type (0x03) -- according to the table, 0x03 is a ' | ||
+ | * byte 12: datablock (msglen-1) | ||
+ | |||
+ | It would seem the " | ||
+ | |||
+ | As it is multibyte, it needs to be treated as little endian (lower-order byte first, followed by upper-order bytes)... we see from our hex display there are 4 bytes remaining in our packet: | ||
+ | |||
+ | < | ||
+ | 03 00 00 00 | ||
+ | </ | ||
+ | |||
+ | So, doing a straight reversal, that would give us: **00 00 00 03**, a 32-bit (4-byte) value, containing the number **3**, the apparent version of things (different from the packet format version above). | ||
+ | |||
+ | Let's address the checksum calculation skipped above... now that we know our data type + datablock bytes (all 5 of them), the checksum is calculated by adding together all 5 of those bytes (but only storing the result in a 1 byte storage space, which will likely mean wraparounds like it is nobody' | ||
+ | |||
+ | 0x03 (data type) + 0x03 (first byte of datablock) + 0x00 (second byte of datablock) + 0x00 (third byte of datablock) + 0x00 (fourth byte of data block) = 0x06. | ||
+ | |||
+ | What was the value stored in the checksum field of the our extracted data packet (byte #2): 0x06. Aha! The sum of the data checks out (this is our other test to ensure packet data validity). | ||
+ | |||
+ | There we have it... one decoded packet, of potentially many. | ||
+ | |||
+ | Pretty awesome, right? | ||
+ | =====Obtain the files===== | ||
+ | This week's project is located in the **spring2015/ | ||
Make a copy of this into your home directory somewhere and set to work. | Make a copy of this into your home directory somewhere and set to work. | ||
**NOTE:** Hopefully it has been standard practice to locate project files in their own unique subdirectory, | **NOTE:** Hopefully it has been standard practice to locate project files in their own unique subdirectory, | ||
- | |||
- | =====Process===== | ||
- | The data you seek (2 files) is obfuscated and contained within this file. | ||
- | Plain text directions give clues on how to find both pieces | + | =====Data Files===== |
+ | Upon extraction | ||
- | Some additional information: | + | * session-201211020309.raw (5866460 bytes, or 5.6MB) -- core sleep session |
+ | * session-201301041418.raw (360135 bytes) -- nap | ||
+ | * session-201301311908.raw (4955855 bytes) -- core sleep session | ||
+ | * session-201302010218.raw (2719296 bytes) -- core sleep session | ||
+ | * session-201302200614.raw (524705 bytes) -- nap | ||
+ | * session-201303051015.raw (511190 bytes) -- nap | ||
- | * The first file should be named **udr1.text** | + | Session files are named with the date and time of the start of the particular sleep session, encoded as follows |
- | * The second (big) file runs from the starting point until the very end of the file | + | |
- | * It should be named ' | + | |
- | * gizmo is binary data, and entirely reversed- you need to get its bytes back in order (last byte should be first byte, 2nd to last should be 2nd, etc.) | + | |
- | * You are to write a shell script to perform the de-reversal of the data, reading from data.file and through whatever processing is needed, produce the file called **gizmo**. | + | |
- | * The **urev** tool has some additional constraints with respect to gizmo... running it should notify you of any details you are lacking. | + | |
+ | * YYYY - 4-digit year (2012) | ||
+ | * MM - 2-digit month (11) | ||
+ | * DD - 2-digit day (02) | ||
+ | * hh - 2-digit hour (24-hour time, so 03 means 3am) | ||
+ | * mm - 2-digit minute (09) | ||
+ | |||
+ | So, 201211020309 means 2012/11/02 at 3:09am was the recorded time of the start of this particular sleep session (I was exploring with a dual core sleep schedule around this time, so this would have been my 2nd core). | ||
+ | |||
+ | =====Task===== | ||
+ | With the provided data files, I'd like for you to do the following (be sure to provide commands for each as well as the answer you got): | ||
+ | |||
+ | * determine the number of data packets in each file | ||
+ | * determine the total time elapsed in the session file | ||
+ | * determine the total time in a sleep state (not undefined, not conscious) | ||
+ | * find a data packet during a time of rem or deep sleep that stores the complete timestamp, and: | ||
+ | * extract that packet from the pertinent data file (provide command) | ||
+ | * what is the timestamp (as a 32-bit value) | ||
+ | * what is the calendar date and time of that timestamp, when appropriately translated? | ||
+ | * which file had the most deep sleep? | ||
+ | * how much took place? | ||
+ | * how did you figure this out? | ||
+ | * what was the approximate time? | ||
=====Useful tools===== | =====Useful tools===== | ||
You may want to become familiar with the manual pages of the following tools (in addition to tools you've already encountered): | You may want to become familiar with the manual pages of the following tools (in addition to tools you've already encountered): | ||
Line 49: | Line 263: | ||
* **dd**(1) | * **dd**(1) | ||
* **bc**(1) | * **bc**(1) | ||
- | * **du**(1) | + | * **od**(1) - as I've said to others, |
- | | + | |
- | | + | |
* **bvi**(1) | * **bvi**(1) | ||
* **hexedit**(1) | * **hexedit**(1) | ||
+ | * **grep**(1) - can be contorted to cooperate | ||
+ | * **date**(1) - might be useful for time/date manipulations | ||
+ | * **bgrep** (see below for usage) | ||
... along with other tools previously encountered. | ... along with other tools previously encountered. | ||
+ | ====bgrep==== | ||
+ | To assist you with this project, a special " | ||
+ | |||
+ | It supports space-separated (or not) bytes of data, and even allows the use of ' | ||
+ | |||
+ | ===Example Usage=== | ||
+ | Let's say you wanted to search for the consecutive bytes 0x12 and 0x34 within a binary file: | ||
+ | |||
+ | <cli> | ||
+ | $ cat session-201302200614.raw | bgrep '12 34' | ||
+ | 533b:12 34 | ||
+ | 29af3:12 34 | ||
+ | 29dff:12 34 | ||
+ | 29f85:12 34 | ||
+ | 2a8a9:12 34 | ||
+ | 2aa2f:12 34 | ||
+ | 2abb5:12 34 | ||
+ | 2aec1:12 34 | ||
+ | 2b353:12 34 | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | What you see are the addresses (in hex) that denote the start of this requested pattern (0x12 immediately followed by 0x34). | ||
+ | |||
+ | If you wanted 0x12 followed by anything, followed by 0x34, we'd do: | ||
+ | |||
+ | <cli> | ||
+ | $ cat session-201302200614.raw | bgrep '12 .. 45' | ||
+ | 3326:12 e0 45 | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | In this case, there is only one such match in the entire file. | ||
+ | |||
+ | The ' | ||
+ | |||
+ | <cli> | ||
+ | $ cat session-201302200614.raw | bgrep '12 e.' | ||
+ | 1cf4:12 ee | ||
+ | 206d:12 e0 | ||
+ | 3325:12 e0 | ||
+ | 3907:12 e0 | ||
+ | 4077:12 e0 | ||
+ | 4795:12 e0 | ||
+ | 50a1:12 e0 | ||
+ | 552b:12 e0 | ||
+ | 5edb:12 e0 | ||
+ | 73e7:12 e0 | ||
+ | 81b9:12 e0 | ||
+ | 8df9:12 e0 | ||
+ | 8fcf:12 e0 | ||
+ | aae3:12 e0 | ||
+ | aae7:12 e0 | ||
+ | b859:12 e0 | ||
+ | 3415c:12 e9 | ||
+ | 4e11f:12 e0 | ||
+ | 6bd5b:12 ed | ||
+ | 796f7:12 e0 | ||
+ | 7b877:12 e0 | ||
+ | 7d3df:12 e0 | ||
+ | 7e7e1:12 e0 | ||
+ | 7e7f5:12 e0 | ||
+ | 7ecf7:12 e0 | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | We can see variations in the lower 4-bits as it matches our desired pattern. | ||
+ | |||
+ | Finally, upper 4-bits can be anything, lower 4 must be 0xc, followed by 0x23: | ||
+ | |||
+ | <cli> | ||
+ | $ cat session-201302200614.raw | bgrep ' | ||
+ | 91c1:3c 34 | ||
+ | 29029:8c 34 | ||
+ | 297e5:0c 34 | ||
+ | 322d3:ec 34 | ||
+ | 6152b:dc 34 | ||
+ | 6a683:0c 34 | ||
+ | 6ef95:6c 34 | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | Notice in this last pattern, we opted not to space separate the pattern... it works either way (output will be space-separated regardless). | ||
+ | |||
+ | This will hopefully prove to be a useful tool in your binary analysis endeavors. | ||
=====Submission===== | =====Submission===== | ||
Successful completion will result in the following criteria being met: | Successful completion will result in the following criteria being met: | ||
- | | + | * When all is said and done, you will submit: |
- | * You have completed all weekly exercises (96, I think) before the deadline, being mindful of the intentionally-paced nature of urev. | + | * **udr2.text**, containing |
- | * Bonus opportunity: | + | |
- | | + | |
- | * **udr1.text** | + | |
- | * Append | + | |
- | * your bash script enabling the processing of data.file to produce gizmo | + | |
- | * Be sure to include comments indicating the reasoning behind actions taken | + | |
- | * Your extracted/ | + | |
====Submit==== | ====Submit==== | ||
Please submit as follows: | Please submit as follows: | ||
<cli> | <cli> | ||
- | lab46: | + | lab46: |
- | Submitting unix project "udr1": | + | Submitting unix project "udr2": |
- | -> udr1.text(OK) | + | -> udr2.text(OK) |
- | -> getgizmo.bash(OK) | + | |
- | -> gizmo(OK) | + | |
SUCCESSFULLY SUBMITTED | SUCCESSFULLY SUBMITTED | ||
- | lab46: | + | lab46: |
</ | </ |