User Tools

Site Tools


haas:spring2015:unix:projects:udr2

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
haas:spring2015:unix:projects:udr2 [2015/03/16 11:37] – [Obtain the file] wedgehaas:spring2015:unix:projects:udr2 [2015/03/30 16:39] (current) – [Errata] wedge
Line 11: Line 11:
 Typos and bug fixes: Typos and bug fixes:
  
-  * no fixes of note +  * bgrep was giving the address of the last byte matched of a pattern, vs. the address of the start of the matched pattern (the intended action). This has been corrected, and **bgrep** has been updated. (20150322) 
 +    * This should not change anything, save for saving you an additional calculation to determine the start of the packet. 
 +  * My aforementioned fix did not work, reverted **bgrep** to original version (20150324) 
 +  * Implemented new fix: **bgrep** should now be correctly reporting the starting address of the matched pattern -- no change on your part, just start using it (and be aware that the address represents the start of the pattern, and not the end) (20150330)
 =====Objective===== =====Objective=====
 Continuing our "1337 haxxing" series of projects, we've found considerable conceptual self-imposed roadblocks blocking our employment of otherwise simple computing properties (that data is a series of bytes, and ultimately, that **everything is a file**). Continuing our "1337 haxxing" series of projects, we've found considerable conceptual self-imposed roadblocks blocking our employment of otherwise simple computing properties (that data is a series of bytes, and ultimately, that **everything is a file**).
Line 33: Line 35:
 Once again there is a conceptual as well as practical angle... some people will struggle more with one over the other, and as always: questions are not just encouraged, they are expected for success! Once again there is a conceptual as well as practical angle... some people will struggle more with one over the other, and as always: questions are not just encouraged, they are expected for success!
 =====EEG data packet format===== =====EEG data packet format=====
-EEG data is represented in the form of data packets-- collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.). The format of the data packet is as follows:+EEG data is represented in the form of data packets-- collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.). 
 + 
 +It is important to note that the data, in cases of multibyte values, is little endian in orientation. 
 + 
 +The format of the data packet is as follows: 
 + 
 +====Data Packet====
  
 ^  Field  ^  Length (bytes)  ^  Description  |AncllLLTttsid ^  Field  ^  Length (bytes)  ^  Description  |AncllLLTttsid
 |  'A' (0x41)  |  1  |  character starting the packet  | |  'A' (0x41)  |  1  |  character starting the packet  |
-|  4 (0x04)  |  1  |  the protocol “version”, of which only 4 is currently supported  |+|  '4(0x34)  |  1  |  the protocol “version”, of which only '4is currently supported  |
 |  checksum  |  1  |  a one byte checksum formed by summing the identifier byte and all the data bytes  | |  checksum  |  1  |  a one byte checksum formed by summing the identifier byte and all the data bytes  |
 |  msglen  |  2  |  a two byte message length (little endian). This length includes the size of the data payload plus the identifier  | |  msglen  |  2  |  a two byte message length (little endian). This length includes the size of the data payload plus the identifier  |
Line 44: Line 52:
 |  sub_sec  |  2  |  the 16-bit sub-second (runs through 0xFFFF in 1 second), LSB first  | |  sub_sec  |  2  |  the 16-bit sub-second (runs through 0xFFFF in 1 second), LSB first  |
 |  seqnum  |  1  |  the 8-bit sequence number  | |  seqnum  |  1  |  the 8-bit sequence number  |
-                i is the datatype +|  datatype  |  1  |  the datatype (see data type subfield table below)  | 
-                d is the array of binary data+|  datablock  |  variable  |  the array of binary data  |
  
  
 +====Data Type Subfield of Data Packet====
 +These are the data types generated by the EEG device and could manifest within the data file. Note that this data will be contained in the **datatype** field of the data packet, and any follow-up data will be present in the **datablock** array field.
  
-=====Obtain the file===== +^  Type ID (hex)  ^  Type Name  ^  Description 
-This week's project is located in the **spring2015/udr1/** directory of the UNIX Public Directory, in a file called: **data.file**+|  0x00  |  event  |  an event has occured (see event table below) 
 +|  0x02  |  slice_end  |  marks the end of a slice of data (a slice can span multiple packets) 
 +|  0x03  |  version  |  version of the raw data output 
 +|  0x80  |  waveform  |  raw time domain brainwave 
 +|  0x83  |  frequency_bins  |  frequency bins derived from waveform 
 +|  0x84  |  signal  |  signal quality range of waveform (0-30) 
 +|  0x8A  |  timestamp  |  full timestamp from EEG device's RTC  | 
 +|  0x97  |  impedance  |  impedance across the headband 
 +|  0x9C  |  badsignal  |  signal contains artifacts 
 +|  0x9D  |  sleepstage  |  current sleep stage (produced in 30 second samples, see sleepstage table below) 
 + 
 +====Event table==== 
 + 
 +These are the possible events generated by the EEG device and could manifest within the data file. Note that this data will be contained in the **datablock** field (the array) of the data packet when the datatype has been identified as an **event**. 
 + 
 +^  Event ID  ^  Event Name  ^  Description 
 +|  0x05  |  session_start  |  data acquisition session has commenced 
 +|  0x07  |  sleep_start  |  user is asleep 
 +|  0x0E  |  headset_disengaged  |  EEG headset has been set on dock  | 
 +|  0x0F  |  headset_engaged  |  EEG headset taken off dock  | 
 +|  0x10  |  alarm_off  |  user turned off alarm functionality 
 +|  0x11  |  alarm_snooze  |  user hit enabled snooze delay on alarm functionality 
 +|  0x13  |  alarm_play  |  set alarm is now going off  | 
 +|  0x15  |  session_end  |  data acquisition session has ceased 
 +|  0x24  |  headset_introduce  |  a new headband ID has been read  | 
 + 
 +====Sleep Stage table==== 
 +These are the possible sleep stages recognized by the EEG device (this data will be located in the **datablock** field (the array) of the data packet when the data type has been identified as a **sleepstage**. 
 + 
 +^  SleepStage ID  ^  SleepStage Name  ^  Description 
 +|  0x00  |  undefined  | insufficient data to determine sleep stage  | 
 +|  0x01  |  conscious  | user is in an awakened state  | 
 +|  0x02  |  rem  |  user is experiencing REM (Random Eye Movement) sleep  | 
 +|  0x03  |  light  |  user is experiencing light sleep  | 
 +|  0x04  |  deep  |  user is experiencing deep sleep (SWS)  |  
 + 
 +====Frequency Bins table==== 
 +Frequency Bins are a measurement of the current waveform frequency being experienced, which is analyzed by the EEG device and factors into the Sleep Stage determination. This would be considered a more raw form of data, should additional analysis be desired. 
 + 
 +^  ID  ^  Named Range (Hz)  ^  Description 
 +|  0x00  |  2-4  |  Delta  | 
 +|  0x01  |  4-8  |  Theta  | 
 +|  0x02  |  8-13  |  Alpha  | 
 +|  0x03  |  13-18  |  Beta  | 
 +|  0x04  |  18-21  |  Beta  | 
 +|  0x05  |  11-14  |  Beta (sleep spindles) 
 +|  0x06  |  30-50  |  Gamma  | 
 + 
 +=====Example Analysis===== 
 +With the use of a hex editor, we can manually identify and decode the EEG data packets, using the information provided above. 
 + 
 +In the file **session-201211020309.raw** (November 2nd, 2012, core sleep session starting at 3:09am), the following data can be seen (snippeted from a **bvi** session): 
 + 
 +<cli> 
 +00002580  3F 05 00 FA FF D7 0E 00 34 02 51 DA 12 00 41 34 7F 05 00 FA ?.......4.Q...A4.... 
 +00002594  FF D8 06 00 35 8A D8 3A 93 50 41 34 06 05 00 FA FF D8 08 00 ....5..:.PA4........ 
 +000025A8  36 03 03 00 00 00 41 34 40 05 00 FA FF D8 0E 00 37 02 52 DA 6.....A4@.......7.R. 
 +000025BC  12 00 41 34 80 05 00 FA FF D9 04 00 38 8A D9 3A 93 50 41 34 ..A4........8..:.PA4 
 +</cli> 
 + 
 +If you look over in the ASCII field on the far right of the line started by offset **00002594**, you will see a ":.PA4"... according to the data packet field breakdown above, the start of the packet will be an 'A', followed by a '4'... so seeing a fairly isolated "A4" is an excellent indication we are looking at a new data packet. 
 + 
 +**bvi** informs us that the lone "A4" 2-byte sequence ('A' byte followed by '4' byte) is at offset **0000259E**. 
 + 
 +The byte prior to the next "A4" (the next line-- **000025A8**) occurs at offset **000025AD**. 
 + 
 +It would seem (especially upon converting 259E and 25AD to decimal), there is a 15 byte difference (so a 16-byte duration) to this particular packet. Let's dig deeper... 
 + 
 +First, to reduce analysis paralysis, let us extract specifically this byte. 
 + 
 +We need the decimal equivalents of 259E and 25AD: 
 + 
 +<cli> 
 +$ echo "ibase=16; 259E" | bc 
 +9630 
 +$ echo "ibase=16; 25AD" | bc 
 +9645 
 +$  
 +</cli> 
 + 
 +And then, calculate their difference (how long is this packet): 
 + 
 +<cli> 
 +$ echo "9645-9630" | bc 
 +15 
 +$  
 +</cli> 
 + 
 +Okay, so we have a 15 bytes of data following offset 9630 (decimal). We need to remember to include the byte at offset 9630, so 15+1=16 total bytes in this packet. Let us extract just that packet into a file for further analysis: 
 + 
 +<cli> 
 +$ dd if=session-201211020309.raw of=packet bs=1 skip=9630 count=16 
 +16+0 records in 
 +16+0 records out 
 +16 bytes (16 B) copied, 0.141976 s, 0.1 kB/s 
 +$  
 +</cli> 
 + 
 +Finally, let's get a hexdump and further decode this arbitrary packet: 
 + 
 +<cli> 
 +$ od -A x -t x1z -v packet  
 +000000 41 34 06 05 00 fa ff d8 08 00 36 03 03 00 00 00  >A4........6.....< 
 +000010 
 +$  
 +</cli> 
 + 
 +Note that with our extraction from the data file, the original offset is no longer valid (we now have a file with JUST our packet in it, and our file begins at offset 0). 
 + 
 +Okay... let's break this down (reference the info tables above): 
 + 
 +  * byte 0: packet start (0x41 -- 'A'
 +  * byte 1: protocol version (0x34 -- '4'
 +  * byte 2: checksum-- see below for calculation (0x06) 
 +  * byte 3: lower-order byte of message length (0x05) 
 +  * byte 4: upper-order byte of message length (0x00) 
 + 
 +According to this, our message length is 0x0005 (or 5 in decimal) bytes long. 
 + 
 +  * byte 5: lower-order inverted byte of message length (was 0x05 above, should be 0xFA) 
 +  * byte 6: upper-order inverted byte of message length (was 0x00 above, should be 0xFF) 
 + 
 +If you have questions about bit inversions, it is merely flipping 0 to 1, and 1 to 0. In our 0x05 example, we have this: 
 + 
 +  * normal: 00000101 (05) or 0000 (0) 0101 (5) 
 +  * inverted: 11111010 (FA) or 1111 (F) 1010 (A) 
 + 
 +The EEG device is inverting the message length data and placing them in our data packet so we can use it as a form of data validation, to make sure we're looking at a real packet (strategies like this are not uncommon-- it is part of interfacing real world devices to the digital environments of computers). 
 + 
 +And we see that the inverted message length checks out with the regular message length... we've passed one of the tests ensuring this is a valid packet. 
 + 
 +  * byte 7: lower-order byte of 32-bit UNIX time (0xd8) -- this will make more sense in the context of the actual time (once known) 
 +  * byte 8: lower-order byte of subsecond (0x08) 
 +  * byte 9: upper-order byte of subsecond (0x00) 
 +  * byte 10: sequence number (0x36) 
 +  * byte 11: data type (0x03) -- according to the table, 0x03 is a 'version' 
 +  * byte 12: datablock (msglen-1) 
 + 
 +It would seem the "message length" consists of the data type byte plus the length of the datablock. We see from the 2-byte message length sequence above that the msglen is 5 bytes... 1 of those bytes is the **data type** byte, which leaves 4 bytes remaining for the **datablock** array. 
 + 
 +As it is multibyte, it needs to be treated as little endian (lower-order byte first, followed by upper-order bytes)... we see from our hex display there are 4 bytes remaining in our packet: 
 + 
 +<cli> 
 +03 00 00 00 
 +</cli> 
 + 
 +So, doing a straight reversal, that would give us: **00 00 00 03**, a 32-bit (4-byte) value, containing the number **3**, the apparent version of things (different from the packet format version above). 
 + 
 +Let's address the checksum calculation skipped above... now that we know our data type + datablock bytes (all 5 of them), the checksum is calculated by adding together all 5 of those bytes (but only storing the result in a 1 byte storage space, which will likely mean wraparounds like it is nobody's business with more exotic values). Let's trace it out: 
 + 
 +0x03 (data type) + 0x03 (first byte of datablock) + 0x00 (second byte of datablock) + 0x00 (third byte of datablock) + 0x00 (fourth byte of data block) = 0x06. 
 + 
 +What was the value stored in the checksum field of the our extracted data packet (byte #2): 0x06. Aha! The sum of the data checks out (this is our other test to ensure packet data validity). 
 + 
 +There we have it... one decoded packet, of potentially many. 
 + 
 +Pretty awesome, right? 
 +=====Obtain the files===== 
 +This week's project is located in the **spring2015/udr2/** directory of the UNIX Public Directory, in an archive called: **sleepfun.tar.bz2**
  
 Make a copy of this into your home directory somewhere and set to work. Make a copy of this into your home directory somewhere and set to work.
  
 **NOTE:** Hopefully it has been standard practice to locate project files in their own unique subdirectory, such as under **src/unix/**, where you can then add/commit/push the results to your repository (you ARE regularly putting stuff in your repository, aren't you?) **NOTE:** Hopefully it has been standard practice to locate project files in their own unique subdirectory, such as under **src/unix/**, where you can then add/commit/push the results to your repository (you ARE regularly putting stuff in your repository, aren't you?)
-  
-=====Process===== 
-The data you seek (2 files) is obfuscated and contained within this file. 
  
-Plain text directions give clues on how to find both pieces of informationand it is up to you to use your skills to extract the necessary data.+=====Data Files===== 
 +Upon extraction of the files in **sleepfun.tar.bz2**, you should have the following files:
  
-Some additional information:+  * session-201211020309.raw (5866460 bytes, or 5.6MB) -- core sleep session 
 +  * session-201301041418.raw (360135 bytes) -- nap 
 +  * session-201301311908.raw (4955855 bytes) -- core sleep session 
 +  * session-201302010218.raw (2719296 bytes) -- core sleep session 
 +  * session-201302200614.raw (524705 bytes) -- nap 
 +  * session-201303051015.raw (511190 bytes) -- nap
  
-  * The first file should be named **udr1.text** and be properly oriented. +Session files are named with the date and time of the start of the particular sleep sessionencoded as follows (YYYYMMDDhhmm):
-  * The second (big) file runs from the starting point until the very end of the file +
-  * It should be named 'gizmo'and reside in your current working directory. +
-  * gizmo is binary data, and entirely reversed- you need to get its bytes back in order (last byte should be first byte, 2nd to last should be 2nd, etc.) +
-    * You are to write a shell script to perform the de-reversal of the data, reading from data.file and through whatever processing is needed, produce the file called **gizmo**. +
-  * The **urev** tool has some additional constraints with respect to gizmo... running it should notify you of any details you are lacking.+
  
 +  * YYYY - 4-digit year (2012)
 +  * MM - 2-digit month (11)
 +  * DD - 2-digit day (02)
 +  * hh - 2-digit hour (24-hour time, so 03 means 3am)
 +  * mm - 2-digit minute (09)
 +
 +So, 201211020309 means 2012/11/02 at 3:09am was the recorded time of the start of this particular sleep session (I was exploring with a dual core sleep schedule around this time, so this would have been my 2nd core).
 +
 +=====Task=====
 +With the provided data files, I'd like for you to do the following (be sure to provide commands for each as well as the answer you got):
 +
 +  * determine the number of data packets in each file
 +  * determine the total time elapsed in the session file
 +  * determine the total time in a sleep state (not undefined, not conscious)
 +  * find a data packet during a time of rem or deep sleep that stores the complete timestamp, and:
 +    * extract that packet from the pertinent data file (provide command)
 +    * what is the timestamp (as a 32-bit value)
 +    * what is the calendar date and time of that timestamp, when appropriately translated?
 +  * which file had the most deep sleep?
 +    * how much took place?
 +    * how did you figure this out?
 +    * what was the approximate time?
 =====Useful tools===== =====Useful tools=====
 You may want to become familiar with the manual pages of the following tools (in addition to tools you've already encountered): You may want to become familiar with the manual pages of the following tools (in addition to tools you've already encountered):
Line 75: Line 263:
   * **dd**(1)   * **dd**(1)
   * **bc**(1)   * **bc**(1)
-  * **du**(1) +  * **od**(1) - as I've said to others, **od** is like **cat**, but for binary data
-  * **bash**(1) shell scripting +
-  * **od**(1)+
   * **bvi**(1)   * **bvi**(1)
   * **hexedit**(1)   * **hexedit**(1)
 +  * **grep**(1) - can be contorted to cooperate
 +  * **date**(1) - might be useful for time/date manipulations
 +  * **bgrep** (see below for usage)
  
 ... along with other tools previously encountered. ... along with other tools previously encountered.
  
 +====bgrep====
 +To assist you with this project, a special "binary grep" has been deployed on the system, called **bgrep**. bgrep searches for patterns among binary data, as part of STDIN.
 +
 +It supports space-separated (or not) bytes of data, and even allows the use of '.' to denote any hex value (remember, it takes 2 hex values to occupy a byte).
 +
 +===Example Usage===
 +Let's say you wanted to search for the consecutive bytes 0x12 and 0x34 within a binary file:
 +
 +<cli>
 +$ cat session-201302200614.raw | bgrep '12 34' 
 +533b:12 34 
 +29af3:12 34 
 +29dff:12 34 
 +29f85:12 34 
 +2a8a9:12 34 
 +2aa2f:12 34 
 +2abb5:12 34 
 +2aec1:12 34 
 +2b353:12 34
 +
 +</cli>
 +
 +What you see are the addresses (in hex) that denote the start of this requested pattern (0x12 immediately followed by 0x34).
 +
 +If you wanted 0x12 followed by anything, followed by 0x34, we'd do:
 +
 +<cli>
 +$ cat session-201302200614.raw | bgrep '12 .. 45' 
 +3326:12 e0 45
 +
 +</cli>
 +
 +In this case, there is only one such match in the entire file.
 +
 +The '.' pattern can also be applied to only part of a byte... 0x12 0xe# (we don't care what the lower order 4-bits are, but the upper 4-bits of the second byte MUST be an 0xe):
 +
 +<cli>
 +$ cat session-201302200614.raw | bgrep '12 e.' 
 +1cf4:12 ee 
 +206d:12 e0 
 +3325:12 e0 
 +3907:12 e0 
 +4077:12 e0 
 +4795:12 e0 
 +50a1:12 e0 
 +552b:12 e0 
 +5edb:12 e0 
 +73e7:12 e0 
 +81b9:12 e0 
 +8df9:12 e0 
 +8fcf:12 e0 
 +aae3:12 e0 
 +aae7:12 e0 
 +b859:12 e0 
 +3415c:12 e9 
 +4e11f:12 e0 
 +6bd5b:12 ed 
 +796f7:12 e0 
 +7b877:12 e0 
 +7d3df:12 e0 
 +7e7e1:12 e0 
 +7e7f5:12 e0 
 +7ecf7:12 e0
 +
 +</cli>
 +
 +We can see variations in the lower 4-bits as it matches our desired pattern.
 +
 +Finally, upper 4-bits can be anything, lower 4 must be 0xc, followed by 0x23:
 +
 +<cli>
 +$ cat session-201302200614.raw | bgrep '.c34' 
 +91c1:3c 34 
 +29029:8c 34 
 +297e5:0c 34 
 +322d3:ec 34 
 +6152b:dc 34 
 +6a683:0c 34 
 +6ef95:6c 34
 +
 +</cli>
 +
 +Notice in this last pattern, we opted not to space separate the pattern... it works either way (output will be space-separated regardless).
 +
 +This will hopefully prove to be a useful tool in your binary analysis endeavors.
 =====Submission===== =====Submission=====
 Successful completion will result in the following criteria being met: Successful completion will result in the following criteria being met:
  
-  * Resulting file with proper settings should enable you to run **urev** tool. +  * When all is said and done, you will submit: 
-  * You have completed all weekly exercises (96, I think) before the deadline, being mindful of the intentionally-paced nature of urev. +    * **udr2.text**, containing the answers/responses to all the above questions (including commands used to pull off the project)
-    * Bonus opportunity: while still performing a minimum of 3 distict **urev** sessions, how could you get around the urev-imposed time limit? (Without copying/changing urev). +
-  * When all is said and done, you will submit 3 files+
-    * **udr1.text** +
-      * Append the dd line(s) as well as any other command lines needed to extract and properly re-orient the file. Also be sure to indicate what is in the file you found (content, not just type of data)+
-    * your bash script enabling the processing of data.file to produce gizmo +
-      * Be sure to include comments indicating the reasoning behind actions taken +
-    * Your extracted/processed **gizmo** file +
 ====Submit==== ====Submit====
 Please submit as follows: Please submit as follows:
  
 <cli> <cli>
-lab46:~/src/unix/udr1$ submit unix udr1 udr1.text getgizmo.bash gizmo +lab46:~/src/unix/udr2$ submit unix udr2 udr2.text 
-Submitting unix project "udr1": +Submitting unix project "udr2": 
-    -> udr1.text(OK)  +    -> udr2.text(OK) 
-    -> getgizmo.bash(OK) +
-    -> gizmo(OK) +
  
 SUCCESSFULLY SUBMITTED SUCCESSFULLY SUBMITTED
-lab46:~/src/unix/udr1+lab46:~/src/unix/udr2
 </cli> </cli>
haas/spring2015/unix/projects/udr2.1426505823.txt.gz · Last modified: 2015/03/16 11:37 by wedge