Differences

This shows you the differences between two versions of the page.

--- haas:spring2015:unix:projects:udr2 [2015/03/16 11:37] – [Obtain the file] wedge
+++ haas:spring2015:unix:projects:udr2 [2015/03/30 16:39] (current) – [Errata] wedge
@@ Line 11: / Line 11: @@
 Typos and bug fixes:
-  * no fixes of note
+  * bgrep was giving the address of the last byte matched of a pattern, vs. the address of the start of the matched pattern (the intended action). This has been corrected, and **bgrep** has been updated. (20150322)
+    * This should not change anything, save for saving you an additional calculation to determine the start of the packet.
+  * My aforementioned fix did not work, reverted **bgrep** to original version (20150324)
+  * Implemented new fix: **bgrep** should now be correctly reporting the starting address of the matched pattern -- no change on your part, just start using it (and be aware that the address represents the start of the pattern, and not the end) (20150330)
 =====Objective=====
 Continuing our "1337 haxxing" series of projects, we've found considerable conceptual self-imposed roadblocks blocking our employment of otherwise simple computing properties (that data is a series of bytes, and ultimately, that **everything is a file**).
@@ Line 33: / Line 35: @@
 Once again there is a conceptual as well as practical angle... some people will struggle more with one over the other, and as always: questions are not just encouraged, they are expected for success!
 =====EEG data packet format=====
-EEG data is represented in the form of data packets-- collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.). The format of the data packet is as follows:
+EEG data is represented in the form of data packets-- collections of bytes that can be decoded to convey a particular meaning (state of sleep, timestamp, signal strength, etc.).
+It is important to note that the data, in cases of multibyte values, is little endian in orientation.
+The format of the data packet is as follows:
+====Data Packet====
 ^  Field  ^  Length (bytes)  ^  Description  |AncllLLTttsid
 |  'A' (0x41)  |  1  |  character starting the packet  |
-|  4 (0x04)  |  1  |  the protocol “version”, of which only 4 is currently supported  |
+|  '4' (0x34)  |  1  |  the protocol “version”, of which only '4' is currently supported  |
 |  checksum  |  1  |  a one byte checksum formed by summing the identifier byte and all the data bytes  |
 |  msglen  |  2  |  a two byte message length (little endian). This length includes the size of the data payload plus the identifier  |
@@ Line 44: / Line 52: @@
 |  sub_sec  |  2  |  the 16-bit sub-second (runs through 0xFFFF in 1 second), LSB first  |
 |  seqnum  |  1  |  the 8-bit sequence number  |
-                i is the datatype
+|  datatype  |  1  |  the datatype (see data type subfield table below)  |
-                d is the array of binary data
+|  datablock  |  variable  |  the array of binary data  |
+====Data Type Subfield of Data Packet====
+These are the data types generated by the EEG device and could manifest within the data file. Note that this data will be contained in the **datatype** field of the data packet, and any follow-up data will be present in the **datablock** array field.
-=====Obtain the file=====
+^  Type ID (hex)  ^  Type Name  ^  Description  |
-This week's project is located in the **spring2015/udr1/** directory of the UNIX Public Directory, in a file called: **data.file**
+|  0x00  |  event  |  an event has occured (see event table below)  |
+|  0x02  |  slice_end  |  marks the end of a slice of data (a slice can span multiple packets)  |
+|  0x03  |  version  |  version of the raw data output  |
+|  0x80  |  waveform  |  raw time domain brainwave  |
+|  0x83  |  frequency_bins  |  frequency bins derived from waveform  |
+|  0x84  |  signal  |  signal quality range of waveform (0-30)  |
+|  0x8A  |  timestamp  |  full timestamp from EEG device's RTC  |
+|  0x97  |  impedance  |  impedance across the headband  |
+|  0x9C  |  badsignal  |  signal contains artifacts  |
+|  0x9D  |  sleepstage  |  current sleep stage (produced in 30 second samples, see sleepstage table below)  |
+====Event table====
+These are the possible events generated by the EEG device and could manifest within the data file. Note that this data will be contained in the **datablock** field (the array) of the data packet when the datatype has been identified as an **event**.
+^  Event ID  ^  Event Name  ^  Description  |
+|  0x05  |  session_start  |  data acquisition session has commenced  |
+|  0x07  |  sleep_start  |  user is asleep  |
+|  0x0E  |  headset_disengaged  |  EEG headset has been set on dock  |
+|  0x0F  |  headset_engaged  |  EEG headset taken off dock  |
+|  0x10  |  alarm_off  |  user turned off alarm functionality  |
+|  0x11  |  alarm_snooze  |  user hit enabled snooze delay on alarm functionality  |
+|  0x13  |  alarm_play  |  set alarm is now going off  |
+|  0x15  |  session_end  |  data acquisition session has ceased  |
+|  0x24  |  headset_introduce  |  a new headband ID has been read  |
+====Sleep Stage table====
+These are the possible sleep stages recognized by the EEG device (this data will be located in the **datablock** field (the array) of the data packet when the data type has been identified as a **sleepstage**.
+^  SleepStage ID  ^  SleepStage Name  ^  Description  |
+|  0x00  |  undefined  | insufficient data to determine sleep stage  |
+|  0x01  |  conscious  | user is in an awakened state  |
+|  0x02  |  rem  |  user is experiencing REM (Random Eye Movement) sleep  |
+|  0x03  |  light  |  user is experiencing light sleep  |
+|  0x04  |  deep  |  user is experiencing deep sleep (SWS)  |
+====Frequency Bins table====
+Frequency Bins are a measurement of the current waveform frequency being experienced, which is analyzed by the EEG device and factors into the Sleep Stage determination. This would be considered a more raw form of data, should additional analysis be desired.
+^  ID  ^  Named Range (Hz)  ^  Description  |
+|  0x00  |  2-4  |  Delta  |
+|  0x01  |  4-8  |  Theta  |
+|  0x02  |  8-13  |  Alpha  |
+|  0x03  |  13-18  |  Beta  |
+|  0x04  |  18-21  |  Beta  |
+|  0x05  |  11-14  |  Beta (sleep spindles)  |
+|  0x06  |  30-50  |  Gamma  |
+=====Example Analysis=====
+With the use of a hex editor, we can manually identify and decode the EEG data packets, using the information provided above.
+In the file **session-201211020309.raw** (November 2nd, 2012, core sleep session starting at 3:09am), the following data can be seen (snippeted from a **bvi** session):
+<cli>
+00002580  3F 05 00 FA FF D7 0E 00 34 02 51 DA 12 00 41 34 7F 05 00 FA ?.......4.Q...A4....
+00002594  FF D8 06 00 35 8A D8 3A 93 50 41 34 06 05 00 FA FF D8 08 00 ....5..:.PA4........
+A8  36 03 03 00 00 00 41 34 40 05 00 FA FF D8 0E 00 37 02 52 DA 6.....A4@.......7.R.
+BC  12 00 41 34 80 05 00 FA FF D9 04 00 38 8A D9 3A 93 50 41 34 ..A4........8..:.PA4
+</cli>
+If you look over in the ASCII field on the far right of the line started by offset **00002594**, you will see a ":.PA4"... according to the data packet field breakdown above, the start of the packet will be an 'A', followed by a '4'... so seeing a fairly isolated "A4" is an excellent indication we are looking at a new data packet.
+**bvi** informs us that the lone "A4" 2-byte sequence ('A' byte followed by '4' byte) is at offset **0000259E**.
+The byte prior to the next "A4" (the next line-- **000025A8**) occurs at offset **000025AD**.
+It would seem (especially upon converting 259E and 25AD to decimal), there is a 15 byte difference (so a 16-byte duration) to this particular packet. Let's dig deeper...
+First, to reduce analysis paralysis, let us extract specifically this byte.
+We need the decimal equivalents of 259E and 25AD:
+<cli>
+$ echo "ibase=16; 259E" | bc
+
+$ echo "ibase=16; 25AD" | bc
+
+$
+</cli>
+And then, calculate their difference (how long is this packet):
+<cli>
+$ echo "9645-9630" | bc
+
+$
+</cli>
+Okay, so we have a 15 bytes of data following offset 9630 (decimal). We need to remember to include the byte at offset 9630, so 15+1=16 total bytes in this packet. Let us extract just that packet into a file for further analysis:
+<cli>
+$ dd if=session-201211020309.raw of=packet bs=1 skip=9630 count=16
++0 records in
++0 records out
+bytes (16 B) copied, 0.141976 s, 0.1 kB/s
+$
+</cli>
+Finally, let's get a hexdump and further decode this arbitrary packet:
+<cli>
+$ od -A x -t x1z -v packet
+41 34 06 05 00 fa ff d8 08 00 36 03 03 00 00 00  >A4........6.....<
+
+$
+</cli>
+Note that with our extraction from the data file, the original offset is no longer valid (we now have a file with JUST our packet in it, and our file begins at offset 0).
+Okay... let's break this down (reference the info tables above):
+  * byte 0: packet start (0x41 -- 'A')
+  * byte 1: protocol version (0x34 -- '4')
+  * byte 2: checksum-- see below for calculation (0x06)
+  * byte 3: lower-order byte of message length (0x05)
+  * byte 4: upper-order byte of message length (0x00)
+According to this, our message length is 0x0005 (or 5 in decimal) bytes long.
+  * byte 5: lower-order inverted byte of message length (was 0x05 above, should be 0xFA)
+  * byte 6: upper-order inverted byte of message length (was 0x00 above, should be 0xFF)
+If you have questions about bit inversions, it is merely flipping 0 to 1, and 1 to 0. In our 0x05 example, we have this:
+  * normal: 00000101 (05) or 0000 (0) 0101 (5)
+  * inverted: 11111010 (FA) or 1111 (F) 1010 (A)
+The EEG device is inverting the message length data and placing them in our data packet so we can use it as a form of data validation, to make sure we're looking at a real packet (strategies like this are not uncommon-- it is part of interfacing real world devices to the digital environments of computers).
+And we see that the inverted message length checks out with the regular message length... we've passed one of the tests ensuring this is a valid packet.
+  * byte 7: lower-order byte of 32-bit UNIX time (0xd8) -- this will make more sense in the context of the actual time (once known)
+  * byte 8: lower-order byte of subsecond (0x08)
+  * byte 9: upper-order byte of subsecond (0x00)
+  * byte 10: sequence number (0x36)
+  * byte 11: data type (0x03) -- according to the table, 0x03 is a 'version'
+  * byte 12: datablock (msglen-1)
+It would seem the "message length" consists of the data type byte plus the length of the datablock. We see from the 2-byte message length sequence above that the msglen is 5 bytes... 1 of those bytes is the **data type** byte, which leaves 4 bytes remaining for the **datablock** array.
+As it is multibyte, it needs to be treated as little endian (lower-order byte first, followed by upper-order bytes)... we see from our hex display there are 4 bytes remaining in our packet:
+<cli>
+00 00 00
+</cli>
+So, doing a straight reversal, that would give us: **00 00 00 03**, a 32-bit (4-byte) value, containing the number **3**, the apparent version of things (different from the packet format version above).
+Let's address the checksum calculation skipped above... now that we know our data type + datablock bytes (all 5 of them), the checksum is calculated by adding together all 5 of those bytes (but only storing the result in a 1 byte storage space, which will likely mean wraparounds like it is nobody's business with more exotic values). Let's trace it out:
+x03 (data type) + 0x03 (first byte of datablock) + 0x00 (second byte of datablock) + 0x00 (third byte of datablock) + 0x00 (fourth byte of data block) = 0x06.
+What was the value stored in the checksum field of the our extracted data packet (byte #2): 0x06. Aha! The sum of the data checks out (this is our other test to ensure packet data validity).
+There we have it... one decoded packet, of potentially many.
+Pretty awesome, right?
+=====Obtain the files=====
+This week's project is located in the **spring2015/udr2/** directory of the UNIX Public Directory, in an archive called: **sleepfun.tar.bz2**
 Make a copy of this into your home directory somewhere and set to work.
 **NOTE:** Hopefully it has been standard practice to locate project files in their own unique subdirectory, such as under **src/unix/**, where you can then add/commit/push the results to your repository (you ARE regularly putting stuff in your repository, aren't you?)
-=====Process=====
-The data you seek (2 files) is obfuscated and contained within this file.
-Plain text directions give clues on how to find both pieces of information, and it is up to you to use your skills to extract the necessary data.
+=====Data Files=====
+Upon extraction of the files in **sleepfun.tar.bz2**, you should have the following files:
-Some additional information:
+  * session-201211020309.raw (5866460 bytes, or 5.6MB) -- core sleep session
+  * session-201301041418.raw (360135 bytes) -- nap
+  * session-201301311908.raw (4955855 bytes) -- core sleep session
+  * session-201302010218.raw (2719296 bytes) -- core sleep session
+  * session-201302200614.raw (524705 bytes) -- nap
+  * session-201303051015.raw (511190 bytes) -- nap
-  * The first file should be named **udr1.text** and be properly oriented.
+Session files are named with the date and time of the start of the particular sleep session, encoded as follows (YYYYMMDDhhmm):
-  * The second (big) file runs from the starting point until the very end of the file
-  * It should be named 'gizmo', and reside in your current working directory.
-  * gizmo is binary data, and entirely reversed- you need to get its bytes back in order (last byte should be first byte, 2nd to last should be 2nd, etc.)
-    * You are to write a shell script to perform the de-reversal of the data, reading from data.file and through whatever processing is needed, produce the file called **gizmo**.
-  * The **urev** tool has some additional constraints with respect to gizmo... running it should notify you of any details you are lacking.
+  * YYYY - 4-digit year (2012)
+  * MM - 2-digit month (11)
+  * DD - 2-digit day (02)
+  * hh - 2-digit hour (24-hour time, so 03 means 3am)
+  * mm - 2-digit minute (09)
+So, 201211020309 means 2012/11/02 at 3:09am was the recorded time of the start of this particular sleep session (I was exploring with a dual core sleep schedule around this time, so this would have been my 2nd core).
+=====Task=====
+With the provided data files, I'd like for you to do the following (be sure to provide commands for each as well as the answer you got):
+  * determine the number of data packets in each file
+  * determine the total time elapsed in the session file
+  * determine the total time in a sleep state (not undefined, not conscious)
+  * find a data packet during a time of rem or deep sleep that stores the complete timestamp, and:
+    * extract that packet from the pertinent data file (provide command)
+    * what is the timestamp (as a 32-bit value)
+    * what is the calendar date and time of that timestamp, when appropriately translated?
+  * which file had the most deep sleep?
+    * how much took place?
+    * how did you figure this out?
+    * what was the approximate time?
 =====Useful tools=====
 You may want to become familiar with the manual pages of the following tools (in addition to tools you've already encountered):
@@ Line 75: / Line 263: @@
   * **dd**(1)
   * **bc**(1)
-  * **du**(1)
+  * **od**(1) - as I've said to others, **od** is like **cat**, but for binary data
-  * **bash**(1) shell scripting
-  * **od**(1)
   * **bvi**(1)
   * **hexedit**(1)
+  * **grep**(1) - can be contorted to cooperate
+  * **date**(1) - might be useful for time/date manipulations
+  * **bgrep** (see below for usage)
 ... along with other tools previously encountered.
+====bgrep====
+To assist you with this project, a special "binary grep" has been deployed on the system, called **bgrep**. bgrep searches for patterns among binary data, as part of STDIN.
+It supports space-separated (or not) bytes of data, and even allows the use of '.' to denote any hex value (remember, it takes 2 hex values to occupy a byte).
+===Example Usage===
+Let's say you wanted to search for the consecutive bytes 0x12 and 0x34 within a binary file:
+<cli>
+$ cat session-201302200614.raw | bgrep '12 34'
+b:12 34
+af3:12 34
+dff:12 34
+f85:12 34
+a8a9:12 34
+aa2f:12 34
+abb5:12 34
+aec1:12 34
+b353:12 34
+$
+</cli>
+What you see are the addresses (in hex) that denote the start of this requested pattern (0x12 immediately followed by 0x34).
+If you wanted 0x12 followed by anything, followed by 0x34, we'd do:
+<cli>
+$ cat session-201302200614.raw | bgrep '12 .. 45'
+:12 e0 45
+$
+</cli>
+In this case, there is only one such match in the entire file.
+The '.' pattern can also be applied to only part of a byte... 0x12 0xe# (we don't care what the lower order 4-bits are, but the upper 4-bits of the second byte MUST be an 0xe):
+<cli>
+$ cat session-201302200614.raw | bgrep '12 e.'
+cf4:12 ee
+d:12 e0
+:12 e0
+:12 e0
+:12 e0
+:12 e0
+a1:12 e0
+b:12 e0
+edb:12 e0
+e7:12 e0
+b9:12 e0
+df9:12 e0
+fcf:12 e0
+aae3:12 e0
+aae7:12 e0
+b859:12 e0
+c:12 e9
+e11f:12 e0
+bd5b:12 ed
+f7:12 e0
+b877:12 e0
+d3df:12 e0
+e7e1:12 e0
+e7f5:12 e0
+ecf7:12 e0
+$
+</cli>
+We can see variations in the lower 4-bits as it matches our desired pattern.
+Finally, upper 4-bits can be anything, lower 4 must be 0xc, followed by 0x23:
+<cli>
+$ cat session-201302200614.raw | bgrep '.c34'
+c1:3c 34
+:8c 34
+e5:0c 34
+d3:ec 34
+b:dc 34
+a683:0c 34
+ef95:6c 34
+$
+</cli>
+Notice in this last pattern, we opted not to space separate the pattern... it works either way (output will be space-separated regardless).
+This will hopefully prove to be a useful tool in your binary analysis endeavors.
 =====Submission=====
 Successful completion will result in the following criteria being met:
-  * Resulting file with proper settings should enable you to run **urev** tool.
+  * When all is said and done, you will submit:
-  * You have completed all weekly exercises (96, I think) before the deadline, being mindful of the intentionally-paced nature of urev.
+    * **udr2.text**, containing the answers/responses to all the above questions (including commands used to pull off the project)
-    * Bonus opportunity: while still performing a minimum of 3 distict **urev** sessions, how could you get around the urev-imposed time limit? (Without copying/changing urev).
-  * When all is said and done, you will submit 3 files:
-    * **udr1.text**
-      * Append the dd line(s) as well as any other command lines needed to extract and properly re-orient the file. Also be sure to indicate what is in the file you found (content, not just type of data).
-    * your bash script enabling the processing of data.file to produce gizmo
-      * Be sure to include comments indicating the reasoning behind actions taken
-    * Your extracted/processed **gizmo** file
 ====Submit====
 Please submit as follows:
 <cli>
-lab46:~/src/unix/udr1$ submit unix udr1 udr1.text getgizmo.bash gizmo
+lab46:~/src/unix/udr2$ submit unix udr2 udr2.text
-Submitting unix project "udr1":
+Submitting unix project "udr2":
-    -> udr1.text(OK)
+    -> udr2.text(OK)
-    -> getgizmo.bash(OK)
-    -> gizmo(OK)
 SUCCESSFULLY SUBMITTED
-lab46:~/src/unix/udr1$
+lab46:~/src/unix/udr2$
 </cli>

Lab46 Wiki

User Tools

Site Tools

Differences

Page Tools