Corning Community College
CSCS2700 Data Communications
~~TOC~~
This section will document any updates applied to the project since original release:
A simultaneous review of prior programming along with the start of a new direction in our programming pursuits.
As we see in the name of the course, Data Communications, there are two important words in the title, and their meanings (both separate and together) are valid topics of exploration for this course:
Looking at those 3 definitions, some important things come to mind:
In this course, we will be exploring different ways of manipulating data (some beyond existing experiences, others in entirely new scenarios), which in many ways will be a form of communication, if only within the same program (input to output), and then also scenarios of “data communication” where there is more than one entity involved in the transaction (be it program, computer, etc.).
So, we are starting simple, with data, somewhat reviewing, although somewhat breaking new ground. I've envisioned a sequence of projects that will provide a common theme, hopefully facilitating our explorations a bit.
In the DATACOMM public directory will be a subdirectory called pds0/; in this directory will be 5 files, named dataset0.txt through dataset5.txt.
Please copy (or reference, via absolute path), these files in your program implementation.
These files are datasets containing intraday stock data for various securities, over varying blocks of time.
The format of the files is as follows:
EXCHANGE%3DOTCMKTS MARKET_OPEN_MINUTE=570 MARKET_CLOSE_MINUTE=960 INTERVAL=60 COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME DATA= TIMEZONE_OFFSET=-240 a1500903000,0.097,0.097,0.0965,0.0965,53758 1,0.096,0.097,0.096,0.097,102502 2,0.0974,0.099,0.095,0.09525,159489 3,0.099,0.099,0.097,0.0975,238832 ...
What we have is some lead-in information, sometimes known as a header, which provides some initial values we can use to calibrate our program logic to better fit the data.
In the example above, the header would be the first 7 lines:
EXCHANGE%3DOTCMKTS MARKET_OPEN_MINUTE=570 MARKET_CLOSE_MINUTE=960 INTERVAL=60 COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME DATA= TIMEZONE_OFFSET=-240
What this is basically telling us is which stock exchange this data pertains to (somewhat unimportant for our current project), the absolute minute from the start of the day when the markets opened and closed (potentially important for what we are doing), the interval of data being reported (in units of seconds), the overall format of the data (date, close, high, etc.), a seemingly unused (maybe reserved?) DATA option, and finally a timezone offset (what timezone is this data being reported in?)
Following the header we have a stanza pertaining to a day, which will kick off with a line like this:
a1500903000,0.097,0.097,0.0965,0.0965,53758
This is effectively kicking off item 0 in the reported interval.
That first field (note a comma-separated list), is actually an encoded UNIX time value, which we'll want to decode to report more recognizable date information (YYYY-MM-DD HH:MM).
The successive fields correspond, in order with the values laid out in the COLUMNS option in the header (after DATE comes the prior CLOSE, then the HIGH, the LOW, the OPEN, and finally the VOLUME).
With the exception of DATE and VOLUME, everything else is represented as a decimal cost (you may assume dollars).
Subsequent lines in the stanza are merely offset intervals from the first, for instance:
1,0.096,0.097,0.096,0.097,102502 2,0.0974,0.099,0.095,0.09525,159489 3,0.099,0.099,0.097,0.0975,238832 4,0.0975,0.099,0.097,0.097,21000
No UNIX time value to decode, merely an offset to add to that initial UNIX time value.
Your job is to write a program that, when provided one of these dataset files as a command-line argument, will open and read its contents into memory (I'm leaving the structure of how you store it somewhat flexible for now, but let's just say it may make a whole lot of sense to use a struct to aid in storing this data, perhaps even an array of structs…), and then be able to interactively (perhaps via a menu?) report :
Results for now should just be displayed to STDOUT.
Clearly, there's a lot of different directions we can go from here, but for now we're aiming to establish a baseline (can we interact with and parse known data in expected ways). Once we have that down, we can get into more sophisticated variations.
Submission is via the lab46 submit tool, by the posted deadline, for the source code (able to compile and run without issue on lab46).