User Tools

Site Tools


opus:fall2011:dschoeff:start

Dave Schoeffler's Fall 2011 Opus

:)

Introduction

Me

Hey all. My name is Dave Schoeffler but most people call me by my nickname Sheffy. I am a Computer Science major at Corning Community College. This is my 5th semester at CCC and 4th being full time. This is my second to last semester here at Corning Community College where upon completion I will be transferring to a yet to be determined 4 year school. My plan is to get a Bachelors in Computer Science with a Minor in Math and possible go on to get a Master's in Computer Science, but we will see how long I can put up with school :).

Some more stuff about me

I've enjoyed my experience at Corning so far, mainly because of my awesome teachers. Last semester I took UNIX/Linux with Matt and really enjoyed learning a new operating system. I look forward to using that knowledge to be a better programmer. I currently have my own lawn mowing business to pay the bills but I can't wait to be able to get some computer based job. I am not sure what job field I would like to get into yet but I know I wanna work with computers.

What else can I tell you about myself? Well I play guitar and like most any sport (except for soccer because its not really a sport). I'll be turning 19 pretty soon and am really looking forward to this class. I'm looking forward to expanding my current knowledge of C/C++ to be able to write some wicked cool programs.

Part 1

Entries

September 5th, 2011

Wow! I'm a little late with getting started with my opus but I have started now and I will soon get all caught up. I'm actually entering this addition to my opus on September 19th but I did this work earlier this month. The first thing I did for this class was join the class irc. I first had to start a screen session

lab46:~$ screen 

After that I had to connect to the server

lab46:~$ irssi
[(status)] /server irc

I then had to join the Data Structures channel

[(status)]/join #data

Bam! Success… I can now happily discuss a numerous amount of topics ranging from linked lists to zombies with my fellow classmates. Can life get better? I submit that it cannot!

September 18th, 2011

After checking my class the class irc I discovered that I completely forgot to sign up to the class mailing list. I promptly fixed this by sending a request to join the class mailing list herehttp://lab46.corning-cc.edu/mailman/listinfo/data. After receiving the email I replied with the confirmation code to complete this task.

September 27th, 2011

Another day of programming, another day of fun. Today and yesterday I've been busting out keywords and the course objective. I've come across a lot of topics that have been incredibly informative. In my use of linked lists for my project I've needed an understanding of everything from pointers to memory allocation to structs. I currently have a successful programs that implements different functions that manipulate linked lists. The only function that I do not currently have working is the copy() function that is supposed to make an exact copy of a given linked list. I'm working on it now and will hopefully have a solution by the end of the month. I am also working on error handling in my functions.

September 30th, 2011

Down to the wire! Today in data structs we started to learn about stacks. Wikipedia says this about stacks “In computer science, a stack is a last in, first out (LIFO) abstract data type and data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack, or initializes the stack if it is empty. If the stack is full and does not contain enough space to accept the given item, the stack is then considered to be in an overflow state. The pop operation removes an item from the top of the stack. A pop either reveals previously concealed items, or results in an empty stack, but if the stack is empty then it goes into underflow state (It means no items are present in stack to be removed). The stack top operation gets the data from the top-most position and returns it to the user without deleting it. The same underflow state can also occur in stack top operation if stack is empty.”

To use stacks, you need to have a good understanding of linked lists. We are gonna be writing our own push and pop functions for our own stack implemtation which I am in the process of writing. As we get more into stacks I'm really looking forward to some of the projects I will be able to explore.

DATA Topics

Version Control

Version control allows you to take a snapshot of an instance of a file. This system allows you to record incremental revisions of files over a period of time. This can become very useful when working on a project and being able to review your progress at different stages.

The repository is where all of these file snapshots are stored. The difference between the repository and a typical file server is that the repository remembers each snapshot of the files. When you want to look up a file, you specify the file and the date and you can view that copy of the file.

By continually saving you work to the repository, you can save a lot of heartache when you accidentally delete/overwrite your file. If you have been faithful to updating your repository you can quickly pull up your most recent copy with minimal data loss.

The use of the repository on lab46 is quite simple. You can set up a folder for version control herehttp://lab46.corning-cc.edu/haas/spring 2011/unix/cs/cs0. After you have that completely working you can use it like this.

lab46:~/src/data$ svn add example.c
A         example.c
lab46:~/src/data$ svn commit -m "Submitted example.c"
Adding         data/example.c
Transmitting file data .....
Committed revision 40.
lab46:~/src/data$

The use of this resource is incredibly valuable!

References: http://svnbook.red-bean.com/nightly/en/svn.basic.version-control-basics.html

Pointers

A pointer references(or points to) a location in memory where value is held. When you access that data value it is called dereferencing the pointer. The following code gives an example of a simple integer pointer.

  1 #include<stdio.h>
  2
  3 int main()
  4 {
  5    int number = 10;//int variable number
  6    int *pointer = &number;//int pointer pointing to the address of number
  7    printf("number is: %d\n",number);
  8    printf("number is stored at 0x%x\n", &number);
  9    printf("pointer points to 0x%x\n", pointer);
 10    printf("The value of pointer is: %d\n", *pointer);//dereferencing a pointer
 11
 12    return 0;
 13 }

Execution of code

lab46:~/src/data$ ./pointer
number is: 10
number is stored at 0x594780d4
pointer points to 0x594780d4
The value of pointer is: 10
lab46:~/src/data$

arrays, pointer arithmetic

Pointers can not only point to single variables, but they can also point to elements of an array. The following code shows an array of size 10 and displays the memory address of each elements. The next part shows a declaration of an integer pointer and then allocates space for ten integer values. It then displays the memory addresses by using pointer arithmetic. The arithmetic is simply manipulating the memory address by adding or subtracting. In this case, we added 1 to the pointer to access the next elements location.

#include<stdio.h>
#include<stdlib.h>
int main()
{
   int array1[10];//static allocation of an array of 10 elements
   int *array2;//declaration of an integer pointer
   array2 = (int*)  malloc(sizeof(int)*10);//dynamically allocating memory for 10 integers
   int i,j = 0;
 
   for( i = 0; i < 10; i++)//This loop displays the address of the first element of the array
   {                       //and then increments i by one to move to the next location
      printf("Element %d is stored at 0x%x\n", i , &array1[i]);
   }
   printf("\n");
   for( i = 0; i < 10; i++)//This next loop uses pointer arithmetic to display the memory addresses
   {
      printf("Element %d is stored at 0x%x\n", i , array2);
      array2++;
   }
 
   return 0;
}

The following is the execution of the above code.

lab46:~/src/data$ ./example
Element 0 is stored at 0x2b5e9a0
Element 1 is stored at 0x2b5e9a4
Element 2 is stored at 0x2b5e9a8
Element 3 is stored at 0x2b5e9ac
Element 4 is stored at 0x2b5e9b0
Element 5 is stored at 0x2b5e9b4
Element 6 is stored at 0x2b5e9b8
Element 7 is stored at 0x2b5e9bc
Element 8 is stored at 0x2b5e9c0
Element 9 is stored at 0x2b5e9c4

Element 0 is stored at 0xfa4010
Element 1 is stored at 0xfa4014
Element 2 is stored at 0xfa4018
Element 3 is stored at 0xfa401c
Element 4 is stored at 0xfa4020
Element 5 is stored at 0xfa4024
Element 6 is stored at 0xfa4028
Element 7 is stored at 0xfa402c
Element 8 is stored at 0xfa4030
Element 9 is stored at 0xfa4034
lab46:~/src/data$

References: The C Programming Language , Brian Kernighan

pointers to pointers

Since a pointer is just a variable, it is possible to have a pointer pointing to another pointer. This can be useful when trying to return a pointer from a function along with many other things.

The following demonstrates how the the pointer and the pointer to the pointer point to the same point :) confused yet? just watch the execution

#include<iostream>
using namespace std;
int main()
{
   int **num1;//declaration of a double pointer
   int *num2;//declaration of a single pointer
   int num3 = 0;//setting value of num3 to 0
   num2 = &num3;//assigning num3's address to num2
   num1 = &num2;//assigning the address of the single pointer to the double pointer
 
   //now **num1, *num2 and num3 all = 0
   cout << "**num1 = " << **num1 << endl;
   cout << "*num2 = " << *num2 << endl;
   cout << "num3 = " << num3 << endl;
   return 0;
}

References: The C Programming Language , Brian Kernighan

Null pointer

“A null pointer has a value reserved for indicating that the pointer does not refer to a valid object. Null pointers are routinely used to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types, null values in relational databases and to the Nothing value in an option type.” (Wikipedia)

Wikipedia does a much better job at explaining what a null pointer is so I will let you read about it there.

References: http://en.wikipedia.org/wiki/Pointer_%28computing%29#Null_pointer

Void pointers

Unlike integer pointers or character pointers, void pointer are generic. This means that they can store a memory address of any data type. In C the void pointer can be changed to any other pointer type upon assignment, but must be explicitly cast if dereferenced inline. Wikipedia has a nice example of this in the code here.

int x = 4;
void* q = &x;
int* p = q;  /* void* implicitly converted to int*: valid C, but not C++ */
int i = *p;
int j = *(int*)q; /* when dereferencing inline, there is no implicit conversion */

NOTE: C++ does not allow the implicit conversion from void* to other pointer types.

References: http://en.wikipedia.org/wiki/Pointer_%28computing%29#Support_in_various_programming_languages

Function pointers

Function pointers are pointers that point to function. I really have never used them before so my knowledge on them is limited, but from what I've gathered they do have some usefulness.

A normal function declaration execpting two int variables and returning an int,

int exampleFunc(int x, int y)

A pointer to function that returns an integer,

int (*exampleFunc)(int x, int y)

I also have read that functions can not be declared inside of a struct. But if you use function pointers, you can still have the same functionality by dereferencing the function pointer.

Static allocation vs. Dynamic allocation

Static memory allocation is the process of allocating memory at compile time. This is the first type of memory allocation you are taught. Dynamic memory allocation on the other hand is allocated at runtime. This memory exists until it is released manually. Dynamic Memory allocation has the ability to be a lot more efficient than static allocation.

Memory allocation (malloc(), new)

As stated previously,memory allocation can generally be done two ways, statically or dynamically. The first way is seen here.

int number;//allocates 4 bytes for the int variable number
char ch;//allocates two bytes for char variable ch
double db;//allocates 8 bytes for double variable db

Alternatively, the following code uses dynamic memory allocation to do the same thing. The C way.

int *number = NULL;
char *ch = NULL;
double *db =NULL;
 
number = malloc(sizeof(int));
ch = malloc(sizeof(char));
db = malloc(sizeof(double));

The C++ way

int *number;
char *ch;
double *db;
 
number = new int;
ch = new char;
db = new double;

The memory for these variables aren't allocated until runtime. Make sure you deallcate your memory though otherwise the FBI might show up at your front door wanting to ask you some questions.

Memory De-allocation (free(), delete)

When you dynamically allocate memory in a program you must remember to deallocate it once you are done. Using the code examples above deallocating the memory goes as follows.

The C way

free(number);
free(ch);
free(db);

The C++ way

delete number;
delete ch;
delete db;

Failure to delete memory can result in memory leaks. Lets just say you don't want this to happen. Don't dynamically allocate and forget to free memory. Only you can prevent memory leaks.

structures(structs/records)

A struct is a collectiong/grouping of variables accessed by a name. The difference between a struct and a class is that a struct can hold variables of different types. The following is a struct definition and declarationthat I used for a C++ program I wrote a while ago.

struct studentType
{
    string studentFName;
    string studentLName;
    int testScore;
    char grade;
};
studentType student;//struct variable declaration

When you declare a struct variable, enough memory is allocated to hold information for the 4 variables on the inside.

If you want to access a specific member of a struct you can do as follows.

//StructVariableName.membername
//in our example. to print the students last name you write this line
cout << "The students last name is " << student.studentLName << endl;

Another incredibly useful function of structs is in linked lists. Check out the linked list keyword for more on this awesome function.

Structure pointers

If you think of struct as just another data type(a very complex data type) then it would make sense to be able to have pointers to them. To help explain this a little better I will use a function that I have written for my project.

#include "linkedlist.h"
#include <stdlib.h>
 
struct node {
   char value;
   struct node *next;
   struct node *prev;
};
typedef struct node Node;
 
void append(Node** end, char input)
{
      Node* tmp;
      tmp = *end;
 
      tmp = (Node *) malloc (sizeof(Node));
      (*end) -> next = tmp; // tack new node onto end of list
      tmp -> prev =(*end); // new node points to current end
      (*end) = (*end) -> next; // advance end to new end
      (*end) -> value = input; // put input in node
      (*end) -> next = NULL; // nothing beyond end
}

In this function we have a struct definition that contains two struct pointers( *next and *prev). These two pointer allow us to point to the next struct/node in our linked list. We then see in the function append the declaration of a struct pointer in *tmp. Function pointers can be used for many things. In upcoming keywords you will see their use in more detail.

DATA Objectives

Write a program that uses linked lists

This course objective is to successfully write a program implementing the use of linked lists.

Method

To do this successfully, I will have to understand the basic concepts of a linked list. This requires a knowledge of pointers, structs, struct pointers, memory allocation and memory deallocation. Success will be measured by the completion of a program that implements these elements efficiently.

Measurement

The keywords that I have entered at the top of this page show that I have successfully researched the concepts required to make a working linked list. Besides having a knowledge of the concepts, the only other measure of success would be to have a program implementing these concepts. You can find such program here → Project: Doubly Linked List Library Implementation

Analysis

Upon completion of this exercise, I believe I have successfully completed the course objective of creating a program that implements a linked list.

  1. I believe could improve my functions so that they could be more flexible. For example, I currently only allow for each nodes value to be a character. I could change a few things around and allow for each node to hold more information.
  2. Since the measurement process is based upon having a knowledge of the concepts required to build a linked list and then to implement them in a program, I don't see to many ways to improve that using the current measurement system. You either have a working program or not.
  3. If you altered the course objective( Write a program that uses linked lists) to be used in a specific application such as storing employee information for a company or something like that, I can see where that could be a little more applicable. But all that aside, if you have an understanding of linked lists you can use that knowledge anywhere whatever the application.

Experiments

Experiment 1

Question

We have been using structs to build our linked lists. The question that I am posing is how does the system allocate memory for the structs we create?

Resources

Hypothesis

I believe that the size of the struct will be the sum of its components. For example, if the struct contains a char(1 Byte) and an int(4 Bytes), the system would allocate 5 bytes of memory for that struct.

Experiment

To test this hypothesis, I will write a struct definition and a program to that uses sizeof() to output the size of the struct definition.

Data

Original

struct node1 {
   char value;//1 byte
 
};
struct node2 {
   char value;//1 byte
   int number;//4 bytes
};
typedef struct node1 Node1;
typedef struct node2 Node2;
#include<stdio.h>
int main()
{
 
   char v1 = 0;
   int v2 = 0;
 
   printf("char is %d bytes\n", sizeof(v1));
   printf("int is %d bytes\n", sizeof(v2));
   printf("node1 is %d bytes\n", sizeof(Node1));
   printf("node2 is %d bytes\n", sizeof(Node2));
 
   return 0;                                                                                                                                                  
}

Execution of the code

lab46:~/src/data$ ./sizelab46:~/src/data$ ./size
char is 1 bytes
int is 4 bytes
node1 is 1 bytes
node2 is 8 bytes
lab46:~/src/data$

The first struct contains a single character and takes up one byte in memory. The second node contains a char and an int(5 bytes) yet it takes up 8 bytes in memory.

Let's try this again but with some struct pointers.

struct node1 {
   char value;
   struct node1 *tmp1;
};
struct node2 {
   char value;
   int number;
   struct node2 *tmp1;
   struct node2 *tmp2;
   struct node2 *tmp3;
};
typedef struct node1 Node1;
typedef struct node2 Node2;

The code above now declares a struct pointer in the first struct and two struct pointers in the second. With some simple math you can surmise that the first structs components add up to 9 bytes and the second structs components add up to 29 bytes. Let's see how the system will allocate memory.

lab46:~/src/data$ ./size
char is 1 bytes
int is 4 bytes
node1 is 16 bytes
node2 is 32 bytes
lab46:~/src/data$

Although the first structs components add up to 9 bytes the system allocates 16 bytes of memory for the struct. Also, the second structs components add up to 29 bytes, the system allocates 32 bytes.

Analysis

Based on the data collected:

  • My original hypothesis stated that the size of the struct would be the sum of its components. This was not completely accurate.
  • The hypothesis was correct in assuming that the size of struct was related to the sum of its components but the memory allocated was padded to the next multiple of 8 bytes.
  • Since I performed this experiment on lab46(64 bit), this experiment might have different results if run a different system(32 bit possibly). This all depends on how it would allocate the memory.

Conclusions

After performing this experiment, I can conclude that when the system allocates memory for a struct it will pad the memory to the nearest multiple of 8 bytes. According to my sources, this has to do with system performance and optimization. I believe this experiment performed on a different system may not always provide the exact same result.

Experiment 2

Question

This is a gonna be a short experiment but since we've been dealing with linked lists so much I thought this might be helpful. How much more memory does a linked list of integers take up than an array.

Resources

The C Programming Language , Brian Kernighan

Hypothesis

I think the linked list will take up substantially more memory than the array.

Experiment

To test his I will write a short program creating an array of 10000 integers. I will also create a struct that contains a int variable. I will find the size of that and then multiply the value by 10000 and then compare the two results.

Data

This is a short program that will examine the memory allocating of two different data structures.

#include<stdio.h>
struct node1 {
   int value;
   struct node1 *tmp1;
};
typedef struct node1 Node1;
int main()
{
   int array[10000];
   int i;
   for (i = 0; i < 10000; i++)
   {
      array[i] = i;
   }
 
   printf("node1 is %d bytes\n", (sizeof(Node1)*10000));
   printf("array is %d bytes\n", (sizeof(array[0])*10000));
 
   return 0;
}

Execution of code.

lab46:~/src/data$ ./size
node1 is 160000 bytes
array is 40000 bytes
lab46:~/src/data$

Analysis

Based on the data collected:

  • My hypothesis was correct. The Linked list took up substantially more memory then the array.
  • The experiment was pretty straight forward and to the point.

Conclusions

The purpose of this experiment was to show that when going for efficiency, using a array could be more efficient in some cases. The linked list took up four times the memory than the array. The downside is that an array must be allocated statically where as a linked list can be allocated dynamically.

Experiment 3

Question

How much memory is a user on lab46 allowed to use.

Resources

Hypothesis

Since dynamic memory is allocated at runtime, the program should compile fine. But once the program is run, the system will eventually run out of memory to allocate. I really haven't the slightest idea how much memory each user can have.

Experiment

I will create an infinite loop with 10 malloc statements in it and have a counter in the loop. The body of the function will allocate memory of size int(4 bytes) ten times per iteration. I will then multiply the final result by 4 to see the total memory allowance per user.

Data

Test Code.

#include<stdio.h>
#include<stdlib.h>
 
int main()
{
   int x = 0;
   int a,b,c,d,e,f,g,h,i,j;
   int iterations = 0;
   while (x == 0)
   {
 
      a = (int)malloc(sizeof(int));
      b = (int)malloc(sizeof(int));
      c = (int)malloc(sizeof(int));
      d = (int)malloc(sizeof(int));
      e = (int)malloc(sizeof(int));
      f = (int)malloc(sizeof(int));
      g = (int)malloc(sizeof(int));
      h = (int)malloc(sizeof(int));
      i = (int)malloc(sizeof(int));
      j = (int)malloc(sizeof(int));
      iterations = iterations + 10;
      printf("iterations: %d\n",iterations);
   }
   return 0;
}

Execution

lab46:~$./malloc
iterations: 1
.
.
.
.
.
.
iterations: 38220610
iterations: 38220620
iterations: 38220630
iterations: 38220640
iterations: 38220650
iterations: 38220660
iterations: 38220670
Killed
lab46:~/src/data$

I ran the command ps during the runtime of this program and got this under memory usage for .malloc

lab46:~$ ps
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
dschoeff  1152  0.0  0.0  13660    12 pts/7    SNs  Sep16   0:00 /bin/bash
dschoeff  1155  0.0  0.1  42544  1428 pts/7    SN+  Sep16   1:59 irssi
dschoeff  3547  0.0  0.0  13632   864 pts/47   SNs  21:42   0:00 -bash
dschoeff 19715  0.0  0.1  13624  1488 pts/45   SNs  23:43   0:00 -bash
dschoeff 20561  9.0 80.0 1054040 836352 pts/47 SN+  23:49   0:32 ./malloc
dschoeff 21328  0.0  0.0   8584  1028 pts/45   RN+  23:55   0:00 ps u
lab46:~$ 

Notice the 80% memory usage

Analysis

Based on the data collected:

  • My hypothesis was correct. The system finally ran out of memory.
  • is there more going on than you originally thought? When I was running the program and running the ps command in another terminal, I noticed at first a steady rise in memory usage. After a while though. I noticed memory usage go down a bit before it began to rise again. This seemed a little stange.
  • what shortcomings might there be in your experiment? I'm sure there may have been an easier way to find out how much memory each user is allowed( maybe ask Matt) but this seemed so much more entertaining.

Conclusions

Based upon the final number of iterations, I can conclude that each user is allowed around 150 megabytes of memory( being that 1 megabyte is approx 1,000,000 bytes). It will be interesting to find out how close this number is to the actuall number or if I am completely off :)

Part 2

Entries

October 20th, 2011

So I'm a little late making my first entry for this month(okay a lot late) but I have been working on stuff for data structures. What I have done so far this month is complete my doubly linked list library implementation project. I had few hiccups with that but I was able to work through them and move on. Since then I have started two more projects. One of them deals with a stack implementation and the other deals with a queue implementation. The stack project has given me a few small little problems but I am currently working through them and will hopefully get that working soon. I've really been enjoying the stuff that I've been learning and hope to get an even better understanding as the weeks go by.

October 24th, 2011

I'm back dating this entry because I forgot to add it on Monday. Today I worked on my Stack project. I was able to work through the problems I was having and got two out of the three main function working. It ended up being stupid compiling mistake that I made and now I have them working. I have also been studying up on the queue library implemetation and hope to have that nearly complete by the end of the month. I'm really enjoying these projects that I'm doing and finding it fun to be able to do a lot of the exploring on my own.

October 27th, 2011

SNOW!!!! It snowed today up here on the hill and that was majorly depressing. Nothing like a full day of homework and then snow to brighten up your day. Despite being depressed I was able to make really good progress on my keywords today. I was able to define stacks and all the operation that go along with them. I find that having to write down what you know about the keywords really makes you think about how they actually function. I feel like I have a better understanding of them now that I have been able to write them down. I also found out that the project proposal that I had for my stack library didn't save and all the work I did was for nothing. Oh well, I guess I'll have to write it up again.

October 28th, 2011

So far today I have finally written up my project proposal for Matt and got that sent to him. I did it a couple days ago but for some reason it didn't save and I lost all my work. I am also currently working on writing up my queue library implementation and will send that to Matt soon as well.

And I learned to always have one of Karl's nuts in your hand at all times!

Data Topics

Linked lists

Wikipedia defines a linked list as this,“a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference (in other words, a link) to the next node in the sequence; more complex variants add additional links. This structure allows for efficient insertion or removal of elements from any position in the sequence.”Each “node” of a linked list is a struct containing some data value along with a struct pointer to the next node.

You may ask yourself, “why would I use a linked list to store data when I could just use an array?”. This is a very good question. The advantages to using a linked to store data are as follows.

  1. insertion of data into the data structure
  2. deletion of data from data structure
  3. adding data to list

The insertion of data into a specific location in an array would require a lot of work. Deletion is easy but causes vacant spots in memory and hinders optimal performance. Also, when wanting to add data to an array, you would need to reallocate memory which is an expensive operation. You do not have any of these problems with a linked list. All you have to do manipulate a few pointers and bam! You are set.

below is an example of the definition of a node structure, creation of a singly linked list and the displaying of a lists contents. The credit for this code goes to Matthew Haas.

#include<stdio.h>
#include<stdlib.h>
 
struct node
{
   int value;
   struct node *next;
};
typedef struct node Node;
 
int main()
{
 
   Node *start, *tmp;
   int input,i;
 
   start = tmp = NULL;
 
   printf("Enter a value -1 to exit: ");
   scanf("%d", &input);
 
   while( input != -1)
   {
      if(start == NULL)
      {
         start = (Node *)malloc(sizeof(Node));
         start->next=NULL;
         tmp = start;
         start->value=input;
      }
      else
      {
         tmp->next = (Node *)malloc(sizeof(Node));
         tmp = tmp->next;
         tmp->next = NULL;
         tmp->value=input;
      }
      printf("Enter a value -1 to exit: ");
      scanf("%d", &input);
   }
 
 
   printf("The list contains the following elements\n");
   tmp = start;
   while (tmp != NULL)
   {
      printf("[%d]value = %d\n", i, tmp -> value);
      tmp = tmp -> next;
      i++;
   }
 
   return 0;
}
lab46:~/src/data$ ./linkedlist
Enter a value -1 to exit: 1
Enter a value -1 to exit: 2
Enter a value -1 to exit: 3
Enter a value -1 to exit: 4
Enter a value -1 to exit: 5
Enter a value -1 to exit: 6
Enter a value -1 to exit: -1
The list contains the following elements
[0]value = 1
[1]value = 2
[2]value = 3
[3]value = 4
[4]value = 5
[5]value = 6
lab46:~/src/data$ 

References http://en.wikipedia.org/wiki/Linked_list#Linked_lists_vs._dynamic_arrays

Doubly Linked List

The only difference between a singly linked list and a doubly linked list is that in a doubly linked list, the node structure also has a pointer to the previous node. This slight difference though gives a lot more functionality to the list. Now you are able to traverse the list in either direction where as in a singly linked list you had to start at the beginning. This can be useful in many applications.

The code below shows the declaration of a node that has two struct pointers

struct node
{
   int value;
   struct node *next;
   struct node *prev;
};
typedef struct node Node;

Stacks

A stack is a last in first out data structure. The type of stacks we covered in class were linked list based. There are also array based stacks but they have their own set of limitations. A stack is simply a linked list that can only be accessed from the top. This is know as a last in first out data structure(more on that later).

The following struct “stack” contains the the data needed to implement a stack. Each Stack has a node pointer pointing to the top and bottom of the stack. It also contains a node pointer tmp and an int variable count which can be used for various other needs.

struct stack {
   int count;
   struct node *top;
   struct node *data;
   struct node *tmp;
};
typedef struct stack Stack;

The creation of a stack is also very similar to that of a linked list. The creation of a stack is typically done the push() function which will be explained below.

Pushing

Pushing is the operation that is used to add a node to a stack. The function is typically called push() and it allows you to push a node on top of the stack. There is no other way to add a value to a stack but by using this function. The push() function should also be able to initialize a stack if there is not a stack to add the value to. Finally, the push() function should also check for stack overflow. More about that later.

the following code demonstrates a push function.

/***********************************************************
* Function push() will create a stack if no stack currently*
* exists. If a stack does exist, push() will add a node to *
* the top of the stack. It will then return the updated    *
* stack.                                                   *
* Compile with: gcc -c push.c                              *
*                                                          *
***********************************************************/
#include "stack.h"
#include<stdio.h>
#include<stdlib.h>
Stack *push(Stack *start, char input)
{
   Node *tmp;
   if(start == NULL)// if no stack, create one
   {
      start = (Stack *) malloc (sizeof(Stack));
      start -> data = malloc(sizeof(Node));
      start -> top = malloc(sizeof(Node));
      start -> tmp = malloc(sizeof(Node));
      start -> top -> value = input;
      start -> top -> next = NULL;
      start -> data = start -> top;
   }
   else//if there is a stack, add node to top of it
   {
      tmp = (Node *) malloc (sizeof(Node));
      tmp -> value = input;
      tmp -> next = start -> top;
      start -> top = tmp;
   }
   return (start);
}

Popping

popping is the opposite of pushing. Popping allows you to remove a node from a stack. It is important to realize that in a stack, you only have access to the top most node. If you would like to remove a node that is not on the top, you have to remove all the nodes on top of the one you want to remove. That may seem stupid but it is the nature of a stack. Typically the function used to pop nodes is called pop(). A good pop function will also check for stack underflow(I'll explain more about that later).

The following code demonstrates a pop() functions.

/******************************************************************
* Function pop() accepts as an argument a stack. It will then pop *
* the node that is currently on the top and make the node under   *
* that the new top. It will then return the value that from the   *
* popped node as a character. If the stack that is passed is NULL *
* (meaning it does not exist), the function cannot pop a node that*
* doesn't exist. Therefore, the function will return the null     *
* terminating character.                                          *
*                                                                 *
* Compile with: gcc -c pop.c                                      *
*                                                                 *
**************************************************************** */
#include "stack.h"
#include <stdlib.h>
char pop(Stack *start)
{
  char chartmp;
  start -> tmp = start -> top;
  if(start -> tmp != start -> data)
  {
     start -> top = start -> top -> next;
     start -> tmp -> next = NULL;
     chartmp = start -> tmp -> value;
  }
  else if(start -> data == NULL)
  {
     chartmp = '\n';
  }
  else
  {
     chartmp = start -> top -> value;
     start -> top = start -> tmp = start -> data = NULL;
  }
 
  return(chartmp);
}

Top

The top operation is a very simple one. the top() ( or peek() ) function allows you to view the contents of the node that is currently on top of the stack. That is it. It just returns whatever value that is on the stack. It doesn't manipulate the nodes or change them in any way.

The following shows the execution of my stack implementation.

lab46:~/src/data/stackProject$ ./stack
Enter - 1 - to pop a Node off of the stack
Enter - 2 - to push a Node on the stack
Enter - 3 - to peek at the value on top of the stack
Enter - q - to quit
2
Enter value for new node: a
Enter - 1 - to pop a Node off of the stack
Enter - 2 - to push a Node on the stack
Enter - 3 - to peek at the value on top of the stack
Enter - q - to quit
3
The top node contains the value: a
Enter - 1 - to pop a Node off of the stack
Enter - 2 - to push a Node on the stack
Enter - 3 - to peek at the value on top of the stack
Enter - q - to quit
2
Enter value for new node: b
Enter - 1 - to pop a Node off of the stack
Enter - 2 - to push a Node on the stack
Enter - 3 - to peek at the value on top of the stack
Enter - q - to quit
3
The top node contains the value: b
Enter - 1 - to pop a Node off of the stack
Enter - 2 - to push a Node on the stack
Enter - 3 - to peek at the value on top of the stack
Enter - q - to quit
q
lab46:~/src/data/stackProject$

Notice that the most recent node pushed onto the stack becomes the top. If the top node is popped, then the node before that now becomes the top.

Overflow Condition

Stack overflow occurs when too much memory is used on the stack. Wikipedia states “The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory. When a program attempts to use more space than is available on the call stack (that is, when it attempts to access memory beyond the call stack's bounds, which is essentially a buffer overflow), the stack is said to overflow, typically resulting in a program crash”(Wikipedia). Basically stack overflow is not a good thing and your push() function should be able to detect and handle that condition if it comes up. When using a array based stack, the maximum size for your stack is the size of the array. Let say you have an array of 100 elements.

example)

int testArray[100];
for(int i = 0; i< 200; i++)
{
    testArray[i] = i;
}

this loop loads values into an array.

Execution of this for loop would occur in a stack overflow because you would be accessing values out side of the given space. When you have a linked list based stack, overflow occurs when you run out of memory.

Underflow condition

When you use the pop() function on a stack but the stack is empty, you go into what is called a state of underflow. This means that there is no value to pop. If this happens, zombies will come out of the computer monitor and try to eat your soul…. NOT GOOD! Seriouly, a stack underflow will result in a runtime error in your program. Your pop() function should check to see if the stack is empty before trying to pop a node.

The following shows what a program should do when trying to pop a node off of a stack when there is no node to pop.

lab46:~/src/data/stackProject$ ./stack
Enter - 1 - to pop a Node off of the stack
Enter - 2 - to push a Node on the stack
Enter - 3 - to peek at the value on top of the stack
Enter - q - to quit
2
Enter value for new node: a
Enter - 1 - to pop a Node off of the stack
Enter - 2 - to push a Node on the stack
Enter - 3 - to peek at the value on top of the stack
Enter - q - to quit
1
The value that was pooped is: a
Enter - 1 - to pop a Node off of the stack
Enter - 2 - to push a Node on the stack
Enter - 3 - to peek at the value on top of the stack
Enter - q - to quit
1
The value that was pooped is: 

Enter - 1 - to pop a Node off of the stack
Enter - 2 - to push a Node on the stack
Enter - 3 - to peek at the value on top of the stack
Enter - q - to quit
q
lab46:~/src/data/stackProject$

The value 'a' was pushed onto the stack and then popped off. This made the stack contain no value. I then tried to run the pop() function. Instead of getting a segmentation fault the pop() function returned a null value indication that there was no value to pop.

LIFO

LIFO stands for “last in, first out”. This describes what we have just been talking about in stacks.

Here is a picture representing a stack.

The red sheet here represents the top of the stack. The red sheet the last sheet added to the stack and will be the first one out if the pop() function were to be run. I hope this helps your understanding of LIFO stack.

Queue

A queue ( pronounced as the letter 'Q') is a FIFO data structure(more about FIFO later). It is basically a linked list that you can only leave from the end or join from the beginning. Think of a line at the verizon. You go into the store and reserve your spot in line. Doing this places you into a queue. You will not be able to leave the queue until all the people in front of you have been waited on. In the mean time more people have been added to the queue behind you as well. Several hours later you will finally be able to leave the queue and be waited on. Like a linked list, queues are commonly made up of linked nodes. They are identical to singly linked lists and have a number of useful applications.

please refer to this link for the basic structure of a queue → http://lab46.corning-cc.edu/opus/fall2011/dschoeff/start?&#linked_lists

Enqueuing

Like in a stack, only certain operations can be performed on a queue. The first one I will mention is enqueue. enqueue() is function that allows you to add a node to the beginning of the queue. This is very similar to a prepend() function for a linked list. This operation takes a node and adds it to the beginning of a queue therefore making that new node the first node in the queue. That node can not be removed until it becomes the last node it the queue.

The following code demonstrates a enqueue operation.

//compile with: gcc -c enqueue.c                                                                                               
//
//Function enqueue() enqueues anode to the queue. If there is no queue in existance
//it will create one. This function is almost exactly like the insert function in my 
//doubly linked library.
#include "queue.h"
#include <stdlib.h>
void enqueue(Node** start, Node** end, int index, char ch) 
{
 
   int i;
   Node* tmp;
   Node* tmp2;
   tmp  = *start;
 
   if (*start == NULL) // if our queue is empty
   {   
      (*start) = (Node *) malloc (sizeof(Node));
      (*end) = tmp = *(start);
      (*start) -> prev = NULL;
      (*end) -> prev = NULL;
      (*start) -> value = ch; 
   }   
   else// we are enqueing a node
   {   
      tmp2 = (Node *) malloc (sizeof(Node));
      tmp2 -> next = (*start); // point new node's next at (*start)
      (*start) -> prev = tmp2; // point (*start)'s prev at tmp2
      (*start) = (*start) -> prev; // move (*start) to new node
      (*start) -> value = ch; // put value in node
   }   
}

Dequeuing

queues would be pretty useless if you could only add nodes to the beginning of the queue. This is where dequeuing comes in. dequeue() is a function that allows you to remove the node on the very end of the queue. This is the only way that you can remove a node in a queue. The dequeue() function is very similar to pop() for stacks and removeNode() for linked lists. They operate basically the same way.

The following demonstrates the dequeuing operation.

//compile with: gcc -c dequeue.c
//
// Function dequeue() removes a node from a given queue and frees the
// memory for that node
#include "linkedlist.h"
#include <stdlib.h>
char dequeue(Node** start, Node** end, int index)
{
   int i;
   char ch;
   Node* tmp;
   tmp = *start;
   if((*start) != NULL)
   {
      // Move tmp to exact node we wish to delete
      for( i=0; i<index; i++)
      {
         tmp = tmp -> next;
      }
      ch = tmp -> value;
      if ((tmp != *start) && (tmp != *end))
      {
         tmp -> next -> prev = tmp -> prev;
         // The node after points to the node before tmp
         tmp -> prev -> next = tmp -> next;
         // Now, the now before points to the node after tmp
         tmp -> prev = NULL;
         tmp -> next = NULL;
         // tmp is now disconnected from the queue
         free (tmp); // deallocate the node tmp is pointing to
      }
      else if (*start == *end) // queue is only a single node
      {
         free(tmp);
         (*start) = (*end) = NULL;
      }
      else if (tmp == *start)
      {
         tmp -> next -> prev = NULL;
         // The node following tmp is no longer pointing to us
         (*start) = (*start) -> next;
         tmp -> next = NULL; // Disconnect tmp from the queue
         free(tmp); // deallocate the node tmp is pointing to
      }
      else // tmp is equal to end
      {
         tmp -> prev -> next = NULL;
         // The node preceding tmp is no longer pointing to us
         (*end) = (*end) -> prev;
         // Adjust end to point to the new end of the queue
         tmp -> prev = NULL; // Disconnect tmp from the queue
         free(tmp); // deallocate the node tmp is pointing to
      }
   }
   else
   {
      ch = '\n';
   }
   return(ch);
}

Data Objective

Write a program that uses stacks

This course objective is to successfully write a program implementing the use of stacks.

Method

To do this successfully, I will have to understand the basic concepts of a stack. This requires a knowledge of pointers, structs, struct pointers, memory allocation and memory deallocation. It also requires a knowledge of linked lists. Success will be measured by the completion of a program that implements these elements efficiently.

Measurement

The keywords that I have entered at the top of this page show that I have successfully researched the concepts required to make a working stack. Besides having a knowledge of the concepts, the only other measure of success would be to have a program implementing these concepts. You can find such program here → http://lab46.corning-cc.edu/user/dschoeff/portfolio/libstack

Analysis

Upon completion of this exercise, I believe I have successfully completed the course objective of creating a program that implements a linked list.

  1. I believe could improve my functions so that they could be more flexible. For example, I currently only allow for each nodes value to be a character. I could change a few things around and allow for each node to hold more information.
  2. Since the measurement process is based upon having a knowledge of the concepts required to build a stack and then to implement them in a program, I don't see to many ways to improve that using the current measurement system. You either have a working program or not.
  3. If you altered the course objective( Write a program that uses stacks)to be implemented in a real life scenario. I can see where that could be a little more applicable. But all that aside, if you have an understanding of stacks you can use that knowledge anywhere whatever the application.

Experiments

Experiment 1

Question

How does the system handle unexpected input. Such as a character when an integer is expected.

Resources

Hypothesis

I believe that when you enter a character when an integer is expected the program will output an error and terminate.

This will happen because the system does not know what to do with the character value.

Experiment

I will test my hypothesis by creating a program that excepts as input an int value. I will then input a character value and see if the program will run.

Data

Here is the code for my experiment.

#include<iostream>
using namespace std;
int main()
{
 
   int num ;
   cout << "Enter integer: ";
   cin >> num;
   cout << " Int is " << num << endl;
   return 0;
}

Here is the execution

lab46:~/src/data$ ./inttest
Enter integer: 1
 Int is 1
lab46:~/src/data$ ./inttest
Enter integer: q
 Int is 0
lab46:~/src/data$ 

Analysis

Based on the data collected:

  • My hypothesis was not correct. I thought that the program would result in a runtime error but that was not the case. Instead, the program ran to completion without any errors.
  • is there more going on than you originally thought? Yes, something I did not take into account for was the input method I was using.
  • what shortcomings might there be in your experiment?In this experiment I used cin to get my input. I did not take into account other ways of getting data(command line, scanf(),fgets,etc…).

Conclusions

I can conclude from this experiment that when using cin to get an integer value, and then a character value is entered, the program will still run without error. Though the program will run without error, the output you get will not be correct. I found that when entering a character value when an int value was expected, the value stored in the variable is 0. BE CAREFUL WITH YOUR INPUT!

Experiment 2

Question

How much time does the system take to do add up to a billion 1 at a time.

Resources

Matthew Haas The man page for clock

Hypothesis

I think that it will take 1 minute to add up to a billion.

The computer is very powerful and can perform calculation at a crazy fast speed.

Experiment

To test this, I will write a function that add one to a variable until it reaches a billion. I will also include the time.h header file so I can measure how long it takes for the program to run.

Data

The following program will allow me to test the runtime of the program.

#include<time.h>                                                                          
#include<stdio.h>
#include<stdlib.h>
int main()
{
   unsigned long int i = 0;
   unsigned long int abc = clock();
   while(i < 1000000000)
   {   
      i++;
   }   
 
   abc = clock();
   abc = abc/CLOCKS_PER_SEC;
   printf("time in seconds = %llu\n",abc);
   return (0);
}

The execution of the program.

lab46:~/src/data$ ./clock
time in seconds = 4
lab46:~/src/data$

Analysis

Based on the data collected:

  • my original hypothesis was not correct.
  • I originally stated that it would take one minute to add up to a billion. After execution I found that it only took 4 seconds.
  • The experiment was pretty straight forward. I don't think too much was going on that I did not account for.

Conclusions

After completing the experiment, I feel I have a better understanding of how fast the processor can handle tasks. This experiment also made me become familiar with using the time.h header file with its built-in function clock().

Experiment 3

Question

This semester I am taking Java, a higher level programming language than C, and have learned that with Java there is more “overhead”. The question I am posing is this. Does more overhead equate to slower running programs?

Resources

  • JAVA Programming: D.S. Malik & Robert P. Burton
  • The C Programming Language: Brian Kernighan
  • The man page for time

Hypothesis

I believe that since Java has more overhead and is a higher level language than C, it will take longer to do the same task.

Experiment

I will test my hypothesis by writing a program in both C and Java That does the same basic task. I will then compile and run these programs recording how long it took each one to run to completion. I will then compare the results with my original hypothesis.

Data

I have written two programs that increment a double variable by .025 until it reaches one billion.

In C

int main()
{
   double i = 0;
   while(i < 1000000000)
   {
      i = i + .025;;
   }
   return (0);
}

In Java

public class Cvsjava
{
   public static void main(String[] args)
   {
      double i = 0;
      while(i < 1000000000)
      {
           i = i + .025;
      }
   }
}

Execution of C code

lab46:~/src/data$ time ./clock

real    4m13.026s
user    4m12.956s
sys     0m0.036s
lab46:~/src/data$

Notice the total time is 4 minutes and 13 seconds

Execution of Java code

lab46:~/src/data$ time java Cvsjava

real    3m47.261s
user    3m47.230s
sys     0m0.076s
lab46:~/src/data$

Notice the total time is 3 minutes and 47 seconds.

Analysis

Based on the data collected:

  • was your hypothesis correct? No. The total runtime of the Java program was shorter than the C program.
  • was your hypothesis not applicable?No. I don't think so.
  • is there more going on than you originally thought? I think there may be. I originally equated a higher level programming language with it's more overhead to a slower running program. I don't think this is correct.
  • what shortcomings might there be in your experiment? I only tested how the languages handle adding numbers. I could've tested a lot more things such as function calls, memory allocation, output,etc… This may have led to a better picture of the differences between the languages.
  • what shortcomings might there be in your data? I believe the data in itself is fine.

Conclusions

According to findings, I think I need to do a lot more research on this topic. I find it very interesting that the Java program(being a higher level language) seemed to run faster than the one written in C. I would also like to find out if other factors are effecting how the programs run(like the JVM or something like that). I would like to investigate this more and get more opinions as to what is going on behind the scenes.

As I continue to program in both languages I find pro's and con's to both. I look forward to learning more about these two popular programming languages.

Part 3

Entries

November 7th, 2011

Happy November! This being my first official dated entry of the new month brings along great news! I have officially completed my stack library implementation and it is awesome. You should definitely check it out here → Project: Stack Library Implementation. I've learned so much doing this stack implementation. I ran into a lot of problems doing this partly because I didn't have a good understanding of stacks going into the project. This led me to research about stacks more and increase my knowledge of how they worked. I don't think I could've done this project without the use of the debugger gdb. It came in so handy when trying to find out why I was getting a segmentation fault.

I still want to increase my knowledge of struct pointers because having a better understanding of them will help lead to better code in the future. I see the power of linked list type data structures and can't wait to explore more of them.

November 11th, 2011

Well today has been fairly productive. I successfully completed my queue library implementation. I also submitted all my files to the subversion repository. You can view my queue project here → http://lab46.corning-cc.edu/user/dschoeff/portfolio/libqueue. I was able to work through a few problems I was having with my dequeue() function, but I stopped by the lair today and Matt helped me out. My next thing I want to work on is my keywords and experiments. I hope to get these finished up soon.

November 14th, 2011

Today I decided I would go do corrections to my opus. I had completed my opus on time and everything but I didn't do that good of a job of demonstrating some of the keywords I defined. So I went back and tried to better demonstrate them with code snippets and programs. I also included some command line execution of programs I had written. I think the keywords can be understood a lot better now with the new demonstrations.

November 18th, 2011

So I'm back dating this entry cause I forgot to do it on Friday.

Today was pretty cool. In Data Structures we started talking about various sorting algorithms. What we did was basically look on Wikipedia at the different sorting algorithms out there. Each type of sorting algorithm has its pros and cons but they all basically do they same thing which is to sort data. When choosing a sorting algorithm to use you need to take into account your own specific implementation and what it will be used for.

After learning about these things in class Matt let us loose and had us write our own simple algorithm. I chose to write a bubble sort algorithm which is not the most efficient algorithm but is simple to understand. I plan on writing a few more sorting algorithms implementing different methods.

After writing this algorithm for a project I decided to do some experiments on how efficient the algorithms really were. It nice to see for your self how efficient the algorithms are at sorting data rather than just taking someone else's word for it.

Data Topics

Queue Overrun

When referring to queues, an overrun condition occurs when the queue exceeds the space available to store data. In a array based queue, this would happen when trying to add a value to an array that is all ready full. Since I haven't dealt much with array based queue implementations I will stick mainly to explaining linked list based implementations. With a linked list based queue, it would seem to me that an overrun condition would be much harder to get. When you think about linked lists, they do not have a fixed size. The only size limitation that the list size has(unless the program designer were to arbitrarily set a size limit for the list) is how much memory is available to the system. Although the chance of overrun is somewhat unlikely, a function that enqueues a node to a queue should check for this condition and be flexible enough to handle that situation. I have done an experiment dealing with system memory management here → http://lab46.corning-cc.edu/opus/fall2011/dschoeff/start#experiment_3. This experiment explores dynamic memory allocation and how lab46 deals with alot of memory usage.

Look at the following is a snippet of my enqueue function. If the system that you were running your program on ran out of memory, a problem would occur at line 26 when you try to allocate memory for a new node when there is no more memory available.

 26       tmp2 = (Node *) malloc (sizeof(Node));
 27       tmp2 -> next = (*start); // point new node's next at (*start)
 28       (*start) -> prev = tmp2; // point (*start)'s prev at tmp2
 29       (*start) = (*start) -> prev; // move (*start) to new node
 30       (*start) -> value = ch; // put value in node

Queue Underrun

Queue underrun occurs when you try to dequeue a node from a queue when the queue is already empty. When dealing with linked list based queues, underrun is much more likely to happen than overrun. In your dequeue function, you must check to see whether the queue is empty before trying to dequeue a node. If you do not you will likely get a segmentation fault.

char dequeue(Node** start, Node** end, int index)
{
   int i;
   char ch;
   Node* tmp;
   tmp = *start;
   if((*start) != NULL)
   {
        //STUFF
   }
   else
   {
      ch = '\n';
   }
   return(ch);
}

The first conditional statement makes sure the queue has a value before trying to dequeue a node. Instead of getting a segmentation fault the function returns the null terminating character indicating there is no node to dequeue. If this is in place you should not have a problem with queue underrun.

FIFO

The term FIFO stands for “First In First Out”. This type of data structure refers to how data is added/removed from the data structure. A queue is a FIFO data structure because the first node added to a queue will be the first node removed.

The following picture represents how the FIFO queue works.

For more explanation of a queue refer to the queue keyword definition → http://lab46.corning-cc.edu/opus/fall2011/dschoeff/start#queue

Computational Complexity

Wikipedia states ,“Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other. In this context, a computational problem is understood to be a task that is in principle amenable to being solved by a computer (which basically means that the problem can be stated by a set of mathematical instructions). Informally, a computational problem consists of problem instances and solutions to these problem instances. For example, primality testing is the problem of determining whether a given number is prime or not. The instances of this problem are natural numbers, and the solution to an instance is yes or no based on whether the number is prime or not.”.

From what I understand, Computational Complexity is the classification of a problem according to to how difficult it is to solve. This difficulty is based upon the amount of resources needed to solve the problem. For example, the mathematical problem 2 + 2 is a lot less difficult to find a solution for than (13^2/78)* 10000 and would require less resources to solve the problem. .

References: http://en.wikipedia.org/wiki/Computational_complexity_theory

Big O/Theta, Bounds

Big O notation is used to classify algorithms by how they respond to bigger and bigger input sizes. This notation is used in computer science when referring to efficiency of algorithm. This is helpful when trying analyze how the algorithms handle data in what is called as best case, worst case and average case.

When referring to a function using big O notation only the upper bound is given. This is true because when input sizes become larger, the other terms become less influential.

When you see Big O notation it will look something like this: O(n), O(n^2), O(nlogn), O(n^∞)

Wikipedia states “Big O notation is useful when analyzing algorithms for efficiency. For example, the time (or the number of steps) it takes to complete a problem of size n might be found to be T(n) = 4n2 − 2n + 2. As n grows large, the n2 term will come to dominate, so that all other terms can be neglected&nbsp;— for instance when n = 500, the term 4n2 is 1000 times as large as the 2n term. Ignoring the latter would have negligible effect on the expression's value for most purposes. Further, the coefficients become irrelevant if we compare to any other order of expression, such as an expression containing a term n3 or n4. Even if T(n) = 1,000,000n2, if U(n) = n3, the latter will always exceed the former once n grows larger than 1,000,000 (T(1,000,000) = 1,000,0003= U(1,000,000)). Additionally, the number of steps depends on the details of the machine model on which the algorithm runs, but different types of machines typically vary by only a constant factor in the number of steps needed to execute an algorithm.”

Big Theta notation is very similar to Big O notation. The only difference between these two notations is that when you use big theta notation to represent a function, the upper and lower bounds are of the same order.

References: http://en.wikipedia.org/wiki/Big_O_notation

Sorting Algorithms

A sorting algorithm is an algorithm that sorts a given set of data. The most common types of sorting algorithms sort data in numerical or alphabetical order( also commonly known as lexicographical order). Sorting algorithms are very useful because data in a particular order can be very useful.

But all sorting algorithms were not created equal, there are hundreds of commonly used sorting algorithms out there each having its own particular twist on how they sort data.

A few common types of sorting algorithms are defined below. I guess the big thing when choosing what algorithm to use is how it performs to varying degrees of randomness in the data it will sort. The algorithms typically are described using Big O notation in the best case scenario, worst case scenario and average case scenario. The best case is how the algorithm responds to data that in already in the desired order. For example, the Bubble Sort algorithm which will be described later has a best case performance of O(n) meaning it would only have to traverse the data n times before it would complete. Its worst case performance(the data is completely out of order) is O(n^2) meaning the number of iterations would be n^2. The average case performance is also O(n^2).

Different algorithms perform differently to different types of data so it is best to know what you are trying to do before you choose a sorting algorithm.

Selection Sort

The Selection Sort is a simple sorting algorithm. It works by taking a data structure(lets say an array) and traversing the entire list to find the minimum value(if you are sorting from smallest to largest). Once it has that, it swaps the elements at the first position and the position of the smallest element. At this point smallest element is at the desired position. It then repeats this process with the second element and so on until it gets to the last element.

The selection sort has a worst case performance O(n^2), a best case performance O(n^2) and an average case performance O(n^2).

Here is an example from Wikipedia showing the progression of this sort.

64 25 12 22 11
11 25 12 22 64
11 12 25 22 64
11 12 22 25 64
11 12 22 25 64

This is a particularly inefficient way to sort data. If you are not concerned about efficiency or do not have a large amount of data to sort this would be good to use because of its simplicity.

References: http://en.wikipedia.org/wiki/Selection_sort

Bubble Sort

The Bubble Sort is a sorting algorithm that steps through a list of elements comparing adjacent elements and swapping them if they are out of order. The logic behind this algorithm is somewhat very simple.

The bubble sort has a worst case performance O(n^2), a best case performance O(n) and an average case performance O(n^2).

The following codes shows an implementation of a bubble sort that allows the user to enter 10 integers and then sorts them in ascending order.

#include <stdio.h>
#include <stdlib.h>
int main()
{
   int *array;
   array = (int*) malloc(sizeof(int)*10);
   int i = 0,input=0,junk;   
   int swapped = 1, tmp;
   //loads ten values from the user into the array
   for(i = 0; i < 10; i++)
   {   
      printf("enter 10 different integers: ");
      scanf("%d",&input);
      *(array+i) = input;
   }   
   //meat of the bubble sort
   while( swapped == 1)
   {   
      swapped = 0;
      for(i = 1; i < 10; i++)
      {   
         if(*(array+(i-1)) > *(array+i))//this statements checks to see
         {                              //if the elements are out of order
            tmp = *(array+i);
            *(array+i) = *(array+(i-1));
            *(array+(i-1)) = tmp;
            swapped = 1;
         }   
      }   
   }   
   //displays the array
   for(i = 0; i < 10; i++)
   {   
      printf("[%d]: %d\n", i , *(array+i));
   }   
 
   return (0);
}

Execution of this code.

lab46:~/src/data/sorts$ ./bubble
enter 10 different integers: 5
enter 10 different integers: 2
enter 10 different integers: 8
enter 10 different integers: 6
enter 10 different integers: 1
enter 10 different integers: 0
enter 10 different integers: 7
enter 10 different integers: 3
enter 10 different integers: 6
enter 10 different integers: 9
[0]: 0
[1]: 1
[2]: 2
[3]: 3
[4]: 5
[5]: 6
[6]: 6
[7]: 7
[8]: 8
[9]: 9
lab46:~/src/data/sorts$

I have done an experiment of how the bubble sort handles differing sets of data. You should check it out here → http://lab46.corning-cc.edu/opus/fall2011/dschoeff/start#experiment_12

Overall, the bubble sort is inefficient in its average case performance and you would be much better of choosing another algorithm to sort your data

Insertion Sort

The Insertion Sort is a simple sorting algorithm that is good to use when you do not have a large amount of data to sort.

The Insertion Sort has a worst case performance O(n^2), a best case performance O(n) and an average case performance O(n^2).

The easiest way to understand this sort is to see an example. The following example was found of Wikipedia.

Example: The following table shows the steps for sorting the sequence {3, 7, 4, 9, 5, 2, 6, 1}. In each step the item under consideration is underlined and the item moved (or held in place if was biggest yet considered) in the previous step is shown in bold.

3 7 4 9 5 2 6 1

3 7 4 9 5 2 6 1

3 7 4 9 5 2 6 1

3 4 7 9 5 2 6 1

3 4 7 9 5 2 6 1

3 4 5 7 9 2 6 1

2 3 4 5 7 9 6 1

2 3 4 5 6 7 9 1

1 2 3 4 5 6 7 9

What happens in an insertion sort is that every repetition it guarantees that one element will be put in the right order. looking at the list from left to right, the sort takes the first card and since it is the first card it places it back in the list. The first element is now sorted. It then looks at the next element and removes it from the list. If it is greater than the first value it is put back in its place. if it is less than the first element the first element is moved to the right and the second element is placed in the first position. There are now two sorted elements. This process is repeated until all the items are in there proper placement.

References: http://en.wikipedia.org/wiki/Insertion_sort

Quick Sort

The Quick sort algorithm is more advanced than the previous algorithms defined above. Wikipedia does a good job at explaining how the algorithm function.

“Quicksort is a divide and conquer algorithm. Quicksort first divides a large list into two smaller sub-lists: the low elements and the high elements. Quicksort can then recursively sort the sub-lists.

The steps are:

  1. Pick an element, called a pivot, from the list.
  2. Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
  3. Recursively sort the sub-list of lesser elements and the sub-list of greater elements.

The base case of the recursion are lists of size zero or one, which never need to be sorted.”(Wikipedia)

The Quick Sort has a worst case performance O(n^2), a best case performance O(nlogn) and an average case performance O(nlogn).

Merge Sort

The Merge Sort algorithm is what you would call a reliable one. The merge sort has a worst case performance O(nlogn), a best case performance O(nlogn) and an average case performance O(nlogn). This algorithm is known as a divide and conquer algorithm because it divides the list of elements up to single elements.

The process goes as follows: “

  1. If the list is of length 0 or 1, then it is already sorted. Otherwise:
  2. Divide the unsorted list into two sublists of about half the size.
  3. Sort each sublist recursively by re-applying the merge sort.
  4. Merge the two sublists back into one sorted list.”(Wikipedia).

The following picture shows a graphical representation of how the algorithm functions.

Overall, the merge sort is very reliable meaning that its worst case performance is only O(nlogn) which is much better than some of the algorithms listed above.

References: http://en.wikipedia.org/wiki/Merge_sort

A Binary Search is an algorithm used to search for a value in a sorted list. It does this by starting with a value to find and then compares that value with the middle element of the array. Since the list is sorted you can know whether to search left or right. It then repeats the process until it finds the value or the remaining array is reduced to zero.

given the following array

array[] = |1|13|22|28|30|45|58|98|100|

Lets say the value being searched for is 13.

It starts by locating the middle element. In this case it is 30. Since 13 is less than 30, if 13 is in the first half of the array.

Therefore, the remaining list to be searched is |1|13|22|28|

It then starts with the middle element of that section of the array 22. Since 13 is less than 22 the value either has to be the first or second element in the subsection of the array. It will then find 13 as the second element of that subsection.

When this logic is put into an algorithm it will use recursion. Pseudocode for an implementation is here.

binary_search(Array[0..N-1], value, low, high):
       if (high < low):
           return -1 // not found
       mid = (low + high) / 2
       if (A[mid] > value):
           return binary_search(A, value, low, mid-1)
       else if (A[mid] < value):
           return binary_search(A, value, mid+1, high)
       else:
           return mid // found
 

Data Objective

Write a program that uses queues

This course objective is to successfully write a program implementing the use of queues.

Method

To do this successfully, I will have to understand the basic concepts of a queue. This requires a knowledge of pointers, structs, struct pointers, memory allocation, memory deallocation and an understanding of linked lists. Success will be measured by the completion of a program that implements these elements efficiently.

Measurement

The keywords that I have entered at the top of this page show that I have successfully researched the concepts required to make a working queue. Besides having a knowledge of the concepts, the only other measure of success would be to have a program implementing these concepts. You can find such program here → http://lab46.corning-cc.edu/user/dschoeff/portfolio/libqueue

Analysis

Upon completion of this exercise, I believe I have successfully completed the course objective of creating a program that implements a queue.

  1. I believe could improve my functions so that they could be more flexible. For example, I currently only allow for each nodes value to be a character. I could change a few things around and allow for each node to hold more information.
  2. Since the measurement process is based upon having a knowledge of the concepts required to build a queue and then to implement them in a program, I don't see to many ways to improve that using the current measurement system. You either have a working program or not.

Experiments

Experiment 1

Question

Today I learned about various different sorting algorithms. Many of them had their advantages and disadvantages. One of the things we saw when exploring the different methods is what is know as there computational complexity. The computational complexity is basically how with the algorithm handles data that is:

  1. completely in order
  2. completely out of order
  3. somewhere in the middle.

What I want to know is how long will it take the bubble sort algorithm to sort these 3 different types of scenarios listed above.

Resources

My knowledge of the bubble sort algorithm has been obtained from Matthew Haas And this webpage → http://en.wikipedia.org/wiki/Bubble_sort

I also have done a project implementing a simple bubble sort algorithm. I think the combined knowledge of the above things will help me in this experiment.

Hypothesis

I believe that the array that has values that are completely out of order will take the longest. After that, the array loaded with random value will take the next longest to sort. Finally the array with values that are already in order will take the least amount of time.

Experiment

I will test my hypothesis by keeping track of the run time of 3 different programs. Each one of these programs will sort an array of integers. The first array will have the elements completely out of order. The next array I will load the array with completely random values. The final array I will load with values that are already in the right order. I will then compare the run times of each of these programs and ponder the results.

Data

Execution of the code that I wrote that used a bubble sort algorithm to sort a list of numbers that were completely out of order. I loaded a 100000 element array with descending numbers starting from 1000000.

lab46:~/src/data/sorts$ time ./bubbleTest > test.txt

real    1m44.096s
user    1m43.898s
sys     0m0.004s
lab46:~/src/data/sorts$ 

I then ran that code and outputted the content of the sorted array to a file. Notice that it took about 1 minute and 44 seconds to complete this task.

test.txt

[0]: 900001
[1]: 900002
.
.
.
.
.
[99992]: 999993
[99993]: 999994
[99994]: 999995
[99995]: 999996
[99996]: 999997
[99997]: 999998
[99998]: 999999
[99999]: 1000000

Now I will test to see if there is any difference in an array that is completely out of order to an array that may only be partially out of order. I will load an array of 100000 elements with random numbers from 0 to 1000000. I will then run that array through the sorting algorithm and observe the results.

lab46:~/src/data/sorts$ time ./bubbleTest > random.txt

real    1m49.129s
user    1m48.951s
sys     0m0.012s
lab46:~/src/data/sorts$

Notice that it took about the same time to run as the list that was completely out of order.

Finally we will test how long it will take to sort through a list that is already in the right order using the bubble sort algorithm.

lab46:~/src/data/sorts$ time ./bubbleTest > test.txt

real    0m0.300s
user    0m0.072s
sys     0m0.012s
lab46:~/src/data/sorts$ 

Notice that the program completed in a fraction of a second. This was to be expected cause the algorithm didn't really have to do anything.

Analysis

Based on the data collected:

  • was your hypothesis correct? Not entirely. I did state that the array that was already in order would take the least amount of time to sort but I did not notice any significant difference between the array completely out of order and the one that was only partially out of order.
  • was your hypothesis not applicable? I think it was applicable just not entirely correct.
  • is there more going on than you originally thought? I don't believe so. I think I have a fairly good understanding of how the sorting algorithm works.
  • what shortcomings might there be in your experiment? There are none that I am aware of.
  • what shortcomings might there be in your data? I am using lab46's built in commands time to record how long each program takes to run. I don't actually time the run time of only the specific sorting algorithm inside the program. This may effect the results slightly but I don't believe this is a big problem.

Conclusions

After completing this experiment, I believe I can state some observations I made.

  1. The bubble sort is very effective sorting algorithm when the list of elements is already in order.
  2. The bubble sort in average case scenario for elements being out of order is not much faster than it is when the list is completely out of order.

Experiment 2

Question

Over the past couple weeks i have been learning various different sorting algorithms. Many of them had their advantages and disadvantages. One of the things I have seen when exploring the different methods of sorting data is known as there computational complexity. The computational complexity is basically how with the algorithm handles data that is:

  1. completely in order
  2. completely out of order
  3. somewhere in the middle.

What I want to know is how long will it take the selection sort algorithm to sort these 3 different types of scenarios listed above.

Resources

My knowledge of the selection sort algorithm has been obtained from C++ Programming by D.S. Malik and this webpage →http://en.wikipedia.org/wiki/Selection_sort

I also have done a project implementing a simple selection sort algorithm. I think the combined knowledge of the above things will help me in this experiment.

Hypothesis

I believe that the all the arrays will take about the same time sort. I believe this is true because the algorithm has no way to check to see if the array is already sorted. Therefore must perform the same amount of operations no matter how the data is already ordered.

Experiment

I will test my hypothesis by keeping track of the run time of 3 different programs. Each one of these programs will sort an array of integers. The first array will have the elements completely out of order. The next array I will load the array with completely random values. The final array I will load with values that are already in the right order. I will then compare the run times of each of these programs and ponder the results.

Data

Execution of the code that I wrote that used a selection sort algorithm to sort a list of numbers that were completely out of order. I loaded a 100000 element array with descending numbers starting from 1000000.

lab46:~/src/data/sorts$ time ./selectionTest > test.txt

real    0m33.734s
user    0m33.494s
sys     0m0.016s
lab46:~/src/data/sorts$ 

I then ran that code and outputted the content of the sorted array to a file. Notice that it took about 34 seconds to complete this task.

test.txt

[0]: 900001
[1]: 900002
.
.
.
.
.
[99992]: 999993
[99993]: 999994
[99994]: 999995
[99995]: 999996
[99996]: 999997
[99997]: 999998
[99998]: 999999
[99999]: 1000000

Now I will test to see if there is any difference in an array that is completely out of order to an array that may only be partially out of order. I will load an array of 100000 elements with random numbers from 0 to 1000000. I will then run that array through the sorting algorithm and observe the results.

lab46:~/src/data/sorts$ time ./selectionTest > random.txt

real    0m32.299s
user    0m32.082s
sys     0m0.004s
lab46:~/src/data/sorts$

Notice that it took about the same time to run as the list that was completely out of order.

Finally we will test how long it will take to sort through a list that is already in the right order using the selection sort algorithm.

lab46:~/src/data/sorts$ time ./selectionTest > test.txt

real    0m32.272s
user    0m32.082s
sys     0m0.012s
lab46:~/src/data/sorts$

Notice that the program in about the same time as the others. This was to be expected since of how the algorithm functions.

Analysis

Based on the data collected:

  • was your hypothesis correct? Yes. I stated that the algorithm would handle the 3 different data set about the same and it did.
  • was your hypothesis not applicable? I think it was applicable.
  • is there more going on than you originally thought? I don't believe so. I think I have a fairly good understanding of how the sorting algorithm works.
  • what shortcomings might there be in your experiment? There are none that I am aware of.
  • what shortcomings might there be in your data? I am using lab46's built in commands time to record how long each program takes to run. I don't actually time the run time of only the specific sorting algorithm inside the program. This may effect the results slightly but I don't believe this is a big problem.

Conclusions

After completing this experiment, I believe I can state some observations I made.

  1. In the selection sort algorithm you know what you are going to get. No matter whether the data is in order or out of order it will take the same amount of time to run. This is not good be cause that is O(n^2) for each case.
  2. Overall, this is not a good sorting algorithm. You will be much better off using a different one.

Experiment 3

Question

I have been studying various different sorting algorithms lately and have wondered how two different ones would match up in a head to head race.

My question is this. Which sorting algorithm sorts data faster? The selection sort or the bubble sort?

Resources

My knowledge of these sorting algorithms have been obtained from the projects I have done on them. You can find them here → http://lab46.corning-cc.edu/user/dschoeff/portfolio

Hypothesis

I believe that the bubble sort will perform better than the selection sort on data that is already in order where as the selection sort will perform better on data that is not in order.

Experiment

I will test my hypothesis by running both sorting algorithms on 3 different sets of data. The first set of integers will be completely out of order. The next set will be randomly filled with integers. The final set will be filled with all the integers in the correct order.

Data

Execution of the code that I wrote that used a bubble sort algorithm to sort a list of numbers that were completely out of order. I loaded a 100000 element array with descending numbers starting from 1000000.

lab46:~/src/data/sorts$ time ./bubbleTest > test.txt

real    1m44.096s
user    1m43.898s
sys     0m0.004s
lab46:~/src/data/sorts$ 

I then ran that code and outputted the content of the sorted array to a file. Notice that it took about 1 minute and 44 seconds to complete this task.

Execution of the code that I wrote that used a selection sort algorithm to sort a list of numbers that were completely out of order. I loaded a 100000 element array with descending numbers starting from 1000000.

lab46:~/src/data/sorts$ time ./selectionTest > test.txt

real    0m33.734s
user    0m33.494s
sys     0m0.016s
lab46:~/src/data/sorts$ 

I then ran that code and outputted the content of the sorted array to a file. Notice that it took about 34 seconds to complete this task. Round 1 goes to the selection sort.

Now I will test the two different sorts on an array of 100000 elements with random numbers from 0 to 1000000. I will then run that array through the sorting algorithms and observe the results. First the bubble sort.

lab46:~/src/data/sorts$ time ./bubbleTest > random.txt

real    1m49.129s
user    1m48.951s
sys     0m0.012s
lab46:~/src/data/sorts$

Notice that it took about 1 minute 50 seconds to complete.

Next, lets run the selection sort on the same data.

lab46:~/src/data/sorts$ time ./selectionTest > random.txt

real    0m32.299s
user    0m32.082s
sys     0m0.004s
lab46:~/src/data/sorts$

Notice that it took about 33 seconds to run. Round 2 goes to the selection sort.

Finally we will test how long it will take to sort through a list that is already in the right order using the bubble sort algorithm.

lab46:~/src/data/sorts$ time ./bubbleTest > test.txt

real    0m0.300s
user    0m0.072s
sys     0m0.012s
lab46:~/src/data/sorts$ 

Notice that the program completed in a fraction of a second. This was to be expected cause the algorithm didn't really have to do anything.

Next lets try the selection sort.

lab46:~/src/data/sorts$ time ./selectionTest > test.txt

real    0m32.272s
user    0m32.082s
sys     0m0.012s
lab46:~/src/data/sorts$

Notice that it took about 33 seconds to run. Round 3 goes to the bubble sort.

Analysis

Based on the data collected:

  • was your hypothesis correct? Yes. The bubble sort performed best on the data set that was already in order and the selection sort performed better on the other two.
  • was your hypothesis not applicable? I think it was applicable.
  • is there more going on than you originally thought? I don't believe so.
  • what shortcomings might there be in your experiment? There are none that I am aware of.
  • what shortcomings might there be in your data? I am using lab46's built in commands time to record how long each program takes to run. I don't actually time the run time of only the specific sorting algorithm inside the program. This may effect the results slightly but I don't believe this is a big problem.

Conclusions

After completing this experiment, I believe I can state some observations I made.

  1. The bubble sort is better to use than the selection sort when the list is already in order.
  2. The selection sort is better to use when the data is not in order.
  3. Since your data is almost never in the right order, the selection sort seems like it would be a better one overall to use.
opus/fall2011/dschoeff/start.txt · Last modified: 2014/01/19 09:20 by 127.0.0.1