User Tools

Site Tools


opus:fall2011:dschoeff:part3

Part 3

Entries

November 7th, 2011

Happy November! This being my first official dated entry of the new month brings along great news! I have officially completed my stack library implementation and it is awesome. You should definitely check it out here → Project: Stack Library Implementation. I've learned so much doing this stack implementation. I ran into a lot of problems doing this partly because I didn't have a good understanding of stacks going into the project. This led me to research about stacks more and increase my knowledge of how they worked. I don't think I could've done this project without the use of the debugger gdb. It came in so handy when trying to find out why I was getting a segmentation fault.

I still want to increase my knowledge of struct pointers because having a better understanding of them will help lead to better code in the future. I see the power of linked list type data structures and can't wait to explore more of them.

November 11th, 2011

Well today has been fairly productive. I successfully completed my queue library implementation. I also submitted all my files to the subversion repository. You can view my queue project here → http://lab46.corning-cc.edu/user/dschoeff/portfolio/libqueue. I was able to work through a few problems I was having with my dequeue() function, but I stopped by the lair today and Matt helped me out. My next thing I want to work on is my keywords and experiments. I hope to get these finished up soon.

November 14th, 2011

Today I decided I would go do corrections to my opus. I had completed my opus on time and everything but I didn't do that good of a job of demonstrating some of the keywords I defined. So I went back and tried to better demonstrate them with code snippets and programs. I also included some command line execution of programs I had written. I think the keywords can be understood a lot better now with the new demonstrations.

November 18th, 2011

So I'm back dating this entry cause I forgot to do it on Friday.

Today was pretty cool. In Data Structures we started talking about various sorting algorithms. What we did was basically look on Wikipedia at the different sorting algorithms out there. Each type of sorting algorithm has its pros and cons but they all basically do they same thing which is to sort data. When choosing a sorting algorithm to use you need to take into account your own specific implementation and what it will be used for.

After learning about these things in class Matt let us loose and had us write our own simple algorithm. I chose to write a bubble sort algorithm which is not the most efficient algorithm but is simple to understand. I plan on writing a few more sorting algorithms implementing different methods.

After writing this algorithm for a project I decided to do some experiments on how efficient the algorithms really were. It nice to see for your self how efficient the algorithms are at sorting data rather than just taking someone else's word for it.

Data Topics

Queue Overrun

When referring to queues, an overrun condition occurs when the queue exceeds the space available to store data. In a array based queue, this would happen when trying to add a value to an array that is all ready full. Since I haven't dealt much with array based queue implementations I will stick mainly to explaining linked list based implementations. With a linked list based queue, it would seem to me that an overrun condition would be much harder to get. When you think about linked lists, they do not have a fixed size. The only size limitation that the list size has(unless the program designer were to arbitrarily set a size limit for the list) is how much memory is available to the system. Although the chance of overrun is somewhat unlikely, a function that enqueues a node to a queue should check for this condition and be flexible enough to handle that situation. I have done an experiment dealing with system memory management here → http://lab46.corning-cc.edu/opus/fall2011/dschoeff/start#experiment_3. This experiment explores dynamic memory allocation and how lab46 deals with alot of memory usage.

Look at the following is a snippet of my enqueue function. If the system that you were running your program on ran out of memory, a problem would occur at line 26 when you try to allocate memory for a new node when there is no more memory available.

 26       tmp2 = (Node *) malloc (sizeof(Node));
 27       tmp2 -> next = (*start); // point new node's next at (*start)
 28       (*start) -> prev = tmp2; // point (*start)'s prev at tmp2
 29       (*start) = (*start) -> prev; // move (*start) to new node
 30       (*start) -> value = ch; // put value in node

Queue Underrun

Queue underrun occurs when you try to dequeue a node from a queue when the queue is already empty. When dealing with linked list based queues, underrun is much more likely to happen than overrun. In your dequeue function, you must check to see whether the queue is empty before trying to dequeue a node. If you do not you will likely get a segmentation fault.

char dequeue(Node** start, Node** end, int index)
{
   int i;
   char ch;
   Node* tmp;
   tmp = *start;
   if((*start) != NULL)
   {
        //STUFF
   }
   else
   {
      ch = '\n';
   }
   return(ch);
}

The first conditional statement makes sure the queue has a value before trying to dequeue a node. Instead of getting a segmentation fault the function returns the null terminating character indicating there is no node to dequeue. If this is in place you should not have a problem with queue underrun.

FIFO

The term FIFO stands for “First In First Out”. This type of data structure refers to how data is added/removed from the data structure. A queue is a FIFO data structure because the first node added to a queue will be the first node removed.

The following picture represents how the FIFO queue works.

For more explanation of a queue refer to the queue keyword definition → http://lab46.corning-cc.edu/opus/fall2011/dschoeff/start#queue

Computational Complexity

Wikipedia states ,“Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other. In this context, a computational problem is understood to be a task that is in principle amenable to being solved by a computer (which basically means that the problem can be stated by a set of mathematical instructions). Informally, a computational problem consists of problem instances and solutions to these problem instances. For example, primality testing is the problem of determining whether a given number is prime or not. The instances of this problem are natural numbers, and the solution to an instance is yes or no based on whether the number is prime or not.”.

From what I understand, Computational Complexity is the classification of a problem according to to how difficult it is to solve. This difficulty is based upon the amount of resources needed to solve the problem. For example, the mathematical problem 2 + 2 is a lot less difficult to find a solution for than (13^2/78)* 10000 and would require less resources to solve the problem. .

References: http://en.wikipedia.org/wiki/Computational_complexity_theory

Big O/Theta, Bounds

Big O notation is used to classify algorithms by how they respond to bigger and bigger input sizes. This notation is used in computer science when referring to efficiency of algorithm. This is helpful when trying analyze how the algorithms handle data in what is called as best case, worst case and average case.

When referring to a function using big O notation only the upper bound is given. This is true because when input sizes become larger, the other terms become less influential.

When you see Big O notation it will look something like this: O(n), O(n^2), O(nlogn), O(n^∞)

Wikipedia states “Big O notation is useful when analyzing algorithms for efficiency. For example, the time (or the number of steps) it takes to complete a problem of size n might be found to be T(n) = 4n2 − 2n + 2. As n grows large, the n2 term will come to dominate, so that all other terms can be neglected — for instance when n = 500, the term 4n2 is 1000 times as large as the 2n term. Ignoring the latter would have negligible effect on the expression's value for most purposes. Further, the coefficients become irrelevant if we compare to any other order of expression, such as an expression containing a term n3 or n4. Even if T(n) = 1,000,000n2, if U(n) = n3, the latter will always exceed the former once n grows larger than 1,000,000 (T(1,000,000) = 1,000,0003= U(1,000,000)). Additionally, the number of steps depends on the details of the machine model on which the algorithm runs, but different types of machines typically vary by only a constant factor in the number of steps needed to execute an algorithm.”

Big Theta notation is very similar to Big O notation. The only difference between these two notations is that when you use big theta notation to represent a function, the upper and lower bounds are of the same order.

References: http://en.wikipedia.org/wiki/Big_O_notation

Sorting Algorithms

A sorting algorithm is an algorithm that sorts a given set of data. The most common types of sorting algorithms sort data in numerical or alphabetical order( also commonly known as lexicographical order). Sorting algorithms are very useful because data in a particular order can be very useful.

But all sorting algorithms were not created equal, there are hundreds of commonly used sorting algorithms out there each having its own particular twist on how they sort data.

A few common types of sorting algorithms are defined below. I guess the big thing when choosing what algorithm to use is how it performs to varying degrees of randomness in the data it will sort. The algorithms typically are described using Big O notation in the best case scenario, worst case scenario and average case scenario. The best case is how the algorithm responds to data that in already in the desired order. For example, the Bubble Sort algorithm which will be described later has a best case performance of O(n) meaning it would only have to traverse the data n times before it would complete. Its worst case performance(the data is completely out of order) is O(n^2) meaning the number of iterations would be n^2. The average case performance is also O(n^2).

Different algorithms perform differently to different types of data so it is best to know what you are trying to do before you choose a sorting algorithm.

Selection Sort

The Selection Sort is a simple sorting algorithm. It works by taking a data structure(lets say an array) and traversing the entire list to find the minimum value(if you are sorting from smallest to largest). Once it has that, it swaps the elements at the first position and the position of the smallest element. At this point smallest element is at the desired position. It then repeats this process with the second element and so on until it gets to the last element.

The selection sort has a worst case performance O(n^2), a best case performance O(n^2) and an average case performance O(n^2).

Here is an example from Wikipedia showing the progression of this sort.

64 25 12 22 11
11 25 12 22 64
11 12 25 22 64
11 12 22 25 64
11 12 22 25 64

This is a particularly inefficient way to sort data. If you are not concerned about efficiency or do not have a large amount of data to sort this would be good to use because of its simplicity.

References: http://en.wikipedia.org/wiki/Selection_sort

Bubble Sort

The Bubble Sort is a sorting algorithm that steps through a list of elements comparing adjacent elements and swapping them if they are out of order. The logic behind this algorithm is somewhat very simple.

The bubble sort has a worst case performance O(n^2), a best case performance O(n) and an average case performance O(n^2).

The following codes shows an implementation of a bubble sort that allows the user to enter 10 integers and then sorts them in ascending order.

#include <stdio.h>
#include <stdlib.h>
int main()
{
   int *array;
   array = (int*) malloc(sizeof(int)*10);
   int i = 0,input=0,junk;   
   int swapped = 1, tmp;
   //loads ten values from the user into the array
   for(i = 0; i < 10; i++)
   {   
      printf("enter 10 different integers: ");
      scanf("%d",&input);
      *(array+i) = input;
   }   
   //meat of the bubble sort
   while( swapped == 1)
   {   
      swapped = 0;
      for(i = 1; i < 10; i++)
      {   
         if(*(array+(i-1)) > *(array+i))//this statements checks to see
         {                              //if the elements are out of order
            tmp = *(array+i);
            *(array+i) = *(array+(i-1));
            *(array+(i-1)) = tmp;
            swapped = 1;
         }   
      }   
   }   
   //displays the array
   for(i = 0; i < 10; i++)
   {   
      printf("[%d]: %d\n", i , *(array+i));
   }   
 
   return (0);
}

Execution of this code.

lab46:~/src/data/sorts$ ./bubble
enter 10 different integers: 5
enter 10 different integers: 2
enter 10 different integers: 8
enter 10 different integers: 6
enter 10 different integers: 1
enter 10 different integers: 0
enter 10 different integers: 7
enter 10 different integers: 3
enter 10 different integers: 6
enter 10 different integers: 9
[0]: 0
[1]: 1
[2]: 2
[3]: 3
[4]: 5
[5]: 6
[6]: 6
[7]: 7
[8]: 8
[9]: 9
lab46:~/src/data/sorts$

I have done an experiment of how the bubble sort handles differing sets of data. You should check it out here → http://lab46.corning-cc.edu/opus/fall2011/dschoeff/start#experiment_12

Overall, the bubble sort is inefficient in its average case performance and you would be much better of choosing another algorithm to sort your data

Insertion Sort

The Insertion Sort is a simple sorting algorithm that is good to use when you do not have a large amount of data to sort.

The Insertion Sort has a worst case performance O(n^2), a best case performance O(n) and an average case performance O(n^2).

The easiest way to understand this sort is to see an example. The following example was found of Wikipedia.

Example: The following table shows the steps for sorting the sequence {3, 7, 4, 9, 5, 2, 6, 1}. In each step the item under consideration is underlined and the item moved (or held in place if was biggest yet considered) in the previous step is shown in bold.

3 7 4 9 5 2 6 1

3 7 4 9 5 2 6 1

3 7 4 9 5 2 6 1

3 4 7 9 5 2 6 1

3 4 7 9 5 2 6 1

3 4 5 7 9 2 6 1

2 3 4 5 7 9 6 1

2 3 4 5 6 7 9 1

1 2 3 4 5 6 7 9

What happens in an insertion sort is that every repetition it guarantees that one element will be put in the right order. looking at the list from left to right, the sort takes the first card and since it is the first card it places it back in the list. The first element is now sorted. It then looks at the next element and removes it from the list. If it is greater than the first value it is put back in its place. if it is less than the first element the first element is moved to the right and the second element is placed in the first position. There are now two sorted elements. This process is repeated until all the items are in there proper placement.

References: http://en.wikipedia.org/wiki/Insertion_sort

Quick Sort

The Quick sort algorithm is more advanced than the previous algorithms defined above. Wikipedia does a good job at explaining how the algorithm function.

“Quicksort is a divide and conquer algorithm. Quicksort first divides a large list into two smaller sub-lists: the low elements and the high elements. Quicksort can then recursively sort the sub-lists.

The steps are:

  1. Pick an element, called a pivot, from the list.
  2. Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
  3. Recursively sort the sub-list of lesser elements and the sub-list of greater elements.

The base case of the recursion are lists of size zero or one, which never need to be sorted.”(Wikipedia)

The Quick Sort has a worst case performance O(n^2), a best case performance O(nlogn) and an average case performance O(nlogn).

Merge Sort

The Merge Sort algorithm is what you would call a reliable one. The merge sort has a worst case performance O(nlogn), a best case performance O(nlogn) and an average case performance O(nlogn). This algorithm is known as a divide and conquer algorithm because it divides the list of elements up to single elements.

The process goes as follows: “

  1. If the list is of length 0 or 1, then it is already sorted. Otherwise:
  2. Divide the unsorted list into two sublists of about half the size.
  3. Sort each sublist recursively by re-applying the merge sort.
  4. Merge the two sublists back into one sorted list.”(Wikipedia).

The following picture shows a graphical representation of how the algorithm functions.

Overall, the merge sort is very reliable meaning that its worst case performance is only O(nlogn) which is much better than some of the algorithms listed above.

References: http://en.wikipedia.org/wiki/Merge_sort

A Binary Search is an algorithm used to search for a value in a sorted list. It does this by starting with a value to find and then compares that value with the middle element of the array. Since the list is sorted you can know whether to search left or right. It then repeats the process until it finds the value or the remaining array is reduced to zero.

given the following array

array[] = |1|13|22|28|30|45|58|98|100|

Lets say the value being searched for is 13.

It starts by locating the middle element. In this case it is 30. Since 13 is less than 30, if 13 is in the first half of the array.

Therefore, the remaining list to be searched is |1|13|22|28|

It then starts with the middle element of that section of the array 22. Since 13 is less than 22 the value either has to be the first or second element in the subsection of the array. It will then find 13 as the second element of that subsection.

When this logic is put into an algorithm it will use recursion. Pseudocode for an implementation is here.

binary_search(Array[0..N-1], value, low, high):
       if (high < low):
           return -1 // not found
       mid = (low + high) / 2
       if (A[mid] > value):
           return binary_search(A, value, low, mid-1)
       else if (A[mid] < value):
           return binary_search(A, value, mid+1, high)
       else:
           return mid // found
 

Data Objective

Write a program that uses queues

This course objective is to successfully write a program implementing the use of queues.

Method

To do this successfully, I will have to understand the basic concepts of a queue. This requires a knowledge of pointers, structs, struct pointers, memory allocation, memory deallocation and an understanding of linked lists. Success will be measured by the completion of a program that implements these elements efficiently.

Measurement

The keywords that I have entered at the top of this page show that I have successfully researched the concepts required to make a working queue. Besides having a knowledge of the concepts, the only other measure of success would be to have a program implementing these concepts. You can find such program here → http://lab46.corning-cc.edu/user/dschoeff/portfolio/libqueue

Analysis

Upon completion of this exercise, I believe I have successfully completed the course objective of creating a program that implements a queue.

  1. I believe could improve my functions so that they could be more flexible. For example, I currently only allow for each nodes value to be a character. I could change a few things around and allow for each node to hold more information.
  2. Since the measurement process is based upon having a knowledge of the concepts required to build a queue and then to implement them in a program, I don't see to many ways to improve that using the current measurement system. You either have a working program or not.

Experiments

Experiment 1

Question

Today I learned about various different sorting algorithms. Many of them had their advantages and disadvantages. One of the things we saw when exploring the different methods is what is know as there computational complexity. The computational complexity is basically how with the algorithm handles data that is:

  1. completely in order
  2. completely out of order
  3. somewhere in the middle.

What I want to know is how long will it take the bubble sort algorithm to sort these 3 different types of scenarios listed above.

Resources

My knowledge of the bubble sort algorithm has been obtained from Matthew Haas And this webpage → http://en.wikipedia.org/wiki/Bubble_sort

I also have done a project implementing a simple bubble sort algorithm. I think the combined knowledge of the above things will help me in this experiment.

Hypothesis

I believe that the array that has values that are completely out of order will take the longest. After that, the array loaded with random value will take the next longest to sort. Finally the array with values that are already in order will take the least amount of time.

Experiment

I will test my hypothesis by keeping track of the run time of 3 different programs. Each one of these programs will sort an array of integers. The first array will have the elements completely out of order. The next array I will load the array with completely random values. The final array I will load with values that are already in the right order. I will then compare the run times of each of these programs and ponder the results.

Data

Execution of the code that I wrote that used a bubble sort algorithm to sort a list of numbers that were completely out of order. I loaded a 100000 element array with descending numbers starting from 1000000.

lab46:~/src/data/sorts$ time ./bubbleTest > test.txt

real    1m44.096s
user    1m43.898s
sys     0m0.004s
lab46:~/src/data/sorts$ 

I then ran that code and outputted the content of the sorted array to a file. Notice that it took about 1 minute and 44 seconds to complete this task.

test.txt

[0]: 900001
[1]: 900002
.
.
.
.
.
[99992]: 999993
[99993]: 999994
[99994]: 999995
[99995]: 999996
[99996]: 999997
[99997]: 999998
[99998]: 999999
[99999]: 1000000

Now I will test to see if there is any difference in an array that is completely out of order to an array that may only be partially out of order. I will load an array of 100000 elements with random numbers from 0 to 1000000. I will then run that array through the sorting algorithm and observe the results.

lab46:~/src/data/sorts$ time ./bubbleTest > random.txt

real    1m49.129s
user    1m48.951s
sys     0m0.012s
lab46:~/src/data/sorts$

Notice that it took about the same time to run as the list that was completely out of order.

Finally we will test how long it will take to sort through a list that is already in the right order using the bubble sort algorithm.

lab46:~/src/data/sorts$ time ./bubbleTest > test.txt

real    0m0.300s
user    0m0.072s
sys     0m0.012s
lab46:~/src/data/sorts$ 

Notice that the program completed in a fraction of a second. This was to be expected cause the algorithm didn't really have to do anything.

Analysis

Based on the data collected:

  • was your hypothesis correct? Not entirely. I did state that the array that was already in order would take the least amount of time to sort but I did not notice any significant difference between the array completely out of order and the one that was only partially out of order.
  • was your hypothesis not applicable? I think it was applicable just not entirely correct.
  • is there more going on than you originally thought? I don't believe so. I think I have a fairly good understanding of how the sorting algorithm works.
  • what shortcomings might there be in your experiment? There are none that I am aware of.
  • what shortcomings might there be in your data? I am using lab46's built in commands time to record how long each program takes to run. I don't actually time the run time of only the specific sorting algorithm inside the program. This may effect the results slightly but I don't believe this is a big problem.

Conclusions

After completing this experiment, I believe I can state some observations I made.

  1. The bubble sort is very effective sorting algorithm when the list of elements is already in order.
  2. The bubble sort in average case scenario for elements being out of order is not much faster than it is when the list is completely out of order.

Experiment 2

Question

Over the past couple weeks i have been learning various different sorting algorithms. Many of them had their advantages and disadvantages. One of the things I have seen when exploring the different methods of sorting data is known as there computational complexity. The computational complexity is basically how with the algorithm handles data that is:

  1. completely in order
  2. completely out of order
  3. somewhere in the middle.

What I want to know is how long will it take the selection sort algorithm to sort these 3 different types of scenarios listed above.

Resources

My knowledge of the selection sort algorithm has been obtained from C++ Programming by D.S. Malik and this webpage →http://en.wikipedia.org/wiki/Selection_sort

I also have done a project implementing a simple selection sort algorithm. I think the combined knowledge of the above things will help me in this experiment.

Hypothesis

I believe that the all the arrays will take about the same time sort. I believe this is true because the algorithm has no way to check to see if the array is already sorted. Therefore must perform the same amount of operations no matter how the data is already ordered.

Experiment

I will test my hypothesis by keeping track of the run time of 3 different programs. Each one of these programs will sort an array of integers. The first array will have the elements completely out of order. The next array I will load the array with completely random values. The final array I will load with values that are already in the right order. I will then compare the run times of each of these programs and ponder the results.

Data

Execution of the code that I wrote that used a selection sort algorithm to sort a list of numbers that were completely out of order. I loaded a 100000 element array with descending numbers starting from 1000000.

lab46:~/src/data/sorts$ time ./selectionTest > test.txt

real    0m33.734s
user    0m33.494s
sys     0m0.016s
lab46:~/src/data/sorts$ 

I then ran that code and outputted the content of the sorted array to a file. Notice that it took about 34 seconds to complete this task.

test.txt

[0]: 900001
[1]: 900002
.
.
.
.
.
[99992]: 999993
[99993]: 999994
[99994]: 999995
[99995]: 999996
[99996]: 999997
[99997]: 999998
[99998]: 999999
[99999]: 1000000

Now I will test to see if there is any difference in an array that is completely out of order to an array that may only be partially out of order. I will load an array of 100000 elements with random numbers from 0 to 1000000. I will then run that array through the sorting algorithm and observe the results.

lab46:~/src/data/sorts$ time ./selectionTest > random.txt

real    0m32.299s
user    0m32.082s
sys     0m0.004s
lab46:~/src/data/sorts$

Notice that it took about the same time to run as the list that was completely out of order.

Finally we will test how long it will take to sort through a list that is already in the right order using the selection sort algorithm.

lab46:~/src/data/sorts$ time ./selectionTest > test.txt

real    0m32.272s
user    0m32.082s
sys     0m0.012s
lab46:~/src/data/sorts$

Notice that the program in about the same time as the others. This was to be expected since of how the algorithm functions.

Analysis

Based on the data collected:

  • was your hypothesis correct? Yes. I stated that the algorithm would handle the 3 different data set about the same and it did.
  • was your hypothesis not applicable? I think it was applicable.
  • is there more going on than you originally thought? I don't believe so. I think I have a fairly good understanding of how the sorting algorithm works.
  • what shortcomings might there be in your experiment? There are none that I am aware of.
  • what shortcomings might there be in your data? I am using lab46's built in commands time to record how long each program takes to run. I don't actually time the run time of only the specific sorting algorithm inside the program. This may effect the results slightly but I don't believe this is a big problem.

Conclusions

After completing this experiment, I believe I can state some observations I made.

  1. In the selection sort algorithm you know what you are going to get. No matter whether the data is in order or out of order it will take the same amount of time to run. This is not good be cause that is O(n^2) for each case.
  2. Overall, this is not a good sorting algorithm. You will be much better off using a different one.

Experiment 3

Question

I have been studying various different sorting algorithms lately and have wondered how two different ones would match up in a head to head race.

My question is this. Which sorting algorithm sorts data faster? The selection sort or the bubble sort?

Resources

My knowledge of these sorting algorithms have been obtained from the projects I have done on them. You can find them here → http://lab46.corning-cc.edu/user/dschoeff/portfolio

Hypothesis

I believe that the bubble sort will perform better than the selection sort on data that is already in order where as the selection sort will perform better on data that is not in order.

Experiment

I will test my hypothesis by running both sorting algorithms on 3 different sets of data. The first set of integers will be completely out of order. The next set will be randomly filled with integers. The final set will be filled with all the integers in the correct order.

Data

Execution of the code that I wrote that used a bubble sort algorithm to sort a list of numbers that were completely out of order. I loaded a 100000 element array with descending numbers starting from 1000000.

lab46:~/src/data/sorts$ time ./bubbleTest > test.txt

real    1m44.096s
user    1m43.898s
sys     0m0.004s
lab46:~/src/data/sorts$ 

I then ran that code and outputted the content of the sorted array to a file. Notice that it took about 1 minute and 44 seconds to complete this task.

Execution of the code that I wrote that used a selection sort algorithm to sort a list of numbers that were completely out of order. I loaded a 100000 element array with descending numbers starting from 1000000.

lab46:~/src/data/sorts$ time ./selectionTest > test.txt

real    0m33.734s
user    0m33.494s
sys     0m0.016s
lab46:~/src/data/sorts$ 

I then ran that code and outputted the content of the sorted array to a file. Notice that it took about 34 seconds to complete this task. Round 1 goes to the selection sort.

Now I will test the two different sorts on an array of 100000 elements with random numbers from 0 to 1000000. I will then run that array through the sorting algorithms and observe the results. First the bubble sort.

lab46:~/src/data/sorts$ time ./bubbleTest > random.txt

real    1m49.129s
user    1m48.951s
sys     0m0.012s
lab46:~/src/data/sorts$

Notice that it took about 1 minute 50 seconds to complete.

Next, lets run the selection sort on the same data.

lab46:~/src/data/sorts$ time ./selectionTest > random.txt

real    0m32.299s
user    0m32.082s
sys     0m0.004s
lab46:~/src/data/sorts$

Notice that it took about 33 seconds to run. Round 2 goes to the selection sort.

Finally we will test how long it will take to sort through a list that is already in the right order using the bubble sort algorithm.

lab46:~/src/data/sorts$ time ./bubbleTest > test.txt

real    0m0.300s
user    0m0.072s
sys     0m0.012s
lab46:~/src/data/sorts$ 

Notice that the program completed in a fraction of a second. This was to be expected cause the algorithm didn't really have to do anything.

Next lets try the selection sort.

lab46:~/src/data/sorts$ time ./selectionTest > test.txt

real    0m32.272s
user    0m32.082s
sys     0m0.012s
lab46:~/src/data/sorts$

Notice that it took about 33 seconds to run. Round 3 goes to the bubble sort.

Analysis

Based on the data collected:

  • was your hypothesis correct? Yes. The bubble sort performed best on the data set that was already in order and the selection sort performed better on the other two.
  • was your hypothesis not applicable? I think it was applicable.
  • is there more going on than you originally thought? I don't believe so.
  • what shortcomings might there be in your experiment? There are none that I am aware of.
  • what shortcomings might there be in your data? I am using lab46's built in commands time to record how long each program takes to run. I don't actually time the run time of only the specific sorting algorithm inside the program. This may effect the results slightly but I don't believe this is a big problem.

Conclusions

After completing this experiment, I believe I can state some observations I made.

  1. The bubble sort is better to use than the selection sort when the list is already in order.
  2. The selection sort is better to use when the data is not in order.
  3. Since your data is almost never in the right order, the selection sort seems like it would be a better one overall to use.
opus/fall2011/dschoeff/part3.txt · Last modified: 2011/12/01 22:33 by dschoeff