User Tools

Site Tools


haas:spring2015:common:intro-to-gdb

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
haas:spring2015:common:intro-to-gdb [2015/04/04 14:28] – [Segfault mitigation] wedgehaas:spring2015:common:intro-to-gdb [2015/04/04 15:00] (current) – [Viewing program data] wedge
Line 51: Line 51:
 Let's take a program that will segfault on execution: Let's take a program that will segfault on execution:
  
-<code c>+<code c 1>
 #include <stdio.h> #include <stdio.h>
  
Line 106: Line 106:
 (gdb) run (gdb) run
 </cli> </cli>
 +
 +It'll appear to pause; that's because it is expecting input... so type in something (hello) and hit ENTER to allow it to proceed, you should then see something resembling the following:
 +
 +<cli>
 +(gdb) run
 +Starting program: /home/user/input 
 +hello
 +just read: 'h' (104)
 +just read: 'e' (101)
 +just read: 'l' (108)
 +just read: 'l' (108)
 +just read: 'o' (111)
 +hello
 +
 +Program received signal SIGSEGV, Segmentation fault.
 +0x0000000000400682 in main () at input.c:23
 +23              stuff -> val = hi;
 +(gdb) 
 +</cli>
 +
 +Aha! A segfault! And look at what the debugger just told us... the EXACT line that, when processed, results in a segfault:
 +
 +<cli>
 +0x0000000000400682 in main () at input.c:23
 +23              stuff -> val = hi;
 +</cli>
 +
 +In fact, there are 3 important pieces of information that are immediately useful to us:
 +
 +  - This problem occurred within the **main()** function (narrowing our search)
 +  - The problem manifested itself specifically on line 23 of input.c, within the main() function
 +  - That the problem is this piece of code: **stuff -> val = hi;**
 +
 +Now, we also know that the code compiled cleanly-- no warnings or errors. So there are no syntax errors.
 +
 +So what could the problem be?
 +
 +For that, more debugging steps are in order.
 +
 +First, if there were multiple function calls at work, it might help to know the function call order that took place (how did we get here-- the problem may not be here, but in something that came before). It is a good idea to perform a function call backtrace, showing where we are back to where we started (we always start at **main()**).. so if there do not appear to be any problems here, we can come up with strategies for testing prerequisite functions.
 +
 +To do a backtrace, simply type **bt** at the "**(gdb)**" prompt. In our case here, there's only one function, so it'll only report 1 thing (main):
 +
 +<cli>
 +(gdb) bt
 +#0  0x0000000000400682 in main () at input.c:23
 +(gdb) 
 +</cli>
 +
 +=====Setting a breakpoint=====
 +Now that we know our problem is on line 23 of main(), and if just knowing that didn't lead to identifying and fixing the problem (you did go back and take a look, right? The debugger assists you in solving problems, it does not solve problems for you), we'll have to dig a little deeper.
 +
 +The next approach we should take is setting a break point. A breakpoint is essentially a cue given to the debugger to STOP execution once it reaches a given line. It is important to realize that the line in question **HAS NOT** yet been run, but is **ABOUT** to be run.
 +
 +====set a breakpoint====
 +So, we know the problem is on line 23, so let us set a breakpoint there:
 +
 +<cli>
 +(gdb) break 23
 +Breakpoint 1 at 0x40067a: file input.c, line 23.
 +(gdb) 
 +</cli>
 +
 +====re-run the program====
 +Now, let us start execution once again:
 +
 +<cli>
 +(gdb) run
 +The program being debugged has been started already.
 +Start it from the beginning? (y or n) y
 +Starting program: /home/username/input 
 +hello
 +just read: 'h' (104)
 +just read: 'e' (101)
 +just read: 'l' (108)
 +just read: 'l' (108)
 +just read: 'o' (111)
 +hello
 +
 +Breakpoint 1, main () at input.c:23
 +23              stuff -> val = hi;
 +(gdb) 
 +</cli>
 +
 +Okay... we've done it... re-run the program, and this time stopped just short of where the segfault seems to be taking place.
 +
 +=====Viewing program data=====
 +Now it is time to take a look at what is actually going on. We THINK we know what is going on, but clearly what we think and what is actually are two different things (we think there shouldn't be a segfault, yet there is).
 +
 +So, looking at our suspect line:
 +
 +<code c>
 +23              stuff -> val = hi;
 +</code>
 +
 +Let us see what the states of these variables are.
 +
 +====printing values during debug====
 +To check the current state of a variable, we can use the **print** or **display** command to gdb.
 +
 +**print** will do a one time display of the state of a variable.
 +
 +**display** will set a display point, printing that variable state out after any further gdb commands (very useful for watching a loop play out)
 +
 +For now, let us take a look at both the **hi** and **stuff -> val** variables:
 +
 +<cli>
 +(gdb) print hi
 +$1 = 111 'o'
 +(gdb) print stuff -> val
 +$2 = -1991643855
 +(gdb) print stuff
 +$3 = (struct thing *) 0x4004d0 <_start>
 +</cli>
 +
 +That value of **hi** should make sense (it should be set to the highest character value encountered during execution (user input)... if you typed in "hello", the 'o' would have the highest numerical value, based on its placement in the ASCII table.
 +
 +The stuff struct prints out seemingly random stuff. But we know that it is a pointer, and we didn't initialize it, so we're seeing whatever initial garbage values were at that memory location.
 +
 +Nothing seemingly out of place... let's check out the **stuff** variable itself:
 +
 +<cli>
 +(gdb) print stuff
 +$3 = (struct thing *) 0x4004d0 <_start>
 +</cli>
 +
 +Even that seems okay... it is a pointer, it should have an address.
 +
 +Okay, so everything seems in order... let's try executing this line (and just this one line) and see what happens.
 +
 +=====Single-Stepping=====
 +The debugger allows us to 'single step' through code, executing instructions individually. This is quite valuable as we can watch the state of our variables change, to better inform us as to what is going on.
 +
 +There are 2 stepping commands:
 +
 +  * **s**tep: execute the next instruction
 +  * **n**ext: execute the next instruction, but do not descend into any called functions
 +
 +The **step** command lets us follow the thread of program execution, whereever it may lead. This can have its uses, but we have to be careful, we can only go where there is debugging support- while we compiled our program with debugging support, we linked against a non-debug C library. So any of those functions (**fgetc()**, **fprintf()**) we do **NOT** want to step into.
 +
 +When faced with a function call without debug symbols, or we simply do not wish to follow the thread of execution into that function, we can instead opt to step over it as if it were just a simple instruction. This is where the **next** command comes in handy.
 +
 +Let us execute that variable assignment, by issuing a **step** command:
 +
 +<cli>
 +(gdb) n
 +
 +Program received signal SIGSEGV, Segmentation fault.
 +0x0000000000400682 in main () at input.c:23
 +23              stuff -> val = hi;
 +(gdb) 
 +</cli>
 +
 +Everything seemed fine, but then when we tried to run it, bam- segfault.
 +
 +So something is clearly awry here.
 +
 +Knowing what those two variables are, **hi** likely isn't the problem, it is just a regular scalar variable.
 +
 +But **stuff** is a pointer. We know that when using pointers, we open the door to these kinds of problems.
 +
 +So what might the problem be?
 +
 +=====Solution=====
 +This solution requires knowledge of the program itself-- its purpose, and the code contained therein. So clearly, if you aren't familiar with the code, not even the debugger can help you get to some solutions.
 +
 +In this case, the problem was that while we declared **stuff** as a pointer to a thing struct, we neglected to **allocate** memory, or point it at an existing instance of a thing struct.
 +
 +Adding this line up top would clear up the problem:
 +
 +<code c>
 +stuff = (struct thing *) malloc (sizeof (struct thing));
 +</code>
 +
 +Also what could have helped better identify this problem would have been to initialize **stuff** to NULL (one should ALWAYS set their variables to sane initial values).. setting it to NULL would have shown **stuff** to have been NULL, so there would NOT have been a **val** element to access (which would have caused a segfault).
 +
 +As it was, **stuff** WAS pointing somewhere, but an invalid location... so trying to modify the data there resulted in the operating system yelling at us.
 +
 +Seeing the NULL would have better clued us in that we had forgotten to **malloc()** the space, and could have more easily come to that solution. As it was, we had to do a little bit of detective work to eventually figure out it was the lack of memory allocation (and default invalid pointing of pointer) that created our problem.
haas/spring2015/common/intro-to-gdb.1428157714.txt.gz · Last modified: 2015/04/04 14:28 by wedge