======Introduction to GDB====== =====Overview===== Google defines a debugger as: * A computer program that assists in the detection and correction of errors in computer programs. There is an important concept we must acknowledge here- a debugger is a program, just like the programs we've been writing. But as a program, it works with other programs. It is a program for programs. See the depth there? GDB, the GNU Debugger, is certainly not the only debugger in existence, but it is most likely among the most available general purpose debuggers we can get our hands on. Learning one debugger, like GDB, well will introduce you to many important concepts that all debuggers share (some just may enable certain actions better than others, or be more specialized for certain tasks-- but they all, as debuggers, share a common underlying functionality). The prime value we are looking to get out of our debugger use is to aid us in finding (and therefore fixing) runtime and logical errors in programs we write. It is also a phenomenal investigative tool, allowing us to see what a particular piece of code is ACTUALLY doing, as far down as the assembly instructions, but in our case the line-by-line execution of our programs in C. =====Debugger Features===== Common fundamental features a debugger may provide: * **stepping**: the ability to comb over your running program a line at a time. Often referred to as "single step"- we can execute a line of code (much as an interpreter might), and analyze the state of variables. This is great for seeing if things are as they should be at various points in a program's execution (into a function, a particular iteration of a loop, etc.) * **viewing data**: as indicated in stepping- in the debugger we can view the state of any existing variable or memory location at any stage of the program's execution. * **breakpoints**: the magic that makes single stepping even more useful. A breakpoint is us communicating to the debugger where we'd like it to pause execution so that we may check out the current state of things. By strategically setting breakpoints throughout a program, we can run through the sections of code we're not interested, then break to analyze areas of particular interest. * **continue**: the cessation of single step mode, when we wish to resume regular execution (until the next breakpoint is encountered). * **manipulating data**: we also have the ability to modify existing data, such as variables... setting them to desired values and watching what will happen in our code. There are many, many more features, but we're just getting started, and these are by far the most useful for us in our endeavors. =====Compile-time Support===== To take full advantage of the debugging environment, we must instruct the compiler at compile time to include debug-specific information. We can do this by adding a **-g** to the compiler's argument list. It can go anywhere on the command-line, provided where it is placed doesn't violate any syntax rules of the other arguments. $ gcc -g -o hello hello.c =====Using a debugger during program execution===== A debugger acts as a sort of wrapper when running a program. It runs the desired program within it, so that we can use the debugger's features to better study what is going on. As such, we need to start a debugging session as follows: $ gdb ./hello Note we run the program as usual, but we take care to prefix it with "gdb" (so gdb knows what program to run). =====Segfault mitigation===== If your program is segfaulting, a quick and easy way to use the debugger is to let it run as normal, and when the program crashes it will provide you with information on where the problem took place. In some cases, this can be enough to identify and fix problems (more involved use of the debugger requires you to know increasingly more specifically where the problem is taking place, so starting off with strategies like this will only help you in further debugging efforts). Let's take a program that will segfault on execution: #include struct thing { int val; struct thing *other; }; int main() { struct thing *stuff; char c, *s, hi = 0, len = 0; while ((c = fgetc(stdin)) != '\n') { *(s+len) = c; fprintf(stdout, "just read: '%c' (%hhd)\n", *(s+len), *(s+len)); len = len + 1; if (c > hi) hi = c; } fprintf(stdout, "%s\n", s); stuff -> val = hi; fprintf(stdout, "Highest value encountered was: '%c' (%hhd)\n", stuff -> val, stuff -> val); return(0); } Type this in and name it (I'll use **input.c** as my example name). Compile it with debugging support: $ gcc -g -o input input.c $ The debugger is only useful with code free of syntax errors (because it requires the code to successfully compile to work). If your code does not compile, you cannot use the debugger to help fix the problem. Now, let us start gdb with **input** as our debug target: $ gdb ./input A smallish banner message will appear, and at the very bottom will be a "**(gdb)**" prompt. This is where you'll be entering various commands. For starters, let us just run the program and see what happens. We do this by issuing the "**run**" command at the gdb prompt: (gdb) run It'll appear to pause; that's because it is expecting input... so type in something (hello) and hit ENTER to allow it to proceed, you should then see something resembling the following: (gdb) run Starting program: /home/user/input hello just read: 'h' (104) just read: 'e' (101) just read: 'l' (108) just read: 'l' (108) just read: 'o' (111) hello Program received signal SIGSEGV, Segmentation fault. 0x0000000000400682 in main () at input.c:23 23 stuff -> val = hi; (gdb) Aha! A segfault! And look at what the debugger just told us... the EXACT line that, when processed, results in a segfault: 0x0000000000400682 in main () at input.c:23 23 stuff -> val = hi; In fact, there are 3 important pieces of information that are immediately useful to us: - This problem occurred within the **main()** function (narrowing our search) - The problem manifested itself specifically on line 23 of input.c, within the main() function - That the problem is this piece of code: **stuff -> val = hi;** Now, we also know that the code compiled cleanly-- no warnings or errors. So there are no syntax errors. So what could the problem be? For that, more debugging steps are in order. First, if there were multiple function calls at work, it might help to know the function call order that took place (how did we get here-- the problem may not be here, but in something that came before). It is a good idea to perform a function call backtrace, showing where we are back to where we started (we always start at **main()**).. so if there do not appear to be any problems here, we can come up with strategies for testing prerequisite functions. To do a backtrace, simply type **bt** at the "**(gdb)**" prompt. In our case here, there's only one function, so it'll only report 1 thing (main): (gdb) bt #0 0x0000000000400682 in main () at input.c:23 (gdb) =====Setting a breakpoint===== Now that we know our problem is on line 23 of main(), and if just knowing that didn't lead to identifying and fixing the problem (you did go back and take a look, right? The debugger assists you in solving problems, it does not solve problems for you), we'll have to dig a little deeper. The next approach we should take is setting a break point. A breakpoint is essentially a cue given to the debugger to STOP execution once it reaches a given line. It is important to realize that the line in question **HAS NOT** yet been run, but is **ABOUT** to be run. ====set a breakpoint==== So, we know the problem is on line 23, so let us set a breakpoint there: (gdb) break 23 Breakpoint 1 at 0x40067a: file input.c, line 23. (gdb) ====re-run the program==== Now, let us start execution once again: (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/username/input hello just read: 'h' (104) just read: 'e' (101) just read: 'l' (108) just read: 'l' (108) just read: 'o' (111) hello Breakpoint 1, main () at input.c:23 23 stuff -> val = hi; (gdb) Okay... we've done it... re-run the program, and this time stopped just short of where the segfault seems to be taking place. =====Viewing program data===== Now it is time to take a look at what is actually going on. We THINK we know what is going on, but clearly what we think and what is actually are two different things (we think there shouldn't be a segfault, yet there is). So, looking at our suspect line: 23 stuff -> val = hi; Let us see what the states of these variables are. ====printing values during debug==== To check the current state of a variable, we can use the **print** or **display** command to gdb. **print** will do a one time display of the state of a variable. **display** will set a display point, printing that variable state out after any further gdb commands (very useful for watching a loop play out) For now, let us take a look at both the **hi** and **stuff -> val** variables: (gdb) print hi $1 = 111 'o' (gdb) print stuff -> val $2 = -1991643855 (gdb) print stuff $3 = (struct thing *) 0x4004d0 <_start> That value of **hi** should make sense (it should be set to the highest character value encountered during execution (user input)... if you typed in "hello", the 'o' would have the highest numerical value, based on its placement in the ASCII table. The stuff struct prints out seemingly random stuff. But we know that it is a pointer, and we didn't initialize it, so we're seeing whatever initial garbage values were at that memory location. Nothing seemingly out of place... let's check out the **stuff** variable itself: (gdb) print stuff $3 = (struct thing *) 0x4004d0 <_start> Even that seems okay... it is a pointer, it should have an address. Okay, so everything seems in order... let's try executing this line (and just this one line) and see what happens. =====Single-Stepping===== The debugger allows us to 'single step' through code, executing instructions individually. This is quite valuable as we can watch the state of our variables change, to better inform us as to what is going on. There are 2 stepping commands: * **s**tep: execute the next instruction * **n**ext: execute the next instruction, but do not descend into any called functions The **step** command lets us follow the thread of program execution, whereever it may lead. This can have its uses, but we have to be careful, we can only go where there is debugging support- while we compiled our program with debugging support, we linked against a non-debug C library. So any of those functions (**fgetc()**, **fprintf()**) we do **NOT** want to step into. When faced with a function call without debug symbols, or we simply do not wish to follow the thread of execution into that function, we can instead opt to step over it as if it were just a simple instruction. This is where the **next** command comes in handy. Let us execute that variable assignment, by issuing a **step** command: (gdb) n Program received signal SIGSEGV, Segmentation fault. 0x0000000000400682 in main () at input.c:23 23 stuff -> val = hi; (gdb) Everything seemed fine, but then when we tried to run it, bam- segfault. So something is clearly awry here. Knowing what those two variables are, **hi** likely isn't the problem, it is just a regular scalar variable. But **stuff** is a pointer. We know that when using pointers, we open the door to these kinds of problems. So what might the problem be? =====Solution===== This solution requires knowledge of the program itself-- its purpose, and the code contained therein. So clearly, if you aren't familiar with the code, not even the debugger can help you get to some solutions. In this case, the problem was that while we declared **stuff** as a pointer to a thing struct, we neglected to **allocate** memory, or point it at an existing instance of a thing struct. Adding this line up top would clear up the problem: stuff = (struct thing *) malloc (sizeof (struct thing)); Also what could have helped better identify this problem would have been to initialize **stuff** to NULL (one should ALWAYS set their variables to sane initial values).. setting it to NULL would have shown **stuff** to have been NULL, so there would NOT have been a **val** element to access (which would have caused a segfault). As it was, **stuff** WAS pointing somewhere, but an invalid location... so trying to modify the data there resulted in the operating system yelling at us. Seeing the NULL would have better clued us in that we had forgotten to **malloc()** the space, and could have more easily come to that solution. As it was, we had to do a little bit of detective work to eventually figure out it was the lack of memory allocation (and default invalid pointing of pointer) that created our problem.