jueves, 22 de agosto de 2013

Debugging a kernel module using gdb

These are some steps to find a simple bug because a NULL pointer.

0. gdb should be installed
# yum install gdb


1. add '-g -O' options (without quotes) at the EXTRA_CFLAGS variable of your Makefile file. This will add symbols to your created .ko module file

like:
EXTRA_CFLAGS += -O2 -Wall -Wno-unused -fno-strict-aliasing -Werror -g -O

2. Make the module
#make

3. execute:
agonzalez@kozlex:~/bugs/b/zncrypt/src/module/zncrypt$ gdb ./zncryptfs.ko
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/agonzalez/bugs/b/zncrypt/src/module/zncrypt/zncryptfs.ko...done.
There you are, symbols were read

4. Other option once you are in gdb shell, add symbols from 0x0:

(gdb) add-symbol-file zncryptfs.ko 0x0
add symbol table from file "zncryptfs.ko" at
    .text_addr = 0x0
(y or n) y
Reading symbols from /home/agonzalez/bugs/b/zncrypt/src/module/zncrypt/zncryptfs.ko...done.
(gdb)

5. Your symbols are loaded, you can dissassemble your culprit function, in my case zncrypt_remove_inode().

(gdb) disassemble zncrypt_remove_inode
Dump of assembler code for function zncrypt_remove_inode:
   0x00000000000008d5 <+0>:    push   %rbp
   0x00000000000008d6 <+1>:    mov    %rsp,%rbp
   0x00000000000008d9 <+4>:    callq  0x8de <zncrypt_remove_inode+9>
   0x00000000000008de <+9>:    mov    (%rdi),%rax
   0x00000000000008e1 <+12>:    mov    0x10(%rdi),%rdx
   0x00000000000008e5 <+16>:    mov    0xc0(%rdx),%rdx
   0x00000000000008ec <+23>:    mov    %rdx,0x20(%rax)
   0x00000000000008f0 <+27>:    mov    (%rdi),%rax
   0x00000000000008f3 <+30>:    mov    0x8(%rdi),%rdx
   0x00000000000008f7 <+34>:    mov    0xd0(%rdx),%rdx
   0x00000000000008fe <+41>:    mov    %rdx,0x130(%rax)
   0x0000000000000905 <+48>:    mov    0x18(%rdi),%rdx
   0x0000000000000909 <+52>:    mov    0x20(%rdi),%rax
   0x000000000000090d <+56>:    mov    %rax,0x8(%rdx)
   0x0000000000000911 <+60>:    mov    %rdx,(%rax)
   0x0000000000000914 <+63>:    movl   $0x100100,0x18(%rdi)
   0x000000000000091b <+70>:    movl   $0xdead0000,0x1c(%rdi)
   0x0000000000000922 <+77>:    movl   $0x200200,0x20(%rdi)
   0x0000000000000929 <+84>:    movl   $0xdead0000,0x24(%rdi)
   0x0000000000000930 <+91>:    callq  0x935 <zncrypt_remove_inode+96>
   0x0000000000000935 <+96>:    pop    %rbp
   0x0000000000000936 <+97>:    retq  
End of assembler dump.
(gdb)

5. Check the offset of your stack trace. In my case I had an Oops and the stack trace shows:

Aug 22 12:06:59 kozlex kernel: [ 464.774484] BUG: unable to handle kernel NULL pointer dereference at (null)
.....
Aug 22 12:06:59 kozlex kernel: [ 464.799669] <ffffffffa04ee9b9>zncrypt_remove_inode+0x39/0x40 zncryptfs


6. You can see from 4 that the start address for zncrypt_remove_inode is 0x00000000000008d5, then add the offset of your stacktrace zncrypt_remove_inode+0x39 (see 5)

7. Do the addition of  0x8d5 + 0x39  = 0x90e

8. Then do follow command with the culprit address:

(gdb) list * 0x90e
0x90e is in zncrypt_remove_inode (include/linux/list.h:88).
83     * This is only for internal list manipulation where we know
84     * the prev/next entries already!
85     */
86    static inline void __list_del(struct list_head * prev, struct list_head * next)
87    {
88        next->prev = prev;
89        prev->next = next;
90    }
91   
92    /**
(gdb)

9. There you go... Line 88 is where the NULL pointer fails...

10. Figure out where is you error by dissassembling previous functions and do what you know to do.

Enjoy!
P.S. The first flag of the world is...

1 comentario:

  1. A quicker way to find the line that was causing the bug is just doing this on gdb:

    (gdb) list *zncrypt_remove_inode+0x39

    That way you don't have to load the symbols nor disassemble the code as they are already loaded when you executed gdb inside the source code, and the 'list' will disassemble the code for you. :).

    So, is Six Flags a sponsor of your blog?
    I want a ticket.

    ResponderEliminar