These are some steps to find a simple bug because a NULL pointer.
0. gdb should be installed
# yum install gdb
1. add '-g -O' options (without quotes) at the EXTRA_CFLAGS variable of your Makefile file. This will add symbols to your created .ko module file
like:
EXTRA_CFLAGS += -O2 -Wall -Wno-unused -fno-strict-aliasing -Werror -g -O
2. Make the module
#make
3. execute:
agonzalez@kozlex:~/bugs/b/zncrypt/src/module/zncrypt$ gdb ./zncryptfs.ko
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/agonzalez/bugs/b/zncrypt/src/module/zncrypt/zncryptfs.ko...done.
There you are, symbols were read
4. Other option once you are in gdb shell, add symbols from 0x0:
(gdb) add-symbol-file zncryptfs.ko 0x0
add symbol table from file "zncryptfs.ko" at
.text_addr = 0x0
(y or n) y
Reading symbols from /home/agonzalez/bugs/b/zncrypt/src/module/zncrypt/zncryptfs.ko...done.
(gdb)
5. Your symbols are loaded, you can dissassemble your culprit function, in my case zncrypt_remove_inode().
(gdb) disassemble zncrypt_remove_inode
Dump of assembler code for function zncrypt_remove_inode:
0x00000000000008d5 <+0>: push %rbp
0x00000000000008d6 <+1>: mov %rsp,%rbp
0x00000000000008d9 <+4>: callq 0x8de <zncrypt_remove_inode+9>
0x00000000000008de <+9>: mov (%rdi),%rax
0x00000000000008e1 <+12>: mov 0x10(%rdi),%rdx
0x00000000000008e5 <+16>: mov 0xc0(%rdx),%rdx
0x00000000000008ec <+23>: mov %rdx,0x20(%rax)
0x00000000000008f0 <+27>: mov (%rdi),%rax
0x00000000000008f3 <+30>: mov 0x8(%rdi),%rdx
0x00000000000008f7 <+34>: mov 0xd0(%rdx),%rdx
0x00000000000008fe <+41>: mov %rdx,0x130(%rax)
0x0000000000000905 <+48>: mov 0x18(%rdi),%rdx
0x0000000000000909 <+52>: mov 0x20(%rdi),%rax
0x000000000000090d <+56>: mov %rax,0x8(%rdx)
0x0000000000000911 <+60>: mov %rdx,(%rax)
0x0000000000000914 <+63>: movl $0x100100,0x18(%rdi)
0x000000000000091b <+70>: movl $0xdead0000,0x1c(%rdi)
0x0000000000000922 <+77>: movl $0x200200,0x20(%rdi)
0x0000000000000929 <+84>: movl $0xdead0000,0x24(%rdi)
0x0000000000000930 <+91>: callq 0x935 <zncrypt_remove_inode+96>
0x0000000000000935 <+96>: pop %rbp
0x0000000000000936 <+97>: retq
End of assembler dump.
(gdb)
5. Check the offset of your stack trace. In my case I had an Oops and the stack trace shows:
Aug 22 12:06:59 kozlex kernel: [ 464.774484] BUG: unable to handle kernel NULL pointer dereference at (null)
.....
Aug 22 12:06:59 kozlex kernel: [ 464.799669] <ffffffffa04ee9b9>zncrypt_remove_inode+0x39/0x40 zncryptfs
6. You can see from 4 that the start address for zncrypt_remove_inode is 0x00000000000008d5, then add the offset of your stacktrace
zncrypt_remove_inode+0x39 (see 5)
7. Do the addition of 0x8d5 + 0x39 = 0x90e
8. Then do follow command with the culprit address:
(gdb) list * 0x90e
0x90e is in zncrypt_remove_inode (include/linux/list.h:88).
83 * This is only for internal list manipulation where we know
84 * the prev/next entries already!
85 */
86 static inline void __list_del(struct list_head * prev, struct list_head * next)
87 {
88 next->prev = prev;
89 prev->next = next;
90 }
91
92 /**
(gdb)
9. There you go... Line 88 is where the NULL pointer fails...
10. Figure out where is you error by dissassembling previous functions and do what you know to do.
Enjoy!
P.S. The first
flag of the world is...