Published 2007-01-31 17:41:05

One the projects I'm working on is a SyncML server, written from scratch in D, It's currently in testing mode, and we found that the server was mysteriously crashing. Unfortunatly, since it's threaded, and forked as a daemon, we didn't really want to run it under GDB, (and since GDB segfaults on startup anyway). we where at a bit of a quagmire about how to find the bug.

So after a bit of searching through code.google.com I came across the idea of catching the SIGSEGV signal and calling backtrace and backtrace_symbols

This little trick can product output that looks something like
/path/to/application [0xAAAAAA] << address of code
/path/to/application [0xAAAAAA] << address of code
/path/to/application [0xAAAAAA] << address of code
/path/to/application [0xAAAAAA] << address of code
which initially seemed a bit cryptic, but by putting it together with
addr2line can result in some great debugging information.

This is my little backtrace logger for the deamon logger.
static void print_trace()
{


void *btarray[10];
size_t size;
char **strings;
size_t i;
pid_t pid = getpid();

//writefln("SEG");
size = backtrace(cast(void**)btarray, 10);
strings = backtrace_symbols(cast(void**)btarray, size);

std.process.system("/bin/echo '----BACKTRACE------' " ~
"> /var/log/myproject/backtrace.log");

for(i = 0; i < size; i++) {

char[] line = std.string.toString(strings[i]);
char[][] bits = std.string.split(line, "[");
char[] left = std.string.strip(bits[0]);
if (!left.length) {
continue;
}
// skip lines with ( in them...
if (std.string.find(left,"(") > -1) {
continue;
}

char[] addr = bits[1][2..length-1];


std.process.system("/bin/echo '----" ~ addr
~ "------' >> /var/log/myproject/backtrace.log");
std.process.system("/usr/bin/addr2line -f -e " ~
left ~ " " ~ addr ~ " >> /var/log/myproject/myproject.log");


}
free(strings);
}

of course you need to use a few C externs to make this work:
extern (C) {
int backtrace(void **__array, int __size);
char** backtrace_symbols(void **__array, int __size);
pid_t getpid();
sighandler_t signal(int signum, sighandler_t handler);
void sigsegv(int sig)
{
// reset the handler.
signal(SIGSEGV, cast(sighandler_t) 0);
print_trace();
// really die
exit(SIGSEGV);
}

}

and to add it to you application, stick this in main() somewhere
signal(SIGSEGV, &sigsegv);
testing it is quite simple, just do this in D

void  testSegfault()
{

class SegTest {
void test() {}
}
SegTest a;
a.test();
}
Now looking at the debug file, you can work out where it failed...
----BACKTRACE------
----805e971------ (THIS IS MY OUTPUT CODE)
_D9myproject7cmdLine6daemon11print_traceFZv
init.c/src/myproject/cmdLine/daemon.d:306
----805e46b------
sigsegv
init.c/src/myproject/cmdLine/daemon.d:121
----804db18------ (AND NOW FOR THE LOCATION OF THE SEGFAULT)
_D9myprojectfort7manager7manager7runOptsFZAa
init.c/src/myproject/manager.d:50
----805617a------
_D9myproject10webRequest10webRequest5parseFAaAaKAaZAa
init.c/src/myproject/webRequest.d:89
----8050c3e------
_D9pmyproject14myprojectThread14myprojectThread18dealWithWebRequestFAaAaZv
init.c/src/myproject/myprojectThread.d:331
----80503d0------
_D9myproject14myprojectThread14myprojectThread3runFZi
init.c/src/myproject/myprojectThread.d:111
----8076260------
_D3std6thread6Thread11threadstartUPvZPv
??:0
----a7fd10bd------
??

I'm sure with some more work, you could get it to log to syslog...
Mentioned By:
google.com : april (86 referals)
google.com : php daemon (39 referals)
google.com : january (34 referals)
google.com : catch SIGSEGV (32 referals)
google.com : december (28 referals)
google.com : backtrace_symbols addr2line (22 referals)
www.planet-php.net : Planet PHP (18 referals)
google.com : addr2line backtrace_symbols (16 referals)
google.com : catching SIGSEGV (6 referals)
google.com : backtrace_symbols (5 referals)
google.com : Backtracing php (5 referals)
google.com : sigsegv catch (4 referals)
planet.debian.org.hk : Debian HK : Debian @ Hong Kong (3 referals)
google.com : backtracing (3 referals)
google.com : execinfo.h (3 referals)
google.com : feed deamon (3 referals)
google.com : php daemon (3 referals)
google.com : php segfault (3 referals)
google.com : sigsegv string::find (3 referals)
google.com : std.process (3 referals)

Comments

Say no to SIGSEGV
I am wary of code that attempts to catch SIGSEGV and do something meaningful with it. When you are in a SEGV state, you could be there because of a corrupted stack or all kinds of other bad things. Running further non-trivial code (and from the looks of your handler it is non-trivial) is likely to trigger
another SEGV in which case your handler is called again and you get
another SEGV.

If your memory is corrupt, trying to do printf() which does complicated
things like malloc() is likely going to dereference an invalid pointer,
so you end up with a second coredump.

Much better to let the process die a natural death. Once it has dumped core, you can have a separate process that converts the core to a stacktrace. If your GDB is segfaulting on startup, I'd suggest spending more time investigating why that is instead of trying to catch fatal signals, which are fatal for a reason.
#0 - Andrei ( Link) on 2007-02-01 00:46:25 Delete Comment
D may be different
While dealing with PHP, segfaults have come from all over the place, and this solution would probably mess up as you describe.

For the time being though, The only reason I've ever seen my D code segfault is on null pointer calls (which are unfortunatly not assert tested yet. - although that's apparently on the todo list.) So, It's not so much of a concern about stack corruption.

That said I could do with working out why gdb segfaults.. It would make life easier.
#1 - Alan Knowles ( Link) on 2007-02-01 08:00:23 Delete Comment

Add Your Comment