Field practice on BMC coredump analysis

Main Track,

Bytedance's practice about automactic coredump analysis, including

  • ipks collection in CI;
  • Coredump collections;
  • debug_dump tool introduction;
  • Integration of alarms and analysis.

Coredumps occurs in BMC and the best tool to debug such coredump is gdb. There are existing ipkdbg and bbdebg tools that make it easy to debug a coredump.

In this talk, we (Bytedance) will introduce the practice of how to debug a coredump that occurred in online production servers (mostly) automatically.

This talk involves the detailed steps of:

  • The CI job configuration to collect the necessary files to debug coredump;
  • The changes in BMC to send coredumps to remote server;
  • The remote server configuration to handle coredump files and events;
  • The integration of alarms for coredump events;
  • The automatic analysis based on the debug_dump tool;
  • The introduction of the new debug_dump tool to preapre the gdb environment with one-line command;
  • The one-click WebUI to make the above steps automatically.

Significantly, the debug_dump is a Python tool based on the existing ipkdbg tool to analyze the coredump, and it will be submitted to the OpenBMC community.