Lessons Learned from a Large OpenBMC Deployment

Main,

Facebook has been working on an open source Board Management Controller (BMC) solution since 2014. This presentation examines several specific problems discovered, as usage of the embedded Linux distribution has grown.

Out of Memory in 1 to 60 Days, or Why to Engage Upstream and Rebase Often

Two memory leaks, one in Linux v2.6 and the other in rsyslog, and how they were fixed upstream.

The Pain of Passwords, or Why to Invest In Security

Several shortcomings of passwords we have encountered, how to set up SSH Trusted CA and Authorized Principals, and password and key rotation considerations for image update and configuration mechanisms.

Unresponsive Endpoints, or Why to Architect and Test for Resilience

Communication failures observed between bootloader and BMC over IPMI, or between BMC processes over Unix socket, the code or system design changes which improved things, and how testing can screen for these issues.

Resources: