02 DEC 2014

Defining Specifications for Open Hardware

Rajeev, an engineer in the Goldman Sachs Data Center Product Group and co-chair of the Open Compute Project (OCP) Hardware Management track, discusses the open hardware specifications process and current and upcoming initiatives.

Q: Tell us about your role at Goldman Sachs.
A: I started out as an architect for what we call the Compute Farm—a grid computing environment that runs the risk and pricing infrastructure for the firm.  I had a small team of engineers and we scaled the grid to tens of thousands of computers. After that, I worked to define the architecture and operation of Dynamic Computing, the firm’s private cloud, and wrote a scalable machine deployment solution. I’m now part of the Data Center Product Group (DCPG), which focuses on computing engineering. We’re using emerging technologies to drive innovations in server hardware, virtualization, operating systems and provisioning.

Q: How did you become involved with the Open Compute Project?
A: Goldman Sachs has been involved with the OCP almost from the beginning, when Don Duet (co-head of Technology at Goldman Sachs) joined the project’s Board of Directors in 2011. Grant Richard, who leads Global Data Center Engineering and Operations, was also a member of the OCP Incubation Committee, and chair of the Hardware Management Track. Since I manage server hardware for the DCPG, it was natural to look for viable alternatives to the hardware we’re currently using. I started getting engaged with Open Compute two years ago because I saw value for the firm by switching to OCP hardware.

Q: What is OCP hardware?
A: The OCP is a community formed on the open source software model to drive the design, operation and management of server, storage, network and data center hardware and management. It consists of engineers from both the vendor and client communities. Our goal is to create designs that are vanity-free, low in cost and use significantly fewer parts. At the same time, by keeping the management of these components simple and similar, we encourage designs that can scale to high-performance computing (HPC), cloud and other large deployment use cases.

The OCP is organized into eight tracks: Server, Storage, Data Center Design, Networking, Hardware Management, Certification, Open Rack and Solution Providers. I’ve been involved mainly with the Hardware Management track. I served as its chair for the past year, and am now co-chair with Badriddine Khessib from Microsoft.

Q: What does the Hardware Management track do?
A: Our charter within the OCP is on defining specifications, focusing on firmware lifecycle, event alert and logs, remote management and strategic technologies enabling vendor agnostic operations and management of hardware. Our first spec was for single node management. Now we’re close to finishing one for multi-node management.

The problem is, how do you uniformly manage a server that has multiple computers in a single chassis? Our specification will define standard terms such as node, chassis and sled, and what functions must be available to access those components and manage them remotely. The end result will be a specifications document that vendors will implement in order to be OCP-certified.

Q: How does the track process work?
A: The track is basically an open forum. We typically have two meetings per month. One is a high-level hardware management meeting, the other is the working track meeting for the multi-node management spec. We have 10 to 15 people who are regular participants, and of course others come and go for shorter periods of time. Like the OCP community, we have both vendors and clients, representing financial institutions, social networking companies and so forth. Clients come up with the wish lists; vendors explain what can and can’t be done and the costs for each feature; and collectively we agree on the feature set.

My role as chair is to bring the team together in healthy discussion, make sure people are engaged, and that the interests of clients and vendors are represented in a balanced way. We want to make sure we come up with something realistic and beneficial for the community at large. The overall process, starting from scratch, can take from six months to a year, but we should be done with the multi-node management document soon.

Q: What happens then?
A: After we’re done, the specification goes to Incubation Committee for vetting. The IC may or may not come back with comments or changes, but once they approve, it becomes a spec. Meanwhile the Hardware Management track will start looking at firmware specifications. One of the biggest challenges for anyone who consumes OCP hardware is that the method of applying firmware, updating firmware and managing configuration settings is different from vendor to vendor. This initiative will engage beyond hardware vendors to include BIOS vendors. Even though the payload will be different for each platform, we need to have the same methods for applying firmware. We need a similar approach for applying settings. Some vendors will have more settings than others, but they should all provide the same nomenclature for the settings they all have.

Some of the other things we are looking at are common REST-based remote management across vendors, and bringing remote management of storage, networking and data center hardware into the same framework as servers.

Q: Are we deploying OCP hardware at Goldman Sachs?
A: We have started deploying OCP hardware in both our dynamic computing and compute farm environments. We have also written a common hardware management framework, which we are considering donating to OCP.

Q: One last question: I know you just returned from the first OCP EU Summit. What can you tell us about that?
A: The Open Compute summit in Paris had significant support from the European community and the French government. Philippe Dewost of Caisse des Dépôts, who leads the team driving public support for the French digital economy, delivered a keynote on the importance of OCP for the nation. Microsoft contributed its v2 Open Cloud Server (OCS) designs incorporating the Intel Haswell processor, including drawings and complete management software, which five vendors are already planning to build. On the hardware management track, we had several sessions on OCS v2 server management, as well as a session on a proposed implementation of multi-node management, which Microsoft has agreed to adopt for OCS once the specifications are finalized. We also opened discussions on a common approach to managing firmware and settings for OCP hardware, which saw significant interest from summit attendees. Finally, Goldman Sachs sponsored the Hardware Hackathon, won by students from Telecom SudParis.