HOW OpenAI USING KUBERNETES..

Sarvjeet Jain
5 min readDec 29, 2020

--

If you are looking to know HOW KUBERNETES HELPS OpenAI TO OVERCOME THEIR CHALLENGES, then WELCOME you are in the right place.

Before directly moving towards the Case Study let’s first discuss about WHAT IS KUBERNETES, etc.

1- WHAT IS KUBERNETES?-

Kubernetes , also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications.

It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community.

It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation . It aims to provide a “platform for automating deployment, scaling, and operations of application containers across clusters of hosts”. It works with a range of container tools and runs containers in a cluster, often with images built using Docker. Kubernetes originally interfaced with the Docker runtime through a “Dockershim”; however, the shim has since been deprecated in favor of directly interfacing with containerd or another CRI-compliant runtime.

Kubernetes is commonly used as a way to host a microservice-based implementation, because it and its associated ecosystem of tools provide all the capabilities needed to address key concerns of any microservice architecture.

3- WHY USE KUBERNETES?-

Because Kubernetes is an open source project, you can use it to run your containerized applications anywhere without needing to change your operational tooling. Kubernetes is maintained by a large community of volunteers and is always improving. Additionally, many other open source projects and vendors build and maintain Kubernetes-compatible software that you can use to improve and extend your application architecture.

1.RUN APPLICATIONS AT SCALE

Kubernetes lets you define complex containerized applications and run them at scale across a cluster of servers.

2.SEAMLESSLY MOVE APPLICATIONS

Using Kubernetes, containerized applications can be seamlessly moved from local development machines to production deployments on the cloud using the same operational tooling.

3.RUN ANYWHERE

Run highly available and scalable Kubernetes clusters on AWS while maintaining full compatibility with your Kubernetes deployments running on-premises.

4.ADD NEW FUNCTIONALITY

As an open source project, adding new functionality to Kubernetes is easy. A large community of developers and companies build extensions, integrations, and plugins that help Kubernetes users do more.

3- OpenAI -

OpenAI is an artificial intelligence research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company, considered a competitor to DeepMind, conducts research in the field of artificial intelligence (AI) with the stated goal of promoting and developing friendly AI in a way that benefits humanity as a whole. The organization was founded in San Francisco in late 2015 by Elon Musk, Sam Altman, and others, who collectively pledged US$1 billion. Musk resigned from the board in February 2018 but remained a donor. In 2019, OpenAI LP received a US$1 billion investment from Microsoft.

So, let’s go deep dive into the Case Study.

4- WHAT CHALLNEGES OPENAI HAD FACED BEFORE K8s?

Speed, Portability, and Cost were the main challenge.

An artificial intelligence research lab, OpenAI needed infrastructure for deep learning that would allow experiments to be run either in the cloud or in its own data center, and to easily scale. Portability, speed, and cost were the main drivers.

5- HOW K8s HELPS OPENAI?-

OpenAI began running Kubernetes on top of AWS in 2016, and in early 2017 migrated to Azure. OpenAI runs key experiments in fields including robotics and gaming both in Azure and in its own data centers, depending on which cluster has free capacity.

“We use Kubernetes mainly as a batch scheduling system and rely on our autoscaler to dynamically scale up and down our cluster,”

says Christopher Berner, Head of Infrastructure.

“This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration.”

6- IMPACT -

The company has benefited from greater portability:

“Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters,”

says Berner.

Being able to use its own data centers when appropriate is “lowering costs and providing us access to hardware that we wouldn’t necessarily have access to in the cloud,” he adds. “As long as the utilization is high, the costs are much lower there.” Launching experiments also takes far less time: “One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days. In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work.”

Different teams at OpenAI currently run a couple dozen projects. While the largest-scale workloads manage bare cloud VMs directly, most of OpenAI’s experiments take advantage of Kubernetes’ benefits, including portability. “Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters,”

says Berner

“You can just write some Python code, fill out a bit of configuration with exactly how many machines you need and which types, and then it will prepare all of those specifications and send it to the Kube cluster so that it gets launched there,”

says Berner.

The impact that Kubernetes has had at OpenAI is impressive. With Kubernetes, the frameworks and tooling, including the autoscaler, in place, launching experiments takes far less time. “One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days,”

says Berner

“In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work.”

So we have seen how K8s is solving challenges of OpenAI. Likewise openAI many other or almost all Organizations or MNCs are using K8s.

That’s all,

THANK YOU FOR READING THIS ARTICLE…

For more such type of articles stay connected.

--

--

No responses yet