Title: Elastic Cloud Services: Scaling Snowflake’s Control Plane
Snowflake’s “Data Cloud” enables data storage, processing, and analytic solutions in a performant, easy to use, and flexible manner. Although cloud service providers provide the foundational infrastructure to run and scale a variety of workloads, operating Snowflake on cloud infrastructure presents interesting challenges. Customers expect Snowflake to be available at all times and to run their workloads with high performance. Behind the scenes, the software that runs customer workloads needs to be serviced and managed. Additionally, failures in individual components such as Virtual Machines (VM) need to be handled without disrupting running workloads. As a result, lifecycle management of compute artifacts, their scheduling and placement, software rollout (and rollback) processes, replication, failure detection, automatic scaling, and load balancing become extremely important.
In this talk, we will cover the design and operation of Snowflake’s Elastic Cloud Services (ECS) layer that manages cloud resources at global scale to meet the needs of the Snowflake Data Cloud. It provides the control plane to enable elasticity, availability, fault tolerance and efficient execution of customer workloads. ECS runs on multiple cloud service providers and provides capabilities such as cluster management, safe code rollout and rollback, management of pre-started pools of running VMs, horizontal and vertical autoscaling, throttling of incoming requests, VM placement, load-balancing across availability zones and cross-cloud and cross-region replication.
Ioannis Papapanagiotou is a director of engineering working building a modern Blockchain Platform at Gemini. Ioannis is also a research assistant professor at the University of New Mexico. He holds a dual Ph.D. degree in Computer Engineering and Operations Research. His main focus is on data platforms, cloud computing, and corporate culture. In the past, Ioannis served as the senior manager of the Services organization at Snowflake supporting the cores services of the Snowflake’s Data Cloud. Prior to that, Ioannis was a senior manager at Netflix’s Data Platform building from ground up the storage and data integrations team and also serving as the leader of the key/value stores and database streaming infrastructure. Ioannis has served in the faculty ranks of Purdue University (tenure-track) and NC State University, and was an engineer at IBM and a mentor to several startups. He has been awarded the NetApp faculty fellowship and established an Nvidia CUDA Research Center at Purdue University. Ioannis has also received the IBM Ph.D. Fellowship, Academy of Athens Ph.D. Fellowship for his Ph.D. research, and best paper awards in several IEEE conferences for his academic contributions. Ioannis has authored a number of research articles and patents. Ioannis is a senior member of ACM and IEEE.