Case studies in cloud migration: Netflix, Pinterest, and Symantec

Chris Stokel-Walker

Case studies in cloud migration: netflix, pinterest, and symantec.

In October 2008, Neil Hunt, chief product officer at Netflix, gathered a meeting of a dozen or so of his engineering staffers in The Towering Inferno, the secluded top-floor meeting room at Netflix’s Los Gatos, CA headquarters. The room, which Netflix CEO Reed Hastings occasionally commandeers as his personal office, is away from the main office hustle and bustle of the start-up company, up a flight of stairs and across an outdoor wooden walkway up on the building’s rooftop—the ideal place for big-picture thinking.

Big thoughts were needed, because Netflix had a problem: its backend client architecture was, to put none too fine a term on it, crumbling more than the Colosseum and leaning more than the Tower of Pisa.

“We kept having issues with connections and threads,” Hunt  recalled  at an industry conference in Las Vegas, NV, six years later. “At one point we upgraded the machine to a fantastic $5 million box and it crashed immediately because the extra capacity on the thread pools meant we ran out of connection pools more quickly.”

It was an unenviable position to be in for the firm, which had introduced online streaming of its vast video library the year before. Netflix had just partnered with Microsoft to get its app on the Xbox 360, and had agreed to terms with the manufacturers of Blu-ray players and TV set-top boxes to service their customers. Millions of potential users of a new, game-changing technology were about to encounter what we now know as the multi-billion dollar industry of online video streaming that would transform Netflix from a failing company that mailed DVDs to movie buffs into a television and movie studio that rivaling some of Hollywood’s biggest names.

But, back in 2008, with a backend that couldn’t cope, the public wasn’t about to encounter anything—unless Netflix made some changes.

There were two points of failure in the physical technology, Hunt explained to the conference audience in Las Vegas: The disk array that ran Netflix’s database—a single Oracle database on an array of Blade webservers—and the single box that talked to it.

“We knew we were approaching a point where we needed to make this redundant,” said Hunt. But Netflix hadn’t yet forked out the cash for second data center that would alleviate the problem. “We were vulnerable to those single points of failure.”

“Let’s rethink this completely, go back to first principles, and think about doing it in the cloud.”

That much became abundantly clear in 2008 when the company pushed a piece of firmware to the disk array. It corrupted Netflix’s database, and the company had to spend three days scrambling to recover. (One contemporary news story on the outage—and the customer outrage it sparked—noted that some customers even went back to Blockbuster, which Netflix had made seem decrepit, for their DVDs.) “That wasn’t a total catastrophe because most customers weren’t reliant on the system being up to get value from the service,” explained Hunt—but as Netflix’s DVD mailing arm wound down and its new streaming service caught on, it would become a problem.

“We thought: ‘Let’s rethink this completely, go back to first principles, and think about doing it in the cloud’,” said Hunt.

Over the course of several meetings in The Towering Inferno, Hunt and his team thrashed out a plan that would ensure that database corruption—and the many other issues with connections and threads that seemed to plague the company back in 2008—would never happen again. They’d move to the cloud.

Whether companies are looking to run their applications serving millions of users or to underpin the databases and file servers of multinational businesses, the cloud provides a low-cost, flexible way to ensure reliable IT resources. Firms don’t need to worry about the physical upkeep of their own private data centers storing information; they can build out capacity as and when it’s needed, lowering costs and increasing their adaptability—important features for a young startup with unpredictable (and potentially limitless) growth. It has been a recent boon, born out by technological innovation, that helps power hundreds of thousands of companies, big and small, across the globe.

For Netflix, the move to the cloud proved a prescient decision: between December 2007 and December 2015, the number of hours of content streamed on Netflix increased one thousand times, and the company had eight times as many people signed up to the service at the end of its cloud migration process as it did at the start. Cloud infrastructure was able to stretch to meet this expanding demand while traditional server racks in a data center were not able to (the number of requests per month called through Netflix’s API  outstripped  the capacity of its traditional data center near the end of 2010). It also proved to be a major cost-saving move.

But at the same time, the cloud was still an unproven, young technology. Amazon, the current leader in cloud computing, had only been offering their Amazon Web Services (AWS) infrastructure products since 2006. Caution was required. Netflix started small, moving over a single page onto AWS to make sure the new system worked. “It’s nicely symbolic,” said Hunt. “We recognised that along the way we probably need to hire some new skills, bring in some new talent, and rethink our organisation.” The company chose AWS over alternative public cloud suppliers because of its breadth of features and its scale, as well as the broader variants of APIs that AWS offered.

“When Netflix made the decision to go all-in on the cloud, most people were barely aware the cloud existed.”

Today, the cloud is many companies’ first choice when it comes to storing data and serving their customers. AWS is a $12 billion company, four times bigger than it was in 2013. It has—and has long had—a 40% market share in the public cloud sector, much more than the combined market share of Microsoft, Google and IBM’s cloud offerings combined,  according to data  collated by Synergy Research Group. Those that aren’t utilising the cloud often feel they want to, and are frustrated when they can’t: Four in 10 businesses have critical company data trapped in legacy systems that can’t be accessed or linked to cloud services, according to a survey by market research company Vanson Bourne for commercial software company Snaplogic, while three in four say that their organization misses out on opportunities because of disconnected data. Vendors’ revenue from the sales of infrastructure products—including server, storage and Ethernet switches—for cloud IT topped $8 billion in the first quarter of 2017, according to analysts IDC.

But none of that was the case when Netflix started its great migration, nor was it true when Ruslan Meshenberg started at Netflix in January 2011, two years into Netflix’s big move. As one of the first companies to move its services into the cloud, Netflix was literally writing the rulebook for many of the tasks it was undertaking. Meshenberg was thrown in at the deep end.

“That was the very first set of objectives I was given,” he explains. “A complete data-center-to-cloud migration for a core set of platform services. Day one.”

It involved a lot of outside the box thinking—and plenty of trailblazing. “When Netflix made the decision to go all-in on the cloud, most people were barely aware the cloud existed,” he explains. “We had to find solutions to a lot of problems, at a time when there were not a lot of standard, off-the-shelf solutions.”

And the problems, when tackling such an enormous task as the migration of a company the size of Netflix, were numerous—particularly for a team used to the mindset that their system operated in a physical data center.

“When you’re operating in a data center,” says Meshenberg, “you know all of your servers. Your applications are running only on a particular set of hardware units.” The goal for the company in a physical data center is a simple one: keep the hardware running at all times, at all costs. That’s not the case with the cloud. Your software runs on ephemeral instances that aren’t guaranteed to be up for any particular duration, or at any particular time. “You can either lament that ephemerality and try to counteract it, or you can try and embrace it and say: ‘I’m going to build a reliable, available system on top of something that is not.’”

Which is where Netflix’s famed  Simian Army  comes in. You have to build a system that can fail—in part—while keeping up as a whole. But in order to figure out if your systems have that ability baked into their design, you need to test it.

Netflix built a tool that would self-sabotage its system, and christened it Chaos Monkey. It would be unleashed on the cloud system, wreaking havoc, bringing down aspects of the system as it rampaged around. The notion might seem self-defeating, but it had a purpose. “We decided to simulate the conditions of a crash to make sure that our engineers can architect, write and test software that’s resilient in light of these failures,” explains Meshenberg.

In its early days, Chaos Monkey’s tantrums in the cloud were a dispiriting experience. “It was painful,” Meshenberg admits. “We didn’t have the best practices, and so many of our systems failed in production. But now, since our engineers have this built-in expectation that our systems will have to be tested by Chaos Monkey, in production they’re now writing their software using the best practices that can withstand such destructive testing.”

Even without Chaos Monkey, there were still early setbacks, including a significant outage across North America on Christmas Eve 2012 thanks to an AWS update to elastic load balancers that tipped Netflix offline—a chastening event. But the company adapted, and came through it. By 2015 all of Netflix’s systems—bar its customer and employee data management databases, and billing and payment components—had been migrated to AWS. It would take a little longer before Meshenberg’s team could celebrate a job complete, but the relatively bump-free path (and the easy scaling up of systems as Netflix’s customer base skyrocketed) vindicated the move.

“The crux of our decision to go into the cloud,” says Meshenberg, was a simple one: “It wasn’t core to our business to build and operate data centers. It’s not something our users get value from. Our users get value from enjoying their entertainment. We decided to focus on that and push the underlying infrastructure to a cloud provider like AWS.”

For Netflix, dipping their toe into the water of cloud computing wasn’t an option. They had to dive in headfirst.

That said, making the leap was a brave move—not least given that, particularly when Netflix began its migration in 2009 and even when Meshenberg joined the company in 2011, cloud storage was still a relatively unknown technology in the Valley, and an unknown term to the general public. (The Institute of Electrical and Electronics Engineers (IEEE)  held  just its fourth ever international conference on cloud computing in Washington DC in 2011; technology analysts Gartner were still able, back in 2011, to  publish  a $2,000 “Hype Cycle” report explaining a technology that was on the rise.) Though those in the know understood the benefits of migrating to the cloud, and had a hunch that the general consensus would follow them, early adopters were still just that—pioneers pushing out the boundaries for the technology.

Going all-in on the cloud required betting on the future—and hoping that others would follow. But for Netflix, dipping their toe into the water of cloud computing wasn’t an option. They had to dive in headfirst.

“We had little doubt that cloud was the future,” explains Meshenberg. “If it was, it didn’t make sense to hedge our bets and straddle both worlds, because that would mean we would lose the focus of getting something done completely to the end.”

There was another factor in the decision for Netflix, too: scalability. “Our business was growing a lot faster than we would be able to build the capacity ourselves,” Meshenberg recalls. “Every time you grow your business your traffic grows by an order of magnitude, you have to rewrite the rules. The thing that worked for you at a smaller scale may no longer work at the bigger scale. We made a bet that the cloud would be a sufficient means in terms of capacity and capability to support our business, and the rest was figuring out the technical details of how.”

For Raj Patel, considering anything but the cloud was never really an option. Head of Cloud Engineering at Pinterest from 2014–2016, Patel joined a company that still had to engineer another move: from Amazon Web Service’s legacy cloud to a next-generation cloud system. “It wasn’t any different, frankly, than moving from a data center to the public cloud,” explains Patel. “We did a migration inside of Amazon.”

The move was one that some at the startup were wary of, even despite its benefits. “A cloud migration, in many cases, doesn’t necessarily get them anything,” says Patel. “The appeal has to be why they should do this before the five other things they were thinking about doing for their own group.”

At a small, nimble startup like Pinterest, time and resources are scarce, and an engineering team’s to-do list is as long as the sum of their collective arms. Getting people on-side with the cloud migration required deftness, discussion—and categorically not a top-down edict. It also required going person-to-person, winning small victories in support of the larger battle.

“You have to intuitively appeal or influence the motivations of an individual engineer to achieve your goal,” says Patel. “What I found was that at the earlier stages of the program I explicitly looked for folks that are early adopters or have a vested interest in doing that program or project, and you really focus on making them really successful. Then if the others see it they’ll get on board.”

Certain groups at Pinterest had pent-up frustrations with the older generation of Amazon’s cloud service, particularly when it same to the elasticity of potential future expansion. Data engineering-intensive applications ran up against walls with the old cloud server. Patel saw an in.

“We focused on those who would benefit the most,” he says, selling them on the idea of migrating over to a new cloud server, better equipped to deal with the developments they wanted to introduce. Patel’s team provided those early adopters with the tools to help them smoothly migrate over to the new cloud. That included embedding a consultant or solution engineer (rebranded “site reliability engineers” so as not to ruffle any feathers within the groups they joined) with each application team, who was able to provide the relevant tools and know-how to help ease the transition over to AWS. What the site reliability engineers from Patel’s team didn’t do, though, was impose any ideas or tools on the teams they joined.

"We focused on those who would benefit the most,” selling them on the idea of migrating over to a new cloud server, better equipped to deal with the developments they wanted to introduce.

“Any time you do a cloud migration—especially with engineers—there’s always this notion of: ‘Here’s my way of doing it, here’s your way of doing it: What’s the right way of doing it?’,” explains Patel. “If you had an outside group tell you this is the only way you’re going to do it, you’re going to run into a lot of friction.”

Rather, the teams worked collaboratively, engendering a sense of common purpose. Pinterest was, in truth, always going to make the move, and the company could have become forceful with its ideas, but Patel wanted a more consensual approach. “Their success is embedded with that application team,” says Patel. “Even though they might be talking about a central tool or approach, they’re perceived from the perspective of that application team.”

Like a pyramid scheme, the early adopters found success, and became proselytisers for the move. “When they talk to others at lunch, they say the migration is going really well; the guys doing it are really helpful, and it’s going just fantastic,” says Patel. “The next time you talk to the sceptics, they say: ‘Let’s go and do it.’”

At the same time, those systems that had successfully made the cloud-to-cloud migration were crowed about internally. Data democracy was crucial, says Patel, in getting across the message that the migration was something to be welcomed, not shunned. “We had important metrics about the progress we were making and would send it out to the whole engineering team to let them see it,” he explains. “People like data—engineers especially. They resonate with that progress.”

Six months later, Pinterest had transferred its backend to the more modern cloud system. The team held a party to celebrate the successful move, but truthfully, it was just another success for a company that has plenty of them.

“Think about it,” says Patel. “This was a company that was doubling or tripling in size every year. When I joined the company it was making $0 in revenue and the first year it was $100 million or something, then the next year something like three times that amount. That was the norm across the entire company. In some ways, it was just business as usual.”

When Patel moved to Symantec in April 2016 to become vice president for cloud platform engineering, things were far from business as usual.

“The magnitude of challenges are, I’d say, 5× with Symantec,” he explains. “That’s one of the things I’ve come to realize: While it’s interesting to talk about companies like Pinterest, Facebook or Instagram, their problem is already solved. They have some of the brightest engineers in the world, their applications are already designed for these cloud-type elastic architectures. In some ways, the challenge is not that interesting. But when you’re dealing with a 30-plus year-old company like Symantec, the challenge is a lot more interesting.”

For decades, Symantec had provided stability and assurance to customers—important, given its role as a security service. Unlike Pinterest, which was born in the cloud seven years ago, Symantec was founded in 1982, when computers were massive, hulking bits of hardware, hardwired to the wall. The company had been in business before the world wide web appeared as long as Pinterest has been in business, period. A publicly listed company—accountable to shareholders, with $3.6 billion of turnover—comes with more levels of hierarchy than a nimble, community-focused startup born in the Valley.

“There are more business units with general managers, instead of application teams,” explains Patel. “All those barriers are a lot more rigid in a larger enterprise than they are in the more nimble, engineering organisation approach you find in a startup.” There are also people who have been working in the company longer than some of Pinterest’s brightest young engineers—individuals who have decades of experience, and rightly should be listened to when they pass comment on the merits of such a move into the cloud. “Frankly, there were a lot of sceptics, and real architectural challenges in applications that simply have not been designed for the cloud,” says Patel. His work would end up closing down 27 separate data centers around the world and moving everything into the public cloud. The scale seemed almost insurmountable.

Even the business case for convincing staff at Symantec was more difficult; it simply wasn’t as easy an argument to make, because if it ain’t broke, why fix it?

“Your influencing job is probably 5× harder,” says Patel. “Because of the cultural transformation, you have to be a lot more convincing. You’re telling people to work differently which is very difficult, and sometimes the organization has the appetite to do those things, and sometimes they don’t.”

Much like Ruslan Meshenberg felt the need to win over his staff members, and just as Patel had to leverage the enthusiasm of early adopters at Pinterest to convince those who were less keen on taking the leap into the cloud, at Symantec Patel had to undergo a similar “hearts and minds” campaign.

Guided by Patel’s boss, the executive vice president of the sector, his team decided to show, not tell, fellow Symantec staffers about the benefits of cloud migration. “We took all the major classes of application and did a proof of concept for each one of them,” he explains. Patel’s team broke down the challenge, piece by piece, drawing up a technical feasibility study for each application, working with each group’s architect, building a proof of concept that could convince them such a move would work— “as opposed to saying: ‘We’re just going to run off this cliff and it’s going to work.’”

The attitude was a simple one: “Let’s remove the risk, and show that.”

It worked. Conviction built around the move; the only thing left to discuss was how exactly to handle the migration.

Big legacy companies planning a move to the cloud are faced with one of two options: They can go down the lift-and-shift path, or the fix-and-shift route.

The lift-and-shift route is the (comparatively) easy option. You take your pre-existing application as it presently works in a private data center, and make the minimum possible changes before moving it into the cloud. “I understand there’s going to be benefits to moving to the cloud, and I’m probably not going to realise most of them, but we’ll fix it later,” says Patel of the lift-and-shift approach.

Fix-and-shift is harder, but potentially more beneficial. You’re not just going to do the bare minimum work to ensure your application—which worked fine in an offline data center—will work in the cloud. You’re buying into the concept of moving to the cloud, fixing your culture along the way, and making it more adaptable to the new norm.

“A lot of the time what you’ll find is that traditional IT organisations tend to do lift-and-shift,” says Patel. “They’re taking the same thing they had in their private data centres and, whether it’s a corporate mandate or whatever, they say: ‘Let’s just go and move it to the cloud.’ They’re looking for roughly the same technical or organisational approaches to operating in the cloud before the cloud,” he adds. “And in my view, that’s why a lot of those efforts fail.”

It was the same choice that Neil Hunt and his team had considered back in The Towering Inferno conference room. “We could take the existing app, forklift it, and shove it into AWS, then start to chip away at it,” he explained. “That was unappealing. It would be easy to do but we’d bring along a lot of bad architecture and a lot of bad habits.”

Netflix’s second choice was equally unappealing at first glance, simply because of the scale of the task. “We would run our existing infrastructure, and side by side run our AWS infrastructure, and migrate one piece at a time, from one system to another.” As the cloud migration occurred, Netflix totally transformed. Its application also changed from a hulking, single monolithic application to a clutch of small microservices, each of which can be developed independent of the others. It recast the way the company thought about everything, completely changing the shape and makeup of the firm.

Years after Netflix’s brave decision to undergo the wholesale application and infrastructure refactor, Symantec came to the same decision: They’re fixing, then they’re shifting. Patel still has a way to go before he can breathe easily: The process has taken—and will take—time, but he’s hopeful about reaching the finish line that lingers temptingly on the horizon.

“I’ll personally feel a lot more excitement when we’re done here at Symantec, just because we’ll have done so much more organisationally,” he explains.

Patel already knows the jubilation that’s felt when you move an entire company into the cloud, and can’t wait to feel that again. For Ruslan Meshenberg, who had helped guide Netflix into the cloud without any major hitches, there was only one way to celebrate the achievement. It’s what Silicon Valley does best: Hold an amazing party.

“We had some fun, and we shared some battle stories,” says Meshenberg. The team shared a sense of achievement—personally and as a group. “Cloud migration involves every single person in a company, whether they’re engineering or not,” he adds.

Meshenberg, who had only known cloud migration in his time with the company, could move on from the project he was handed on the first day of his job, to task number two. It must’ve seemed easy-going in comparison, you’d think. “Relatively speaking,” he agrees — “but probably not less challenging. The only constant is change itself. Nothing stands still. We have to constantly re-evaluate our assumptions and ensure that our ecosystem evolves as well.”

“Cloud migration involves every single person in a company, whether they’re engineering or not.”

But Meshenberg still holds with him that sense of pride that his team and colleagues pulled off a major cloud migration without much of a hitch—and that they confounded the critics along the way, remaining ahead of the technical curve.

“When we went into the cloud we faced a lot of external scepticism, people saying this will never work, or that it may work but not for us,” he says. “It might not be secure enough, scalable enough—you name it.”

There’s a brief pause, a moment as Meshenberg collects his thoughts. Eventually, he comes out with 10 short words: “It was good to be able to get it done.”

About the author

Chris Stokel-Walker is a UK-based features journalist for The Economist , Bloomberg , the BBC, and Wired UK . His first book, YouTubers , was published in 2019, and his second, TikTok Boom, was published in July 2021.

Ray Oranges / Machas

ray-oranges.com

Buy the print edition

Visit the Increment Store to purchase print issues.

Continue Reading

Reliability, case study: how akamai weathered a surge in capacity growth, containers for the future, programming languages, julia: the goldilocks language, documentation, let’s talk about docs, the mystery of steganography, internationalization, it’s probably never going to work in german, open source, voting for transparency, what broke the bank, the team that powers vlc, explore topics.

  • Learn Something New
  • Scaling & Growth
  • Ask an Expert
  • Interviews & Surveys
  • Guides & Best Practices
  • Essays & Opinion
  • Workplace & Culture

Software Architecture

Energy & environment, development.

TechRepublic

Account information.

netflix cloud migration case study

Share with Your Friends

5 lessons IT learned from the Netflix cloud journey

Your email has been sent

Image of Keith Townsend

With all the talk of containers, cloud-native applications, and cluster management platforms, it’s assumed that the enterprise migration to the cloud is a foregone conclusion. I’ve wanted to provide a little perspective. Some perspective has come from the poster child for all cloud infrastructure, Netflix.

Netflix recently published a blog post championing their seven-year journey to the public cloud. Netflix has contributed significantly to the overall knowledge of operating a predominantly cloud-based service but, it’s only after years of preparation that they were able to migrate the streaming portion of their service completely to the public cloud. Here are five top takeaways from their journey.

1. It’s not about cost savings

One of the impressive attributes of the Netflix cloud use case is the clarity around the value of cloud. Netflix didn’t communicate cost reduction as an advantage. Instead, Netflix spoke of the advantages of scale and reliability in leveraging AWS. Netflix realized cost savings due to the elastic nature of their workloads. The company wasn’t burdened by the fixed cost of scaling their private data center for peak load. However, by building elasticity into their application, they were able to reap the additional benefit of cost savings.

SEE: Cloud Data Storage Policy Template (Tech Pro Research)

2. Change management

3. availability.

It’s tough to achieve five nines (99.999%) availability in a private data center. It’s even harder to achieve that level of availability in a cloud-based application. Netflix has finally reached a four nines availability. In the past, I’ve used the rule of thumb that applications with two to three nines are targets for cloud migration. Just as Netflix was selective in the services they migrated to the public cloud, it’s critical that organizations understand the impact of service availability when migrating to the public cloud.

4. Cloud-native

One of the reasons Netflix was able to save money on public cloud vs. the private data center is their application architecture. Netflix didn’t simply lift and shift monolithic applications from their private data center to an AWS VM. Taking the forklift approach ticks off the check box for cloud without any of the real benefits. Since scale and reliability were the primary factors in Netflix’s cloud decision, it required a re-architecture. Organizations considering cloud need to evaluate their current application architecture. Lift and shift to the cloud may just shift your existing problems to a new platform, sans the insight from managing your infrastructure.

SEE: Should you follow Netflix and run your business from the public cloud?

Netflix is held up as the gold standard for leveraging cloud computing. The median annual salary at Netflix is an eye-popping $180K. There are not very many organizations that have the resources or reputation to attract the talent needed to undertake a full migration to the public cloud. It’s likely that most organizations will seek outside assistance in their migration efforts. The Netflix journey highlights the need to investigate deeply the credentials of organizations looking to assist in your cloud journey.

Your thoughts?

What are some of the insights you observed from Netflix’s cloud journey? Share you thoughts in the comments section.

Subscribe to the Cloud Insider Newsletter

This is your go-to resource for the latest news and tips on the following topics and more, XaaS, AWS, Microsoft Azure, DevOps, virtualization, the hybrid cloud, and cloud security. Delivered Mondays and Wednesdays

  • Cloud security: 10 things you need to know
  • Netflix wraps up Amazon cloud migration, rolls credits on last video streaming datacenter
  • AWS outage: How Netflix weathered the storm by preparing for the worst
  • Cloud computing goes hybrid as the norm: AWS, VMware, Azure duke it out
  • Netflix on how to build services that scale beyond millions of users

Image of Keith Townsend

Create a TechRepublic Account

Get the web's best business technology news, tutorials, reviews, trends, and analysis—in your inbox. Let's start with the basics.

* - indicates required fields

Sign in to TechRepublic

Lost your password? Request a new password

Reset Password

Please enter your email adress. You will receive an email message with instructions on how to reset your password.

Check your email for a password reset link. If you didn't receive an email don't forgot to check your spam folder, otherwise contact support .

Welcome. Tell us a little bit about you.

This will help us provide you with customized content.

Want to receive more TechRepublic news?

You're all set.

Thanks for signing up! Keep an eye out for a confirmation email from our team. To ensure any newsletters you subscribed to hit your inbox, make sure to add [email protected] to your contacts list.

  • Artificial Intelligence
  • Business Operations
  • Cloud Computing
  • Computers and Peripherals
  • Data Center
  • Emerging Technology
  • Enterprise Applications
  • Small and Medium Business
  • Managed Service Providers
  • Mergers and Acquisitions
  • IT Leadership
  • Digital Transformation
  • IT Management
  • Productivity Software
  • Software Development
  • Vendors and Providers
  • Vendors and providers
  • Foundry Careers
  • Newsletters
  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Copyright Notice
  • Member Preferences
  • About AdChoices
  • Your California Privacy Rights

Our Network

  • Reseller News
  • Computerworld
  • Network World

Ten years on: How Netflix completed a historic cloud migration with AWS

Dave hahn, a senior engineer at netflix, explains how the company scaled to provide video streaming to 130 million subscribers.

In its 20-year history, Netflix has grown from a DVD rental website with 30 employees to a global streaming service with over 5,000 titles, 130 million subscribers and $11 billion annual revenue that has drastically transformed the entertainment industry.

At the  Consumer Electronics Show  in January 2016, Netflix CEO Reed Hastings launched the service in more than 130 countries.

“While you have been listening to me talk, the Netflix service has gone live in nearly every country of the world,” he announced on the Las Vegas Convention Centre stage. “Today, right now, you are witnessing the birth of a global TV network.”

This was all made possible by a radical transformation of a previously traditional IT operation, as Dave Hahn, a senior engineer in Netflix’s cloud operations and reliability team, explained recently at the  Service Desk and IT Support Show  in London.

“We flipped on the service for another 130 countries, and millions of new customers that we hadn’t previously been servicing,” said Hahn.

“I think you can imagine the amount of work and thinking and architecture design we had to do to open up to 130 countries, and millions of new customers just in that moment; the technical architecture, the research, the billing systems, the kind of people that we needed and the thinking about these kinds of problems in order to make that happen.”

The journey began when Netflix decided to move from its own data centres to the public cloud.

Migrating with micro-services

In 2008, Netflix was running relational databases in its own data centres when disaster struck. A data centre failure shut the entire service down and stopped DVD shipments for three days.

The company’s owners faced a choice: turn Netflix into a world-class data centre operations company or move the service to the public cloud.

Netflix was growing fast. The thousands of videos and tens of millions of customers was already generating an enormous quantity of data.

The company would struggle to rack the servers in their own data centres fast enough to handle the ever-growing volumes, but the cloud would let them add thousands of virtual servers and petabytes of storage within minutes.

A migration to the cloud was the clear choice. They soon became a poster child customer for Amazon Web Services (AWS), choosing the company for its scale and broad set of services and features.

The move would require a complete rearchitecting of the company’s traditional infrastructure though. They could have forklifted all of their monolithic enterprise systems out of the data centre and dropped them into AWS, but this would only have brought all of their old data centre problems to the cloud.

Instead, they chose to rebuild the Netflix technology in AWS and fundamentally change the way that the company operated.

“Software’s like anything else; if you can design it for the environment that it’s going to be living in it will do more of the things you want it to do, more often and more regularly,” said Hahn. “So we chose to move to micro-services.”

This made the infrastructure much more agile by breaking aspects of the service up into multiple  micro-services , managed by their own small teams who understood how their service worked and interacted with other systems. This was pretty groundbreaking at the time.

This provides clear, specific insights that make it easier to change the service, which leads to smaller and faster deployments. It also allows them to isolate services to understand the various performance profiles, patterns and securities in each micro-service, and move away from any individual piece that’s causing a problem.

“I don’t have to assemble all of these pieces built by other people in order to have a singular deployment,” said Hahn. “Any Netflix service team can deploy their service at any time. It requires no coordination, no scheduling, no crucible to get to production.”

Benefits of the cloud

It took Netflix seven years to complete the migration to the cloud. In 2016, the last remaining data centres used by the streaming service were shut down.

In its place was a new cloud infrastructure running all of Netflix’s computing and storage needs, from customer information to recommendation algorithms.

The migration improved Netflix’s scalability and service availability and the velocity by which the company could release new content, features, interfaces and interactions. It also freed up the capacity of engineers, cut the costs of streaming, drastically improved availability and added the experience and expertise of AWS.

“The other thing is that the cost model is really nice for us,” added Hahn. “You pay for what you use. That allows us to do a lot of experimentations.”

This gives them greater freedom to test new features and improve existing ones, such as the rows of content recommendations that are personalised every day.

“These large recommendation algorithms require a lot of compute work,” Hahn explained. “If I want to find out if a new one we’re playing with does better, I don’t want to turn off the old one, because you still need recommendations.

“I can now spin up an entirely new set of machines in the tens, or hundreds or thousands in an afternoon and chunk through my data and see if we’ve done better, and I only pay for the portions I use. It allows us an amazing amount of freedom in experimentation.”

Content delivery

The cloud is only one part of the Netflix user experience. Everything that happens before they hit play takes place in AWS, but the video content that follows comes from a separate system: Netflix OpenConnect, the company’s proprietary content delivery network (CDN). The OpenConnect appliances store the video content and deliver it to client devices.

CDNs are designed to deliver internet-based content to viewers by bringing it closer to where they’re watching. Netflix originally outsourced streaming video delivery to third-party CDN suppliers, but as the company grew, these vendors struggled to support the traffic. Netflix needed more control over the service and user experience.

The company decided to design a CDN tailored to its needs.

It now installs OpenConnect appliances that store and deliver content inside local Internet Service Provider (ISP) data centres, which isolates the Netflix service from the wider internet.

Popularity algorithms and storage techniques help distribute the content in ways that maximise offload efficiency. The system reduces the demand on upstream network capacity and helps Netflix work more closely with the ISP networks that host its traffic.

“We designed OpenConnect caching boxes to hold our content, and wherever we can we install them inside of your internet service provider’s network, so that when you see those video bits you aren’t actually transiting off of your operator’s network,” said Hahn.

The new system cut the appearances of the loathed buffering wheel by an order of magnitudes. It also allowed Netflix to make the CDN software more intelligent.

Now, whenever a customer presses play their device can get its content from numerous places on the internet.

The investment paid off when a fire in an ISP data centre in Brazil burned down Netflix’s entire stack of machines. Customers who had been streaming from the ISP didn’t experience any change in their user experience.

“Their devices already knew somewhere else to go get the data,” said Hahn. “It didn’t interrupt even one frame of streaming, when we literally burned down the base.”

Chaos engineering culture

Netflix developers are well known these days for their unique approach to engineering culture. A self-service chaos engineering tool called the Chaos Automation Platform was pioneered to test problems in their production environments so they can be sure that their software will behave as they want during a failure.

“People press the play button on Netflix thousands of times a second,” said Hahn. “If the systems cannot auto recover, if they cannot handle bad situations, if they cannot self-repair, by the time I get a human involved, in the best-case scenario, minutes have gone by.

“You can get an idea of how many of our customers we’ve disappointed in the three or four or five minutes it may take to get a human involved, and in the right place and working. Chaos engineering is an excellent inoculation to failures.”

They use the chaos engineering method to ensure Netflix can survive a failure in one of three AWS regions it uses. Every month, they turn off one of the regions and test that they can move all of the customers that it was serving to another one within six minutes.

To embrace chaos without causing destruction, Netflix had to create a corporate culture that supported such ideas.

The central principles were formalised in the 127-slide Netflix Culture Deck, which Facebook COO Sheryl Sandberg  said  “may well be the most important document ever to come out of the Valley”.

A central tenet of the policy is balancing freedom with responsibility. Teams are given ownership of their micro-service and encouraged to act independently but not recklessly.

“Netflix managers do not set out tasks for their employees to do or design their projects,” said Hahn. “Their job is to give them the appropriate context so they can make the decisions, to hire excellent, stunning colleagues for them to work with, and to stay out of their way.”

The company avoids making too many rules beyond a set of fundamental principles such as never touching customer data. Hahn describes the approach as building guardrails but not gates, and claims he can count on one hand the number of times he’s had to tell an engineer exactly what to do.

“By making sure that that context is widely and regularly shared I can have someone design a billing system, someone else working on algorithmic systems, SREs on our reliability teams, and somebody else working on customer service, and they’ll understand the same context and march towards the same goals,” he said.

“That allows us to keep those teams loosely covered. We don’t have lots of structures and controls, but we keep them highly aligned.”

(Reporting by Tom Macaulay, Computerworld)

Related content

Survey: declining awareness of complex cyber risk among sg smes, cloud transformation drives sap apj growth, juniper and westcon-comstor expand partnership into malaysia, ibm to acquire hashicorp for us$6.4b in hybrid cloud, ai play, show me more, fpt inks nvidia deal to become 'one-stop ai shop'.

Image

Epsilon and Moratelindo partner to connect Indonesian businesses to global Internet exchanges

Image

Intel appoints Hans Chuang to lead APJ business

Image

Jira Software

Project and issue tracking

Content collaboration

Jira Service Management

High-velocity ITSM

Visual project management

  • View all products

Marketplace

Connect thousands of apps and integrations for all your Atlassian products

Developer Experience Platform

Jira Product Discovery

Prioritization and roadmapping

You might find helpful

Cloud Product Roadmap

Atlassian Migration Program

Work Management

Manage projects and align goals across all teams to achieve deliverables

IT Service Management

Enable dev, IT ops, and business teams to deliver great service at high velocity

Agile & DevOps

Run a world-class agile software organization from discovery to delivery and operations

BY TEAM SIZE

Small Business

BY TEAM FUNCTION

Software Development

BY INDUSTRY

Telecommunications

Professional Services

What's new

Atlassian together.

Get Atlassian work management products in one convenient package for enterprise teams.

Atlassian Trust & Security

Customer Case Studies

Atlassian University

Atlassian Playbook

Product Documentation

Developer Resources

Atlassian Community

Atlassian Support

Enterprise Services

Partner Support

Purchasing & Licensing

Work Life Blog

netflix cloud migration case study

Atlassian Presents: Unleash

Product updates, hands-on training, and technical demos – catch all that and more at our biggest agile & DevOps event.

  • Atlassian.com
  • Enterprise Cloud
  • The space between: Netflix's cloud migration story

The space between: Netflix's cloud migration story

Netflix isn’t the typical company and its cloud migration isn’t the typical story. The journey to Confluence Cloud started routinely but slowly went sideways. Layers of data cruft from a decades-old, on-prem install; gaps in the migration tools; and migration testing that felt more like a slow spiral to nowhere led to a crisis point. In this session you’ll hear how new tools were developed on the fly to close the spaces between and how that led to a migration low on drama and high on likes.

netflix cloud migration case study

Senior Software Engineer, Netflix

netflix cloud migration case study

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Netflix Technology Blog

Netflix Technology Blog

Netflix TechBlog

By: Jitender Aswani (Data Engineering and Infrastructure), Sebastien de Larquier (Science & Analytics)

Netflix’s engineering culture is predicated on Freedom & Responsibility, the idea that everyone (and every team) at Netflix is entrusted with a core responsibility and they are free to operate with freedom to satisfy their mission. This freedom allows teams and individuals to move fast to deliver on innovation and feel responsible for quality and robustness of their delivery. Central engineering teams enable this operational model by reducing the cognitive burden on innovation teams through solutions related to securing, scaling and strengthening (resilience) the infrastructure.

A majority of the Netflix product features are either partially or completely dependent on one of our many micro-services (e.g., the order of the rows on your Netflix home page, issuing content licenses when you click play, finding the Open Connect cache closest to you with the content you requested, and many more). All these micro-services are currently operated in AWS cloud infrastructure.

As a micro-service owner, a Netflix engineer is responsible for its innovation as well as its operation, which includes making sure the service is reliable, secure, efficient and performant. This operational component places some cognitive load on our engineers, requiring them to develop deep understanding of telemetry and alerting systems, capacity provisioning process, security and reliability best practices, and a vast amount of informal knowledge about the cloud infrastructure.

While our engineering teams have and continue to build solutions to lighten this cognitive load (better guardrails, improved tooling, …), data and its derived products are critical elements to understanding, optimizing and abstracting our infrastructure. This is where our data (engineering and science) teams come in: we leverage vast amounts of data produced by our platforms and micro-services to inform and automate decisions related to operating the many components of our cloud infrastructure reliably, securely and efficiently.

In the next section, we will highlight some high level areas of focus in each dimension of our infrastructure. In the last section, we will attempt to feed your curiosity by presenting a set of opportunities that will drive our next wave of impact for Netflix.

In the Security space, our data teams focus almost all our efforts on detecting suspicious or malicious activity using a collection of machine learning and statistical models. Historically, this has been focussed on potentially compromised employee accounts, but efforts are in place to build a more agnostic detection framework that would consider any agent (human or machine). Our data teams also invest in building more transparency around our security and privacy to support progress in reducing threats and hazards faced by our micro-services or internal stakeholders.

In the Reliability space, our data teams focus on two main approaches. The first is on prevention: data teams help with making changes to our environment and its many tenants as safe as possible through contained experiments (e.g., Canaries ), detection and improved KPIs. The second approach is on the diagnosis side: data teams measure the impact of outages and expose patterns across their occurrence, as well as provide a connected view of micro-service-level availability.

In the Efficiency space, our data teams focus on transparency and optimization. In Netflix’s Freedom and Responsibility culture, we believe the best approach to efficiency is to give every micro-service owner the right information to help them improve or maintain their own efficiency. Additionally, because our infrastructure is a complex multi-tenant environment, there are also many data-driven efficiency opportunities at the platform level. Finally, provisioning our infrastructure itself is also becoming an increasingly complex task, so our data teams contribute to tools for diagnosis and automation of our cloud capacity management.

In the Performance space, our data teams currently focus on the quality of experience on Netflix-enabled devices. The main motivation is that while the devices themselves have a significant role in overall performance, our network and cloud infrastructure has a non-negligible impact on the responsiveness of devices. There is a continuous push to build improved telemetry and tools to understand and minimize the impact of our infrastructure in the overall performance of Netflix application across a wide range of devices.

In the People space, our data teams contribute to consolidated systems of record on employees, contractors, partners and talent data to help central teams manage headcount planning, reduce acquisition cost, improve hiring practices, and other people analytics related use-cases.

Challenges & Opportunities in the Infra Data Space

Security events platform for anomaly detection.

  • How can we develop a complex event processing system to ingest semi-structured data predicated on schema contracts from hundreds of sources and transform it into event streams of structured data for downstream analysis?
  • How can we develop templated detection modules (rules- and ML-based) and data streams to increases speed of development?

See open source project such as StreamAlert and Siddhi to get some general ideas.

Asset Inventory

  • How can we develop a dimensional data model representing relationships between apps, clusters, regions and other metadata including AMI / software stack to help with availability, resiliency and fleet management?
  • Can we develop learning models to enrich metadata with application vulnerabilities and risk scores?

Reliability

  • How can we guarantee that a code change will be safe when rolled out to our production environment?
  • Can we adjust our auto-scaling policies to be more efficiency without risking our availability during traffic spikes?

Capacity and Efficiency

  • Which resources (clusters, tables, …) are unused or under-utilized and why?
  • What will be the cost of rolling out the winning cell of an AB test to all users?

People Analytics

  • Can we support AB experiments related to recruiting and help improve candidate experience as well as attract solid talent?
  • Can we measure the impact of Inclusion and Diversity initiatives?

People & Security

  • How can we build a secure and restricted People Data Vault o provide a consolidated system of reference and allow apps to add additional metadata?
  • How can we automatically provision or de-provision access privileges?

Data Lineage

  • Can we develop a generalized lineage system to develop relationships among various data artifacts stored across Netflix data landscape?
  • Can we leverage this lineage solution to help forecast SLA misses and address Data Lifecycle Management questions (job cost, table cost, and retention)?

This was just a tiny glimpse into our fantastic world of Infrastructure Data Engineering, Science & Analytics. We are on a mission to help scale a world-class data-informed infrastructure and we are just getting started. Give us a holler if you are interested in a thought exchange.

Contributors: Sui Huang (S&A) is partnering on reimagining People initiatives.

Other Relevant Readings

  • More infrastructure-related post from the Netflix Tech Blog
  • How Netflix Works?
  • Ten years on: How Netflix completed a historic cloud migration with AWS
  • Amazon Fleet Management: Meet the Man Who Keeps Amazon Servers Running, No Matter What | Amazon Web Services
  • ADS Framework at Palantir
  • Building a SOCless detection team
  • Lessons Learned in Detection Engineering
  • Engineering Trade-Offs and The Netflix API Re-Architecture
  • Evolving the Netflix Data Platform with Genie 3
  • Making No-distributed Database Distributed — Dynomite
  • Detecting Credential Compromises in AWS
  • Dredge Analysis
  • Dredge Case Study
  • Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability & Efficiency

Netflix Technology Blog

Written by Netflix Technology Blog

Learn more about how Netflix designs, builds, and operates our systems and engineering organizations

More from Netflix Technology Blog and Netflix TechBlog

Predictive CPU isolation of containers at Netflix

Predictive CPU isolation of containers at Netflix

By benoit rostykus, gabriel hartmann.

Reverse Searching Netflix’s Federated Graph

Reverse Searching Netflix’s Federated Graph

By ricky gardiner, alex hutter, and katie lefevre.

The Making of VES: the Cosmos Microservice for Netflix Video Encoding

The Making of VES: the Cosmos Microservice for Netflix Video Encoding

Liwei guo, vinicius carvalho, anush moorthy, aditya mavlankar, lishan zhu.

Data Gateway — A Platform for Growing and Protecting the Data Tier

Data Gateway — A Platform for Growing and Protecting the Data Tier

Shahar zimmerman, vidhya arvind, joey lynch, vinay chella, recommended from medium, incremental processing using netflix maestro and apache iceberg, by jun he, yingyi zhang, and pawan dixit.

High-Level System Architecture of Booking.com

Talha Şahin

High-Level System Architecture of Booking.com

Take an in-depth look at the possible high-level architecture of booking.com..

netflix cloud migration case study

General Coding Knowledge

netflix cloud migration case study

data science and AI

netflix cloud migration case study

Predictive Modeling w/ Python

netflix cloud migration case study

Love Sharma

ByteByteGo System Design Alliance

System Design Blueprint: The Ultimate Guide

Developing a robust, scalable, and efficient system can be daunting. however, understanding the key concepts and components can make the….

How does Uber build real-time infrastructure to handle petabytes of data every day?

Data Engineer Things

How does Uber build real-time infrastructure to handle petabytes of data every day?

All insights from the paper: real-time data infrastructure at uber.

Roadmap to Learn AI in 2024

Benedict Neo

bitgrit Data Science Publication

Roadmap to Learn AI in 2024

A free curriculum for hackers and programmers to learn ai.

Multi Objective Optimisation in Suggestions Ranking @ Flipkart

Pranjal Sanjanwala

Flipkart Tech Blog

Multi Objective Optimisation in Suggestions Ranking @ Flipkart

The evolution of ltr ranking models in a high-scale and low-latency system.

Text to speech

Case Study: How Netflix uses Cloud for Innovation, Agility and Scalability

Suggested articles, top 5 predictions for the cloud in 2019, how to streamline your data archival process using the cloud, cloudsine showcases weborion to protect cloud-based websites and web applications at 10-11 april aws summit 2019, cloudsine @ div0 startup quarter – on 23 may 2019, cloud security seminar – on 4 sept 19, the future of digital workforce with intelligent automation seminar , 21 nov 19, case study: cloudsine accelerates centre for evidence and implementation (cei)’s cloud adoption journey, aws cloudgoat and mitigation strategies: part 1, aws cloudgoat and mitigation strategies: part 2, aws cloudgoat and mitigation strategies: part 3, aws cloudgoat and mitigation strategies: part 4, aws cloudgoat and mitigation strategies: part 5, weborion® launches javascript malware detection engine (jme), cloudsine | weborion® supports the launch of community-focused ai security quarter in div0 on 31 mar 2021, cloudsine is excited to partner with sginnovate new frontier event to build up deep tech community, product announcement: enhanced email alerts for weborion defacement monitor, cloudsine and weborion signs technology alliance partnership with new net technologies, sutd x cloudsine – artificial intelligence award, sginnovate’s powerx programme with cloudsine, statement on apache log4j2 remote code execution (rce) vulnerability on weborion products and customers – cve-2021-44228, the cybersecurity implications for website owners from the russia-ukraine conflict, how are hacktivists shaping the cybersecurity posture of nation-states in the russia-ukraine conflict, weborion® adds smart image hash (sih) feature to improve monitoring of compressed images, weborion® introduces ai nlp for web defacement monitoring, cloudsine | weborion® – technology alliance partnership with netrust pte ltd, the serverless model for the uninitiated, seamless integration with the weborion® api, the weborion® defacement monitor cloud saas is now available on aws marketplace, cloudsine exhibited in govware 2022, 18th to 20th october, dns – a brief summary of an easily overlooked system, enumerate, secure and detect changes in dns records, port scanning – exposing your network’s points of entry, anyone can enumerate your web server using port scanning tools, weborion® anti-defacement and web security stack is now available on indonesia’s lkpp e-katalog, weborion® anti-defacement dan web security stack kini sudah hadir di e-katalog lkpp indonesia, what’s new in pci-dss v4.0: payment page javascript monitoring, what’s new in pci-dss v4.0: http header tamper detection, magecart and card skimming detection, what’s new in pci-dss v4.0: supply chain inventory of software, what’s new in pci-dss v4.0: ssl cert monitoring, preventing web defacement: a technical manager’s guide to securing web applications.

“Planning without taking action is the slowest route to victory. Taking action without planning is the noise before defeat.” - Sun Tzu, The Art of War Introduction to cloud computing It is said that the world evolves at the speed of technological evolution. Organizations are constantly looking for new technologies...

“The best way to predict the future is to invent it.” - Peter Drucker, American economist and corporate philosopher Cloud computing, the advancement of computing over a network of servers, was a key driver of the tech industry in 2018. Mergers and acquisitions between large and small companies led to...

How to Streamline Your Data Archival Process using the Cloud Data archiving is the process of moving data that is no longer essential to a separate data store for long-term retention. Archived data consists of older data that might serve some importance to the organization, possibly for future reference or...

Event Date: 10-11 April 2019 Last Updated: 27 May 2019 The AWS Summit Singapore brings together the cloud computing community to connect, collaborate and learn about AWS. Cloudsine was recognized as an AWS Technology Partner and an AWS Consulting Partner last year. During the Summit, we showcased WebOrion™, an all-in-one...

Event Date: 23 May 2019 Last Updated: 27 May 2019 Startup Quarter - Secure your cloud, was a meetup organized by Division Zero (Div0) on 23rd May 2019 at ACE. Div0 an open, inclusive, and completely volunteer-driven cybersecurity community with a mission of promoting a vibrant cybersecurity community and safer...

Event Date: 4 Sept 2019 Last Updated: 10 Sept 2019 Cloudsine is glad to host our very own seminar on the theme of Cloud Security: Myths, New Security Concerns and Mitigations on 4 Sept 2019. Glad to engage and interact with an audience of Singapore and Indonesia customers, resellers and...

Event Date: 21 Nov 2019 Last Updated: 25 Nov 2019 Cloudsine had our first RPA seminar in collaboration with Automation Anywhere on the theme of the “Future of Digital Workforce with Intelligent Automation” on 21 November 2019. It was our pleasure to have Ehunt Siow and Sundarraj Subrammani from Automation...

Cloudsine Accelerates Centre for Evidence and Implementation (CEI)’s Cloud Adoption Journey by Providing Data Migration, a Customized File Portal that Integrates with AWS S3 and Cloud Data Security Assurance. The Centre for Evidence and Implementation (CEI)is a global team of research, policy and practice experts based in Australia, Singapore…

Part 1 of our own series of articles on CloudGoat and mitigation strategies. This is a step by step breakdown on how to interpret and think like an attacker and also how to go about mitigating the attacks.

Part 2 of our own series of articles on CloudGoat and mitigation strategies. This is a step by step breakdown on how to interpret and think like an attacker and also how to go about mitigating the attacks.

In this third part, we will explore privilege escalation using EC2 instance profile attachment to obtain full admin privileges on the AWS account and also exploiting SSRF on EC2’s metadata service to get credentials.

This is part 4 of the series on AWS Cloudgoat Scenarios and the mitigation strategies series where we explore and see how remote command injection on a web application can be used to compromise the AWS environment.

WebOrion® is pleased to announce the launch of our new Javascript Malware Detection Engine(JME). The JME adds to the powerful capabilities of our WebOrion® Monitor to detect defacements, malicious scripts and other website threats. Today, practically every website uses JavaScript. The power and flexibility of a scripting language embedded within...

AI Security Quarter of Div0 was officially launched on 31 Mar 2021 over virtual Zoom and attended by >50 cybersecurity and AI enthusiasts in Singapore. Cybersecurity and AI are both critically important technologies for the digital future. Attackers are using more automation and AI to help them probe and attack...

Cloudsine is excited to partner with SGInnovate at the New Frontier event on 10 Apr 2021 to help build up the deeptech community. The New Frontier event is organized by SGInnovate with Guest of Honour, Lawrence Wong (Minister for Education), to promote the growth of the deeptech ecosystem in Singapore...

Email alerts are the primary method that WebOrion Defacement Monitor uses to inform our customers about the changes to their websites. Through these email alerts, users are informed if their website becomes unreachable, or if any of WebOrion’s various engines are triggered during webpage monitoring. The email alerts are important...

Cloudsine, the parent company of WebOrion, is excited to announce the technology alliance partnership with New Net Technologies (NNT). NNT is a Cybersecurity and Compliance software company based in UK and is widely deployed in many Enterprise and Government Organizations globally. Cloudsine provides cloud consulting services and offers web defacement...

Cloudsine is honoured to partner with SUTD to sponsor the artificial intelligence award to nurture interests and identify talents in this area. Students from SUTD who are interested may enroll in 50.021 Artificial Intelligence module offered by the Information Systems Technology and Design (ISTD) pillar. In this course, students will...

"The PowerX Programme has given young Cybersecurity companies like us a boost to identify, recruit and train cyber talents that are critical for our growth." Matthias Chin, Founder and CEO of Cloudsine. The PowerX Cybersecurity and Software & Product Development programmes are SGInnovate’s 12-month programmes including structured training and industry...

There was a high severity vulnerability (CVE-2021-44228) impacting multiple versions of the Apache Log4j which was disclosed publicly on December 9, 2021. An attacker who can control log messages or log message parameters can execute arbitrary code loaded from LDAP servers when message lookup substitution is enabled. The vulnerability impacts...

Cyber Threat Activities from the Russia-Ukraine Cyberwar The Russian incursion into Ukraine has led to a conflict that involves both the physical and cyber domains, with hacking groups of differing allegiances launching cyberattacks on government, military, financial and telecommunication websites. Cybersecurity specialists worldwide have highlighted growing concerns that the intensifying...

The Resurgent Threat of Hacktivism As the Russia-Ukraine conflict intensifies, cyberwarfare continues to be waged between the two countries. Concerns remain that state-backed hacker groups may target organisations outside of Eastern Europe in retaliation for the global sanctions imposed on Russia, or as false-flag operations to further promote political narratives....

WebOrion® is glad to introduce a new feature into the existing Integrity Analytics engine – Smart Image Hash (SIH). SIH helps reduce false alerts regarding image changes by analysing them in a smarter way. Images can make websites look more attractive and have been widely adopted ever since the inclusion...

AI technologies have been widely applied to different fields, but have you ever heard of using AI technologies to monitor the defacement of webpages? WebOrion is glad to introduce a new engine to the WebOrion defacement monitoring platform – AI Natural Language Processing (NLP) Engine. This engine analyzes webpage-changes and...

Cloudsine, the parent company of WebOrion, is pleased to announce the technology alliance partnership with Netrust Pte Ltd. Netrust is an established company since 1997 and is Asia’s first Public Certification Authority (CA) andSingapore’s only commercial IMDA-accredited CA. Cloudsine provides cloud consulting services and offers web defacement detection and response...

“Serverless” is a buzzword that is thrown around especially in the cloud industry. For the inexperienced, it may seem intuitive - “server” and “less”. It does not mean having less servers, but it actually refers to lesser (or no) management of servers. Serverless services allow developers to build and run...

Discover what is the WebOrion® API and the benefits of integrating the API with various systems such as Content Management Systems like Wordpress, and in the coming days, SIEM and SOAR systems. Our simple and easy-to-follow demonstration will also show you how to seamlessly integrate the WebOrion® API with your...

For the past decade, Cloudsine has been working with Amazon Web Services (AWS) to serve the market. Cloudsine is a consulting and technology partner with AWS and has used cloud computing to build and run many secure applications to support enterprise and government customers across Asia Pacific countries including Singapore,...

Thanks to all who visited our booth at GovWare 2022, held on 18-20 October 2022, at Sands Expo and Convention Centre, Singapore. We sincerely hope that all who visited us were able to catch a glimpse of what we currently do in the cybersecurity space with WebOrion, as well as...

As a website owner, one would surely come across the Domain Name Service (DNS). DNS is an extremely critical system on the Internet, as it is a system that helps translate domain names (which are easily recognisable and remembered) into IP addresses. It is important for all website owners to...

In this video, we will be sharing with you why your DNS records are important. How an outsider can conduct DNS enumeration to determine the attack surface. What can you do to hide and secure your DNS records. What are some tools WebOrion provide that can detect changes to the...

In computer networking, ports are points of entry to your computer – virtual origins and/or destinations of network connections. Port number definition and standardisation is overseen by the Internet Assigned Numbers Authority (IANA). Based on the list maintained by IANA, there are three types of ports amongst the total number...

In this video, we will discuss how hackers can easily enumerate your web server and potentially find vulnerabilities that they can exploit. It is important to understand how these attacks work so that you can take steps to protect your server and your website. We will walk through the process...

Today, we are excited to announce WebOrion® Defacement Monitor and Restorer is listed as a partner of LKPP E-Katalog, https://e-katalog.lkpp.go.id/, for PEP Category, Software Security, and Antivirus License. This opens up a new channel for Indonesian public sector agencies to quickly start protecting and monitoring their websites from cyber attacks...

Hari ini, dengan gembira kami mengumumkan WebOrion® Defacement Monitor and Restorer terdaftar sebagai mitra LKPP E-Katalog, https://e-katalog.lkpp.go.id/, untuk Kategori Peralatan Elektronik dan Pendukungnya, Keamanan Perangkat Lunak, dan Lisensi Antivirus. Hal ini membuka saluran baru bagi lembaga sektor public dan Pemerintah di Indonesia untuk segera mulai melindungi dan memantau situs website...

PCI-DSS is a set of security standards designed to ensure that all companies that accept, process, store or transmit credit card information maintain a secure environment. This article is part of a series of articles under the “What’s New in PCI-DSS v4.0” series where we explore what has changed in...

What is Magecart? Magecart is a type of cybercriminal group that specializes in stealing credit card information from online stores (a.k.a card skimming). The group's attacks typically involve injecting malicious code into the checkout pages of e-commerce websites to steal payment card data from customers. The Magecart group is known...

While web defacements may not be the most prevalent cyber attack in recent years, the consequences of web defacements are real – reputations may be damaged, client-customer relationships may be broken, financial losses may occur, etc. Web defacements can come in various forms, visual or non-visual (script inclusions). Hackers may...

In this video, we'll be discussing the important topic of preventing web defacement - a type of cyber attack that involves unauthorized alteration of a website's content or appearance. As a technical manager or CTO, it's crucial to understand the methods and motivations behind web defacement attacks and take steps...

“Planning without taking action is the slowest route to victory. Taking action without planning is the noise before defeat.” – Sun Tzu, The Art of War

Introduction to cloud computing

It is said that the world evolves at the speed of technological evolution. Organizations are constantly looking for new technologies such as cloud computing to meet their goals strategically and to drive business value. This article will address how cloud computing can possibly help organizations adopt efficient technologies and also improve productivity.

Cloud computing refers to computing on a network of remote servers accessible over the web, in order to store, manage, and process data. It utilizes computing resources of cloud providers, such as their data centers, instead of having the organization build their own local infrastructure.

Regardless of whichever industry one’s company belongs to (finance, retail or real estate), it is always advisable to understand the technology that other corporations are adopting. This is to solidify a competitive advantage by examining the lessons learned and best practises developed along the way.  In order to understand this better, let us illustrate how Netflix utilized cloud services to reach its level of success today.

Amazon Web Services used by Netflix

The cloud is an enabling technology for AI to mine and analyze data for deeply embedded insights. Cloud computing contributed an innovative breakthrough of accelerators for AI software. An accelerator is a class of microprocessor or system that is designed to provide hardware acceleration for AI applications such as neural networks, computer vision and machine learning. Currently, it is rather difficult for on-premise hardware to match the processing power of the accelerator hardware residing in the worldwide data centers of cloud providers (Source: Gartner, The Google Guys). Furthermore, cloud providers possess one more advantage over non-cloud AI: their extensive global network of data centers are in the better position to process the massive amount of data being generated all over the world. This alone makes it substantially easier to train machine learning models and neural networks for data insights and pattern recognition.

Analyzing customer data creates customer insights for any organization. It helps management avoid making assumptions about customers, which may be misinterpreted by the customer as apathy. Data analytics and personalized customer assistance (PCA) features are actually the largest areas of innovation on the cloud for organizations, up till 2019 (Source: InsideBigData, Google Cloud, IBM).

What sort of cloud services aid in discovering data insights and building personalized assistance features for customers? For one, Netflix uses Amazon RDS and DynamoDB, which provide the structural organization that these are cloud services that helps to build, develop and deploy custom machine learning models for each organization based on its unique goals and work environments.

While deciding whether to produce “House of Cards”, whereby 26 episodes cost $100 million in production, Netflix decided that it was more intelligent to use data analytics to determine which fan bases its new drama should target. These data were captured on their database for analysis. Using machine learning, they were targeting its marketing appeal at the fans of the British House of Cards, as well as the long-time fans of actor Kevin Spacey and director David Fincher.

With cloud-based AI services, organizations can index their entire product/service catalogue based on each customer/user’s profile. Age, location, gender, and other profile data helped to determine which products should be ranked first for each individual customer. Customers with different likings and profile data would see a personalized set of recommended products specially curated for their viewing. These personalized services tend to make users feel important and valued by the enterprise, instead of just being a source of revenue and hence retains the organisation’s customer base.

Cloud agility refers to the rapid provisioning of computer-related resources. The Cloud environment can usually provide compute instances or storage in minutes. Before cloud providers took off with IaaS, one had to email infrastructure suppliers and wait for a few weeks before the supplier replied with the requested provisions. (Source: Netflix, Amazon Case Study on Netflix). The existing IaaS delivery is executed using the consoles of cloud providers, allowing a faster release of new features for users. The benefit of such services reduce the time taken to develop, test and deploy software applications.

Most successful companies share a common trait: they had people who started developing a product/service prototype way ahead of their peers. The reason for their success is rather obvious – the first-mover advantage. Cloud computing is a technology designed to help organizations obtain the first-mover advantage, as evident from their rich variety of service offerings.

How did Netflix utilize agility features of the cloud for the cloud migration of their operations? They rebuilt their app functions inside the native cloud development environments first, later including app development for business operations. The large, cumbersome Netflix service of 2008 was refactored into microservices and unstructured scalable databases.

Netflix’s cloud database usage followed a pay-as-you-use basis, which helped them save costs whenever they rolled out the AI based feature called top personalized recommendations. (the AI has to mine data from their database, and so the database has to be hosted properly and securely on Amazon’s cloud). This AI feature, top personalized recommendations, showed users niche titles that would not be available on traditional cable networks but were similar to content liked by the user. As such, users purchased these niche titles more, generating more revenue; Netflix no longer had to spend so much money on acquiring new content to sell to users. The costs saved were estimated to be $1 billion by Netflix’s Vice President and Chief Product Officer, in a research paper published by them.

With such a progressive implementation, the management became more strategic and informed about budget evaluations and approvals. The purchase of hardware and the progressive release schedule of the re-morphing Netflix became more streamlined day by day. Gradually, a large organization like Netflix was no longer constrained by physical compute-resources and grew to become the global Internet TV network everyone knows today.

Scalability

Scalability refers to a software-based product or service which retains its intended function with no quality compromise when moved to an environment with more incoming customers. The user’s needs must be met no matter what changes and the response time should not get longer. The elaborations below highlight the relevance of cloud scalability to your organization.

By using services from cloud providers like AWS and Open Connect (for streaming), Netflix expanded its network of servers (both physical and virtual) from North America to the rest of the world, including areas like Europe and India.

Netflix is one example of an organization using the cloud. By running on AWS, it provided billions hours of service to customers around the globe. Users can order its products/services from almost anywhere in the world, using PCs, tablets, or mobile devices. 10,000 customer orders were processed every second during Netflix’s last peak demand season. This is a stark contrast from the few thousand DVD orders Netflix could handle in its early days before streaming and migrating to the cloud. Having 86 million customers worldwide who consume 150 million hours of content daily, this is rather strong evidence about how the cloud has powered Netflix’s scalability of business operations.

Cloud providers like AWS provide technologies such as container auto-scaling and application level load-balancing, to support the customer service that Netflix provides. Cloud providers possess the resources to handle the gigantic operational loads of their client organizations. The compute resources they provide are globally available, enabling customers from around the world to place orders literally anytime they prefer. Organizations that face a small home-country market no longer have to worry about global expansion.

Conclusion and Final thoughts

The most important action taken by enterprise organizations in 2018 was to engage a professional cloud vendor experienced in providing step-by-step solutions and enterprise-level cybersecurity.

Enterprise innovation is now centered upon (but not limited to) cloud-native machine learning models and data analytics. These technologies offer a pleasant side benefit, assisting organizations in managing their vast amount of customer and operations-related data. In the area of enterprise agility, cloud providers and third-party resellers have created intelligent software so as to help enterprises make important decisions faster than ever. In the area of scalability, the cloud has empowered various organizations to serve their users and customers around the world with better availability and response times. Selling to more customers beyond the home-country is now easier.

It is important for organizations to understand global industrial changes. The future of the cloud computing will continue to be several billion dollar industries, such as AI innovation, blockchain and cloud security (Source: Forbes). Hence, most organizations now find their boardroom discussions increasingly centered upon the topic of technology  in business strategy.

“Victorious warriors win first and then go to war, while defeated warriors go to war first and then seek to win”

― Sun Tzu, The Art of War

Contact

The Curie, 83 Science Park Drive,#02-01C, 118258, Singapore

Email

netflix cloud migration case study

Netflix’s Cloud Migration

Netflix has easily become one of the top streaming services worldwide. it kickstarted dvds into extinction and is one of the most prominent players in its field. before it came to be such a powerhouse, like many businesses, netflix faced struggles with disasters and outages..

cloud computing

Starting in 2008, Netflix was hit with a shock that changed the way they looked at and handled their databases. The IT department assumed that if they attempted to make their systems perfect, then developers would no longer need to worry about future failures. Netflix was running on high-end and costly IBM P-series hardware, Oracle database. The assumptions proved false when a SAN hardware failure resulted in a two day outage. Across management, questions were raised. Eventually, the company came up with the declaration that availability was concerning applications. As the application was the matter of the situation, Netflix’s people realized that there was no need for expensive hardware, so they would use a more cost friendly cloud infrastructure. It was more than the question of whether this strategy was going to push them forward rather than was it simply working.

A year later, with Netflix’s growth, they found themselves in need for more data center capacity. The rapid increase for Netflix did not allow them to spend the time to estimate how much capacity and where to store all the data. The future logistics were unpredictable since previous data were based on the shipping of DVDs. With the advancement of IT, scaling is now based on systems of engagement and how many customers they have. Previously with Netflix’s DVD business, there would be only one or two engagements per week. Customers were only browsing and watching movies a few times a week since they were limited to how quickly they would receive the product. Many traditional enterprises are similar in this way of customer interaction. Looking at the now widely used streaming business of Netflix, people are binge watching episodes of television daily. With the streaming service, there is more that goes into it like progression logging. If a customer were to stop halfway through whatever they were watching, the system would know where they got to. Quality of service in total is more advanced. Interactions during streaming is also amplified versus in the DVD business. Netflix calculated a thousandfold increase of traffic in the datacenter with the streaming service. With the switch to streaming was growing quickly, Netflix was faced with the issue of building data centers just as fast.

Netflix was faced with two options: recruit expensive, world class data center operations and guess how much data they would need to build centers before it was needed or use Amazon Web Services, created by one of Netflix’s biggest competitors. With the latter route, Netflix would save tremendous amounts of money which could be used for development and video content. Netflix decided to go with option number two, allowing for extra cash to be invested back into the company.

Later in 2009, mitigating risks was one of the top priorities. Competition, capacity, business and publicity were four aspects most focused and explored on. They learned more about AWS and learned it was separate from Amazon Prime, Netflix’s competitor. Netflix conducted experiments on capacity to see what worked, how quickly systems could be grabbed, and which datacenter would be deployed. Additionally, Netflix signed up for a business license agreement with AWS so they would not have to run the service on a click through license on a credit card. Fast forward to April of 2010, New York Times wrote an article on the unity of AWS and Netflix. They were the first to join cloud computing at the time and the article caught a lot of publicity.

After transitioning to cloud computing via AWS, Netflix found they could receive capacity on demand allowing for backlogs to be shut down. As more time passed, Netflix came to the point where if they did not transition all mission critical applications to AWS they would not survive. A hard deadline was created to migrate to the cloud to further concentrate the focus of Netflix’s priorities. Overall, the data migration sequence started with the simplest API sequences, then the simplest web pages, and finally the rest of the web pages and API’s one by one. The gradual migration was accomplished by some of the site traffic directed to the old data centers, while AWS was serving the others when possible.

Like any other business utilizing the cloud, Netflix also participated in a vigorous data backup strategy to ensure their success. The Netflix we see and use today exists because they transitioned to cloud computing. With the development of technology in recent times, any company seeking to migrate to the cloud has an easier and simpler progression.

netflix cloud migration case study

By using this website, you agree to our use of cookies. We use cookies to provide you with a great experience and to help our website run effectively.

Sunteco | Beyond the Clouds

Sun Container Spinner

Serverless Container Orchestration Platform

netflix cloud migration case study

High-performance Virtual Machine

netflix cloud migration case study

Sun Kafka Highway

Event Streaming Platform

netflix cloud migration case study

Unlimited Object Storage

sunteco-logo-icon

Humans of SUNTECO

See the story behind and meet our experts

Discover your future at SUNTECO

Let us know if you need any help

Documentation

Where you can find reference, tutorials, tips and tricks of every products

Contact experts

Need technical support from our expert? Let's start a conversation

Stay up on the latest technology news and other cool stuffs

See how Sun Cloud leverage your IT systems

  • Get Started

Blogs , Use cases

7 best case-studies for migrating from on-premise to cloud.

For most businesses considering cloud migration, it’s essential to understand that cloud platform benefits come alongside considerable challenges, including improving availability and latency, auto-scale orchestration, managing tricky connections, scaling the development process effectively, and addressing cloud security challenges.

netflix cloud migration case study

A transformation example when moving from On-premise to Cloud

#1 Betabrand: Bare Metal to Cloud

netflix cloud migration case study

CloudBetabrand  (est. 2005) is a crowd-funded, crowd-sourced retail clothing e-commerce company that designs, manufactures, and releases limited-quantity products via its website.

– Migration objective 

The company struggled with the maintenance difficulties and lack of scalability of the bare metal infrastructure supporting their operations. Planning for and adding capacity took too much time and added costs. They also needed the ability to handle website traffic surges better.

– Key Takeaways

  • With planning, cloud migration can be a simple process. Betabrand’s 2017 on-premise to cloud migration proved smooth and simple. Before actual migration, they created multiple clusters in GKE and performed several test migrations, identifying the right steps for a successful launch.
  • Cloud streamlines load testing.  Betabrand was able to quickly create a replica of its production services that they could use in load testing. Tests revealed poorly performing code paths that would only be revealed by heavy loads. They could fix the issues before Black Friday.
  • Cloud’s scalability is key to customer satisfaction.  As a fast-growing e-commerce business, Betabrand realized they couldn’t afford the downtime or delays of bare metal. Their cloud infrastructure scales automatically, helping them avoid issues and keep customers happy. This factor alone underlines the strategic importance of cloud computing in business organizations like Betabrand.

#2 Spotify: Bare Metal to Cloud

netflix cloud migration case study

Spotify’s leadership and engineering team agreed: The company’s massive in-house data centers were difficult to provision and maintain, and they didn’t directly serve the company’s goal of being the “best music service in the world.” They wanted to free up Spotify’s engineers to focus on innovation. They started planning for migration to Google Cloud Platform (GCP) in 2015, hoping to minimize disruption to product development and minimize the cost and complexity of hybrid operation.

  • Gaining stakeholder buy-in is crucial.  Spotify was careful to consult its engineers about the vision. Once they could see what their jobs looked like in the future, they were all-in advocates.
  • Migration preparation shouldn’t be rushed.  Spotify’s dedicated migration team took the time to investigate various cloud strategies and build out the use case showing the benefits of cloud computing to the business. They carefully mapped all dependencies. They also worked with Google to identify and orchestrate the right cloud strategies and solutions.
  • Focus and dedication pay huge dividends.  Spotify’s dedicated migration team kept everything on track and in focus, making sure everyone involved was aware of experience and lessons already learned. In addition, since engineering teams were fully focused on the migration effort, they could complete it more quickly, reducing the disruption to product development

#3 Waze: Cloud to Multi-cloud

Waze (est. 2006; acquired by Google in 2013) is a GPS-enabled navigation application that uses real-time user location data and user-submitted reports to suggest optimized routes.

Though Waze moved to the cloud very early on, their fast growth quickly led to production issues that caused painful rollbacks, bottlenecks, and other complications. They needed to get faster feedback to users while mitigating or eliminating their production issues.

  • Some business models may be a better fit for multiple clouds.  Cloud strategies are not one-size-fits-all. Waze’s stability and reliability depend on avoiding downtime, deploying quick fixes to bugs, and ensuring the resiliency of their production systems. Running on two clouds at once helps make it all happen.
  • Your engineers don’t have to be cloud experts to deploy effectively. Spinnaker streamlines multi-cloud deployment for Waze such that developers can focus on development, rather than on becoming cloud experts.
  • Deploying software more frequently doesn’t have to mean reduced stability/reliability c ontinuous delivery can get you to market faster, improving quality while reducing risk and cost.

#4 Dropbox: Cloud to Hybrid

netflix cloud migration case study

Dropbox had developed its business by using the cloud — specifically, Amazon S3 (Simple Storage Service) — to house data while keeping metadata housed on-premise. Over time, they feared they’d become overly dependent on Amazon: not only were costs increasing as their storage needs grew, but Amazon was also planning a similar service offering, Amazon WorkDocs. Dropbox took back their storage to help them reduce costs, increase control, and maintain their competitive edge.

  • On-premise infrastructure may still be right for some businesses.  Since Dropbox’s core product relies on fast, reliable data access and storage, they need to ensure consistently high performance at a sustainable cost. Going in-house required an enormous investment, but improved performance and reduced costs may serve them better in the long run. Once Dropbox understood that big picture, they had to recalculate the strategic importance of cloud computing to their organization.
  • Size matters.  As  Wired  lays out in  its article detailing the move , cloud businesses are not charities. There’s always going to be a margin, a margin somewhere. If a business is big enough — like Dropbox — it may make sense to take on the difficulties of building a massive in-house network. But tension enormous risk, an enormous risk for businesses that aren’t big enough, or whose growth may stall.

#5 GitLab: Cloud to Cloud

netflix cloud migration case study

GitLab’s core application enables software development teams to collaborate on projects in real time, avoiding both handoffs and delays. GitLab wanted to improve performance and reliability, accelerating development while making it as seamless, efficient, and error-free as possible. While they acknowledged Microsoft Azure had been a great cloud provider, they strongly believed that GCP’s Kubernetes were the future, calling it “a technology that makes reliability at massive scale possible.”

  • Containers are seen by many as the future of DevOps.  GitLab was explicit that they view Kubernetes as the future.   Indeed, containers provide notable benefits, including a smaller footprint, predictability, and the ability to scale up and down in real time. For GitLab’s users, the company’s cloud-to-cloud migration makes it easier to get started with using Kubernetes for DevOps.
  • An enormous benefit, improved stability and availability can be an enormous benefit of cloud migration. In GitLab’s case, mean-time between outage events pre-migration was 1.3 days. Excluding the first day post-migration, they’re up to 12 days between outage events. Pre-migration, they averaged 32 minutes of downtime weekly; post-migration, they’re down to 5.

#6 Cordant Group: Bare Metal to Hybrid

netflix cloud migration case study

– Migration objective

Over the years, the Cordant Group had grown tremendously, requiring an extensive IT infrastructure to support their vast range of services. While they’d previously focused on capital expenses, they’d shifted to looking at OpEx, or operational expenses — which meant cloud’s “pay as you go” model made increasing sense. It was also crucial to ensure ease of use and robust data backups.

  • Business and user needs drive cloud needs.  That’s why cloud strategies will absolutely vary based on a company’s unique needs. The Cordant Group needed to revisit its cloud computing strategy when users were unable to quickly access the files they needed. In addition, with such a diverse user group, ease of use had to be a top priority.
  • Cloud ROI ultimately depends on how your business measures ROI.  The strategic importance of cloud computing in business organizations is specific to each organization. Cloud became the right answer for the Cordant Group when OpEx became the company’s dominant lens.

#7 Shopify: Cloud to Cloud

netflix cloud migration case study

Shopify wanted to ensure they were using the best tools possible to support the evolution needed to meet increasing customer demand. Though they’d always been a cloud-based organization, building and running their e-commerce cloud with their own data centers, they sought to capitalize on the container-based cloud benefits of immutable infrastructure to provide better support to their customers. Specifically, they wanted to ensure predictable, repeatable builds and deployments; simpler and more robust rollbacks; and elimination of configuration management drift.

  • Immutable infrastructure vastly improves deployments.  Since cloud servers are never modified post-deployment, configuration drift — in which undocumented changes to servers can cause them to diverge from one another and from the originally deployed configuration — is minimized or eliminated. This means deployments are easier, simpler, and more consistent.
  • Scalability is central to meeting the changing needs of dynamic e-commerce businesses.  Shopify is home to online shops like Kylie Cosmetics, which hosts flash sales that can sell out in 20 seconds. Shopify’s cloud-to-cloud migration helped its servers flex to meet fluctuating demand, ensuring that commerce isn’t slowed or disrupted.

Which Cloud Migration Strategy Is Right for You?

As these 7 case studies show, cloud strategies are not one-size-fits all. Choosing the right cloud migration strategy for your business depends on several factors, including your:

  • Goals. What business results do you want to achieve because of the migration? How does your business measure ROI? What problems are you trying to solve via your cloud migration strategy?
  • Business model.  What is your current state? What are your core products/services and user needs, and how are they affected by how and where data is stored? What are your development and deployment needs, issues, and constraints? What are your organization’s cost drivers? How is your business affected by lack of stability or availability? Can you afford downtime?
  • Security needs.  What are your requirements regarding data privacy, confidentiality, encryption, identity and access management, and regulatory compliance? Which cloud security challenges pose potential problems for your business?
  • Scaling needs.  Do your needs and usage fluctuate? Do you expect to grow or shrink?
  • Disaster recovery and business continuity needs. What are your needs and capabilities in this area? How might your business be affected in the event of a major disaster — or even a minor service interruption?
  • Technical expertise.  What expertise do you need to run and innovate your core business? What expertise do you have in-house? Are you allocating your in-house expertise to the right efforts?
  • Team focus and capacity.  How much time and focus can your team dedicate to the cloud migration effort?
  • Timeline.  What business needs to constrain your timeline? What core business activities must remain uninterrupted? How much time can you allow for planning and testing your cloud migration strategy?

In short, with the list of questions above and 7 case studies of companies’ successful cloud migrations. You can start with a plan, understanding the goals and desires of your business. Learn the right tools to lead you to cloud strategies and solutions that will work best for your business.

Ready to take the next step on your cloud journey?

Sunteco Cloud provides technology solutions and products through the Sunteco Cloud ecosystem, accelerating the digital transformation journey of businesses of all sizes. We focus on creating simple, interesting and scalable solutions to ensure that we understand your business and its challenges, to effectively deliver significant results. Each product is just a piece in the big business picture. So we don’t start from our product but from your business.

>> Learn more Sunteco Cloud ecosystem

>> See more Sunteco Cloud product pricing

Source: Distillery

Ready when you are — let's set up your free account

Distillery

10 Important Cloud Migration Case Studies You Need to Know

Aug 1, 2019 | Engineering

netflix cloud migration case study

For most businesses considering cloud migration, the move is filled with promise and potential. Scalability, flexibility, reliability, cost-effectiveness, improved performance and disaster recovery, and simpler, faster deployment — what’s not to like? 

It’s important to understand that cloud platform benefits come alongside considerable challenges, including the need to improve availability and latency, auto-scale orchestration, manage tricky connections, scale the development process effectively, and address cloud security challenges. While advancements in virtualization and containerization (e.g., Docker, Kubernetes) are helping many businesses solve these challenges, cloud migration is no simple matter. 

That’s why, when considering your organization’s cloud migration strategy, it’s beneficial to look at case studies and examples from other companies’ cloud migration experiences. Why did they do it? How did they go about it? What happened? What benefits did they see, and what are the advantages and disadvantages of cloud computing for these businesses? Most importantly, what lessons did they learn — and what can you learn from them? 

With that in mind, Distillery has put together 10 cloud migration case studies your business can learn from. While most of the case studies feature companies moving from on-premise, bare metal data centers to cloud, we also look at companies moving from cloud to cloud, cloud to multi-cloud, and even off the cloud. Armed with all these lessons, ideas, and strategies, you’ll feel readier than ever to make the cloud work for your business.

Challenges for Cloud Adoption: Is Your Organization Ready to Scale and Be Cloud-first?

We examine several of these case studies from a more technical perspective in our white paper on Top Challenges for Cloud Adoption in 2019 . In this white paper, you’ll learn:

  • Why cloud platform development created scaling challenges for businesses
  • How scaling fits into the big picture of the Cloud Maturity Framework
  • Why advancements in virtualization and containerization have helped businesses solve these scaling challenges
  • How companies like Betabrand, Shopify, Spotify, Evernote, Waze, and others have solved these scaling challenges while continuing to innovate their businesses and provide value to users

Download your Top Challenges for Cloud Adoption white paper

#1 Betabrand : Bare Metal to Cloud

Cloud Migration: Betabrand Logo

Betabrand (est. 2005) is a crowd-funded, crowd-sourced retail clothing e-commerce company that designs, manufactures, and releases limited-quantity products via its website. 

Migration Objective 

The company struggled with the maintenance difficulties and lack of scalability of the bare metal infrastructure supporting their operations. 

Planning for and adding capacity took too much time and added costs. They also needed the ability to better handle website traffic surges.

Migration Strategy and Results 

In anticipation of 2017’s Black Friday increased web traffic, Betabrand migrated to a Google Cloud infrastructure managed by Kubernetes (Google Kubernetes Engine, or GKE). They experienced no issues related to the migration, and Black Friday 2017 was a success. 

By Black Friday 2018, early load testing and auto-scaling cloud infrastructure helped them to handle peak loads with zero issues. The company hasn’t experienced a single outage since migrating to the cloud.

Key Takeaways

  • With advance planning, cloud migration can be a simple process. Betabrand’s 2017 on-premise to cloud migration proved smooth and simple. In advance of actual migration, they created multiple clusters in GKE and performed several test migrations, thereby identifying the right steps for a successful launch.
  • Cloud streamlines load testing. Betabrand was able to quickly create a replica of its production services that they could use in load testing. Tests revealed poorly performing code paths that would only be revealed by heavy loads. They were able to fix the issues before Black Friday. 
  • Cloud’s scalability is key to customer satisfaction. As a fast-growing e-commerce business, Betabrand realized they couldn’t afford the downtime or delays of bare metal. Their cloud infrastructure scales automatically, helping them avoid issues and keep customers happy. This factor alone underlines the strategic importance of cloud computing in business organizations like Betabrand. 

#2 Shopify : Cloud to Cloud

Cloud Migration: Shopify Logo

Shopify (est. 2006) provides a proprietary e-commerce software platform upon which businesses can build and run online stores and retail point-of-sale (POS) systems. 

Shopify wanted to ensure they were using the best tools possible to support the evolution needed to meet increasing customer demand. Though they’d always been a cloud-based organization, building and running their e-commerce cloud with their own data centers, they sought to capitalize on the container-based cloud benefits of immutable infrastructure to provide better support to their customers. Specifically, they wanted to ensure predictable, repeatable builds and deployments; simpler and more robust rollbacks; and elimination of configuration management drift. 

By building out their cloud with Google, building a “Shop Mover” database migration tool, and leveraging Docker containers and Kubernetes, Shopify has been able to transform its data center to better support customers’ online shops, meeting all their objectives. For Shopify customers, the increasingly scalable, resilient applications mean improved consistency, reliability, and version control.

  • Immutable infrastructure vastly improves deployments. Since cloud servers are never modified post-deployment, configuration drift — in which undocumented changes to servers can cause them to diverge from one another and from the originally deployed configuration — is minimized or eliminated. This means deployments are easier, simpler, and more consistent.
  • Scalability is central to meeting the changing needs of dynamic e-commerce businesses. Shopify is home to online shops like Kylie Cosmetics, which hosts flash sales that can sell out in 20 seconds. Shopify’s cloud-to-cloud migration helped its servers flex to meet fluctuating demand, ensuring that commerce isn’t slowed or disrupted.

#3 Spotify: Bare Metal to Cloud

Cloud Migration: Spotify Logo

Spotify (est. 2006) is a media services provider primarily focused on its audio-streaming platform, which lets users search for, listen to, and share music and podcasts.

Spotify’s leadership and engineering team agreed: The company’s massive in-house data centers were difficult to provision and maintain, and they didn’t directly serve the company’s goal of being the “best music service in the world.” They wanted to free up Spotify’s engineers to focus on innovation. They started planning for migration to Google Cloud Platform (GCP) in 2015, hoping to minimize disruption to product development, and minimize the cost and complexity of hybrid operation. 

Spotify invested two years pre-migration in preparing, assigning a dedicated Spotify/Google cloud migration team to oversee the effort. Ultimately, they split the effort into two parts, services and data, which took a year apiece. For services migration, engineering teams moved services to the cloud in focused two-week sprints, pausing on product development. For data migration, teams were allowed to choose between “forklifting” or rewriting options to best fit their needs. Ultimately, Spotify’s on-premise to cloud migration succeeded in increasing scalability while freeing up developers to innovate. 

  • Gaining stakeholder buy-in is crucial. Spotify was careful to consult its engineers about the vision. Once they could see what their jobs looked like in the future, they were all-in advocates. 
  • Migration preparation shouldn’t be rushed. Spotify’s dedicated migration team took the time to investigate various cloud strategies and build out the use case demonstrating the benefits of cloud computing to the business. They carefully mapped all dependencies. They also worked with Google to identify and orchestrate the right cloud strategies and solutions. 
  • Focus and dedication pay huge dividends. Spotify’s dedicated migration team kept everything on track and in focus, making sure everyone involved was aware of past experience and lessons already learned. In addition, since engineering teams were fully focused on the migration effort, they were able to complete it more quickly, reducing the disruption to product development.

#4 Evernote : Bare Metal to Cloud

Cloud Migration: Evernote Logo

Evernote (est. 2008) is a collaborative, cross-platform note-taking and task management application that helps users capture, organize, and track ideas, tasks, and deadlines.

Evernote, which had maintained its own servers and network since inception, was feeling increasingly limited by its infrastructure. It was difficult to scale, and time-consuming and expensive to maintain. They wanted more flexibility, as well as to improve Evernote’s speed, reliability, security, and disaster recovery planning. To minimize service disruption, they hoped to conduct the on-premise to cloud migration as efficiently as possible. 

Starting in 2016, Evernote used an iterative approach : They built a strawman based on strategic decisions, tested its viability, and rapidly iterated. They then settled on a cloud migration strategy that used a phased cutover approach, enabling them to test parts of the migration before committing. They also added important levels of security by using GCP service accounts , achieving “encryption at rest,” and improving disaster recovery processes. Evernote successfully migrated 5 billion notes and 5 billion attachments to GCP in only 70 days. 

  • Cloud migration doesn’t have to happen all at once. You can migrate services in phases or waves grouped by service or user. Evernote’s phased cutover approach allowed for rollback points if things weren’t going to according to plan, reducing migration risk. 
  • Ensuring data security in the cloud may require extra steps. Cloud security challenges may require extra focus in your cloud migration effort. Evernote worked with Google to create the additional security layers their business required. GCP service accounts can be customized and configured to use built-in public/private key pairs managed and rotated daily by Google.
  • Cloud capabilities can improve disaster recovery planning. Evernote wanted to ensure that they would be better prepared to quickly recover customer data in the event of a disaster. Cloud’s reliable, redundant, and robust data backups help make this possible. 

#5 Etsy : Bare Metal to Cloud

Cloud Migration: Etsy Logo

Etsy (est. 2005) is a global e-commerce platform that allows sellers to build and run online stores selling handmade and vintage items and crafting supplies.

Etsy had maintained its own infrastructure from inception. In 2018, they decided to re-evaluate whether cloud was right for the company’s future. In particular, they sought to improve site performance, engineering efficiency, and UX. They also wanted to ensure long-term scalability and sustainability, as well as to spend less time maintaining infrastructure and more time executing strategy.

Migration Strategy and Results

Etsy undertook a detailed vendor selection process , ultimately identifying GCP as the right choice for their cloud migration strategy . Since they’d already been running their own Kubernetes cluster inside their data center, they already had a partial solution for deploying to GKE. They initially deployed in a hybrid environment (private data center and GKE), providing redundancy, reducing risk, and allowing them to perform A/B testing. They’re on target to complete the migration and achieve all objectives. 

Key Takeaways 

  • Business needs and technology fit should be periodically reassessed. While bare metal was the right choice for Etsy when it launched in 2005, improvements in infrastructure as a service (IaaS) and platform as a service (PaaS) made cloud migration the right choice in 2018.
  • Detailed analysis can help businesses identify the right cloud solution for their needs. Etsy took a highly strategic approach to assessment that included requirements definition, RACI (responsible, accountable, consulted, informed) matrices, and architectural reviews. This helped them ensure that their cloud migration solution would genuinely help them achieve all their goals.
  • Hybrid deployment can be effective for reducing cloud migration risk. Dual deployment on their private data center and GKE was an important aspect of Etsy’s cloud migration strategy. 

#6 Waze : Cloud to Multi-cloud

Cloud Migration: Waze Logo

Waze (est. 2006; acquired by Google in 2013) is a GPS-enabled navigation application that uses real-time user location data and user-submitted reports to suggest optimized routes.

Though Waze moved to the cloud very early on, their fast growth quickly led to production issues that caused painful rollbacks, bottlenecks, and other complications. They needed to find a way to get faster feedback to users while mitigating or eliminating their production issues.  

Waze decided to run an active-active architecture across multiple cloud providers — GCP and Amazon Web Services (AWS) — to improve the resiliency of their production systems. This means they’re better-positioned to survive a DNS DDOS attack, or a regional or global failure. An open source continuous delivery platform called Spinnaker helps them deploy software changes while making rollbacks easy and reliable. Spinnaker makes it easy for Waze’s engineers to deploy across both cloud platforms, using a consistent conceptual model that doesn’t rely on detailed knowledge of either platform .  

  • Some business models may be a better fit for multiple clouds. Cloud strategies are not one-size-fits-all. Waze’s stability and reliability depends on avoiding downtime, deploying quick fixes to bugs, and ensuring the resiliency of their production systems. Running on two clouds at once helps make it all happen. 
  • Your engineers don’t necessarily have to be cloud experts to deploy effectively. Spinnaker streamlines multi-cloud deployment for Waze such that developers can focus on development, rather than on becoming cloud experts. 

Deploying software more frequently doesn’t have to mean reduced stability/reliability. Continuous delivery can get you to market faster, improving quality while reducing risk and cost.

#7 AdvancedMD : Bare Metal to Cloud

Cloud Migration: AdvancedMD Logo

AdvancedMD (est. 1999) is a software platform used by medical professionals to manage their practices, securely share information, and manage workflow, billing, and other tasks. 

AdvancedMD was being spun off from its parent company, ADP; to operate independently, it had to move all its data out of ADP’s data center. Since they handle highly sensitive, protected patient data that must remain available to practitioners at a moment’s notice, security and availability were top priorities. They sought an affordable, easy-to-manage, and easy-to-deploy solution that would scale to fit their customers’ changing needs while keeping patient data secure and available.

AdvancedMD’s on-premise to cloud migration would avoid the need to hire in-house storage experts, save them and their customers money, ensure availability, and let them quickly flex capacity to accommodate fluctuating needs. It also offered the simplicity and security they needed. Since AdvancedMD was already running NetApp storage arrays in its data center, it was easy to use NetApp’s Cloud Volumes ONTAP to move their data to AWS. ONTAP also provides the enterprise-level data protection and encryption they require.

  • Again, ensuring data security in the cloud may require extra steps. Though cloud has improved or mitigated some security concerns (e.g., vulnerable OS dependencies, long-lived compromised servers), hackers have turned their focus to the vulnerabilities that remain. Thus, your cloud migration strategy may need extra layers of controls (e.g., permissions, policies, encryption) to address these cloud security challenges.
  • When service costs are a concern, cloud’s flexibility may help. AdvancedMD customers are small to mid-sized budget-conscious businesses. Since cloud auto-scales, AdvancedMD never pays for more cloud infrastructure than they’re actually using. That helps them keep customer pricing affordable.

#8 Dropbox : Cloud to Hybrid

Cloud Migration: Dropbox Logo

Dropbox (est. 2007) is a file hosting service that provides cloud storage and file synchronization solutions for customers.

Dropbox had developed its business by using the cloud — specifically, Amazon S3 (Simple Storage Service) — to house data while keeping metadata housed on-premise. Over time, they began to fear they’d become overly dependent on Amazon: not only were costs increasing as their storage needs grew, but Amazon was also planning a similar service offering, Amazon WorkDocs. Dropbox decided to take back their storage to help them reduce costs, increase control, and maintain their competitive edge. 

While the task of moving all that data to an in-house infrastructure was daunting, the company decided it was worth it — at least in the US (Dropbox assessed that in Europe, AWS is still the best fit). Dropbox designed in-house and built a massive network of new-breed machines orchestrated by software built with an entirely new programming language, moving about 90% of its files back to its own servers . Dropbox’s expanded in-house capabilities have enabled them to offer Project Infinite, which provides desktop users with universal compatibility and unlimited real-time data access.

  • On-premise infrastructure may still be right for some businesses. Since Dropbox’s core product relies on fast, reliable data access and storage, they need to ensure consistently high performance at a sustainable cost. Going in-house required a huge investment, but improved performance and reduced costs may serve them better in the long run. Once Dropbox understood that big picture, they had to recalculate the strategic importance of cloud computing to their organization.  
  • Size matters. As Wired lays out in its article detailing the move , cloud businesses are not charities. There’s always going to be margin somewhere. If a business is big enough — like Dropbox — it may make sense to take on the difficulties of building a massive in-house network. But it’s a huge risk for businesses that aren’t big enough, or whose growth may stall.

#9 GitLab : Cloud to Cloud

Cloud Migration: GitLab Logo

GitLab (est. 2011) is an open core company that provides a single application supporting the entire DevOps life cycle for more than 100,000 organizations. 

GitLab’s core application enables software development teams to collaborate on projects in real time, avoiding both handoffs and delays. GitLab wanted to improve performance and reliability, accelerating development while making it as seamless, efficient, and error-free as possible. While they acknowledged that Microsoft Azure had been a great cloud provider, they strongly believed that GCP’s Kubernetes was the future, calling it “a technology that makes reliability at massive scale possible.” 

In 2018, GitLab migrated from Azure to GCP so that GitLab could run as a cloud-native application on GKE. They used their own Geo product to migrate the data, initially mirroring the data between Azure and GCP. Post-migration, GitLab reported improved performance (including fewer latency spikes) and a 61% improvement in availability.    

  • Containers are seen by many as the future of DevOps. GitLab was explicit that they view Kubernetes as the future. Indeed, containers provide notable benefits, including a smaller footprint, predictability, and the ability to scale up and down in real time. For GitLab’s users, the company’s cloud-to-cloud migration makes it easier to get started with using Kubernetes for DevOps.
  • Improved stability and availability can be a big benefit of cloud migration. In GitLab’s case, mean-time between outage events pre-migration was 1.3 days. Excluding the first day post-migration, they’re up to 12 days between outage events. Pre-migration, they averaged 32 minutes of downtime weekly; post-migration, they’re down to 5. 

#10 Cordant Group : Bare Metal to Hybrid

Cloud Migration: Cordant Group Logo

The Cordant Group (est. 1957) is a global social enterprise that provides a range of services and solutions, including recruitment, security, cleaning, health care, and technical electrical.

Over the years, the Cordant Group had grown tremendously, requiring an extensive IT infrastructure to support their vast range of services. While they’d previously focused on capital expenses, they’d shifted to looking at OpEx, or operational expenses — which meant cloud’s “pay as you go” model made increasing sense. It was also crucial to ensure ease of use and robust data backups.

They began by moving to a virtual private cloud on AWS , but found that the restriction to use Windows DFS for file server resource management was creating access problems. NetApp Cloud ONTAP, a software storage appliance that runs on AWS server and storage resources, solved the issue. File and storage management is easier than ever, and backups are robust, which means that important data restores quickly. The solution also monitors resource costs over time, enabling more accurate planning that drives additional cost savings. 

  • Business and user needs drive cloud needs. That’s why cloud strategies will absolutely vary based on a company’s unique needs. The Cordant Group needed to revisit its cloud computing strategy when users were unable to quickly access the files they needed. In addition, with such a diverse user group, ease of use had to be a top priority.
  • Cloud ROI ultimately depends on how your business measures ROI. The strategic importance of cloud computing in business organizations is specific to each organization. Cloud became the right answer for the Cordant Group when OpEx became the company’s dominant lens. 

Which Cloud Migration Strategy Is Right for You?

As these 10 diverse case studies show, cloud strategies are not one-size-fits all. Choosing the right cloud migration strategy for your business depends on several factors, including your:

  • Goals. What business results do you want to achieve as a result of the migration? How does your business measure ROI? What problems are you trying to solve via your cloud migration strategy? 
  • Business model. What is your current state? What are your core products/services and user needs, and how are they impacted by how and where data is stored? What are your development and deployment needs, issues, and constraints? What are your organization’s cost drivers? How is your business impacted by lack of stability or availability? Can you afford downtime? 
  • Security needs. What are your requirements regarding data privacy, confidentiality, encryption, identity and access management, and regulatory compliance? Which cloud security challenges pose potential problems for your business?
  • Scaling needs. Do your needs and usage fluctuate? Do you expect to grow or shrink? 
  • Disaster recovery and business continuity needs. What are your needs and capabilities in this area? How might your business be impacted in the event of a major disaster — or even a minor service interruption? 
  • Technical expertise. What expertise do you need to run and innovate your core business? What expertise do you have in-house? Are you allocating your in-house expertise to the right efforts? 
  • Team focus and capacity. How much time and focus can your team dedicate to the cloud migration effort? 
  • Timeline. What business needs constrain your timeline? What core business activities must remain uninterrupted? How much time can you allow for planning and testing your cloud migration strategy? 

Of course, this list isn’t exhaustive. These questions are only a starting point. But getting started — with planning, better understanding your goals and drivers, and assessing potential technology fit — is the most important step of any cloud migration process. We hope these 10 case studies have helped to get you thinking in the right direction. 

While the challenges of cloud migration are considerable, the right guidance, planning, and tools can lead you to the cloud strategies and solutions that will work best for your business. So don’t delay: Take that first step to helping your business reap the potential advantages and benefits of cloud computing. 

Ready to take the next step on your cloud journey? As a Certified Google Cloud Technology Partner , Distillery is here to help. Download our white paper on top challenges for cloud adoption to get tactical and strategic about using cloud to transform your business.  

Recent Posts

  • How to Hire a Software Developer: 10 Key Steps & Tips
  • AI, A Double-Edged Solution
  • Discover the Best E-commerce Shopping Platforms for Software Development
  • ChatGPT vs Gemini: What’s Better to Support Your Software Engineering Team
  • Next-Gen Software Solutions for Hospitality & Travel

Recent Comments

netflix cloud migration case study

Special Features

Vendor voice.

netflix cloud migration case study

Easing the cloud migration journey

Huawei’s net5.5g converged ip network can improve cloud performance, reliability and security, says the company.

Sponsored Feature The appetite across most vertical sectors for migrating applications and services to the cloud shows no sign of abating. Independent analyst firm Gartner estimates that global spending on public cloud services will hit $679 billion this year.

That figure likely to exceed $1 trillion by 2027. Gartner expects that within the next few years the hosting of applications in the cloud will have moved on from simply being a disruptor of traditional IT to being considered a business necessity when it comes to maintaining competitiveness.

Organisations from worlds as diverse as finance, healthcare, retail, energy, transportation, manufacturing, education and government are all actively investing in cloud thanks to its power to drive innovation, unlock hidden value, create opportunities and boost customer experiences.

But even the best planned cloud migration project is only as good as the network that supports it. Along with cloud's benefits – which include reduced costs, improved availability and scalability of services and greater manageability - come challenges and risks that traditional enterprise networks are not always able resolve.

netflix cloud migration case study

It's those cloud migration challenges that Huawei's new Net5.5G converged IP network solution is designed to help address. The manufacturer bills it as ideal for organisations in all industries looking to 'accelerate intelligent transformation and maximize digital productivity'. Companies across a variety of industry verticals can rely on the solution to improve the performance, reliability and security of the cloud-hosted applications and services they provide to their end users, says Huawei.

Connectivity issues that puncture the cloud

There's no doubt that poor network quality can affect the impact of a move to digital services. In many cases the deployment of technologies like cloud are not fully translated into digital success on account of network failings, often necessitating that networks are upgraded to the next generation.

Let's consider the challenges around latency, reliability and security in more detail:

Latency : Network latency issues can seriously impact cloud application performance. Just a few milliseconds of delay can significantly degrade end users' experience and productivity, particularly in the case of cloud-based video conferencing and online document editing tools for example. Associated lower data transfer speeds can also cause network bottlenecks for organisations handling large volumes of data.

Reliability : The success of any cloud migration strategy depends in no small part on providing users with consistent, reliable access to the mission critical applications from wherever they happen to be, and regardless of what device they are using. Downtime is effectively death for many digital services – imagine a transport ticketing system which sees unpredictable packet loss regularly disrupting purchases, for example. All cloud infrastructures need some level of fault tolerance to avoid that type of scenario.

Security : Ensuring network security is equally critical given the amount of data which flows in and out of the cloud, as well as the penalties imposed for the loss, theft or corruption of sensitive information. Organisations need to monitor and control who has access to cloud-based resources to help build their own stakeholders trust in the cloud and avoid falling found of data protection regulation. Legacy cyber security solutions built for on-prem environments rarely deliver the protection required, calling for dedicated, cloud native security solutions built for the task.

A solution at hand with Huawei Net5.5G

Designed specifically for the cloud era, the Huawei Net5.5G converged IP network solution is built to deliver the kind of high-quality connectivity that can overcome these challenges and help organisations drive their digital transformation initiatives. At its heart are Huawei NetEngine 8000 series routers which provide an all-service, intelligent routing platform, billed as the most powerful routing solutions on the market. The top of the range features performance of 19.2Tbit/s per slot for example, offering what Huawei says is the industry's largest switching capacity and highest density of ports and allowing users to build simplified and converged ultra-broadband networks with a modest cost of ownership and the lowest possible levels of latency.

These routers combine with Huawei's next generation Network Cloud Engine (NCE) to deliver full-lifecycle automation and proactive Operations and Maintenance (O&M). It's an intent-driven approach which can free organisations from the unnecessary constraints of legacy networking, says the manufacturer.

The solution's support for end-to-end IPsec and SRv6, as well as CGN and BNG functions, is central to its reliability and security. A large number of the wide area networks (WANs) deployed in in sectors that range from government to financial services involve the leasing of third-party links. When data is travelling across these other networks, it needs to be encrypted end to end to ensure security, which is what Huawei says IPsec and SRv6 support guarantees. The company believes that most solutions on the market can't match this capability, because they only support IPsec at the edge.

Network security steps in

Huawei's Network Security Solution is differentiated by three features: rapid threat handling within seconds, industry-leading detection performance, and precise ransomware prevention.

Let's think about why this sort of fast and secure coverage throughout the network layer really matters. Imagine an organisation with widely distributed end points that needs to connect to a fresh distant location at short notice. With end-to-end support for SRv6 and IPsec, it can enable link provisioning in minutes, all while ensuring 100 percent security for its data as it traverses the public network. It can, if it needs to, achieve this at scale, adding thousands of new nodes in short order without compromising the safety of data.

A combination of SRv6 and network slicing, the solution deals with the restrictions of legacy connectivity in other ways. Cross-domain automatic connections can deliver one-hop access to the cloud, while minute-level service provisioning, and both tenant and application-level Service Level Agreements (SLAs) guarantee an improved user experience says Huawei, ensuring smooth evolution away from legacy standards like MPLS. Support for intelligent traffic steering allows for on-demand scheduling of network resources and the configuration of SLAs for different services.

With the technology upgrade, it also can boost throughput for certain applications and services. At the service bearer layer, those slices can be upgraded using a fingerprint solution that allows them to be deployed based on user ID, for example. When it comes to the management layer, Huawei's Network Digital Map 2.0 supports application-level service visualization and optimization, designed to deliver accurate insights into the customer service experience.

Traditional O&M methods can only view the tunnel layer, leading to congestion when multiple applications exist in the same tunnel, says Huawei. But the company's Network Digital Map 2.0 supports application-level service visualization and optimization which can automatically locate and optimize congestion points. This means it can, for example, optimize and adjust video conferences separately, helping customers improve operation quality and experience.

Organisations everywhere are constantly on the lookout for new ways to exploit the cloud's potential to support their digital transformation goals. But as they migrate more and more of their mission-critical applications and services off premise, many find that existing enterprise WANs are just not up to the job. Secure, reliable low latency network links will be the foundation of the digital economy, and Huawei's new Net5.5G converged IP network solution was designed specifically for the task.

Sponsored by Huawei.

Narrower topics

Send us news

icon

  • Advertise with us

Our Websites

  • The Next Platform
  • Blocks and Files

Your Privacy

  • Cookies Policy
  • Privacy Policy
  • Ts & Cs

Situation Publishing

Copyright. All rights reserved © 1998–2024

no-js

Goal: $2,000,000

Current donations: $ 2 5 6 , 9 1 7, donors: 2 2 9, netflix case study.

netflix cloud migration case study

Case Study: Maintaining the World’s Fastest Content Delivery Network at Netflix on FreeBSD

Netflix is a global entertainment company that revolutionized the way people consume TV shows and movies with its streaming service. Headquartered in Los Gatos, California, Netflix has grown into one of the world’s leading streaming platforms, boasting millions of subscribers in over 190 countries. Known for its extensive catalog of films, television series, and documentaries, including critically acclaimed original productions, Netflix continues to shape the entertainment industry by investing in innovative content and technology.

netflix cloud migration case study

The fastest and highest-trafficked network on the internet, all running FreeBSD

Gleb Smirnoff is a skilled software engineer and experienced FreeBSD committer who works at Netflix and manages the customized and performance-optimized FreeBSD-based firmware for Open Connect, the company’s content delivery network (CDN).

During his presentation at the FreeBSD Vendor Summit in November 2023, Smirnoff emphasized the massive scale of Netflix’s operations.

“We are one of the biggest sources of traffic on the internet – sending terabits per second, powered by thousands of servers or appliances, all running FreeBSD.” 

As Smirnoff notes, Netflix’s Open Connect originally operated on a standard FreeBSD platform, which was gradually improved for better performance. In 2012, a proof-of-concept CDN was started on vanilla FreeBSD 9.0-RELEASE and nginx that was provisioned on servers equipped with a single 10 Gbit/s interface. 

Over time, it became evident that achieving rapid growth required exceeding the limits of the operating system’s current capabilities. The expected scale of Netflix’s Content Delivery Network (CDN) was so massive that it was worthwhile to invest in the ongoing open source development of FreeBSD. 

Netflix realized that when deploying a CDN at a global scale, even a single percentage point increase in performance results in savings worth hundreds of thousands of dollars. Netflix’s customized version of FreeBSD enabled deeper integration and more precise optimization at the kernel level, leading to significant performance improvements.

Tracking FreeBSD-CURRENT at Netflix

Netflix carefully balanced its modifications with the need to stay aligned with the FreeBSD project’s core codebase. This ensured that their custom enhancements improved the system’s capabilities without causing an unsustainable divergence from the original FreeBSD source. This delicate balance allowed Netflix to leverage FreeBSD’s strengths while creating a tailored solution that met their specific high-performance needs.

Another Open Connect team member, Drew Gallatin, detailed insights into FreeBSD’s customization at Netflix during his March 2024 talk at OpenFest Bulgaria, a prominent technology and open source conference. 

With over 25 years of experience contributing to FreeBSD, Gallatin shared his journey and challenges in optimizing FreeBSD for Netflix’s Open Connect and emphasized the strategic decision-making process behind tracking FreeBSD-CURRENT, stating: 

“We decided what we were doing was silly, and what we should do is track FreeBSD-CURRENT. It sounds crazy because that’s where everybody pushes all their stuff, but it’s actually the best thing in the world for us.” 

During his talk, he also shared anecdotes from the “Magical Mystery Merge,” illustrating the importance of running the CURRENT branch. Gallatin explained, reflecting Netflix’s proactive approach to maintaining system performance and stability:

“When we run FreeBSD-CURRENT, we catch things really fast. If there’s some regression, we catch it right away. There’s no two—or three-year delay between somebody committing something and us finding it’s a problem.” 

Adding to the narrative on the subtree integration, Gallatin pointed out the benefits of this approach, highlighting the streamlined development and maintenance processes that resulted from Netflix’s strategic alignment with FreeBSD-CURRENT:

“Our tree is almost identical to the upstream FreeBSD tree… it greatly reduces the technical debt we accumulate by keeping our own patches.” 

Strategic integration and performance optimization of FreeBSD

Netflix carefully manages the code flow between the in-house FreeBSD implementation and the wider FreeBSD community. A rigorous testing framework, continuous integration, and unit testing are the foundation of Netflix’s development strategy. Regular merges include upstream changes, and special focus is given to incorporating performance-enhancing patches ahead of their official inclusion in FreeBSD. A/B testing is performed for each merge to maintain or improve system performance and stability. 

The evolution of Netflix’s FreeBSD implementation involved refining the kernel to alleviate performance bottlenecks and handle the increasing data traffic, which includes RACK (Recent ACKnowledgment), a TCP stack developed by Randall Stewart , designed to improve the performance and reliability of data transmission. Other notable enhancements to FreeBSD by Netflix include asynchronous sendfile operations, which facilitate non-blocking data transfers, and advanced VM page caching techniques that improve data handling efficiency and network throughput.

The Netflix CDN team has also notably collaborated with the FreeBSD community to enhance the security and efficiency of data transmissions using Kernel TLS (KTLS).

KTLS is a technology that moves the processing of TLS (Transport Layer Security) from user applications to the operating system kernel. This improves performance for file and web servers using sendfile(9) by encrypting the data in the kernel, where it resides, and avoiding extra copying of the data into and out of user-space just to encrypt it. KTLS is helpful for high-throughput applications, like web servers, that require secure data transmission. It allows for efficient data handling and has enabled Netflix to achieve 400 Gb/s throughput on its CDN servers. Gallatin explains:

“We had the first 100 gigabit per second production CDN server in the world… due to Kernel TLS.”

“What’s kernel TLS? We moved bulk crypto into the kernel (from nginx) to preserve the sendfile pipeline.”

“With sendfile and kernel TLS, we can eliminate many of these memory bandwidth bottlenecks, and now things become much more possible. By accounting for bandwidth and CPU utilization, we get about 375 gigs at about 53% busy CPU with FreeBSD.” 

Kernel TLS in FreeBSD is a large project and has undergone significant development through collaboration within the community. While at Netflix, Scott Long first proposed integrating TLS into the kernel. Together with Randall Stewart, they developed the foundational software TLS transmission mechanisms. Drew Gallatin contributed significantly to the project by introducing external pages mbufs and M_NOTREADY mbufs, which were essential for handling encrypted data within the kernel. He also developed a pluggable interface for various software TLS backends.

Later versions of KTLS made notable enhancements to the system. For instance, for FreeBSD 13, the transmission of Transport Layer Security (TLS) through offloading to network interface cards (NICs) was added. Drew Gallatin first implemented this feature in collaboration with Chelsio, which co-sponsored the project with Netflix for Chelsio T6 adapters. Later, Hans Petter Selasky extended this functionality to include Mellanox ConnectX-6 Dx adapters, enabling support for a wider range of hardware acceleration.

This ongoing development, backed by contributions from Netflix, Chelsio, and Mellanox, highlights the strong, community-driven efforts to enhance FreeBSD’s network security and performance capabilities.

Giving back to the community

Netflix’s strategy in managing its FreeBSD implementation for Open Connect reflects a deep commitment to the broader FreeBSD community. Smirnoff highlighted the significance of aligning closely with FreeBSD’s development: 

“It’s crucial to reduce the divergence of your operating system to FreeBSD, which means that you need to upstream your changes.”  

He also articulated the practical benefits of this strategy, explaining, 

“Tracking FreeBSD-CURRENT … allowed us to collaborate with upstream developers and get our changes into FreeBSD quickly.” 

This approach has minimized technical debt and facilitated rapid incorporation of the latest features and improvements, keeping Netflix at the forefront of technological innovation in streaming.

Lessons learned and best practices

The successful management of large-scale FreeBSD implementations, such as Netflix’s, provides valuable lessons on the importance of community involvement and open source collaboration. 

  • Engaging with the community early on and proactively contributing to the project is crucial to harnessing FreeBSD’s full potential. These practices ensure that any adaptations to the system align well with ongoing developments in the broader ecosystem.
  • Over time, refining the strategy for managing an organization’s FreeBSD implementation by prioritizing community engagement, regular testing, and strategic upstream contributions can yield significant benefits. 
  • Adopting new FreeBSD features and conducting thorough testing to identify potential system degradations early in development is essential. This proactive approach helps maintain a clear understanding of how an organization’s customized fork diverges from the primary FreeBSD Project, ensuring that enhancements improve the system’s capabilities without leading to unsustainable divergences.
  • Having a well-defined process for integrating external code and managing internal changes is critical. Setting clear protocols for code review, integration, and testing is vital to maintaining system integrity and performance. 

By adopting these practices, organizations can effectively manage their FreeBSD-based systems, ensuring they meet specific operational needs while staying ahead of technological advancements.

Future directions

Netflix is committed to using FreeBSD’s flexibility and performance capabilities and will continue collaborating with the community, focusing on growth and innovation. Netflix has set a precedent in the industry by successfully maintaining a customized FreeBSD implementation through strategic foresight, rigorous testing, and active community engagement.

Getting started with FreeBSD 

Reflecting on Netflix’s journey with FreeBSD, Netflix’s CDN team offers valuable advice to organizations considering using FreeBSD. They suggest proactively engaging with the FreeBSD community and leveraging resources like The FreeBSD Foundation, which can provide crucial support on technical issues, implementation challenges, and community connections. For Netflix, the strategy was not just about adopting FreeBSD but integrating it into its ecosystem, contributing to its development, and sharing its innovations upstream. The FreeBSD Foundation can assist with technical and implementation questions, networking, and connecting community members. If your organization is thinking about getting started with FreeBSD, email the Foundation using the Contact Us page of their website, or download FreeBSD to get started today.

netflix cloud migration case study

What caused Dubai floods? Experts cite climate change, not cloud seeding

  • Medium Text

DID CLOUD SEEDING CAUSE THE STORM?

Aftermath following floods caused by heavy rains in Dubai

CAN'T CREATE CLOUDS FROM NOTHING

Sign up here.

Reporting by Alexander Cornwell; editing by Maha El Dahan and Alexandra Hudson

Our Standards: The Thomson Reuters Trust Principles. New Tab , opens new tab

Canada's Aamjiwnaang First Nation declared a state of emergency due to the excessive release of harmful chemicals from INEOS Styrolution's plastic manufacturing plant, the Indigenous group said.

Engie holds annual shareholders meeting in Paris

The French industry minister Roland Lescure on Friday launched a call for interest for carbon capture at various sites to help carbon-intense industries abate emissions.

LSEG Workspace

World Chevron

Funeral of Russian opposition leader Alexei Navalny

US intelligence believes Putin probably didn't order Navalny to be killed, Wall Street Journal reports

U.S. intelligence agencies have determined that Russian President Vladimir Putin probably didn't order opposition politician Alexei Navalny killed at an Arctic prison camp in February, the Wall Street Journal reported on Saturday.

U.S. Secretary of State Antony Blinken visits China

IMAGES

  1. (ENT209) Netflix Cloud Migration, DevOps and Distributed Systems

    netflix cloud migration case study

  2. Netflix Migrates to the Public Cloud for Seamless Streaming in 2020

    netflix cloud migration case study

  3. A Design Analysis of Cloud-based Microservices Architecture at Netflix

    netflix cloud migration case study

  4. Benefits of the Cloud

    netflix cloud migration case study

  5. How Netflix Achieved Digital Transformation: A Case Study

    netflix cloud migration case study

  6. System Design Netflix

    netflix cloud migration case study

VIDEO

  1. Capgemini Azure Public Cloud Migration l Case Study

  2. Data Migration case study Explanation Useful for Business analyst and IT People

  3. Mastering Cloud Migration: 5 Key Considerations for Seamless Migration in Hybrid Cloud Environments

  4. Is Netflix Cloud Gaming the Next Google Stadia?

  5. @Netflix to introduce ads and microtransactions in its games: Report 🎮 #shorts

  6. A Story behind Netflix Historic Microservices Migration

COMMENTS

  1. Completing the Netflix Cloud Migration

    We are happy to report that in early January, 2016, after seven years of diligent effort, we have finally completed our cloud migration and shut down the last remaining data center bits used by our streaming service! Moving to the cloud has brought Netflix a number of benefits. We have eight times as many streaming members than we did in 2008 ...

  2. Netflix Case Study

    Netflix Case Study. Online content provider Netflix can support seamless global service by using Amazon Web Services (AWS). AWS enables Netflix to quickly deploy thousands of servers and terabytes of storage within minutes. Users can stream Netflix shows and movies from anywhere in the world, including on the web, on tablets, or on mobile ...

  3. Unleashing the Power of the Cloud: A Case Study of Netflix's ...

    Scalability: A Seamless Streaming Experience; Before its migration to the cloud, Netflix faced the venture of handling huge fluctuations in person demand, in particular for the duration of top hours.

  4. Case studies in cloud migration: Netflix, Pinterest, and Symantec

    Case studies in cloud migration: Netflix, Pinterest, and Symantec. Three very different companies and their migrations to the cloud. In October 2008, Neil Hunt, chief product officer at Netflix, gathered a meeting of a dozen or so of his engineering staffers in The Towering Inferno, the secluded top-floor meeting room at Netflix's Los Gatos ...

  5. Case Study on Netflix AWS Migration

    Netflix's move to the Cloud in 2008 was spurred by a disruptive event, leading to a seven-year massive overhaul for a complete AWS migration despite having a focused cloud migration strategy. Rather than directly transferring existing systems, Netflix chose to rebuild its software, leveraging a cloud network that allows the construction of ...

  6. 5 lessons IT learned from the Netflix cloud journey

    Here are five top takeaways from their journey. 1. It's not about cost savings. One of the impressive attributes of the Netflix cloud use case is the clarity around the value of cloud. Netflix ...

  7. Case Study on Netflix's Successful Cloud Migration

    Netflix's successful cloud migration has transformed its operations and cemented its position as a global streaming giant. By embracing the scalability, flexibility, and cost-effectiveness of ...

  8. A case study on Netflix Cloud Migration

    In this case study, we will delve into Netflix's cloud migration journey and explore the amazing advantages gained through this transformative shift. Netflix recognized early on that to meet the ...

  9. Ten years on: How Netflix completed a historic cloud migration with AWS

    The journey began when Netflix decided to move from its own data centres to the public cloud. Migrating with micro-services. In 2008, Netflix was running relational databases in its own data centres when disaster struck. A data centre failure shut the entire service down and stopped DVD shipments for three days.

  10. Netflix, Cloud migration Journey and advantages of cloud adoption

    A Case Study of Netflix's Phenomenal Cloud Migration Journey Introduction: In this digital era, the cloud has emerged as a game-changer, revolutionizing the way businesses operate and unlocking ...

  11. The Space Between: Netflix's Cloud Migration Story

    The journey to Confluence Cloud started routinely but slowly went sideways. Layers of data cruft from a decades-old, on-prem install; gaps in the migration tools; and migration testing that felt more like a slow spiral to nowhere led to a crisis point. In this session you'll hear how new tools were developed on the fly to close the spaces ...

  12. A Case study of Netflix's migration to the cloud

    Netflix's successful migration to the cloud serves as a prime example of the benefits that can be achieved with careful planning and execution. By moving to the cloud, the company was able to ...

  13. Storage using Amazon S3

    Building on Amazon Simple Storage Service (S3) to store and move assets across its systems, along with AWS Local Zones, Netflix can then track assets using object stores and file systems, and dive into namespaces, access control, support for part-level addressability for multipart objects, and lifecycle management.

  14. Netflix: An AWS Case Study

    Netflix wanted to remove any single point of failure from its system. AWS offered highly reliable databases, storage, and redundant data centers. Netflix wanted cloud computing, so it wouldn't ...

  15. How Data Inspires Building a Scalable, Resilient and Secure Cloud

    Ten years on: How Netflix completed a historic cloud migration with AWS; Amazon Fleet Management: Meet the Man Who Keeps Amazon Servers Running, No Matter What | Amazon Web Services ... Dredge Case Study; Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability & Efficiency; Cloud Computing. Data. Reliability Engineering.

  16. Case Study: How Netflix uses Cloud for Innovation, Agility and

    Case Study: How Netflix uses Cloud for Innovation, Agility and Scalability. "Planning without taking action is the slowest route to victory. Taking action without planning is the noise before defeat.". - Sun Tzu, The Art of War. Introduction to cloud computing. It is said that the world evolves at the speed of technological evolution.

  17. How Netflix Revolutionized Entertainment with Cloud Adoption: A

    In this case study, we delve into Netflix's cloud migration journey and explore the incredible advantages the company gained through adopting cloud technology.

  18. AWS Innovator: Netflix

    Netflix on AWS. Netflix is one of the world's leading entertainment services with over 260 million members in more than 190 countries. Netflix uses AWS for nearly all its computing and storage needs, including databases, analytics, recommendation engines, video transcoding, and more—hundreds of functions that in total use more than 100,000 ...

  19. Netflix's Cloud Migration

    Netflix is a great example of cloud migration done right! Learn their story and discover what steps to take when it's time for your business to make the switch. ... Like any other business utilizing the cloud, Netflix also participated in a vigorous data backup strategy to ensure their success. The Netflix we see and use today exists because ...

  20. 7 best case-studies for migrating from on-premise to cloud

    A transformation example when moving from On-premise to Cloud #1 Betabrand: Bare Metal to Cloud CloudBetabrand (est. 2005) is a crowd-funded, crowd-sourced retail clothing e-commerce company that designs, manufactures, and releases limited-quantity products via its website. - Migration objective The company struggled with the maintenance difficulties and lack of scalability of the bare metal ...

  21. 10 Important Cloud Migration Case Studies You Need to Know

    This helped them ensure that their cloud migration solution would genuinely help them achieve all their goals. Hybrid deployment can be effective for reducing cloud migration risk. Dual deployment on their private data center and GKE was an important aspect of Etsy's cloud migration strategy. #6 Waze: Cloud to Multi-cloud

  22. Unleashing the Power of Cloud: A Marvelous Case Study of Netflix's

    Embracing the Cloud: Netflix, known for disrupting the entertainment industry, embarked on a cloud migration journey in 2008 with a vision to provide seamless, on-demand streaming to millions of ...

  23. Easing the cloud migration journey • The Register

    Reliability: The success of any cloud migration strategy depends in no small part on providing users with consistent, reliable access to the mission critical applications from wherever they happen to be, and regardless of what device they are using.Downtime is effectively death for many digital services - imagine a transport ticketing system which sees unpredictable packet loss regularly ...

  24. Netflix Case Study

    Case Study: Maintaining the World's Fastest Content Delivery Network at Netflix on FreeBSD. Netflix is a global entertainment company that revolutionized the way people consume TV shows and movies with its streaming service. Headquartered in Los Gatos, California, Netflix has grown into one of the world's leading streaming platforms ...

  25. Netflix Case Study: Unleashing the Power of AWS Cloud for ...

    Nils Pommerien Director, Cloud Infrastructure Engineering, Netflix. Netflix uses Amazon Web Services (AWS) for nearly all its computing and storage needs. Some of the AWS services used by Netflix ...

  26. What caused Dubai floods? Experts cite climate change, not cloud

    A storm hit the United Arab Emirates and Oman this week bringing record rainfall that flooded highways, inundated houses, grid-locked traffic and trapped people in their homes.