AWS re:Invent 2017: Tuesday Night Live with Peter Desantis
Tuesday Night Live Peter Desantis
Peter is VP of AWS Global Infrastructure which he’s been doing for a year and a bit although he’s been at Amazon for 20 years. This slot has always traditionally been done by James Hamilton, AWS Distinguished Engineer & one of the super techies.
Talk on the street is that all the techie stuff they are likely to talk about in this session is actually the stuff AWS was using two years ago and now wants to talk about, they’re so far ahead that they keep some of their new cards close to their chest.
There was a cool warm up band.
Update on AWS Global Infrastructure
Write your app once and deploy to any region “going global has never been so easy”, said Peter.
He went through the history of the global expansion and how this is accelerating. They have announced plans to expand with 17 new Availability Zones in six new geographic Regions: Bahrain, China, France, Hong Kong, Sweden, and a second AWS GovCloud Region in the US.”
Just let that sink in, 11 regions in the first 10 years and 17 in two years, that;s massive scale.
He touted the AWS commitment for renewal energy. He continued drilling down into what regions look like with the make up of availability zones being separately powered and super redundently cabled together.
Machine Learning
Computing at scale as expected he said they’re adding more compute but not just at the CPU count but specifically at the GPU and FPGA. Matt Wood came on stage and talked through the advanced hardware assisted machine learning with NVidia silicon. ML frameworks have also been improved to “put ML in the hand of mere mortals”. The 3rd part is the rapidly growing community, one of which is called Gluon done with Microsoft! which is a more friendly API for ML with no loss of performance. There also an open neural network exchange for an open specification for neural network providing interoperability between the various offerings. This is just such advanced stuff, imagine the kinds of modelling of machine learning you could do for your applications that would be unthinkable even a few months ago. If enterprises could just leverage this and do cloud properly we could see massive changes in how literally everything is done.
EC2
Peter came back to talk about EC2 under the covers and innovation at scale which you can make large investment to optimise every part of your infrastructure. Creating a big network that shunts packers quickly, cheaply and reliably. Onto talking through the hypervisor with network, management, security and monitoring. The big goal for EC2 instances is security, performance and familiarity which means instances looking like native hardware so customers can use them just like bare metal.EBS volumes currently look like NVMe drives to an OS so its not a cludge with having to manage an iSCSI initiator (yuck).
Peter went on to talk about the Nitro System which is the specialised modular hardware AWS uses in conjunction with its hosts. The idea is to make EC2 indistinguishable from bare metal so the Nitro system takes over the management layer for EC2 as a separate module so customers get all the oomph from the hardware.
AWS then needed to redesign its custom hardware to be able to take advantage of faster hardware. They looked at both FPGAs which were costly or custom ASICs which is a huge investment yet way more flexible. These were from Annapurnalabs which AWS acquired. C5 is the first instance which now uses this to offload the entire EC2 software stack which runs on an optimised version of core KVM although highly customised and slim. Nitro is in effect a hardware sidecar. This gives nearly 100% of available compute workloads to customers and gives more options for security from the hardware up.
EC2 Bare Metal Instances
Nitro is also being used for VMware Cloud on AWS so ESXi can get all the resources. What’s next and was announced is EC2 Bare Metal Instances which is what VMware Cloud is based on. This can be used for workloads that can’t be virtualised or need a particular hypervisor. I can see a future with other options like even Nutanix running its nodes on bare metal instances allowing native Nutanix in/on AWS.
The journey continues, a 3rd generation ASIC which has twice as many transistors as the 2nd version is currently being tested.
Autodesk then spoke about how they are using generative design to use machine learning and cloud scale to build new things “organically”.
Back to AWS and Load Balancing at Scale
Early load balancing ran on very limited hardware based devices. It was then extended but started becoming very costly and wasn’t commodity, they were black boxes that they didn’t know enough about and couldn’t do anything special. Management was a pain as every service needed at least one VIP and most needed multiple which had to be managed very carefully.
S3 was a major driver for sorting out load-balancing at it needed massive scale. This was then changed to dedicated switches and distributed systems algorithms to route traffic. State is replicated to at least 3 hosts, Operations were much simplified with higher utilisation.Current S3 traffic in a single region is 37Tb/s! This was still a custom rack for S3 and they wanted to expand it to other services.
They created a new internal product called AWS Hyperplane which underpins, EFS, Managed NAT, the new Network Load Balancer which is Hyperplane for customers and lastly PrivateLink which allows you to share a private subnet across VPCs in your or other accounts. It also increases security and is available to partners. This is load balancing way beyond a web server, state is so important as connections can be there for YEARS.
Security at Scale
Security is at the heart of AWS, they are investing in automation and tooling along with machine learning. “If you keep the humans away from the data there are fewer chances of a problem”. Keeping up with the velocity of change across the business requires complete company culture buy-in.
AWS doesn’t have a SOC, “if you need people watching, its probably too late”. They have a single security engineer doing operations work on any particular shift. They use the tooling in the system. There is automated identification of common “bad” behaviour and automated forensics. Constant low latency scanning for misconfiguration, automated ticketing. They use the same tools available to customers, extensive use of Lambda.
Amazon Macie was released earlier this year which is a machine learning service to prevent data loss in AWS. Macie helps you understand your data using Natural Language Processing so it can find credit card information or street addresses. ML comes in to understand the use of the data, from where, and by whom it is accessed. This is such an amazing way to find out what you have. All those companies scrambling to classify data for GDPR, this is what you need, if only you could use it to reach internally in to your file shares…or ca you?!
Amazon Guard Duty is announced which is intelligent threat detection at scale, enabled via a single click. This uses the machine leaning so you can find a compromised EC2 instance that is mining Bitcoin or work out your account is compromised and someone is spinning up instances in a region you don’t normally use or an API key was used externally which hasn’t happened before. This is really powerful security at scale getting the machines to spot anomolies.
What a wake up call, bye bye security vendors who require visibility on the wire or agents in a VM. Guard Duty just at your lunch! I’ve recently been working with a number of security related products and I see how complicated and expensive they are. This kind of cloud security is what enterprises can only dream of and there are still that see the cloud as not secure.
An interesting look under the covers of some of AWS.
Recent Comments