DevOps Articles

Curated articles, resources, tips and trends from the DevOps World.

Amazon EC2 Trn1 Instances for High-Performance Model Training are Now Avail

2 years ago aws.amazon.com
Amazon EC2 Trn1 Instances for High-Performance Model Training are Now Avail

Summary: This is a summary of an article originally published by AWS DevOps Blog. Read the full original article here →

https://aws.amazon.com/polly/ Deep learning (DL) models have been increasing in size and complexity over the last few years, pushing the time to train from days to weeks. To reduce model training times and enable machine learning (ML) practitioners to iterate fast, AWS has been innovating across chips, servers, and data center connectivity.

New Trn1 Instance Highlights Trn1 instances are available today in two sizes and are powered by up to 16 AWS Trainium chips with 128 vCPUs.

Trn1 EC2 UltraClusters For large-scale model training, Trn1 instances integrate with https://aws.amazon.com/fsx/lustre/ high-performance storage and are deployed in EC2 UltraClusters.

Get Started with Trn1 Instances In this example, I train a PyTorch model on an EC2 Trn1 instance using the available PyTorch Neuron packages.

Made with pure grit © 2024 Jetpack Labs Inc. All rights reserved. www.jetpacklabs.com