Get started with P5 instances
P5 instances provide 8 NVIDIA H100 GPUs with 640 GB of high-bandwidth GPU memory. They feature 3rd generation AMD EPYC processors and provide 2 TB of system memory, 30 TB of local NVMe instance storage, 3,200 Gbps aggregated network bandwidth, and GPUDirect RDMA support. P5 instances also support Amazon EC2 UltraCluster technology, which provides lower latency and improved network performance using EFA.
The following table provides a summary of the p5.48xlarge
specifications.
vCPUs | System memory | GPUs | GPU memory | Network bandwidth | GPUDirect RDMA | GPU peer to peer | Instance storage |
---|---|---|---|---|---|---|---|
192 | 2 TiB | 8 NVIDIA H100 GPUs | 640 GB HBM3 | 3200 Gbps with EFAv2 | Supported | 900 GB/s NVSwitch | 8 x 3,800 GB NVMe SSD volumes |
Software configuration
The easiest way to get started with P5 instances is to launch an instance using an AWS Deep Learning AMI
that is preconfigured with all of the required software. For the latest AWS Deep Learning AMI for use with
P5 instances, see the AWS Deep Learning Base GPU AMI (Ubuntu 20.04)
If you need to build a custom AMI for use with P5 instances, we recommend installing the following minimum software versions:
-
NVIDIA driver 535.54.03 or later
-
CUDA 12.1 or later
-
NVIDIA GDRCopy 2.3 or later
-
EFA installer 1.24.1 or later
-
NCCL 2.18.3 or later
-
aws-ofi-nccl plugin 1.7.2-aws or later
We also recommend that you configure the instance to not use deeper C-states. For more information, see High performance and low latency by limiting deeper C-states. The latest AWS Deep Learning Base GPU AMI is preconfigured to not use deeper C-states.
Ubuntu 20.04 specific recommendations
The following recommendations for Ubuntu 20.04 help prevent unpredictable interface naming on boot:
-
Ensure you are running
systemd 245.4-4ubuntu3.19
or later with the following command:systemd --version
-
Ensure you have configured GRUB:
-
Open the
/etc/default/grub
configuration file in a text editor. -
Edit the
GRUB_CMDLINE_LINUX_DEFAULT
entry to includenet.naming-scheme=v247
. -
Reboot your instance by running
sudo update-grub
.
-
Networking and EFA configuration
P5 instances deliver 3200 Gbps of networking bandwidth by using multiple EFA interfaces. P5 instances support 32 network cards. We recommend that you define a single EFA network interface per network card. To configure these interfaces at launch we recommend the following settings:
-
For network interface
0
, specify device index0
-
For network interfaces
1
through31
, specify device index1
For more information about how to configure your P5 instances for EFA see Get started with P5 instances and EFA.