Securing SageMaker notebooks – Amazon SageMaker Modeling – MLS-C01 Study Guide

Securing SageMaker notebooks

If you are reading this section of the chapter, then you have already learned how to use notebook instances, which type of training instances should be chosen, and how to configure and use endpoints. Now, let’s learn about securing those instances. The following aspects will help to secure the instances:

  • Encryption: When you talk about securing something via encryption, you are talking about safeguarding data. But what does this mean? It means protecting data at rest using encryption, protecting data in transit with encryption, and using KMS for better role separation and internet traffic privacy through TLS 1.2 encryption. SageMaker instances can be launched with encrypted volumes by using an AWS-managed KMS key. This helps you to secure the Jupyter Notebook server by default.
  • Root access: When a user opens a shell terminal from the Jupyter Web UI, they will be logged in as ec2-user, which is the default username in Amazon Linux. Now the user can run sudo to the root user. With root access, users can access and edit files. In many use cases, an administrator might not want data scientists to manage, control, or modify the system of the notebook server. This requires restrictions to be placed on the root access. This can be done by setting the RootAccess field to Disabled when you call CreateNotebookInstance or UpdateNotebookInstance. The data scientist will have access to their user space and can install Python packages. However, they cannot sudo into the root user and make changes to the operating system.
  • IAM role: During the launch of a notebook instance, it is necessary to create an IAM role for execution or to use an existing role for execution. This is used to launch the service-managed EC2 instance with an instance profile associated with the role. This role will restrict the API calls based on the policies attached to this role.
  • VPC connection: When you launch a SageMaker notebook instance, by default, it gets created within the SageMaker service account, which has a service-managed VPC, and it will, by default, have access to the internet via an internet gateway, and that gateway is managed by the service. If you are only dealing with AWS-related services, then it is recommended that you launch a SageMaker notebook instance in your VPC within a private subnet and with a well-customized security group. The AWS services can be invoked or used from this notebook instance via VPC endpoints attached to that VPC. The best practice is to control them via endpoint policies for better API controls. This enforces the restriction on data egress outside your VPC and secured environment. In order to capture all network traffic, you can turn on the VPC flow logs, which can be monitored and tracked via CloudWatch.
  • Internet access: You can launch a Jupyter Notebook server without direct internet access. It can be launched in a private subnet with a NAT or to access the internet through a virtual private gateway. To train and deploy inference containers, you can set the EnableNetworkIsolation parameter to True when you call CreateTrainingJob, CreateHyperParameterTuningJob, or CreateModel. Network isolation can be used along with the VPC, which ensures that containers cannot make any outbound network calls.
  • Connecting a private network to your VPC: You can launch your SageMaker notebook instance inside the private subnet of your VPC. This can access data from your private network by communicating with the private network, which can be done by connecting your private network to your VPC by using Amazon VPN or AWS Direct Connect.

In this section, you learned several ways in which you can secure our SageMaker notebooks. In the next section, you will learn about SageMaker Debugger.