In this article, we are going to explore Amazon S3 and how to connect to it from Python. Amazon is one of the largest cloud provider along with Microsoft Azure and Google Cloud Platform.
Amazon Simple Storage Service(Amazon S3) is a file storage service that enables the users to store any kind of files. It provides object storage through a web service interface. AWS S3 has impressive availability and durability, making it the standard way to store videos, images, and data.
Storage units in Amazon are known as Buckets. A Bucket is synonymous with a root directory. Inside a bucket, you can store files or other sub buckets. All the directories and files are considered as objects within the S3 ecosystem.
You can use any of the following methods to create an S3 bucket.
- Using the UI – logging in to AWS Console and creating from the S3 dashboard
- Using cli – AWS provides a command line (awscli) that can be used to create an S3 bucket
- Using an SDK – Amazon provides SDKs for most of the popular programming languages
- Using the REST APIs – Amazon provides REST APIs that you can connect to using any programming languages
Objects or items that are stored using the Amazon CLI or the REST APIs are limited to 5TB in size with 2KB of metadata information
In this article we will focus on using the Amazon SDK for Python (boto3) to do opetations with s3.
Boto3 is the name of the Python SDK for AWS. It allows you to directly create, update, and delete AWS resources from your Python scripts.
Related content
- How to set up Minio as an Object Storage in Rocky Linux Server
- How to set CORS headers on your Amazon S3 bucket
- How to store Django Static and Media files in Amazon S3
- Using AWS S3 from the terminal with awscli
- Script to Upload files to AWS S3 Using Golang
Prerequisites
To follow along, ensure you have the following:
- A valid AWS account with access to launch S3 bucket
- Python 3 with pip package manager installed
- Text editor of your choice
Table of content
- Setting up the python environment
- Creating an s3 bucket
- Listing buckets
- Uploading file to the bucket
- Downloading a File
- Deleting an Object
1. Setting up the python environment
The recommended approach to run the script is in an isolated environment where we can install script specific requirements. We will use Python virtualenv to create an isolated environment for our script.
In python 3 let’s create a virtualenv using this command:
python -m venv s3stuff
Activate the virtualenv:
source s3stuff/bin/activate
Now let’s install the boto3
package providing the SDK that we will use to connect to AWS.
pip install boto3
Now that the dependencies are set up, let’s create a script that will have the functions necessary to do the S3 operations we want.
To make it run against your AWS account, you’ll need to provide some valid credentials. If you already have an IAM user that has full permissions to S3, you can use those user’s credentials (their access key
and their secret access key
) without needing to create a new user. Otherwise, the easiest way to do this is to create a new AWS user and then store the new credentials.
To create a new user, head to AWS IaM console, then Users and click on Add user. Give the user a name, enable programmatic access. This will ensure that this user will be able to work with any AWS supported SDK or make separate API calls. For the new user, choose to attach an existing policy then attach the AmazonS3FullAccess policy.
A new screen will show you the user’s generated credentials. Click on the Download .csv button to make a copy of the credentials. You will need them to complete your setup.
Now that you have your new user, create a new file, ~/.aws/credentials
:
vim ~/.aws/credentials
Add this content to the file:
[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
region = YOUR_PREFERRED_REGION
This will create a default
profile, which will be used by Boto3 to interact with your AWS account.
Client Versus Resource
Boto3 calls AWS APIs on your behalf. Boto3 offers two distinct ways of accessing these abstracted APIs:
Client
for low-level service accessResource
for higher-level object-oriented service access
You can use either to interact with S3.
To connect to the low-level client interface, you must use Boto3’s client()
. You then pass in the name of the service you want to connect to, in this case, s3
:
import boto3
s3_client = boto3.client('s3')
To connect to the high-level interface, you’ll follow a similar approach, but use resource()
:
import boto3
s3_resource = boto3.resource('s3')
Common Operations
2. Creating an s3 bucket
Before proceeding, we need a bucket that we can use to store our content.
This function creates an s3 bucket given a bucket name and an optional region. If the region is not supplied it will default to the us-east-1
region – the default AWS region. This function returns true on success and False while logging the error on failure:
def create_bucket(bucket_name, region=None):
"""Create an S3 bucket in a specified region
If a region is not specified, the bucket is created in the S3 default
region (us-east-1).
:param bucket_name: Bucket to create
:param region: String region to create bucket in, e.g., 'us-west-2'
:return: True if bucket created, else False
"""
# Create bucket
try:
if region is None:
s3_client = boto3.client('s3')
s3_client.create_bucket(Bucket=bucket_name)
else:
s3_client = boto3.client('s3', region_name=region)
location = {'LocationConstraint': region}
s3_client.create_bucket(Bucket=bucket_name,
CreateBucketConfiguration=location)
except ClientError as e:
logging.error(e)
return False
return True
3. Listing buckets
This function retrieves the buckets in our account. After retrieving we are looping each while printing out the name:
def list_buckets():
# Retrieve the list of existing buckets
s3 = boto3.client('s3')
response = s3.list_buckets()
# Output the bucket names
print('Existing buckets:')
for bucket in response['Buckets']:
print(f' {bucket["Name"]}')
4. Uploading file to the bucket
This functions uploads a file to the s3 bucket returning true if it succeed:
def upload_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = os.path.basename(file_name)
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return True
We can also upload the file as a binary. The upload_fileobj
method accepts a readable file-like object. The file object must be opened in binary mode, not text mode.
s3 = boto3.client('s3')
try:
with open(file_name, "rb") as f:
s3.upload_fileobj(f, bucket, "files/"+ file_name)
except ClientError as e:
logging.error(e)
return False
5. Downloading a File
The download_file
method allows you to download a file. The Filename
parameter will map to your desired local path, this snippet will download the file to home/citizix
directory.
s3_resource.Object(
bucket_name,
file_name
).download_file(
f'/home/citizix/{file_name}'
)
6. Copying an Object Between Buckets
The boto3
library offers the .copy()
method that allows you to copy a file from one bucket to the other:
def copy_to_bucket(from_bucket, to_bucket, file_name):
"""Copy a file from one bucket to the other
:param from_bucket: Source bucket
:param to_bucket: Destination bucket
"""
copy_source = {
'Bucket': from_bucket,
'Key': file_name
}
s3_resource.Object(to_bucket, file_name).copy(copy_source)
6. Deleting an Object
The .delete()
method allows you to delete a file:
s3_resource.Object(bucket_name, file_name).delete()
Consclusion
Up to this point you can now do basic bucket operations programatically using python sdk boto3
. You’re now equipped with knowledge to start working programmatically with S3.