Using AWS S3 Programatically with Python using Boto3 SDK

In this article, we are going to explore Amazon S3 and how to connect to it from Python. Amazon is one of the largest cloud provider along with Microsoft Azure and Google Cloud Platform.

Amazon Simple Storage Service(Amazon S3) is a file storage service that enables the users to store any kind of files. It provides object storage through a web service interface. AWS S3 has impressive availability and durability, making it the standard way to store videos, images, and data.

Storage units in Amazon are known as Buckets. A Bucket is synonymous with a root directory. Inside a bucket, you can store files or other sub buckets. All the directories and files are considered as objects within the S3 ecosystem.

You can use any of the following methods to create an S3 bucket.

  • Using the UI – logging in to AWS Console and creating from the S3 dashboard
  • Using cli – AWS provides a command line (awscli) that can be used to create an S3 bucket
  • Using an SDK – Amazon provides SDKs for most of the popular programming languages
  • Using the REST APIs – Amazon provides REST APIs that you can connect to using any programming languages

Objects or items that are stored using the Amazon CLI or the REST APIs are limited to 5TB in size with 2KB of metadata information

In this article we will focus on using the Amazon SDK for Python (boto3) to do opetations with s3.
Boto3 is the name of the Python SDK for AWS. It allows you to directly create, update, and delete AWS resources from your Python scripts.

Prerequisites

To follow along, ensure you have the following:

  • A valid AWS account with access to launch S3 bucket
  • Python 3 with pip package manager installed
  • Text editor of your choice

Table of content

  1. Setting up the python environment
  2. Creating an s3 bucket
  3. Listing buckets
  4. Uploading file to the bucket
  5. Downloading a File
  6. Deleting an Object

1. Setting up the python environment

The recommended approach to run the script is in an isolated environment where we can install script specific requirements. We will use Python virtualenv to create an isolated environment for our script.

In python 3 let’s create a virtualenv using this command:

python -m venv s3stuff

Activate the virtualenv:

source s3stuff/bin/activate

Now let’s install the boto3 package providing the SDK that we will use to connect to AWS.

pip install boto3

Now that the dependencies are set up, let’s create a script that will have the functions necessary to do the S3 operations we want.

To make it run against your AWS account, you’ll need to provide some valid credentials. If you already have an IAM user that has full permissions to S3, you can use those user’s credentials (their access key and their secret access key) without needing to create a new user. Otherwise, the easiest way to do this is to create a new AWS user and then store the new credentials.

To create a new user, head to AWS IaM console, then Users and click on Add user. Give the user a name, enable programmatic access. This will ensure that this user will be able to work with any AWS supported SDK or make separate API calls. For the new user, choose to attach an existing policy then attach the AmazonS3FullAccess policy.

A new screen will show you the user’s generated credentials. Click on the Download .csv button to make a copy of the credentials. You will need them to complete your setup.

Now that you have your new user, create a new file, ~/.aws/credentials:

vim ~/.aws/credentials

Add this content to the file:

[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
region = YOUR_PREFERRED_REGION

This will create a default profile, which will be used by Boto3 to interact with your AWS account.

Client Versus Resource

Boto3 calls AWS APIs on your behalf. Boto3 offers two distinct ways of accessing these abstracted APIs:

  • Client for low-level service access
  • Resource for higher-level object-oriented service access

You can use either to interact with S3.

To connect to the low-level client interface, you must use Boto3’s client(). You then pass in the name of the service you want to connect to, in this case, s3:

import boto3
s3_client = boto3.client('s3')

To connect to the high-level interface, you’ll follow a similar approach, but use resource():

import boto3
s3_resource = boto3.resource('s3')

Common Operations

2. Creating an s3 bucket

Before proceeding, we need a bucket that we can use to store our content.

This function creates an s3 bucket given a bucket name and an optional region. If the region is not supplied it will default to the us-east-1 region – the default AWS region. This function returns true on success and False while logging the error on failure:

def create_bucket(bucket_name, region=None):
    """Create an S3 bucket in a specified region

    If a region is not specified, the bucket is created in the S3 default
    region (us-east-1).

    :param bucket_name: Bucket to create
    :param region: String region to create bucket in, e.g., 'us-west-2'
    :return: True if bucket created, else False
    """

    # Create bucket
    try:
        if region is None:
            s3_client = boto3.client('s3')
            s3_client.create_bucket(Bucket=bucket_name)
        else:
            s3_client = boto3.client('s3', region_name=region)
            location = {'LocationConstraint': region}
            s3_client.create_bucket(Bucket=bucket_name,
                                    CreateBucketConfiguration=location)
    except ClientError as e:
        logging.error(e)
        return False
    return True

3. Listing buckets

This function retrieves the buckets in our account. After retrieving we are looping each while printing out the name:

def list_buckets():
    # Retrieve the list of existing buckets
    s3 = boto3.client('s3')
    response = s3.list_buckets()

    # Output the bucket names
    print('Existing buckets:')
    for bucket in response['Buckets']:
        print(f'  {bucket["Name"]}')

4. Uploading file to the bucket

This functions uploads a file to the s3 bucket returning true if it succeed:

def upload_file(file_name, bucket, object_name=None):
    """Upload a file to an S3 bucket

    :param file_name: File to upload
    :param bucket: Bucket to upload to
    :param object_name: S3 object name. If not specified then file_name is used
    :return: True if file was uploaded, else False
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = os.path.basename(file_name)

    # Upload the file
    s3_client = boto3.client('s3')
    try:
        response = s3_client.upload_file(file_name, bucket, object_name)
    except ClientError as e:
        logging.error(e)
        return False
    return True

We can also upload the file as a binary. The upload_fileobj method accepts a readable file-like object. The file object must be opened in binary mode, not text mode.

s3 = boto3.client('s3')
try:
    with open(file_name, "rb") as f:
        s3.upload_fileobj(f, bucket, "files/"+ file_name)
except ClientError as e:
    logging.error(e)
    return False

5. Downloading a File

The download_file method allows you to download a file. The Filename parameter will map to your desired local path, this snippet will download the file to home/citizix directory.

s3_resource.Object(
    bucket_name,
    file_name
).download_file(
    f'/home/citizix/{file_name}'
)

6. Copying an Object Between Buckets

The boto3 library offers the .copy() method that allows you to copy a file from one bucket to the other:

def copy_to_bucket(from_bucket, to_bucket, file_name):
    """Copy a file from one bucket to the other

    :param from_bucket: Source bucket
    :param to_bucket: Destination bucket
    """
    copy_source = {
        'Bucket': from_bucket,
        'Key': file_name
    }
    s3_resource.Object(to_bucket, file_name).copy(copy_source)

6. Deleting an Object

The .delete() method allows you to delete a file:

s3_resource.Object(bucket_name, file_name).delete()

Consclusion

Up to this point you can now do basic bucket operations programatically using python sdk boto3. You’re now equipped with knowledge to start working programmatically with S3.

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy