In today’s article, we will learn how to read files present in S3 buckets from our virtual EC2 instance using Python programming.
Let’s get started 🙂
What is Amazon S3?
S3, also known as Simple Storage Services is object-level public storage provided by AWS. With S3, we can store and retrieve any amount of data (structured or unstructured). In S3, we create buckets which serve as containers for multiple objects
Each object in an S3 bucket is assigned a unique URL.
- Download and install an Anaconda navigator.
- Launch a Linux based EC2 instance. You can refer to: Web-Page-Deployment-on-AWS-EC2
- Create an IAM role that gives your virtual machine full access to S3 and attach your IAM role to your EC2 instance. If you are stuck in, try checking: Assigning-IAM-Role-to-AWS-EC2
After completing the given prerequisites, you will be good to go forward with this tutorial where you will –
- Create an S3 bucket.
- Upload files to your bucket.
- Configure your AWS credentials on Anaconda Prompt.
- Run a python program on your virtual EC2 machine for viewing your uploaded files.
Let’s preview what we’ll be exploring today:
a) Create an Amazon S3 bucket
Follow the given steps for creating an S3 bucket-
i) Sign in to the AWS Management Console and open the Amazon S3 console.
ii) Choose “Create bucket.”
- Under “Bucket name” give a name to your bucket which should be unique across the whole Amazon S3.
- In Region, choose the AWS Region where you want the bucket to reside.
iii) Choose “Next” and go to “configure options”. Here, keep all the values as default and choose “Next.”
iv) Under “set permissions” un-check the box which says “block all public access”. Check the box which asks for your acknowledgment.
v) Choose “Next” and review the information of your bucket. After verifying, choose “Create Bucket”.
vi) Your bucket is created. Click on your bucket name and then choose “permissions”. Make sure “Block all public access” is unchecked and your bucket can be publicly accessed.
Yeah!! Bucket is successfully created.
b) Upload files to S3 Bucket
Step-1 : Click on your bucket name and choose “overview”.
Step-2: Choose “upload” and select files (.CVS file) from your PC that you wish to upload.
Step-3: Choose “upload” and your file will be successfully uploaded to your bucket.
We took a Salary Data csv file.
Our object/file is stored into our bucket.
c) Configure your AWS credentials on Anaconda Prompt
Moving forward, you need to configure your AWS credentials, i.e. Access Key, Secret Access Key, etc. Follow these below steps:
i) Open anaconda prompt as an administrator.
ii) If pip is not installed, make sure to install it with the command –
iii) To start working on AWS with anaconda, you need to install awscli with the command-
pip install awscli
iv) Write “aws configure” in the prompt and press enter. You will be asked to type your AWS Access Key. To find your Access Key and Secret Access Key –
- Choose “My Security Credentials” from your AWS account as shown.
- Choose “Access Keys” and then “Create new access key”.
- Download your access keys.
v) Enter your AWS Secret Access Key. Enter your default region name (the region in which your EC2 instance is launched).
vi) Enter your default output format as shown in the figure (without square brackets).
We came so far 🌸. Let’s jump on the last rock and retrieve S3 data 👇
d) Launch EC2 Instance to view S3 Content using Python
Step-1: Launch your EC2 instance using PuTTY. If you are unaware about how to launch it, refer to Web-Page-Deployment-on-AWS-EC2
Step-2: Elevate your privileges and become the root user with the following command.
Step-3: Install python 3 to your virtual machine.
yum install python3
If python3 package isn’t there, first use below command.
sudo yum install python34 python34-pip
Step-4: Install pip to your virtual EC2 instance.
yum install python3-pip
Step-5: Install the libraries “pandas”, “s3fs” and “boto3” as we will need them further.
pip3 install pandas pip3 install s3fs pip3 install boto3
Step-6: Type python3 to work on python interpreter. Run the following commands on python3 on your virtual machine –
import pandas as pd import boto3 client = boto3.client(‘s3’) path=’s3://Your_bucket_name/Your_file_name’ df=pd.read_csv(path) df.head()
Congrats, we have successfully manged to view S3 bucket files via EC2 Instance using Python Programming.
Class is over.
Go, grab a coffee now and create you bucket 😛
We would be glad if you could leave some claps here. Let us know your suggestions or any issues you encountered. The comment section is all yours!!
– Rishita Anand Sachdeva