Objective:

  • Create a Lambda function.
  • Trigger the lambda function execution, on upload of csv file in S3 Bucket.
  • It will read and insert the CSV data to RDS mysql database.
  • After data ingestion, remove the file from S3 bucket.

Prerequisite:

i) Create IAM Role:

Create IAM role for lambda function with the help of below screenshot.

ii) CSV File Model:

    Create a csv file with below template for this use case.

    iii) lambda Function Zip File:

Create a zip file with lambda function(lambda_function.py) that will read the csv file from the S3 bucket and insert the data to mysql DB and after processing the data, remove the csv file from the bucket. Below are the steps to create a lambda function python file.

After creating the python file, zip the whole directory into single file. this file will use to upload lambda function.

Lambda_function.py Code:

import json
import boto3
import pymysql 

s3_cient = boto3.client('s3')
def lambda_handler(event, context):
    bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
    s3_file_name = event["Records"][0]["s3"]["object"]["key"]
    resp = s3_cient.get_object(Bucket=bucket_name, Key=s3_file_name)

    data = resp['Body'].read().decode('utf-8')
    data = data.split("\n")

    rds_endpoint  = "rds_enpoint"
    username = "" #username for RDS Mysql
    password = "" # RDS Mysql password
    db_name = "" # RDS MySQL DB name
    conn = None
    try:
        conn = pymysql.connect(rds_endpoint, user=username, passwd=password, db=db_name, connect_timeout=5)
    except pymysql.MySQLError as e:
        print("ERROR: Unexpected error: Could not connect to MySQL instance.")

    try:
        cur = conn.cursor()
        cur.execute("create table s3databseori ( id INT NOT NULL AUTO_INCREMENT, Name varchar(255) NOT NULL, PRIMARY KEY (id))") #you can change the table cols and type 
        conn.commit()
    except:
        pass
    
    with conn.cursor() as cur:
        for emp in data: # Iterate over S3 csv file content and insert into MySQL database
            try:
                emp = emp.replace("\n","").split(",")
                print (">>>>>>>"+str(emp))
                cur.execute('insert into s3databseori (Name) values("'+str(emp[1])+'")')
                conn.commit()
            except:
                continue
        cur.execute("select count(*) from s3databseori")
        
    print("Deleting the csv file from s3 bucket")
    try:
        response = s3_cient.delete_object(Bucket=bucket_name, Key=s3_file_name)
    except Exception as e:
        print(e)
        raise e
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Step 1 : Create RDS Mysql database

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.
  2. In the upper-right corner of the Amazon RDS console, choose the AWS Region in which you want to create the DB instance.
  3. In the navigation pane, choose Databases.
  4. Choose Create database.
  1. In Choose a database creation method, select Standard Create.
  2. In Engine options, choose the engine type: MariaDB, Microsoft SQL Server, MySQL, Oracle, or PostgreSQL. Microsoft SQL Server is shown here.
  3. For Version, choose the engine version
  4. In Templates, choose the template that matches your use case.
  1. Settings for master password.
    • In the Settings section, open Credential Settings.
    • Clear the Auto-generate a password check box.
    • (Optional) Change the Master username value and enter the same password in Master password and Confirm password.
  1. for DB Instance Size , keep the default configuration Setting.
  1. For Connectivity section, keep the default configuration. Change the Public access to Yes.
  1. Enter the Database name under “Additional Configuration” section.
  1. Disabled the option of “Maintenance and Deletion Protection”, if it is not require for a specific purpose. For development database this option is not recommend to use.
  2. Create Database.
  1. After creating database , go to Security groups in RDS database and make sure that its been configured with port 3306 for any traffic.

Using third-party MySQL Workbench software, we can connect to RDS database to check for values inserted into tables. Which will be executed at last step.

Step 2: Create S3 Bucket

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
  2. Choose Create bucket.
  3. The Create bucket page opens.
  4. In Bucket name, enter a DNS-compliant name for your bucket.
  5. The bucket name must:
    • Be unique across all of Amazon S3.
    • Be between 3 and 63 characters long.
    • Not contain uppercase characters.
    • Start with a lowercase letter or number.

Step 3 : Create lambda Function

  1. Visit lambda service page from web console
  1. Select create function:
    • Select Author from Scratch ,
    • Provide function name and select runtime : python-3.7
    • Select “use an existing role” option which was already created in the prerequisite step.
    • Click on Create Function.
  1. After creating Lambda Function, click on “Add Trigger” button.
  1. Under Add Trigger:
    • Select S3 Bucket under trigger configuration select the S3 bucket created from previous step
    • Event Type : All Objects create events
    • Suffix : .csv
    • Click on Add
S3 trigger has been added.
  1. Upload Lambda Function code zip file
After the zip file is uploaded , from pre-work step.
Lambda function python file also with requirement packages will be loaded from zip into lambda function.
  1. Update the “rds_endpoint”(for MySQL database) in lambda function python file , which was already created in the previous step.
  2. Enter user name, password and database name of RDS MySQL database.
  3. Click on Deploy.

Step 4: Upload An Csv file into S3 Bucket

  1. From S3 Dashboard, open your bucket and click on properties tab.
  2. Scroll down the page to “Event Notification”.
Do check that whether lambda function created in previous step is configured with your s3 bucket
  1. Come back to Objects tab and click on upload.
  1. A dialog box will pop up on your webpage, select a CSV file (created in the pre-processing step) and upload it into Bucket.
  1. After data processing the file will be removed from bucket.

Step 5: CloudWatch for Lambda function logs

  1. Open CloudWatch Service page and from left-hand side select “log groups”
  1. From log groups , select the lambda function that configured with S3 bucket.
  1. Select the latest log file and verify the printed in logs.
  1. CSV file data has printed in log, verify the file should not present in S3 bucket any more.

Step 6:Connecting to MySQL RDS with MySQL Workbench

  1. Open MySQL Workbench Software,
  1. Click on Database , and Connect to Database.
  1. In Database Dialog Box , provide information from RDS MySQL Database that has been created.
  • Hostname : RDS Endpoint
  • Username : MySQL Username
  • Password: click on “Store in Vault” and provide the password.
  1. Enter the username and password of RDS MySQL database.
  1. After connecting database ,verify the database name on top left side of the window under “SCHEMAS”, in this use case “s3tords” is the database name, select it.
  1. Verify the generated tables under selected database.



Select the table in which you need to check the rows, and provide the query to list all the rows from the table. From Result Grid we can check that all the rows from csv file is inserted into MySQL database.

Kindly comment here or mail us at support@data-stats.com and edugenixcloud@gmail.com in case of any doubts . We will reach you as soon as possible.

Categories: AWS

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert