S3 data size calculate using the LAMBDA Python

S3 data size calculate using the LAMBDA Python

Inspiration to write this story

While doing a task to calculate the daily basis usage of s3 sizes based on the prefix. like we used to have different types of files (videos, images, csv files and JSON files). Wrote a script in the python using the Boto3 which will take the prefix path and later I am adding the all the size values to get the total . There is no direct way to read the total folder at once from the boto3. Everything goes well like it won't matter how much time it takes to get the total size because of all the micro services in the EC2.

Problem:

Problem is came when we migrated from EC2 to Lambda . Lambda is having the short period of time(15 Minutes). In this time I am unable to fetch the all objects and calculate the total size.

Solution:

Using the AWS-CLI we are able to get the total size(time of GB) in a time of 1-3 minutes. but it will work in EC2 only

Finally fond a solution to make this aws-cli as a layer to lambda

AWS-CLI Lambda layer

  • We need to clone this repository

  • For python3 need to checkout to this branch awscli-v2-python37

  • Do the changes in Makefile then save it

  • Run this command make build layer-zip layer-upload layer-publish

  • Copy the created layers.zip to s3 bucket and paste the URL of this bucket into layers

  • Add this layer to lambda function and use the below code

import subprocess

def lambda_handler(event, context):
    cmd = "'/opt/awscli/aws', 's3', 'ls', 's3://path_to_bucket/folder_videos', '--recursive', '--human-readable', '--summarize'"
    out = calc_size(cmd)
    return {
        'statusCode': 200,
        'body': out
    }

def calc_size(path: str):
    response = subprocess.check_output([f'{path}'])
    lines = response.split("\n")
    for line in lines:
        print("output: ", line)
    return lines