John Cherian – Medium

John Cherian

Pinned

What version of Apache Spark would you like to use on AWS?

In terms on easy to work and granular control with then it follows the sequence Amazon EMR on EKS, EMR on EC2 , EMR serverless and then…

Sep 7, 2022

What version of Apache Spark would you like to use on AWS?

Sep 7, 2022

Amazon Bedrock learning series

Are you ready to dive into the world of Generative AI and transform your career? With the evolving landscape of technology, the demand for…

Jan 4, 2024

Amazon Bedrock learning series

Jan 4, 2024

Amazon Bedrock: The AWS Lambda of the Generative AI World

Amazon Bedrock emerges as a fully managed service, designed to simplify the development and deployment of generative AI applications…

Oct 10, 2023

Oct 10, 2023

Hosting GPTJ-6B on AWS SageMaker: Unleashing the Power of Large Language Models

Introduction

May 10, 2023

Hosting GPTJ-6B on AWS SageMaker: Unleashing the Power of Large Language Models

May 10, 2023

Spark Runtime for AWS Lambda

“Outpacing Spark with Pen and Paper: A Data Processing Dilemma." Have you heard customers claim that manually inserting 10 rows of data is…

Apr 4, 2023

Spark Runtime for AWS Lambda

Apr 4, 2023

Performance Benchmarking for Pandas On AWS lambda for CSV files

In this blog, we want to explore the data size limit for CSV format files on AWS S3 for the combination of AWS Wrangler and Pandas on AWS…

Apr 27, 2022

Performance Benchmarking for Pandas On AWS lambda for CSV files

Apr 27, 2022

Running data processing containers on AWS Lambda

AWS Lambda supports Docker Container as a Function(CaaF). This adds other dimensions to AWS Lambda, portability ,dependency handling and…

Apr 15, 2022

Running data processing containers on AWS Lambda

Apr 15, 2022

AWS SAM(Serverless Application Model)is an open source framework that enables AWS users to build…

It provides an environment where you can test AWS Lambda locally and fix bugs, saving cost and time. It integrates well with AWS Code…

Apr 13, 2022

AWS SAM(Serverless Application Model)is an open source framework that enables AWS users to build…

Apr 13, 2022

Faster Java UDF in Pyspark

Using UDFs(User Defined Functions) in spark is probably the last resort for build column based data processing logic on spark. The Spark…

Feb 11, 2022

Faster Java UDF in Pyspark

Feb 11, 2022

Comparing SerDe Modules in Python

SerDe(serialization and deserialization) is a process in the programming world where convert object into byte stream format, so that…

Nov 11, 2021

Preserving the State of Python Object at a point in time

Nov 11, 2021

John Cherian

John Cherian

Data Engineering Architect

Following

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech