Open in app

Sign In

Write

Sign In

John Cherian
John Cherian

10 Followers

Home

About

Pinned

What version of Apache Spark would you like to use on AWS?

My First take on Amazon EMR Serverless What version of Apache Spark would you like to use on AWS? I faced the same issue at the ice cream shop when I was a child. There are 5 ways to run Spark jobs on AWS. AWS Glue, Amazon EMR-EC2, Amazon EMR on…

3 min read

What version of Apache Spark would you like to use on AWS?
What version of Apache Spark would you like to use on AWS?

3 min read


Apr 27, 2022

Performance Benchmarking for Pandas On AWS lambda for CSV files

In this blog, we want to explore the data size limit for CSV format files on AWS S3 for the combination of AWS Wrangler and Pandas on AWS Lambda. The data set used for the benchmarking was downloaded from the geographical Society. The results of the performance test may vary…

Pandas

4 min read

Performance Benchmarking for Pandas On AWS lambda for CSV files
Performance Benchmarking for Pandas On AWS lambda for CSV files
Pandas

4 min read


Apr 15, 2022

Running data processing containers on AWS Lambda

AWS Lambda supports Docker Container as a Function(CaaF). This adds other dimensions to AWS Lambda, portability, ,dependency handling, and environment setup to the serverless architecture. The CaaF supports use cases where standard environment setup is required, portability, and handling unsupported languages like PHP and hybrid cloud environments. There exists a…

Docker

4 min read

Running data processing containers on AWS Lambda
Running data processing containers on AWS Lambda
Docker

4 min read


Apr 13, 2022

AWS SAM(Serverless Application Model)is an open source framework that enables AWS users to build…

AWS SAM(Serverless Application Model)is an open source framework that enables AWS users to build, test locally and deploy AWS Lambda to the Cloud. There are many tools to simplify the development of Serverless functions: cross-vendor Serverless framework, AWS-specific Serverless Application Model (SAM), and others. The AWS SAM is widely used…

Aws Sam

5 min read

AWS SAM(Serverless Application Model)is an open source framework that enables AWS users to build…
AWS SAM(Serverless Application Model)is an open source framework that enables AWS users to build…
Aws Sam

5 min read


Feb 11, 2022

Faster Java UDF in Pyspark

Using UDFs (User Defined Functions) in spark is probably the last resort for building column-based data processing logic. The Spark UDF is an expensive operation and is used only to extend or fill in missing functionality of Spark methods or libraries or frameworks that do not have a Python wrapper…

Java Udf

4 min read

Faster Java UDF in Pyspark
Faster Java UDF in Pyspark
Java Udf

4 min read


Nov 11, 2021

Comparing SerDe Modules in Python

SerDe(serialization and deserialization) is a process in the programming world where convert object into byte stream format, so that reusability of the object in the same or across different script or environment. …

Python Programming

3 min read

Comparing SerDe Modules in Python
Comparing SerDe Modules in Python
Python Programming

3 min read


May 5, 2021

Spark on Docker

Hi, I am John Cherian working as a Data engineering architect consulting for various clients in the DC area. My key area of focus is designing big data and low latency ETL architecture on Cloud platforms like AWS, Azure, and GCP. Recently noticed that a lot of firms are moving…

Apache Spark

4 min read

Spark on Docker
Spark on Docker
Apache Spark

4 min read


Jan 22, 2020

Big data : File Compaction & Maintenance using Apache spark

Have you heard comments like “throw the files into a filesystem and big data framework will take care of the rest”, “Spark is 100x faster so it can handle any volume and any datasets”, “ Why should I worry about partition, sorting in a Big data ecosystem? It is an…

Spark

3 min read

Spark

3 min read

John Cherian

John Cherian

10 Followers

Data Engineering Architect

Following
  • Pavan Andhukuri

    Pavan Andhukuri

  • Dipanjan (DJ) Sarkar

    Dipanjan (DJ) Sarkar

  • Susan Li

    Susan Li

  • Anand Jha

    Anand Jha

  • Daniel.Queiroz

    Daniel.Queiroz

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech