A Data Engineer's Guide to Amazon EMR

This white paper presents explanations for key architecture components of Amazon Elastic MapReduce (EMR) and guides for getting started. Learn the building blocks of distributed workloads on EMR including ETL with Spark, Big Data Migration, and Machine Learning.

The authors have contributed best practices and lessons learned from their thousands of hours of combined experience.

About the authors: 

— Sam Portillo, Data Engineer

— Pooja Krishnan, Data Engineer

— Emma York, Data Engineer and Technical Manager

— Rodrigo Moran, Software / Data Engineer 


Our latest news

Articles 11/30/2022

IoT Cloud Infrastructure for the Win, A Startup’s Story of Leveling Up Past PoC

An early startup business in the wellness industry looked to Ippon to help drive their product forward. The client offers an immersive meditation experience that combines traditional meditation techniques with the proven benefits of vibration, sound, and light.

Read more
IoT Cloud Infrastructure for the Win, A Startup’s Story of Leveling Up Past PoC
Articles 11/24/2022

OpenTelemetry And Friends

OpenTelemetry (OTel) is an open-source initiative to provide a standardised approach for the capture and distribution of metrics, trace and log data from applications. It defines not just APIs and schemas, but also a set of standard vendor-neutral SDKs and monitoring agents to facilitate collection and exports.

Read more
OpenTelemetry And Friends
Articles 11/22/2022

Python in Production Part 1 of 5

Writing Python code is fast and easy. It is thanks to this fact that Python as a language has gained immense popularity. Putting Python code into production can also be fast and easy, if you follow a few guidelines. In this series of blog posts, I will cover the how and why of several different aspects of writing ready for production python code.

Read more