A Data Engineer's Guide to Amazon EMR

This white paper presents explanations for key architecture components of Amazon Elastic MapReduce (EMR) and guides for getting started. Learn the building blocks of distributed workloads on EMR including ETL with Spark, Big Data Migration, and Machine Learning.

The authors have contributed best practices and lessons learned from their thousands of hours of combined experience.

About the authors: 

— Sam Portillo, Data Engineer

— Pooja Krishnan, Data Engineer

— Emma York, Data Engineer and Technical Manager

— Rodrigo Moran, Software / Data Engineer 


