This white paper presents explanations for key architecture components of Amazon Elastic MapReduce (EMR) and guides for getting started. Learn the building blocks of distributed workloads on EMR including ETL with Spark, Big Data Migration, and Machine Learning.
The authors have contributed best practices and lessons learned from their thousands of hours of combined experience.
About the authors:
— Sam Portillo, Data Engineer
— Pooja Krishnan, Data Engineer
— Emma York, Data Engineer and Technical Manager
— Rodrigo Moran, Software / Data Engineer
In that blog, I briefly examine Snowflake Procedures and discuss when Procedures should be used versus User Defined Functions (UDF)s. In the following, I am going to examine Snowflake Procedures further.Read more
In this blog post, I will explain how you can run all of your transformation processes using dbt directly on Airflow and take advantage of all its features. All of the code in this blog post is available at this GitHub repository.Read more
In this guide, we will be building a CLI tool from scratch. No fancy frameworks or libraries -- instead, we are building our own highly minimal framework loosely based on Cobra. Here's a taste of what we're building: 🐟 gupiRead more