If you
are looking to expand your knowledge in data engineering or want to level up
your portfolio by adding Spark programming to your skillset, then you are in
the right place. This course will help you understand Spark programming and
apply that knowledge to build data engineering solutions. This course is
example-driven and follows a working session-like approach. We will be taking
a live coding approach and explaining all the concepts needed along the
way.
In this course, we will start
with a quick introduction to Apache Spark, then set up our environment by
installing and using Apache Spark. Next, we will learn about Spark execution
model and architecture, and about Spark programming model and developer experience.
Next, we will cover Spark structured API foundation and then move towards
Spark data sources and sinks. Then we
will cover Spark Dataframe and dataset transformations. We will also cover
aggregations in Apache Spark and finally, we will cover Spark Dataframe
joins.
By the end of this course, you
will be able to build data engineering solutions using Spark structured API
in Python. All the resources for the
course are available at
https://github.com/PacktPublishing/Spark-Programming-in-Python-for-Beginners-with-Apache-Spark-3