Data Analyst Training Icon

Apache Spark Big Data Boot Camp

3 Days Classroom Session   |  
3 Days Live Online
Classroom Registration
Group Rate:
(per registrant, 2 or more)
GSA Individual:
Live Online Registration
Live Online:
Private Onsite Package

This course can be tailored to your needs for private, onsite delivery at your location.

Request a Private Onsite Price Quote

Professional Credits


ASPE is an IIBA Endorsed Education Provider of business analysis training. Select Project Delivery courses offer IIBA continuing development units (CDU) in accordance with IIBA standards.


Select courses offer Leadership (PDU-L), Strategic (PDU-S) and Technical PMI professional development units that vary according to certification. Technical PDUs are available in the following types: ACP, PBA, PfMP, PMP/PgMP, RMP, and SP.


Learn to use Spark for your own applications in three packed hands-on days

This fast-paced 3-day course is for data engineers, data analysts, data scientists, developers and operations teams and provides a thorough, hands-on overview of the Apache Spark Platform and various technologies and paradigms which are in Apache Spark.

  • We will explore Apache Spark, how it came into existence, how it compares with Apache Hadoop – currently the de facto big data standard – and the new use cases that can be realized with Apache Spark as well as how your current use cases can be made more performant and powerful.
  • We will also look at Apache Spark’s Streaming Architecture which can help realize most of the real time-constrained needs of your business. We will also explore Apache Spark’s SQL Architecture which provides very fast migration from traditional slower analytical tools like Hive to SparkSQL.
  • We will spend some time on Apache Spark ML/ML Lib which provide a total integrated Architecture with both real-time and batch analytics.
  • Finally, we will also look at Apache Spark GraphX which deals with Graph Algorithms.

All these workshops are delivered with guided hands-on labs allowing attendees to explore the data and the techniques and familiarize themselves with the various paradigms.

Upcoming Dates and Locations
All Live Online times are listed in Eastern Time Guaranteed To Run
Request a quote for private onsite training Request
Oct 21, 2019 – Oct 23, 2019    8:30am – 4:30pm Live Online Register
Oct 21, 2019 – Oct 23, 2019    8:30am – 4:30pm Austin, Texas

Embassy Suites Austin Central
5901 North IH-35
Frontage Rd
Austin, TX 78723
United States

Dec 9, 2019 – Dec 11, 2019    8:30am – 4:30pm Atlanta, Georgia

Microtek Atlanta
1000 Abernathy Rd. NE Ste 194
Northpark Bldg 400
Atlanta, GA 30328
United States

Jan 6, 2020 – Jan 8, 2020    8:30am – 4:30pm Minneapolis, Minnesota

Embassy Suites Airport
7901 34th Avenue South
Bloomington, MN 55425
United States

Jan 6, 2020 – Jan 8, 2020    9:30am – 5:30pm Live Online Register
Mar 2, 2020 – Mar 4, 2020    8:30am – 4:30pm Live Online Register
Mar 2, 2020 – Mar 4, 2020    8:30am – 4:30pm Philadelphia, Pennsylvania

Hyatt Place
440 American Avenue
King Of Prussia, PA 19406
United States

May 4, 2020 – May 6, 2020    8:30am – 4:30pm Live Online Register
May 4, 2020 – May 6, 2020    8:30am – 4:30pm Washington, District of Columbia

Microtek-Washington, DC
1110 Vermont Avenue NW
Suite 700
Washington, DC 20005
United States

Jul 8, 2020 – Jul 10, 2020    8:30am – 4:30pm Live Online Register
Jul 8, 2020 – Jul 10, 2020    8:30am – 4:30pm Atlanta, Georgia

Microtek Atlanta
1000 Abernathy Rd. NE Ste 194
Northpark Bldg 400
Atlanta, GA 30328
United States

Sep 1, 2020 – Sep 3, 2020    8:30am – 4:30pm San Francisco, California

Learn IT
33 New Montgomery St.
Suite 300
San Francisco, CA 94105
United States

Sep 1, 2020 – Sep 3, 2020    11:30am – 7:30pm Live Online Register
Nov 2, 2020 – Nov 4, 2020    8:30am – 4:30pm Chicago, Illinois

Microtek Chicago
230 W. Monroe
Suite 900
Chicago, IL 60606
United States

Nov 2, 2020 – Nov 4, 2020    9:30am – 5:30pm Live Online Register
Course Outline

Part 1: Introduction to Big Data & Apache Spark

  1. Introduce Data Analysis
  2. Introduce Big Data
  3. Big Data Definition
  4. Introduce the techniques and challenges in Big Data
  5. Introduce the techniques and challenges in Distributed Computing
  6. Show how the functional programming approach is particularly useful in tackling these challenges
  7. Short overview of previous solutions: Google’s MapReduce and Apache Hadoop
  8. Introduce Apache Spark

Hands-on practice: We will get exposure to admin and setup

Part 2: Deploying & Understanding Apache Spark Architecture

  1. Spark Architecture in a Cluster
  2. Spark Ecosystem and Cluster Management
  3. Deploying Spark on a Cluster
  4. Deploying Spark on a Standalone Cluster
  5. Deploying Spark on a Mesos Cluster
  6. Deploying Spark on YARN cluster
  7. Cloud-based Deployment

Hands-on practice: Learn to deploy and begin using Spark

Part 3: Spark Core, RDDs and Spark Shell

  1. Dig deeper into Apache Spark
  2. Introduce Resilient Distributed Datasets (RDDs)
  3. Apache Spark installation (basic, local)
  4. Introduce the Spark Shell
  5. Actions and Transformations (Laziness)
  6. Caching
  7. Loading and Saving data files from the file system

Hands-on practice: Get hands-on with Spark Core and RDDs

Part 4: Deep Dive into RDD

  1. Tailored RDD
  2. Pair RDD
  3. NewHadoop RDD
  4. Aggregations
  5. Partitioning
  6. Broadcast Variables
  7. Accumulators

Hands-on practice: You’ll learn expanded RDD capabilities

Part 5: Spark SQL and DataFrames

  1. SparkSQL & DataFrames
  2. DataFrame & SQL API
  3. DataFrame Schema
  4. Datasets and Encoders
  5. Loading and Saving data
  6. Aggregations
  7. Joins

Hands-on practice: You’ll learn to use one of Spark’s most powerful features: DataFrames using R-style modeling supported by supercomputing clusters

Part 6: Spark Streaming

  1. A brief introduction to streaming
  2. Spark Streaming
  3. Discretized Streams
  4. Structured Streaming
  5. Stateful / Stateless Transformations
  6. Checkpointing
  7. Interoperability with Streaming Platforms (Apache Kafka)

Hands-on practice: Another of Spark 2.1’s most exciting features is the ability to provide big data streaming to allow beating the timeframe constraints of previous big data solutions

Part 7: Spark MLlib and ML

  1. Introduction to Machine Learning
  2. Spark Machine Learning APIs
  3. Feature Extractor and Transformation
  4. Classification using Logistic Regression
  5. Best Practice in ML for the Practitioners

Hands-on practice: Use Spark to perform production-friendly calls for powerful machine learning service and predictive analytics

Part 8: Graphx

  1. Brief Introduction to Graph Theory
  2. GraphX
  3. Vertex and Edge RDDs
  4. Graph operators
  5. Pregel API
  6. PageRank / Travelling Salesman Problem

Hands-on practice: Get hands-on practice using Graphx

Part 9: Testing and Debugging Spark

  1. Testing in a Distributed Environment
  2. Testing Spark Application
  3. Debugging Spark Application

Hands-on practice: You’ll get lab practice supporting Spark solutions with best practices for testing, debugging, and normal-day production issues for Spark solutions

Who should attend
  • Developers and Team Leads
  • Software Engineers
  • Business Analysts
  • System Analysts
  • Data Analysts and Scientists
  • Data Scientists
  • Operations and DevOps Engineers
  • JAVA Developers
  • Big Data Engineers

Labs can be accessed by everyone using the cloud environment set up by the instructor. Participation is not mandatory; if they prefer, attendees can simply observe the instructor perform the lab example. Scala/Python are a nice to have skill to better understand what is being done in the Labs.

Additionally, although it is not mandatory, students who have completed the self-paced Fundamentals of DevOps eLearning course have found it very helpful when completing this course.


Download the brochure