Module 1: Getting Started with HDInsight - This module introduces Hadoop, the MapReduce paradigm, and HDInsight.
- What is Big Data?
- Introduction to Hadoop
- Working with MapReduce Function
- Introducing HDInsight
Module 2: Deploying HDInsight Clusters - This module provides an overview of the Microsoft Azure HDInsight cluster types, in addition to the creation and maintenance of the HDInsight clusters.
- Identifying HDInsight cluster types
- Managing HDInsight clusters by using the Azure portal
- Managing HDInsight Clusters by using Azure PowerShell
Module 3: Authorizing Users to Access Resources - This module provides an overview of non-domain and domain-joined Microsoft HDInsight clusters, in addition to the creation and configuration of domain-joined HDInsight clusters.
- Non-domain Joined clusters
- Configuring domain-joined HDInsight clusters
- Manage domain-joined HDInsight clusters
Module 4: Loading data into HDInsight - This module provides an introduction to loading data into Microsoft Azure Blob storage and Microsoft Azure Data Lake storage.
- Storing data for HDInsight processing
- Using data loading tools
- Maximizing value from stored data
Module 5: Troubleshooting HDInsight - In this module, you will learn how to interpret logs associated with the various services of the Microsoft Azure HDInsight cluster to troubleshoot any issues you might have with these services.
- Analyze HDInsight logs
- YARN logs
- Heap dumps
- Operations management suite
Module 6: Implementing Batch Solutions - In this module, you will look at implementing batch solutions in Microsoft Azure HDInsight by using Hive and Pig.
- Apache Hive storage
- HD Insight data queries using Hive and Pig
- Operationalize HDInsight
Module 7: Design Batch ETL solutions for big data with Spark - This module provides an overview of Apache Spark, describing its main characteristics and key features.
- What is Spark?
- ETL with Spark
- Spark performance
Module 8: Analyze Data with Spark SQL - This module describes how to analyze data by using Spark SQL. In it, you will be able to explain the differences between RDD, Datasets and Dataframes, identify the uses cases between Iterative and Interactive queries, and describe best practices for Caching, Partitioning and Persistence.
- Implementing iterative and interactive queries
- Perform exploratory data analysis
Module 9: Analyze Data with Hive and Phoenix - In this module, you will learn about running interactive queries using Interactive Hive (also known as Hive LLAP or Live Long and Process) and Apache Phoenix.
- Implement interactive queries for big data with interactive hive.
- Perform exploratory data analysis by using Hive
- Perform interactive processing by using Apache Phoenix
Module 10: Stream Analytics - The Microsoft Azure Stream Analytics service has some built-in features and capabilities that make it as easy to use as a flexible stream processing service in the cloud.
- Stream analytics
- Process streaming data from stream analytics
- Managing stream analytics jobs
Module 11: Implementing Streaming Solutions with Kafka and HBase - In this module, you will learn how to use Kafka to build streaming solutions.
- Building and Deploying a Kafka Cluster
- Publishing, Consuming, and Processing data using the Kafka Cluster
- Using HBase to store and Query Data
Module 12: Develop big data real-time processing solutions with Apache Storm - This module explains how to develop big data real-time processing solutions with Apache Storm.
- Persist long term data
- Stream data with Storm
- Create Storm topologies
- Configure Apache Storm
Module 13: Create Spark Streaming Applications - This module describes Spark Streaming; explains how to use discretized streams (DStreams); and explains how to apply the concepts to develop Spark Streaming applications.
- Working with Spark Streaming
- Creating Spark Structured Streaming Applications
- Persistence and Visualization