Data Analysis and Machine Learning

This course will cover several of the SQL interfaces (e.g., Hive, Drill, Impala, and Spark) used to perform data exploration and the statistical functions used to return summary information and advanced analysis. Additionally, this course will explore several open-source tools used for scalable, parallel, and distributed machine learning. You will discuss, demonstrate, and employ the KNIME Analytics Platform and SDK, as well as Spark, to train, evaluate, and validate several predictive models and to describe and apply basic clustering and classification algorithms.

You'll Walk Away with

  • The ability to use the SQL query tools Hive, Drill, Impala, and Spark to produce summary reports of data
  • Knowledge of the applicability of machine learning as used with Hadoop data, such as data mining algorithms
  • The confidence to run classifications using Spark
  • The formal practice of using Naive Bayes and decision trees to apply data to various categories or classes
  • The ability to apply association rule learning to find relationships in data
  • The aptitude to use clustering to discover similarities among data
