Skip to main content

Cloudera Data Science Workbench Training


CDH

About This Course

This course introduces Cloudera Data Science Workbench (CDSW) and prepares learners to use it for data science and machine learning workflows with the Python and R languages.

View the full course outline

Payment and Registration

You can purchase this course on its own, or as part of our Full Library subscription.


Course Length

This course includes over 6 hours of video content. Note: In order to complete the hands-on exercises for this course, students must have access to CDSW through their organization. In addition, the CDSW environment must be running Apache Spark 2.


Course Outline

Through narrated demonstrations and hands-on exercises, learners achieve proficiency in CDSW and develop the skills required to:

  • Navigate CDSW’s options and interfaces with confidence
  • Create projects in CDSW and collaborate securely with other users and teams
  • Develop and run reproducible Python and R code
  • Customize projects by installing packages and setting environment variables
  • Connect to a secure (Kerberized) Cloudera (CDH) or Hortonworks (HDP) cluster
  • Work with large-scale data using Apache Spark 2 with PySpark and sparklyr
  • Perform end-to-end machine learning workflows in CDSW using Python or R (read, inspect, transform, visualize, and model data)
  • Measure, track, and compare machine learning models using CDSW’s Experiments capability
  • Deploy models as REST API endpoints serving predictions using CDSW’s Models capability
  • Work collaboratively using CDSW together with Git
  • Navigate the UI in CDSW 1.6
  • Use a third-party editor with CDSW 1.6

View the full course outline

Audience and Prerequisites

This course is intended for Python and R users at organizations running Cloudera Data Science Workbench (CDSW) under a trial license or commercial license. The learner must have access to a CDSW environment on a Cloudera (CDH) or Hortonworks (HDP) cluster running Apache Spark 2 in order to complete the hands-on exercises. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required.