Skip to main content

Cloudera Data Science Workbench Training

About This Course

Cloudera Data Science Workbench Training prepares learners to complete exploratory data science and machine learning projects using Cloudera Data Science Workbench (CDSW).

Payment and Registration

You can purchase this course on its own, or as part of our Full Library subscription.

Course Length

This course includes 4.5 hours of video content. Note: In order to complete the hands-on exercises for this course, students must have access to CDSW through their organization. In addition, the CDSW environment must be running Apache Spark 2.

Course Outline

Through narrated demonstrations and hands-on exercises, learners gain familiarity with CDSW and develop the skills required to:

  • Navigate CDSW’s options and interfaces with confidence
  • Create projects in CDSW and collaborate securely with other usersand teams
  • Develop and run reproducible Python and R code
  • Customize projects by installing packages and setting environment variables
  • Connect to a secure (Kerberized) Cloudera cluster
  • Work with large-scale data using Apache Spark 2 with PySpark and sparklyr
  • Perform full exploratory data science and machine learning workflows in CDSW using Python or R—read, inspect, transform, visualize, and model data
  • Work collaboratively using CDSW together with Git

Audience and Prerequisites

This course is designed for learners at organizations using CDSW under a Cloudera Enterprise license or a trial license. The learner must have access to a CDSW environment on a Cloudera cluster running Apache Spark 2. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required.