About This Course
This course introduces Cloudera Data Science Workbench (CDSW) and prepares learners to use it for data science and machine learning workflows with the Python and R languages.
Payment and Registration
You can purchase this course on its own, or as part of our Full Library subscription.
- Purchase this course alone
- Purchase the full OnDemand library (includes courses for developers, administrators, and data analysts)
This course includes over 6 hours of video content. Note: In order to complete the hands-on exercises for this course, students must have access to CDSW through their organization. In addition, the CDSW environment must be running Apache Spark 2.
Through narrated demonstrations and hands-on exercises, learners achieve proficiency in CDSW and develop the skills required to:
- Navigate CDSW’s options and interfaces with confidence
- Create projects in CDSW and collaborate securely with other users and teams
- Develop and run reproducible Python and R code
- Customize projects by installing packages and setting environment variables
- Connect to a secure (Kerberized) Cloudera (CDH) or Hortonworks (HDP) cluster
- Work with large-scale data using Apache Spark 2 with PySpark and sparklyr
- Perform end-to-end machine learning workflows in CDSW using Python or R (read, inspect, transform, visualize, and model data)
- Measure, track, and compare machine learning models using CDSW’s Experiments capability
- Deploy models as REST API endpoints serving predictions using CDSW’s Models capability
- Work collaboratively using CDSW together with Git
- Navigate the UI in CDSW 1.6
- Use a third-party editor with CDSW 1.6
Audience and Prerequisites
This course is intended for Python and R users at organizations running Cloudera Data Science Workbench (CDSW) under a trial license or commercial license. The learner must have access to a CDSW environment on a Cloudera (CDH) or Hortonworks (HDP) cluster running Apache Spark 2 in order to complete the hands-on exercises. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required.