DS-2002: Data Science Systems

Jon Tupitza email
School of Data Science
University of Virginia

Course Schedule

Unit/Topic Topics(s) Labs/Activities
00 Course Intro: Data Engineering Basics An Introduction to Data Engineering
Beginners Guide to Data Engineering 1
Beginners Guide to Data Engineering 2
Beginners Guide to Data Engineering 3
Unit 1 Structured Data Systems
(SQL Relational Databases)
Activity 01: Installing/Using MySQL & MySQL Workbench
Topic 1 Online Transaction Processing (OLTP) Schema Design Activity 02: Creating and Populating the Northwind Database
Topic 2 SQL Language and Querying Fundamentals Lab 01: SQL Query Fundamentals
Topic 3 Advanced SQL Query Language: Advanced Topics Activity 03: Advanced SQL Querying Techniques
Topic 4 Online Analytical Processing (OLAP) Schema Design Activity 04: Creating the Northwind_DW Data Warehouse
Topic 5 Extract-Transform-Load (ETL) Processing Lab 02: Basic ETL Processing (with SQL)
Unit 2 Python Programming for Data Engineering Activity 01: Installing/Using Anaconda Python with Jupyter Notebooks
Topic 1 Python Fundamentals Activity 02: Python Language Basics in Jupyter Notebooks
Topic 2 Using Python to Interact with SQL Database Systems (MySQL) Activity 03: Using Python to Interact with MySQL in Jupyter Notebooks
Topic 3 Using Python to Interact with File System Data Activity 04: Using Python to Interact with Files in Jupyter Notebooks
Topic 4 Using Python to Interact with Application Program Interfaces (APIs) Activity 05: Using Python to Interact with APIs in Jupyter Notebooks
Topic 5 Using Python to Extract, Transform and Load Data Lab 03: Using Python to Perform Extract-Transform-Load (ETL) Processing
Project 1 Create a Data Warehouse Using Data from Various Sources
Unit 3 Semi-Structured Data Systems (NoSQL) Activity 01: Installing MongoDB & MongoDB Compass
Topic 1 Introduction to NoSQL Database Systems Activity 02: Provisioning MongoDB Atlas (Cloud Version)
Topic 2 Using Python to Interact with NoSQL Database System (MongoDB) Activity 03: Using Python to Interact with MongoDB in Jupyter Noteboks
Topic 3 Working with Polyschematic Data and JSON Activity 04: MongoDB Querying Fundamentals with JavaScript Object Notation (JSON)
Topic 4 Integrating MongoDB Data into the Northwind_DW Data Warehouse Lab 04: Extending the Northwind_DW Data Warehouse with Data from MongoDB
Unit 4 Data Lakehouse Architectures & Real-Time Streaming Systems Activity 01: Provisioning a Spark/PySpark Development Environment
Topic 1 Introduction to Apache Spark & PySpark Activity 02: Running and Configuring Apache Spark/PySpark
Topic 2 Spark SQL Language and Query Fundamentals Activity 03: Using Spark-SQL to Query File-based Data
Topic 3 Spark Files, Databases, Tables and Views Activity 04: Using PySpark to Create Tables and Views
Topic 4 Integrating Real-Time Data with Structured Streaming Lab 05: Incremental Updates with PySpark Structured Streaming
Topic 5 Data Integration & ETL Processing in Spark Lab 06: Using PySpark to Implement the Medallion Architecture
Project 2 Create a Data Lakehouse Using PySpark
Topic 6 Integrating Databases with Spark Activity 05: Connecting to MySQL and SQL Server with PySpark
Topic 7 Integrating NoSQL Databases with Databricks Activity 06: Connecting to MongoDB with PySpark