Data Engineering Course

Part - the smallest piece of the curriculum, a notebook requiring ~10 hours of study time. Usually, 5 Parts make up a Sprint. A Part can contain a Project requiring corrections (usually the 5th part of a regular Sprint) or theoretical knowledge with some practical exercises and a quiz (usually the first 4 Parts of a Sprint). To progress further in the course, either a quiz or a correction needs to be completed.

Project - a Part dedicated completely for practical work. A project aims to incorporate as many topics from the current and previous sprints as possible to allow practicing your skills. Most projects require 1 STL and 1 peer correction to be passed.

Sprint - a larger piece of the curriculum requiring ~50 hours to complete. It is either a collection of 5 Parts out of which one is a project, or one larger capstone project. A sprint always requires a correction to be passed.

Capstone project - a practical task at the end of a module that takes a whole sprint (~50) hours to complete. It allows to practice all of the skills learned throughout a module

Module - Largest piece of the curriculum, usually made up of 3 Regular Sprints and 1 Capstone Project Sprint. Takes about 200 hours to complete. Some of the modules can be optional.

Specialisation module - a module that a learner chooses from a pool of options depending on the data roles and companies that they plan on applying to. The module covers the tools, skills and technologies needed for specific roles or companies. Most specialisation modules are prepared in cooperation with our Hiring Partners.

Course Structure*

Module 1: Introduction to Data Engineering
	Sprint 1	Intermediate Python & Git - Python data model, Python sequences, Git basics
	Sprint 2	Introduction to Relational Databases & SQL Basics - Python mutability and object references, SQL queries
	Sprint 3	Intermediate SQL - SQL joins, subqueries, sets, and strings
Module 2: Fundamentals of Data Engineering
	Sprint 1	Advanced Python & Linux Shell Commands - Linux distribution and architecture, shell commands, Python interfaces, and inheritance
	Sprint 2	Managing Relational Databases & Advanced SQL - database security and compliance, Python iterators and generators, SQL indices, transactions, and views
	Sprint 3	Working with Data Pipelines & Apache Airflow - constructing ETL pipelines, Airflow DAGs, and workflows
Module 3: Intermediate Data Engineering
	Sprint 1	Data Warehousing & dbt - enterprise data warehousing, defining data models with dbt
	Sprint 2	Data Mesh & ML systems design - architecture, principles of data mesh, feature engineering, model development, and evaluation
	Sprint 3	Docker & Intro to MLOps - Docker basics, container concept, and containerization principles, ML model monitoring, and continual learning
Specialisation modules
Module 4 (should choose either A or B)		A) Google Cloud Platform
		B) Amazon Web Services
Module 5 (optional)		Data analysis and visualisation with Python

*Turing College reserves the right to update and (or) amend the course curriculum and its structure as well as release new course versions. Major changes are most likely in the first (pilot) batches of the course.

Choosing a specialisation module

You get to choose the optional modules after you complete the first 3 modules of the course. Some things to take into account when choosing are:

Which areas do I want to get better in?
Which companies and positions are you most interested in – are they looking for some specific skills that these modules offer?

You will be able to make the choice in the platform. While most learners are expected to do either 4A or 4B, In case you would like to do both, let us know via the support chat in the platform.