Menu

SDS 407: Fundamentals of Data Science

Course Title

Fundamentals of Data Science

Course Code

SDS 407

Course Type

Mandatory

Level

Master’s

Year / Semester

1st Semester

Instructor’s Name

Dr. Simone Bacchio (Lead Instructor), Dr. Leonidas Christodoulou

ECTS

5

Lectures / week

2

Laboratories / week

1

Course Purpose and Objectives

The aim of this course is to introduce students to data science, big data analysis and statistics for data science, providing both the necessary theoretical and practical skills.

This includes a focus on statistical methods for data scientists, including an introduction to probability theory and linear algebra, inference and estimation, as well as topics on model evaluation and hypothesis testing. To develop a set of practical skills and tools in terms of visualizing, exploring, storing and processing data, and an introduction to big data and data science tools.

Learning Outcomes

By the end of the course, the students will have a good grasp on statistical knowledge related to data science, and be able to apply this knowledge to data using modern tools and libraries. The students will also be able to perform exploratory data analysis and visualization, data pre-processing, and basic techniques for predictive analysis and model evaluation. Students will also be taught in Python, which will form the basis for all following courses.

Summarizing, by the end of the course students should be able to:

  • Explain and apply statistical knowledge related to data science on real-world datasets, using modern tools and libraries
  • Perform exploratory data analysis and visualization on datasets with different properties
  • Perform data wrangling, manipulation, and data cleaning, to get usable datasets for downstream tasks
  • Knowhow to access, manage, and store big data for analysis, while also using High- Performance Computing

Apply basic techniques in predictive analysis and model evaluation.

Prerequisites

None

Requirements None

Course Content

Week 1. Introduction to Data Programming and Statistical Learning. Review of basic mathematics; Statistics for data science. Introduction to programming environments for data science in python

Week 2. Data Wrangling. Learn how to access, clean, and process different types of raw/unstructured data and prepare for downstream tasks such as visualization, aggregation, training.

Week 3. Data Representation and Visualization. Introduction to exploratory data analysis techniques, visualization, and data representations.

Week 4. Introduction to statistical and machine learning: Regression and fitting techniques (e.g., linear regression and correlated fits), classification.

Week 5. Evaluation and Validation. Hypothesis testing; statistical significance, inference, practical applications in real-world problems.

Week 6. Data management for Data Analysis. Study on databases and database design for big data.

Week 7. HPC for Big Data in Python. Development of high-performance computing algorithms on CPU and GPU for processing big datasets.

Teaching Methodology

Lectures, Labs

Bibliography

  • Heumann, M. Schomaker, “Introduction to Statistics and Data Analysis”. Springer, 2016.
  • Haslwanter , “An Introduction to Statistics with Python”, Springer, 2016.
  • W. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977.
  • Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. No. 10. New York: Springer series in statistics, 2001

Assessment

Combination of coursework and exam

Language

English

Publications & Media