SDS 407: Fundamentals of Data Science
Course Title |
Fundamentals of Data Science |
||||||
Course Code |
SDS 407 |
||||||
Course Type |
Mandatory |
||||||
Level |
Master’s |
||||||
Year / Semester |
1st Semester |
||||||
Instructor’s Name |
Dr. Simone Bacchio (Lead Instructor), Dr. Leonidas Christodoulou | ||||||
ECTS |
5 |
Lectures / week |
2 |
Laboratories / week |
1 |
||
Course Purpose and Objectives |
The aim of this course is to introduce students to data science, big data analysis and statistics for data science, providing both the necessary theoretical and practical skills. This includes a focus on statistical methods for data scientists, including an introduction to probability theory and linear algebra, inference and estimation, as well as topics on model evaluation and hypothesis testing. To develop a set of practical skills and tools in terms of visualizing, exploring, storing and processing data, and an introduction to big data and data science tools. |
||||||
Learning Outcomes |
By the end of the course, the students will have a good grasp on statistical knowledge related to data science, and be able to apply this knowledge to data using modern tools and libraries. The students will also be able to perform exploratory data analysis and visualization, data pre-processing, and basic techniques for predictive analysis and model evaluation. Students will also be taught in Python, which will form the basis for all following courses. Summarizing, by the end of the course students should be able to:
Apply basic techniques in predictive analysis and model evaluation. |
||||||
Prerequisites |
None |
Requirements | None | ||||
Course Content |
Week 1. Introduction to Data Programming and Statistical Learning. Review of basic mathematics; Statistics for data science. Introduction to programming environments for data science in python Week 2. Data Wrangling. Learn how to access, clean, and process different types of raw/unstructured data and prepare for downstream tasks such as visualization, aggregation, training. Week 3. Data Representation and Visualization. Introduction to exploratory data analysis techniques, visualization, and data representations. Week 4. Introduction to statistical and machine learning: Regression and fitting techniques (e.g., linear regression and correlated fits), classification. Week 5. Evaluation and Validation. Hypothesis testing; statistical significance, inference, practical applications in real-world problems. Week 6. Data management for Data Analysis. Study on databases and database design for big data. Week 7. HPC for Big Data in Python. Development of high-performance computing algorithms on CPU and GPU for processing big datasets. |
||||||
Teaching Methodology |
Lectures, Labs |
||||||
Bibliography |
|
||||||
Assessment |
Combination of coursework and exam |
||||||
Language |
English |