ESG Data Science: Overview of modules
The core module Statistics covers fundamental statistical concepts and methods and consists of two courses. The first course, Statistical Reasoning and Inference, comprises (i) traditional and modern methods of statistical inference (maximum likelihood, composite likelihood, multiple testing, false discovery rate, etc.); (ii) Bayesian approaches including computer intensive Markov-Chain-Monte-Carlo (MCMC) methods; and (iii) computer-based inference approaches like bootstrapping (resampling). The lecture is accompanied by an exercise class in which the content of the lecture will be consolidated and numerical tools such as R will be applied.
In the course Sampling and Experimental Design fundamental ideas of sampling and experimental design are introduced. In the presence of massive data (Big Data) computation and data analytics can be numerically very demanding or even infeasible. It may therefore be useful to draw a sample from the data (subsampling) instead of analysing the entire data source itself. The sampling idea is extended to network-related data and unequal probability sampling. A second focus of the lecture is on the analysis of “observational data” and related problems of potential biases. Finally, fundamental concepts and ideas of experimental design will be introduced.
The core module Informatics gives an overview of the steps of the knowledge discovery process and consists of two courses. In the first course, Knowledge Discovery and Data Mining, different feature representations and similarity measures are explained. Based on these, various methods from the areas of data mining and pattern search are introduced (e.g. lazy learning, density-based clustering, k-medoid clustering, local outlier factor, a-priori algorithm, FP-growth, suffix-trees, GSpan).
The second course, Big Data Management, focuses on the implementation of analysis methods and information systems for large, complex, and volatile data sets. First, the implementation of established data mining methods in parallel, distributed, and streaming systems is introduced. Modern data processing frame works are presented that are used for managing, processing, and distributing data in data science applications. These systems include batch processing (e.g. Hadoop, Apache Spark), streaming systems (e.g. Storm), and document-based database systems (e.g. Lucene).
Each student will be assigned to two courses from a variety of courses in advanced methods of statistics and informatics. These comprise lectures on statistical modelling, multivariate data analysis, advanced programming, and database systems, in regression modelling and multivariate data analysis, among others. At the end of the module they will be on a homogeneous level of expertise in both statistics and informatics.
The module Human Computation and Analytics covers those aspects of Data Science, in which humans either produce data, and process and analyse it with the help of algorithms, or in which data are presented to humans by a computer system. In the area of Human-Computer Interaction (HCI), the basics of human perception and cognition are introduced as well as some approaches for the design of usable systems. The lecture part on Visual Analytics (VA) covers the visual analysis of data by the human user as well as some visualization techniques. The lecture part on Human Computation (HC) gives an introduction to distributed data collection by humans (crowdsourcing), and the processing of data by humans, for example in the form of online games (HC). The course includes lab meetings, in which students develop their own concepts based on what they have learned in the lectures. In the practical, students will implement their own concepts for HC/VA systems in the form of a working prototype.
Predictive Modelling, in particular by means of non-linear, non-parametric methods, has become a central part of modern data analysis both in computer science and statistics in order to uncover complex patterns and relationships in data. The module covers models such as decision trees, neural networks, support vector machines, and ensembles (random forest, bagging, boosting) and concludes with advanced techniques regarding model selection, feature selection, and hyperparameter optimization.
The module Data Ethics and Data Security covers basic legal and ethical questions and challenges of data security. The module comprises two courses. The first course looks into methodological questions of data anonymisation and technical aspects of data security. The second course is a lecture series with (invited) talks by different speakers on ethical and legal aspects of data security. Students are introduced to the technical, legal, and ethical issues of data security, especially when dealing with personal data or when planning experiments in Data Science.
In the elective modules, students may choose courses in specialized fields from the regularly offered master courses in statistics, informatics, and computer linguistics. In addition, students may also attend master level courses at the partner universities. These include courses on image processing at TUM, computational finance at Augsburg University, and mathematical statistics at TUM.
In this module, publications of current research in Data Science will be discussed. Students will learn to work independently with scientific publications and to present newly acquired scientific knowledge. This module also comprises the summer schools, the focused tutorials, and the DataFest.
The module Data Science Practical plays a central role in the curriculum of the master program. Practical experience with data-analytic methods that are taught in the core and elective modules is essential in order to generate knowledge from data. Students will work on practical problems in the field of data science. The problems are typically concrete projects provided by non-university partners. The focus of the course is therefore not only on tackling methodological challenges in the analysis of massive data (Big Data) but on communicating the results and findings to the client.
The master thesis concludes the study program. The thesis may be either research-orientated or stimulated through a practical problem, e.g. as an extension of a data science practical.