# Interviews

The interview is an **oral exam**, in which the interviewers will assess each candidate’s knowledge and skills, particularly in informatics and statistics, as well as his/her motivation for studying Data Science.

Please find below a catalogue of questions for both areas, as well as some reading recommendations. You will be asked some of these questions during the interview.

**Statistics**

- Explain quantities like expectation, median, variance, quantile, etc. for a random variable X.
- For a continuous random variable X explain how the density is related to the distribution function.
- Assume that X is normally distributed with mean m and variance s2. Explain what this means, and state some properties of the normal distribution. How can we calculate P(X < x)?
- Explain the principle ideas of Maximum Likelihood. Use Maximum Likelihood to estimate the parameter of a Bernoulli (p) distribution when given n independent 0/1 i.i.d. observations.
- Explain the meaning of a confidence interval.
- What is the meaning of covariance and correlation? Give a mathematical definition. What is the difference between the theoretical and the empirical covariance?
- What is the purpose of a regression model. Give an example.
- Explain how a regression model is used for forecasting.
- Give an example for a 2x2 table. How is the odds ratio defined?
- What is the aim of a cluster analysis?

### Literature

- F.M. Dekking, C. Kraaikamp, H.P. Lopuhuä, L.E. Meester. A modern introduction to probability and statistics. Springer.
- G. Grimmett, D. Stirzaker. Probability and random processes. Oxford Univ. Pr.
- A.M. Mood, F.A. Graybill, D.C. Boes. Introduction to the theory of statistics. McGraw-Hill.

**Informatics**

- Name elementary data types and how they are represented within computer memory.
- What is an algorithm and which types of control structures are generally used to implement algorithms?
- What is a function and how can it be used for implementing recursion?
- What are arrays, compound data types (aka records) and references (aka pointers)?
- What is the difference between compiled and interpreted programming languages?
- What are the time complexity and the memory complexity of an algorithm?
- Name at least two commonly used sorting algorithms and explain their computational complexity.
- Name two approaches to represent a graph in a data structure? Name a method to find out whether a graph is connected based on one of these structures?
- Name the advantages of a database system compared to storing data in plain text files?
- What are a relation, a tuple and a primary key in the relational data model?

### Literature

- Thomas H. Cormen. Introduction to algorithms. MIT press, 2009.
- David J. Eck. Introduction to programming using Java, 7th ed. Version 7.0, August 2014 (Version 7.0.2, with mostly typographical corrections, December 2016). Web: http://math.hws.edu/javanotes/
- Cay S. Horstmann. Computing concepts with Java essentials, 3rd ed. Wiley, 2003.
- R. Ramakrishnan, J. Gehrke. Database management systems, 3rd ed. McGraw Hill, 2002.

Last update: 10 December 2021