401-3620-70L Student Seminar in Statistics: Multiple Testing for Modern Data Science
| Semester | Autumn Semester 2020 |
| Lecturers | M. Löffler, A. Taeb |
| Periodicity | every semester recurring course |
| Language of instruction | English |
| Comment | Number of participants limited to 24 Mainly for students from the Mathematics Bachelor and Master Programmes who, in addition to the introductory course unit 401-2604-00L Probability and Statistics, have heard at least one core or elective course in statistics. Also offered in the Master Programmes Statistics resp. Data Science. |
Courses
| Number | Title | Hours | Lecturers | ||||
|---|---|---|---|---|---|---|---|
| 401-3620-00 S | Student Seminar in Statistics: Multiple Testing for Modern Data Science | 2 hrs |
| M. Löffler, A. Taeb |
Catalogue data
| Abstract | The course encompasses a review of approaches to multiple testing. |
| Learning objective | The students understand the relevance of multiple testing in modern applications. Further, they learn about two commonly used measures -- namely family-wise-error-rate (FWER) and false discovery rate (FDR) -- and approaches to control for them. |
| Content | In modern statistical applications it is often desired to perform thousands of statistical tests simultaneously. Performing a test at a desired level (e.g. 0.05) for each variable separately will result in many false positives. In science this is known as the ‘reproducibility crisis’. In this seminar we will review and discuss approaches to deal with this issue. First, we will consider the strong notion of FWER and how to control it via Bonferroni correction, permutation tests, step-up and hierarchical procedures or Tukey’s higher criticism. In the second part of the seminar we will investigate the less conservative FDR, discussing the classical Benjamini-Hochberg procedure, as well as more modern methods such as Knockoffs and Bayesian approaches. Throughout, we highlight the utility of discussed methods for real world applications. |
| Literature | Lecture 1: Bonferroni and Simes https://www.jstor.org/stable/4615733 Link Lecture 2: Permutation tests https://projecteuclid.org/download/pdf_1/euclid.ss/1056397487 https://arxiv.org/pdf/1106.2068.pdf Lecture 3: Hierarchical testing https://www.jstor.org/stable/27640041?seq=8#metadata_info_tab_contents https://stat.ethz.ch/~nicolai/hierarchical.pdf https://onlinelibrary.wiley.com/doi/epdf/10.1002/sim.3495 Lecture 4: Higher criticism Methodology: https://arxiv.org/pdf/1410.4743.pdf and for theoretical reference https://arxiv.org/pdf/math/0410072.pdf Application: https://ieeexplore.ieee.org/document/8192593 and for more reference https://hea-www.harvard.edu/astrostat/Stat310_fMMV/jjs_20051011.pdf Lecture 5: Benjamini-Hochberg (BH) with martingales https://www.jstor.org/stable/2346101?seq=1#metadata_info_tab_contents, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2004.00439.x Lecture 6: FDR control under dependence https://projecteuclid.org/euclid.aos/1013699998 http://www.jmlr.org/papers/volume10/blanchard09a/blanchard09a.pdf Lecture 7: Empirical null distribution http://statweb.stanford.edu/~tibs/ftp/bradfdr.pdf https://arxiv.org/pdf/1912.03109.pdf Lecture 8: Bayes FDR methods https://projecteuclid.org/download/pdf_1/euclid.aos/1074290335 https://arxiv.org/abs/1808.09748 Lecture 9: SLOPE https://projecteuclid.org/euclid.aos/1151418235 https://arxiv.org/abs/1407.3824 Lecture 10: Knockoffs https://projecteuclid.org/euclid.aos/1438606853 https://www.biorxiv.org/content/10.1101/631390v3 Lecture 11: Generalization of FWER and connections to FDR https://arxiv.org/pdf/math/0507420.pdf http://www.people.vcu.edu/~mreimers/HTDA/Korn%20-%20Controlling%20FDR.pdf Lecture 12: Exploratory testing https://arxiv.org/pdf/1208.2841.pdf https://arxiv.org/abs/1803.06790 |
| Prerequisites / Notice | Every lecture will consist of an oral presentation highlighting key ideas of selected papers by a pair of students. Another two students will be responsible for asking questions during the presentation and providing a discussion of the pros+cons of the papers at the end. Finally, an additional two students are responsible for giving an evaluation on the quality of the presentations/discussions and provide constructive feedback for improvement. |
Performance assessment
| Performance assessment information (valid until the course unit is held again) | |
Performance assessment as a semester course | |
| ECTS credits | 4 credits |
| Examiners | M. Löffler, A. Taeb |
| Type | ungraded semester performance |
| Language of examination | English |
| Repetition | Repetition only possible after re-enrolling for the course unit. |
Learning materials
| No public learning materials available. | |
| Only public learning materials are listed. |
Groups
| No information on groups available. |
Restrictions
| Places | Limited number of places. Special selection procedure. |
| Beginning of registration period | Registration possible from 01.08.2020 |
| Priority | Registration for the course unit is only possible for the primary target group |
| Primary target group | Data Science MSc (261000)
Mathematics BSc (404000) starting semester 05 Statistics MSc (436000) Mathematics MSc (437000) Applied Mathematics MSc (437100) Mathematics (Mobility) (448000) |
| Waiting list | until 23.09.2020 |
| End of registration period | Registration only possible until 11.09.2020 |
Offered in
| Programme | Section | Type | |
|---|---|---|---|
| Data Science Master | Seminar | W | |
| Mathematics Bachelor | Seminars | W | |
| Mathematics Master | Seminars | W | |
| Statistics Master | Seminar or Semester Paper | W |


Performance assessment as a semester course