401-3620-70L  Student Seminar in Statistics: Multiple Testing for Modern Data Science

SemesterAutumn Semester 2020
LecturersM. Löffler, A. Taeb
Periodicityevery semester recurring course
Language of instructionEnglish
CommentNumber of participants limited to 24

Mainly for students from the Mathematics Bachelor and Master Programmes who, in addition to the introductory course unit 401-2604-00L Probability and Statistics, have heard at least one core or elective course in statistics. Also offered in the Master Programmes Statistics resp. Data Science.



Courses

NumberTitleHoursLecturers
401-3620-00 SStudent Seminar in Statistics: Multiple Testing for Modern Data Science2 hrs
Mon16:00-18:00ON LI NE »
M. Löffler, A. Taeb

Catalogue data

AbstractThe course encompasses a review of approaches to multiple testing.
Learning objectiveThe students understand the relevance of multiple testing in modern applications. Further, they learn about two commonly used measures -- namely family-wise-error-rate (FWER) and false discovery rate (FDR) -- and approaches to control for them.
ContentIn modern statistical applications it is often desired to perform thousands of statistical tests simultaneously. Performing a test at a desired level (e.g. 0.05) for each variable separately will result in many false positives. In science this is known as the ‘reproducibility crisis’.
In this seminar we will review and discuss approaches to deal with this issue. First, we will consider the strong notion of FWER and how to control it via Bonferroni correction, permutation tests, step-up and hierarchical procedures or Tukey’s higher criticism. In the second part of the seminar we will investigate the less conservative FDR, discussing the classical Benjamini-Hochberg procedure, as well as more modern methods such as Knockoffs and Bayesian approaches. Throughout, we highlight the utility of discussed methods for real world applications.
LiteratureLecture 1: Bonferroni and Simes
https://www.jstor.org/stable/4615733 Link
Lecture 2: Permutation tests
https://projecteuclid.org/download/pdf_1/euclid.ss/1056397487 https://arxiv.org/pdf/1106.2068.pdf
Lecture 3: Hierarchical testing
https://www.jstor.org/stable/27640041?seq=8#metadata_info_tab_contents
https://stat.ethz.ch/~nicolai/hierarchical.pdf
https://onlinelibrary.wiley.com/doi/epdf/10.1002/sim.3495
Lecture 4: Higher criticism
Methodology: https://arxiv.org/pdf/1410.4743.pdf and for theoretical reference https://arxiv.org/pdf/math/0410072.pdf
Application: https://ieeexplore.ieee.org/document/8192593 and for more reference
https://hea-www.harvard.edu/astrostat/Stat310_fMMV/jjs_20051011.pdf
Lecture 5: Benjamini-Hochberg (BH) with martingales
https://www.jstor.org/stable/2346101?seq=1#metadata_info_tab_contents, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2004.00439.x
Lecture 6: FDR control under dependence
https://projecteuclid.org/euclid.aos/1013699998
http://www.jmlr.org/papers/volume10/blanchard09a/blanchard09a.pdf
Lecture 7: Empirical null distribution
http://statweb.stanford.edu/~tibs/ftp/bradfdr.pdf
https://arxiv.org/pdf/1912.03109.pdf
Lecture 8: Bayes FDR methods
https://projecteuclid.org/download/pdf_1/euclid.aos/1074290335
https://arxiv.org/abs/1808.09748
Lecture 9: SLOPE
https://projecteuclid.org/euclid.aos/1151418235
https://arxiv.org/abs/1407.3824
Lecture 10: Knockoffs
https://projecteuclid.org/euclid.aos/1438606853
https://www.biorxiv.org/content/10.1101/631390v3
Lecture 11: Generalization of FWER and connections to FDR
https://arxiv.org/pdf/math/0507420.pdf
http://www.people.vcu.edu/~mreimers/HTDA/Korn%20-%20Controlling%20FDR.pdf
Lecture 12: Exploratory testing
https://arxiv.org/pdf/1208.2841.pdf
https://arxiv.org/abs/1803.06790
Prerequisites / NoticeEvery lecture will consist of an oral presentation highlighting key ideas of selected papers by a pair of students. Another two students will be responsible for asking questions during the presentation and providing a discussion of the pros+cons of the papers at the end. Finally, an additional two students are responsible for giving an evaluation on the quality of the presentations/discussions and provide constructive feedback for improvement.

Performance assessment

Performance assessment information (valid until the course unit is held again)
Performance assessment as a semester course
ECTS credits4 credits
ExaminersM. Löffler, A. Taeb
Typeungraded semester performance
Language of examinationEnglish
RepetitionRepetition only possible after re-enrolling for the course unit.

Learning materials

No public learning materials available.
Only public learning materials are listed.

Groups

No information on groups available.

Restrictions

PlacesLimited number of places. Special selection procedure.
Beginning of registration periodRegistration possible from 01.08.2020
PriorityRegistration for the course unit is only possible for the primary target group
Primary target groupData Science MSc (261000)
Mathematics BSc (404000) starting semester 05
Statistics MSc (436000)
Mathematics MSc (437000)
Applied Mathematics MSc (437100)
Mathematics (Mobility) (448000)
Waiting listuntil 23.09.2020
End of registration periodRegistration only possible until 11.09.2020

Offered in

ProgrammeSectionType
Data Science MasterSeminarWInformation
Mathematics BachelorSeminarsWInformation
Mathematics MasterSeminarsWInformation
Statistics MasterSeminar or Semester PaperWInformation