263-5200-00L  Data Mining: Learning from Large Data Sets

Semester Frühjahrssemester 2014
Dozierende A. Krause
Periodizität jährlich wiederkehrende Veranstaltung
Lehrsprache Englisch


Kurzbeschreibung Many scientific and commercial applications require insights from massive, high-dimensional data sets. This courses introduces principled, state-of-the-art techniques from statistics, algorithms and discrete and convex optimization for learning from such large data sets. The course both covers theoretical foundations and practical applications.
Lernziel Many scientific and commercial applications require us to obtain insights from massive, high-dimensional data sets. In this graduate-level course, we will study principled, state-of-the-art techniques from statistics, algorithms and discrete and convex optimization for learning from such large data sets. The course will both cover theoretical foundations and practical applications.
Inhalt Topics covered:
- Dealing with large data (Data centers; Map-Reduce/Hadoop; Amazon Mechanical Turk)
- Fast nearest neighbor methods (Shingling, locality sensitive hashing)
- Online learning (Online optimization and regret minimization, online convex programming, applications to large-scale Support Vector Machines)
- Multi-armed bandits (exploration-exploitation tradeoffs, applications to online advertising and relevance feedback)
- Active learning (uncertainty sampling, pool-based methods, label complexity)
- Dimension reduction (random projections, nonlinear methods)
- Data streams (Sketches, coresets, applications to online clustering)
- Recommender systems
Voraussetzungen / Besonderes Prerequisites: Solid basic knowledge in statistics, algorithms and programming. Background in machine learning is helpful but not required.