CS 519 - Big Data Exploration and Analytics - Spring 2019

Instructor: Arash Termehchy

Home | Administration and Policies | Schedule | Projects

Tentative Schedule

This is the anticipated schedule and may be updated over the term.

Date

Topic

Reading

Presenter

Notes

April 2

Introduction

Challenges and Opportunities with Big Data, H. Jagadish et al. 2011.
The Claremont Report on Database Research
The Beckman Report on Database Research

Arash Termehchy

April 4

Query Languages

Foundations of Databases, Section 3, 4, 5, 6 .

Arash Termehchy

April 9

Query Languages

Arash Termehchy

April 11

Graph and Link Analysis

Authorative Sources in a Hyperlinked Environment, J. Klienburg et al. 1998,
The PageRank Citation Ranking: Bringing Order to the Web, L. Page et al. 1999 (*)

Arash Termehchy

April 16

Ranking and Rank Aggregation

Optimal aggregation algorithms for middleware, R. Fagin et al. J. Computer and System Sciences 66 (2003) (up to Section 8). (*)

Arash Termehchy

Project Proposal Due

April 18

Sampling

On Random Sampling Over Joins (*)

Arash Termehchy

April 23

Sampling

Arash Termehchy

April 25

Data Integration

Answering Queries Using Views, A Survey (up to Section 7) (*)

Arash Termehchy

April 30

Data Exploration

Dynamic Prefetching of Data Tiles for Interactive Visualization (*)

Arash Termehchy
Sanad Saha

May 2

Scalable Learning

Scalable Linear Algebra on a Relational Database System(*)

Arash Termehchy
Pavel Grechunk

May 7

Midterm Project Presentation

Midterm Project Report Due

May 9

Scalable Inference

Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS (*)

Arash Termehchy
Fransiskus Derian

May 14

Data Errors , Data Cleaning

ActiveClean: Interactive Data Cleaning For Statistical Modeling (*)
Detecting Data Errors: Where are we and what needs to be done? (*)

Arash Termehchy
John Davis
Chethan Nag Garapati

May 16

Data Cleaning I

Dependencies Revisited for Improving Data Quality(*)

Arash Termehchy
Prateek Dasgupta

May 21

Data Cleaning II

The Interaction between Record Matching and Data Repairing(*)

Arash Termehchy
Omeed Habibelahian

May 23

No Class

May 28

Data Cleaning, Continued

May 30

Data Cleaning
Data Transformation

BoostClean: Automated Error Detection and Repair for Machine Learning(*)
Wrangler: Interactive Visual Specification of Data Transformation Scripts (*)

Arash Termehchy
Christopher Buss
Lin You-Jen

June 4

Data Transformation

Interactive Program Synthesis (*)

Arash Termehchy
Aayam Shrestha

June 6

Confirmatory Data Analysis

A firm foundation for private data analysis(*)
The reusable holdout: Preserving validity in adaptive data analysis(*)

Arash Termehchy
Praveen Ilango