CSI 5311 -- READING LIST 


Peer-to-peer data management:


Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov: Schema mediation 
for large-scale semantic data sharing. VLDB J. 14(1): 68-83 (2005)

Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou: PeerDB: A P2P-based 
System for Distributed Data Sharing. ICDE 2003:633-644

Reza Akbarinia, Vidal Martins: Data Management in the APPA System. 
J. Grid Comput. 5(3): 303-317 (2007)

Reza Akbarinia, Esther Pacitti, Patrick Valduriez: Reducing network traffic 
in unstructured P2P systems using Top-k queries. Distributed and Parallel 
Databases 19(2-3): 67-86 (2006)

Mehedi Masud, Iluju Kiringa, Anastasios Kementsietsidis: Don't Mind Your 
Vocabulary: Data Sharing Across Heterogeneous Peers. OTM Conferences (1) 2005: 292-309 

Patricia Rodríguez-Gianolli, Maddalena Garzetti, Lei Jiang, Anastasios Kementsietsidis, 
Iluju Kiringa, Mehedi Masud, Renée J. Miller, John Mylopoulos: Data Sharing in the 
Hyperion Peer Database System. VLDB 2005: 1291-1294 


Data stream managemnent:

Sudipto Guha, Andrew McGregor: Approximate quantiles and the order of the stream. 
PODS 2006: 273-279 

Peter A. Tucker, David Maier, Tim Sheard, Leonidas Fegaras: Exploiting Punctuation 
Semantics in Continuous Data Streams. IEEE Trans. Knowl. Data Eng. 15(3): 555-568 (2003)

Themistoklis Palpanas, Michail Vlachos, Eamonn J. Keogh, Dimitrios Gunopulos, Wagner 
Truppel: Online Amnesic Approximation of Streaming Time Series. ICDE 2004: 339-349

Tamraparni Dasu , Shankar Krishnan , Suresh Venkatasubramanian , Ke Yi. 
An information-theoretic approach to detecting changes in multi-dimensional 
data streams (2006). In Proc. Symp. on the Interface of Statistics, Computing Science, 
and Applications

Arvind Arasu, Shivnath Babu, Jennifer Widom: The CQL continuous query language: 
semantic foundations and query execution. VLDB J. 15(2): 121-142 (2006)


Cloud computing:

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung: The Google file system. SOSP 2003: 29-43

Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. 
OSDI 2004: 137-150

Jeffrey Dean, Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. 
Commun. ACM 51(1): 107-113 (2008)

Jeffrey Dean, Sanjay Ghemawat: MapReduce: a flexible data processing tool. Commun. 
ACM 53(1): 72-77 (2010)

Michael Stonebraker, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Erik Paulson, 
Andrew Pavlo, Alexander Rasin: MapReduce and parallel DBMSs: friends or foes? 
Commun. ACM 53(1): 64-71 (2010)


Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, 
Suresh Anthony, Hao Liu, Raghotham Murthy: Hive - a petabyte scale data warehouse 
using Hadoop. ICDE 2010: 996-1005

Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, 
Raghotham Murthy, Hao Liu: Data warehousing and analytics infrastructure at facebook. SIGMOD 
Conference 2010: 1013-1020