Performance Evaluation in Database Research: Principles and Experience

Ioana Manolescu, INRIA Saclay--Ile-de-France, Paris
Stefan Manegold, CWI, Amsterdam

A significant part of today's database research focuses on improving performance of a specific system. Quantitative experiments are the best way to validate such results. However, performing experiments is not always easy. Besides the complexity of the system under test, designing an experiment, choosing the right environment and parameter values, analyzing the data which is gathered, and reporting it to a third party in an expressive and intelligible way is hard.

In this tutorial, we present a general road-map to the above steps, including tips and tricks on how to organize and present code that performs experiments, so that an outsider can repeat them.

The tutorial is primarily aimed at MS and PhD students seeking to improve their experiment practices, but more senior attendants may also find it interesting.

Scalable OLAP and Mining of Information Networks

Jiawei Han (Univ. of Illinois at Urbana-Champaign, USA)
Xifeng Yan IBM T. J. Watson Research Center, USA)
Philip S. Yu (University of Illinois at Chicago, USA)

With the ubiquity of information networks and their broad applications, there have been numerous studies on the construction, online analytical processing, and mining of information networks in multiple disciplines, including social network analysis, World-Wide Web, database systems, data mining, machine learning, and networked communication and information systems. Moreover, with a great demand of research in this direction, there is a need of a systematic introduction of methods for analysis of information networks from multiple disciplines. However, there are few systematic tutorials on such a theme. In this tutorial, we will present an organized picture on scalable OLAP (online analytical processing) and mining of information networks, with the inclusion of the following topics:

  1. an introduction to information networks and information network analysis,
  2. general statistical behavior of information networks,
  3. mining frequent subgraphs in large graphs and networks,
  4. data integration, data cleaning and data validation in information networks,
  5. clustering graphs and information networks,
  6. classi?cation of graphs and information networks;
  7. summarization and simpli?cation of graphs and information networks,
  8. OLAP and multidimensional analysis of information networks,
  9. evolution of dynamic information networks, and
  10. research challenges on OLAP and mining of information networks.

Geographic Privacy-aware Knowledge Discovery and Delivery

Fosca Giannotti, Dino Pedreschi, Yannis Theodoridis

A flood of data pertinent to moving objects is available today, and will be more in the near future, particularly due to the automated collection of privacy-sensitive telecom data from mobile phones and other location-aware devices. Such wealth of data, referenced both in space and time, may enable novel classes of applications of high societal and economic impact, provided that the discovery of consumable and concise knowledge out of these raw data is made possible. Recent research activities have developed theory, techniques and systems for geographic knowledge discovery and delivery, some of them based on privacy-preserving methods for extracting knowledge from large amounts of raw data referenced in space and time. All these efforts aim at devising knowledge discovery and analysis methods for trajectories of moving objects. The fundamental hypothesis is that it is possible, in principle, to aid citizens in their mobile activities by analysing the traces of their past activities by means of data mining techniques. For instance, behavioural patterns derived from mobile trajectories may allow inducing traffic flow information, capable to help people travel efficiently, to help public administrations in traffic-related decision making for sustainable mobility and security management, as well as to help mobile operators in optimising bandwidth and power allocation on the network. On the other hand, it is clear that the use of personal sensitive data arouses concerns about citizenís privacy rights.

In this tutorial, we establish a framework for the challenges and the mining solutions for the geographic information collected by Moving Object Database (MOD) engines. We first discuss the challenges of collecting mobility data, and elaborate on the impact of trajectory data analysis in several modern applications. We then discuss methodologies and techniques to collect raw data, reconstruct trajectory information, and efficiently store it in MODs. We continue with an overview of knowledge discovery approaches for movement data. Finally, we propose a research agenda and identify areas where interdisciplinary studies are needed