Keynote Speakers

Victor Vianu

Victor Vianu is a Professor of Computer Science at the University of California, San Diego. He received his PhD in Computer Science from the University of Southern California in 1983. He has taught at

the Ecole Normale Superieure and Ecole Nationale Superieure des Telecommunications in Paris, as well as the Sorbonne, and has spent sabbaticals as Invited Professor at INRIA. Vianu's interests include database theory, computational logic, and Web data. His most recent research focuses on static analysis of XML-based systems, and specification and verification of data-driven Web services.

Vianu's publications include over 100 research articles and a graduate textbook on database theory. He has given numerous invited talks, is a member of several editorial boards, and has served as General Chair of SIGMOD and Program Chair of the PODS and ICDT conferences. He was elected Fellow of the ACM in 2006.

Automatic Verification of Database-Driven Systems: A New Frontier

Software systems centered around a database are becoming pervasive in numerous applications. However, such systems are often very complex and prone to costly bugs, whence the need for verification of critical properties. Recently, a novel approach to verification of database-driven systems has been taking shape. Instead of applying general-purpose techniques with only partial guarantees of success, it aims to identify restricted but sufficiently expressive classes of applications and properties for which sound and complete verification can be performed in a fully automatic way. This leverages the emergence of high-level specification tools for database-centered applications that not only allow fast prototyping and improved programmer productivity but, as a side effect, provide convenient targets for automatic verification. We present theoretical and practical results on verification of database-driven systems. The results are quite encouraging and suggest that, unlike arbitrary software systems, significant classes of database-driven systems may be amenable to automatic verification. This relies on a novel marriage of database and model checking techniques, of relevance to both the database and the computer aided verification communities.

Umesh Dayal

Umeshwar Dayal is an HP Fellow in the Intelligent Information Management Lab at Hewlett-Packard Laboratories, Palo Alto, California. In this role, he has initiated research programs in enterprise-scale data warehousing, scalable analytics, and information visualization.

Umesh has nearly 30 years of research experience in data management and has made fundamental contributions in the field, including developing some of the basic techniques for managing federated databases, defining mechanisms for triggering transactions and database rules, and investigating query optimization strategies, especially in heterogeneous systems. In addition, he has done important work in long-duration transactions, business process management, and database design. He has published over 160 research papers and holds over 25 patents in the areas of database systems, transaction management, business process management, business intelligence and information visualization. In 2001, he received (with two co-authors) the prestigious 10-year best paper award from the International Conference on Very Large Data Bases for his paper on a transactional model for long-running activities.

Prior to joining HP Labs, Umesh was a senior researcher at DEC's Cambridge Research Lab, Chief Scientist at Xerox Advanced Information Technology and Computer Corporation of America, and on the faculty at the University of Texas-Austin. He received his PhD from Harvard University.

Umesh has served on the Editorial Board of four international journals, and he has chaired and served on the Program Committees of numerous conferences. Most recently, he was General Program Chair of VLDB 2006. He has served as a member of the Board of the VLDB Endowment, the Executive Committee of the IEEE Technical Committee on Electronic Commerce, and the Steering Committee of the SIAM International Conference on Data Mining; and as a founding member of the Board of the International Foundation for Cooperative Information Systems.

Title: Data Integration Flows for Business Intelligence

Umeshwar Dayal, Malu Castellanos, Alkis Simitsis, Kevin Wilkinson

Business Intelligence (BI) refers to technologies, tools, and practices for collecting, integrating, analyzing, and presenting large volumes of information to enable better decision making. Today's BI architecture typically consists of a data warehouse (or one or more data marts), which consolidates data from several operational databases, and serves a variety of front-end querying, reporting, and analytic tools. The back-end of the architecture is a data integration pipeline for populating the data warehouse by extracting data from distributed and usually heterogeneous operational sources; cleansing, integrating and transforming the data; and loading it into the data warehouse. Since BI systems have been used primarily for off-line, strategic decision making, the traditional data integration pipeline is a serial, batch process, usually implemented by extract-transform-load (ETL) tools. The design and implementation of the ETL pipeline is largely a labor-intensive activity, and typically consumes a large fraction of the effort in data warehousing projects. Increasingly, as enterprises become more automated, data-driven, and real-time, the BI architecture is evolving to support operational decision making. This imposes additional requirements and tradeoffs, resulting in even more complexity in the design of data integration flows. These include reducing the latency so that near real-time data can be delivered to the data warehouse, extracting information from a wider variety of data sources, extending the rigidly serial ETL pipeline to more general data flows, and considering alternative physical implementations. We describe the requirements for data integration flows in this next generation of operational BI system, the limitations of current technologies, the research challenges in meeting these requirements, and a framework for addressing these challenges. The goal is to facilitate the design and implementation of optimal flows to meet business requirements.

Georg Gottlob

Georg Gottlob is a Professor of Computing Science at Oxford University and a Fellow of St Anne's College.

His interests include data extraction, data exchange, algorithms for semistructured data and XML processing, database theory, algorithms for games and auctions, graph or hypergraph-based algorithms for problem decomposition, knowledge representation and reasoning, complexity in AI and logic programming, complexity theory, finite model theory, and computational complexity.

His current research deals with database theory, query languages, data exchange, and with graph-theoretic problem decomposition methods that can be used for recognizing large classes of tractable instances of hard problems. The latter methods have applications in query optimization, in constraint satisfaction, and in game theory and electronic commerce (e.g. winner determination in combinatorial auctions). Georg Gottlob is a founding member of the recently established Oxford-Man Institute of Quantitative Finance.

Datalog+/-: A Unified Approach to Ontologies and Integrity Constraints

Andrea Cali, Georg Gottlob, and Thomas Lukasiewicz

Andrea Cali, Georg Gottlob, and Thomas Lukasiewicz We report on a recently introduced family of expressive extensions of Datalog, called Datalog+/-, which is a new framework for representing ontological axioms in form of integrity constraints, and for query answering under such constraints. Datalog+/- is derived from Datalog by allowing existentially quantified variables in rule heads, and by enforcing suitable properties in rule bodies, to ensure decidable and efficient query answering. We first present different languages in the Datalog+/- family, providing tight complexity bounds for all cases but one (where we have a low complexity AC0 upper bound). We then show that such languages are general enough to capture the most common tractable ontology languages. In particular, we show that the DL-Lite family of description logics and F-Logic Lite are expressible in Datalog+/-. We finally show how stratified negation can be added to Datalog+/- while keeping ontology querying tractable in the data complexity. Datalog+/- is a natural and very general framework that can be successfully employed in different contexts such as data integration and exchange. This survey mainly summarizes two recent papers.