News and Events

August 27, 2015, 12:00

MAT-MEX V14 line, 29, room 11.

1. Towards self-management in a distributed column-store system
George Chernyshev

Abstract: In this paper, we discuss a self-managed distributed column-store system which would adapt its physical design to changing workloads. Architectural novelties of column-stores hold a great promise for construction of an efficient self-managed database. At first, we present a short survey of an existing self-managed systems. Then, we provide some views on the organization of a self-managed distributed column-store system. We discuss its three core components: alerter, reorganization controller and the set of physical design options (actions) available to such a system. We present possible approaches to each of these components and evaluate them. This study is the first step towards a creation of an adaptive distributed column-store system.

2. Query Skylines for Optimization and Approximate Evaluation
Anna Yarygina and Boris Novikov

A problem of effective and efficient approximate query evaluation is addressed. We consider this problem as a special case of multi-objective optimization with 2 criteria: the computational resources spent for query evaluation and the quality of its result.

We introduce a compact approximate representation of a Pareto set (called skyline), adapt different optimization techniques to solve our problem over extended algebra, and experimentally evaluate them.

The proposed optimization and execution model provides for interactive trade of quality for speed and is also suitable for systems with firm real-time constraints.

3. The automata complexity of regular expression recognition via IDS.
Dmitry Alexandrov

We consider regular languages defined by a union of regular expressions of the form .*R_1.*R_2.*, where R_1 and R_2 are arbitrary regular expressions. Such languages may lead to ``exponential explosion'' of the number of states of recognizing automata. We propose a method of regular expression modification that allows to reduce automata complexity and estimate the number of states for the original and modified languages. We also analyze practical efficiency of the proposed method by applying it to regular expressions of Snort.

May 30, 2015, 10:00

MAT-MEX Peterhoff, room 2414.

Graduation papers: Bachelors

May 21, 2015, 15:30

MAT-MEX V14 line, 29, room 11.

New member elections: Anna Yarygina
Progress report on current research activities
Kirill Cherednik

A problem of implementing fuzzy queries processing system is of current interest. In this talk, the problem of query optimization in such a system is described, considering the fact that generally accepted approaches do not take into account the query fuzziness. Aside from that the impact of problem’s specifics on the solution is given.

May 7, 2015, 15:30

MAT-MEX V14 line, 29, room 11.

Coding public-key cryptosystems
Elizaveta Vostokova

Abstract is not provided yet.

Progress report
Daria Dzendzik

People has always try to minimize their efforts and automate all processes around them. Text analysis is not exception. Named entity recognition task, events extraction, named entity disambiguation are solved relatively successful by the world scientific community to focus on a particular issue. This report will be discussed on the ways and methods of text processing in situations where the handling of the text itself is not the ultimate goal, but part of a process; text data are noisy or refer to a specific domain area; there is a limited set of resources for the algorithms running.

December 23, 2014, 15:30

MAT-MEX V14 line, 29, room 11.

Linked Data Benchmark Council: Benchmarking for RDF and Graph Databases
Andrey Gubichev, TU Munich,

Graph-like and schema-last data models are receiving increased attention in database research and practice. Benchmarking has a stellar record of leading to rapid performance improvement and maturation, as exemplified by the dramatic success of the TPC benchmarks in the relational domain. LDBC is a European FP7 research project and subsequent industry association with the intent of replicating TPC’s success in the domain of the emerging graph and RDF data models. LDBC is now two years underway and has gathered strong industrial participation for its mission. In this talk I will describe the LDBC Social Network Benchmark (SNB) and present database benchmarking innovation in terms of methodology (choke-point driven design), correlated graph data generation, scalable benchmark driver on a workload with complex dependencies, and parameter selection for benchmark queries.

SNB has three query workloads under development: Interactive, Business Intelligence, and Graph Algorithms. In the talk I will describe the SNB Interactive Workload in detail and illustrate the workload with some early results, as well as the goals for the two other workloads.

Decemver 05, 2014 15:30

Universitetsky prosp. 28, room 4388 PETERHOF

Physical DatabaseOorganization Techniques: a Survey PDF
George Chernyshev

November 06, 2014, 15:30

MAT-MEX V14 line, 29, room 11.

Query Skylines for Optimization and Approximate Evaluation
Anna Yaryigna, Boris Novikov

A problem of effective and efficient approximate query evaluation in presence of diverse querying paradigms is addressed. We consider this problem as a special case of multi-objective optimization with 2 criteria: the resources spent for query evaluation and the quality of its result. The adopted approach is to optimize one of the criteria insuring at least the specified level of another.

We demonstrate that known query optimization techniques do not work well enough in our environment due to the ultimate need in extendable algebra and constraints on optimization objectives. We then develop a family of approximate optimization algorithms based on an auxiliary data structure called skyline, and experimentally evaluate the performance of these algorithms, including a comparison with ordinary optimization.

The proposed optimization and execution model provides for interactive trade of quality for speed and is also suitable for systems with firm real-time constraints.

October 16, 2014, 15:30

MAT-MEX V14 line, 29, room 11.

Configuring semi-supervised methods considering uncertainties
erman Sapozhnikov

In the world there is a huge amount of unlabeled data in various fields, and the relatively small number of labeled. Marking up the data requires a lot of effort. Required to minimize the work of the man on the marking up data. To achieve this, we try to combine the ideas of domain adaptation methods, active and semi-supervised learning, and to help the man to make a quick adaptation of the classifier to a new data.

This presentation is dedicated to changing the structure of the classifier, the aspect extraction in different languages ​​and interactive classification. Then I will present a plan for further work related to the association of the different parts to a single thesis.

September 18, 2014, 15:30

MAT-MEX V14 line, 29, room 11.

Cluster quality estimation
Elena Sivogolovko

Clustering has been a subject of wide research since it arises in many application domains. One of the clustering process issues is the evaluation of clustering results. Estimation of the obtained cluster structure quality is the main subject of cluster validity. In several years many cluster validity indexes were presented in the research community. Most of them can be considered a function from obtained cluster structure an given dataset. The usage of cluster validity indexes can simplify cluster quality estimantion and increase the clustering efficiency. In this talk the most widely used cluster validity indexes is descirbed. The question about applicability of different cluster validity indexes to different classes of clustering algorithms is reviewed. The data quality influence on clustering outcomes is considered. The semantic approach to cluster validity notion is presented. The estimation method of semantic cluster validity based on RDF concept is described.

May 27, 2014, 15:30

MAT-MEX V14 line, 29, room 11.

Adaptively Approximate Techniques in Distributed Architectures
Barbara Catania, University of Genova, Italy

The wealth of information generated by users interacting with the network and its applications is often under-utilized due to complications in accessing heterogeneous and dynamic data and in retrieving relevant information from sources having possibly unknown formats and structures. Processing complex requests on such information sources is, thus, costly, though not guaranteeing user satisfaction.

In such environments, requests are often relaxed and query processing is forced to be adaptive and approximate, either to cope with limited processing resources (QoS-oriented techniques), possibly at the price of sacrificing result quality, or to cope with limited data knowledge and data heterogeneity (QoD-oriented techniques), with the aim of improving the quality of results. While both kinds of approximation techniques have been proposed, most adaptive solutions are QoS-oriented. Additionally, techniques which apply a QoD-oriented approximation in a QoD-oriented adaptive way (called adaptively approximate techniques), though demonstrated potentially useful in getting the right compromise between precise and approximate computations, have been largely neglected.

In this talk, after presenting and classifying several approximate and/or adaptive query processing approaches, proposed for different distributed architectures, we show, with some concrete examples, the benefits of using adaptively approximate techniques. We then present the result of our ongoing research in the context of data stream and geo-social data management.

Feb. 26, 2014, 17:20

MAT-MEX Peterhof, room 2414.

Talking to the Database in a Semantically Rich Way: A New Approach to Resolve Object-Relational Impedance Mismatch
Henrietta Dombrovskaya, Senior Database Architect, Enova, Chicago, USA

Conventional recommendations for Object Oriented application design include the concept of Object-Relational Mapping and suggest clear separation of business logic from interaction with the database. While these requirements seem natural to application developers, it prevents them from using the full power of the database engine, and thereby become the most essential source of application performance degradation. Acknowledging the widespread usage of the above concepts, new approach developed at Enova provides an algorithm for “splitting” logic between different layers of classes.

We identify the parts of logic that are essential for data retrieval and thereby belong to the database, and the parts of logic that drive the computation or other data transformation and can reside in the application model. Although the splitting logic algorithm, as yet, is not implemented in any tool, we consider it an important part of the application design process. This presentation provides examples of redesigned methods as well as before-and-after performance data from the production system.

Oct. 03, 2013, 15:30

MAT-MEX V14 line, 29, room 11.

Approximate algorithms for algebraic operations and cost models for them in distributed scalable environment.
Alisa Pigul

Sep. 24, 2013, 12:50

MAT-MEX, room 405.

Sept. 12, 2013, 15:30

MAT-MEX V14 line, 29, room 11.

Annual reports of PhD students.

Apr 04, 2013, 11:30

MAT-MEX , room 14388.

Dept. chair elections Protocol.

Jan 24, 2013


The sub-department has been created.




Saint-Petersburg University
Mathematics and Mechanics Facultet
Publications of Information Management Research Group
  ISMW-FRUCT Conference