Download Free Efficient Query Processing On Graph Databases Book in PDF and EPUB Free Download. You can read online Efficient Query Processing On Graph Databases and write the review.

Many databases today capture both, structured and unstructured data. Making use of such hybrid data has become an important topic in research and industry. The efficient evaluation of hybrid data queries is the main topic of this thesis. Novel techniques are proposed that improve the whole processing pipeline, from indexes and query optimization to run-time processing. The contributions are evaluated in extensive experiments showing that the proposed techniques improve upon the state of the art.
Graph data are extensively associated with state-of-the-art applications in a variety of domains which include Linked Data and Social Media. This drives the need to have graph databases that can effectively store and manage graph data. Relational query processing has become efficient due to many decades of research in the field of data management and processing, among which translating SQL into relational algebra operations plays a key role in query processing. Based on relational algebra, many graph algebras have been defined that can be used for query processing and optimization in graph databases. We propose a graph algebra which operates on graph databases, for processing queries. We have implemented a graph algebra as a part of ScalaTion and compared it with Neo4j and MySQL with respect to query processing times. Various queries are tested on datasets with a few vertices to a large number of vertices. Graph databases perform well when the database gets larger compared to relational databases. Increase in the number of joins in queries, decreases the performance of relational databases, whereas equivalent queries in graph databases comparatively exhibit good performance. Among graph databases compared in the study, ScalaTion shows better performance.
Graph data modeling and querying arises in many practical application domains such as social and biological networks where the primary focus is on concepts and their relationships and the rich patterns in these complex webs of interconnectivity. In this book, we present a concise unified view on the basic challenges which arise over the complete life cycle of formulating and processing queries on graph databases. To that purpose, we present all major concepts relevant to this life cycle, formulated in terms of a common and unifying ground: the property graph data model—the pre-dominant data model adopted by modern graph database systems. We aim especially to give a coherent and in-depth perspective on current graph querying and an outlook for future developments. Our presentation is self-contained, covering the relevant topics from: graph data models, graph query languages and graph query specification, graph constraints, and graph query processing. We conclude by indicating major open research challenges towards the next generation of graph data management systems.
This book is an anthology of the results of research and development in database query processing during the past decade. The relational model of data provided tremendous impetus for research into query processing. Since a relational query does not specify access paths to the stored data, the database management system (DBMS) must provide an intelligent query-processing subsystem which will evaluate a number of potentially efficient strategies for processing the query and select the one that optimizes a given performance measure. The degree of sophistication of this subsystem, often called the optimizer, critically affects the performance of the DBMS. Research into query processing thus started has taken off in several directions during the past decade. The emergence of research into distributed databases has enormously complicated the tasks of the optimizer. In a distributed environment, the database may be partitioned into horizontal or vertical fragments of relations. Replicas of the fragments may be stored in different sites of a network and even migrate to other sites. The measure of performance of a query in a distributed system must include the communication cost between sites. To minimize communication costs for-queries involving multiple relations across multiple sites, optimizers may also have to consider semi-join techniques.
Knowledge graphs are increasingly used in scientific and industrial applications. The large number and size of knowledge graphs published as Linked Data in autonomous sources has led to the development of various interfaces to query these knowledge graphs. Therefore, effective query processing approaches that enable efficient information retrieval from these knowledge graphs need to address the capabilities and limitations of different Linked Data Fragment interfaces. This book investigates novel approaches to addressing the challenges that arise in the presence of decentralized, heterogeneous sources of knowledge graphs. The effectiveness of these approaches is empirically evaluated and demonstrated using various real world and synthetic large-scale knowledge graphs throughout. First, a sample-based approach for generating fine-grained performance profiles is proposed, and it is demonstrated how the information from such profiles can be leveraged in cost model-based query planning. In addition, a sample-based data distribution profiling approach is advocated which aims to estimate the statistical profile features of large knowledge graphs and the applicability of these estimations in federated querying processing is demonstrated. The remainder of the book focuses on techniques to devise efficient query processing approaches when heterogeneous interfaces need to be queried but no fine-grained statistics are available. Robust techniques to support efficient query processing in these circumstances are investigated and results are shared to demonstrate the way in which these techniques can outperform state-of-the-art approaches. Finally, the author describes a framework for federated query processing over heterogeneous federations of Linked Data Fragments to exploit the capabilities of different sources by defining interface-aware approaches.
In the last years, Linked Data initiatives have encouraged the publication of large graph-structured datasets using the Resource Description Framework (RDF). Due to the constant growth of RDF data on the web, more flexible data management infrastructures must be able to efficiently and effectively exploit the vast amount of knowledge accessible on the web. This book presents flexible query processing strategies over RDF graphs on the web using the SPARQL query language. In this work, we show how query engines can change plans on-the-fly with adaptive techniques to cope with unpredictable conditions and to reduce execution time. Furthermore, this work investigates the application of crowdsourcing in query processing, where engines are able to contact humans to enhance the quality of query answers. The theoretical and empirical results presented in this book indicate that flexible techniques allow for querying RDF data sources efficiently and effectively.
Graph databases provide a natural way of storing and querying graph data. In contrast to relational databases, queries over graph databases enable to refer directly to the graph structure of such graph data. For example, graph pattern matching can be employed to formulate queries over graph data. However, as for relational databases running complex queries can be very time-consuming and ruin the interactivity with the database. One possible approach to deal with this performance issue is to employ database views that consist of pre-computed answers to common and often stated queries. But to ensure that database views yield consistent query results in comparison with the data from which they are derived, these database views must be updated before queries make use of these database views. Such a maintenance of database views must be performed efficiently, otherwise the effort to create and maintain views may not pay off in comparison to processing the queries directly on the data from which the database views are derived. At the time of writing, graph databases do not support database views and are limited to graph indexes that index nodes and edges of the graph data for fast query evaluation, but do not enable to maintain pre-computed answers of complex queries over graph data. Moreover, the maintenance of database views in graph databases becomes even more challenging when negation and recursion have to be supported as in deductive relational databases. In this technical report, we present an approach for the efficient and scalable incremental graph view maintenance for deductive graph databases. The main concept of our approach is a generalized discrimination network that enables to model nested graph conditions including negative application conditions and recursion, which specify the content of graph views derived from graph data stored by graph databases. The discrimination network enables to automatically derive generic maintenance rules using graph transformations for maintaining graph views in case the graph data from which the graph views are derived change. We evaluate our approach in terms of a case study using multiple data sets derived from open source projects.
Property graph model is a semantically rich model for real-world applications that represent their data as graphs, e.g., communication networks, social networks, financial transaction networks. On-Line Analytical Processing (OLAP) provides an important tool for data analysis by allowing users to perform data aggregation through different combinations of dimensions. For example, given a Q&A forum dataset, in order to study if there is a correlation between a poster's age and his or her post quality, one may ask what is the average age of users grouped by the post score. Another example is that, in the field of music industry, it may be interesting to ask what total sales of records are with respect to different music companies and years so as to conduct a market activity analysis. Surprisingly, current graph databases do not efficiently support OLAP aggregation queries. In most cases, such queries are transformed to a sequence of join operations, and the system computes everything from scratch. For example, Neo4j, a state-of-art graph database system, processes each OLAP query in two steps. First, it expands the nodes and edges that satisfy the given query constraint. Then it performs the aggregation over all the valid substructures returned from the first step. However, in data warehousing workloads, it is common to have repeated queries from time to time. Computing everything from scratch would be highly inefficient. Materialization and view maintenance techniques developed in traditional RDBMS have proved to be efficient for processing OLAP workloads. Following the generic materialization methodology, in this thesis we develop a structure-aware cuboid caching solution to efficiently support OLAP aggregation queries over property graphs. Structure-aware means that our solution takes both heterogeneous attributes and graph topological information into consideration. The essential idea is to precompute and materialize some views based on statistics of history workload, such that future query processing can be accelerated. We implement a prototype system on top of Neo4j. Empirical studies over real-world property graphs show that, with a reasonable space cost constraint, our solution on average achieves 15-30x speedup over native Neo4j in time efficiency.
This book presents a comprehensive overview of fundamental issues and recent advances in graph data management. Its aim is to provide beginning researchers in the area of graph data management, or in fields that require graph data management, an overview of the latest developments in this area, both in applied and in fundamental subdomains. The topics covered range from a general introduction to graph data management, to more specialized topics like graph visualization, flexible queries of graph data, parallel processing, and benchmarking. The book will help researchers put their work in perspective and show them which types of tools, techniques and technologies are available, which ones could best suit their needs, and where there are still open issues and future research directions. The chapters are contributed by leading experts in the relevant areas, presenting a coherent overview of the state of the art in the field. Readers should have a basic knowledge of data management techniques as they are taught in computer science MSc programs.