Download Free Scalable Inference Algorithms For Determinantal Point Processes Book in PDF and EPUB Free Download. You can read online Scalable Inference Algorithms For Determinantal Point Processes and write the review.

Determinantal Point Processes (DPPs) are probability distributions on subsets of a collection of points that tend to generate diverse configurations of points. This feature makes them suitable as a probabilistic model of diversity. Recently this idea has been exploited extensively in subset selection problems, where given a large set of items such as images, documents, or any other form of collected data, the goal is to select a small, yet diverse and representative subset. However, with the rapid growth of datasets size, in order to utilize DPPs for real-world tasks, we need to design new primitives and inference algorithms that can be run efficiently in these settings. This thesis focuses on two inference tasks for DPPs: In the first part, we study sampling algorithms for DPPs and offer efficient MCMC based algorithms which can be applied in both discrete and continuous domains. In the second part, we consider the problem of determinant maximization which is equivalent to the Maximum a Posteriori encoding for DPPs, and present scalable algorithms in a distributed setting which assumes the input data are arbitrarily split among numerous nodes.
This monograph provides a comprehensible introduction to DPPs, focusing on the intuitions, algorithms, and extensions that are most relevant to the machine learning community.
In this thesis we explore a probabilistic model that is well-suited to a variety of subset selection tasks: the determinantal point process (DPP). DPPs were originally developed in the physics community to describe the repulsive interactions of fermions. More recently, they have been applied to machine learning problems such as search diversification and document summarization, which can be cast as subset selection tasks. A challenge, however, is scaling such DPP-based methods to the size of the datasets of interest to this community, and developing approximations for DPP inference tasks whose exact computation is prohibitively expensive. A DPP defines a probability distribution over all subsets of a ground set of items. Consider the inference tasks common to probabilistic models, which include normalizing, marginalizing, conditioning, sampling, estimating the mode, and maximizing likelihood. For DPPs, exactly computing the quantities necessary for the first four of these tasks requires time cubic in the number of items or features of the items. In this thesis, we propose a means of making these four tasks tractable even in the realm where the number of items and the number of features is large. Specifically, we analyze the impact of randomly projecting the features down to a lower-dimensional space and show that the variational distance between the resulting DPP and the original is bounded. In addition to expanding the circumstances in which these first four tasks are tractable, we also tackle the other two tasks, the first of which is known to be NP-hard (with no PTAS) and the second of which is conjectured to be NP-hard. For mode estimation, we build on submodular maximization techniques to develop an algorithm with a multiplicative approximation guarantee. For likelihood maximization, we exploit the generative process associated with DPP sampling to derive an expectation-maximization (EM) algorithm. We experimentally verify the practicality of all the techniques that we develop, testing them on applications such as news and research summarization, political candidate comparison, and product recommendation.
Intensity estimation is challenging for modern large-scale multivariate or marked point process data. When the total number of events is large, the mark space is non-trivial or the covariates are complicated, computationally efficient methods adapted to the unique challenges the dataset poses are required. This work considers three such settings with millions of events and develops scalable intensity estimation methods. In the first setting, we model what we call Point-to-Point processes, where the events observed over time are interactions between two entities in their state spaces. We will use the NYC Taxi dataset as an example, to model taxi trips interacted between pick-up and drop-off locations with the departure times being the time of occurrence. With fine grid discretization in both time and state spaces, the events are represented as a huge sparse tensor and the intensity can be represented as a dense tensor of the same dimension. We choose a special form of non-negative tensor decomposition to compress the intensity tensor, with an algorithm operating directly on the compressed tensors and the sparse entries of the data tensor. The second setting considers a grouped change point detection problem in high dimensional multivariate point processes. We propose two scalable algorithms: a maximization likelihood program with log-intensities group total-variation constraint, which is solved by a Frank-Wolfe algorithm; and a wild binary segmentation algorithm with a CUSUM statistics derived in the Poisson process context. These two methods are compared based on their empirical performances on simulated data with millions of events and hundreds of dimensions, and applied to military security data. The third setting addresses right censored survival data with complex covariates (e.g. medical images). Applying deep neural network to model the effects of these covariates under the Cox proportional hazard model is tempting, but the stochastic gradient descent is not trivially scalable under the partial likelihood loss. With different models for the baseline hazard function between mini-batch steps, we present various losses as objectives for mini-batch gradient descent, that are scalable to large dataset with high-dimensional covariates, and compare the algorithms for these losses with simulated survival data.
Probabilistic modeling, as known as probabilistic machine learning, provides a principled framework for learning from data, with the key advantage of offering rigorous solutions for uncertainty quantification. In the era of big and complex data, there is an urgent need for new inference methods in probabilistic modeling to extract information from data effectively and efficiently. This thesis shows how to do theoretically-guaranteed scalable and reliable inference for modern machine learning. Considering both theory and practice, we provide foundational understanding of scalable and reliable inference methods and practical algorithms of new inference methods, as well as extensive empirical evaluation on common machine learning and deep learning tasks. Classical inference algorithms, such as Markov chain Monte Carlo, have enabled probabilistic modeling to achieve gold standard results on many machine learning tasks. However, these algorithms are rarely used in modern machine learning due to the difficulty of scaling up to large datasets. Existing work suggests that there is an inherent trade-off between scalability and reliability, forcing practitioners to choose between expensive exact methods and biased scalable ones. To overcome the current trade-off, we introduce general and theoretically grounded frameworks to enable fast and asymptotically correct inference, with applications to Gibbs sampling, Metropolis-Hastings and Langevin dynamics. Deep neural networks (DNNs) have achieved impressive success on a variety of learning problems in recent years. However, DNNs have been criticized for being unable to estimate uncertainty accurately. Probabilistic modeling provides a principled alternative that can mitigate this issue; they are able to account for model uncertainty and achieve automatic complexity control. In this thesis, we analyze the key challenges of probabilistic inference in deep learning, and present novel approaches for fast posterior inference of neural network weights.
This monograph provides a comprehensible introduction to DPPs, focusing on the intuitions, algorithms, and extensions that are most relevant to the machine learning community.
In this dissertation, we focus on Markov logic networks (MLNs), an advanced modeling language that combines first-order logic, the cornerstone of traditional Artificial Intelligence (AI), with probabilistic graphical models, the cornerstone of modern AI. MLNs are routinely used in a wide variety of application domains including natural language processing and computer vision, and are preferred over propositional representations because unlike the latter they yield compact, interpretable models that can be easily modified and tuned. Unfortunately, even though the MLN representation is compact and efficient, inference in them is notoriously difficult and despite great progress, several inference tasks in complex real-world MLNs are beyond the reach of existing technology. In this dissertation, we greatly advance the state-of-the-art in MLN inference, enabling it to solve much harder and larger problems than existing approaches. We develop several domain-independent principles, techniques and algorithms for fast, scalable and accurate inference that fully exploit both probabilistic and logical structure. This dissertation makes the following five contributions. First, we propose two approaches that respectively address two fundamental problems with Gibbs sampling, a popular approximate inference algorithm: it does not converge in presence of determinism and it exhibits poor accuracy when the MLN contains a large number of strongly correlated variables. Second, we lift sampling-based approximate inference algorithms to the first-order level, enabling them to take full advantage of symmetries and relational structure in MLNs. Third, we develop novel approaches for exploiting approximate symmetries. These approaches help scale up inference to large, complex MLNs, which are not amenable to conventional lifting techniques that exploit only exact symmetries. Fourth, we propose a new, efficient algorithm for solving a major bottleneck in all inference algorithms for MLNs: counting the number of true groundings of each formula. We demonstrate empirically that our new counting approach yields orders of magnitude improvements in both the speed and quality of inference. Finally, we demonstrate the power and promise of our approaches on Biomedical event extraction, a challenging real-world information extraction task, on which our system achieved state-of-the-art results.
This dissertation focuses on Markov logic networks (MLNs), a knowledge representation tool that elegantly unifies first-order logic (FOL) and probabilistic graphical models (PGMs). FOL enables compact representation while probability allows the user to model uncertainty in a principled manner. Unfortunately, although the representation is compact, inference in MLNs is quite challenging, as PGMs generated from MLNs typically have millions of random variables and features. As a result, even linear time algorithms are computationally infeasible. Recently, there has been burgeoning interest in developing "lifted" algorithms to scale up inference in MLNs. These algorithms exploit symmetries in the PGM associated with an MLN, detecting them in many cases by analyzing the first-order structure without constructing the PGM, and thus have time and space requirements that are sub-linear when symmetries are present and can be detected. However, previous research has focused primarily on lifted marginal inference while algorithms for optimization tasks such as maximum-a-posteriori (MAP) inference are far less advanced. This dissertation fills this void, by developing next generation algorithms for MAP inference. This dissertation presents several novel, scalable algorithms for MAP inference in MLNs. The new algorithms exploit both exact and approximate symmetries, and experimentally are orders of magnitude faster than existing algorithms on a wide variety of real-world MLNs. Specifically, this dissertation makes the following contributions: A key issue with existing lifted approaches is that one has to make substantial modifications to highly engineered, well-researched inference algorithms and software, developed in the PGM community over the last few decades. We address this problem by developing the ``lifting as pre-processing'' paradigm, where we show that lifted inference can be reduced to a series of pre-processing operations that compresses a large PGM to a much smaller PGM. Another problem with current lifted algorithms is that they only exploit exact symmetries. In many real-world problems, very few exact symmetries are present while approximate symmetries are abundant. We address this limitation by developing a general framework for exploiting approximate symmetries that elegantly trades solution quality with time and space complexity. Inference and weight learning algorithms for MLNs need to solve complex combinatorial counting problems. We propose a novel approach for formulating and efficiently solving these problems. We scale-up two approximate inference algorithms, Gibbs sampling and MaxWalkSAT and three weight learning algorithms, Contrastive Divergence, Voted Perceptron, and, Pseudo-log-likelihood learning. We propose novel approximate inference algorithms for accurate, scalable inference in PGMs having shared sub-structures but no shared parameters. We demonstrate both theoretically and experimentally that they outperform state-of-the-art approaches.
The three volume proceedings LNAI 10534 – 10536 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2017, held in Skopje, Macedonia, in September 2017. The total of 101 regular papers presented in part I and part II was carefully reviewed and selected from 364 submissions; there are 47 papers in the applied data science, nectar and demo track. The contributions were organized in topical sections named as follows: Part I: anomaly detection; computer vision; ensembles and meta learning; feature selection and extraction; kernel methods; learning and optimization, matrix and tensor factorization; networks and graphs; neural networks and deep learning. Part II: pattern and sequence mining; privacy and security; probabilistic models and methods; recommendation; regression; reinforcement learning; subgroup discovery; time series and streams; transfer and multi-task learning; unsupervised and semisupervised learning. Part III: applied data science track; nectar track; and demo track.