^{1}

^{2}

^{2}

^{1}

^{2}

Edited by: Heni Ben Amor, Georgia Institute of Technology, USA

Reviewed by: Wannes Meert, KU Leuven, Belgium; Neil Thomas Dantam, Georgia Institute of Technology, USA

This article was submitted to Computational Intelligence, a section of the journal Frontiers in Robotics and AI.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The field of Probabilistic Logic Programming (PLP) has seen significant advances in the last 20 years, with many proposals for languages that combine probability with logic programming. Since the start, the problem of learning probabilistic logic programs has been the focus of much attention. Learning these programs represents a whole subfield of Inductive Logic Programming (ILP). In Probabilistic ILP (PILP), two problems are considered: learning the parameters of a program given the structure (the rules) and learning both the structure and the parameters. Usually, structure learning systems use parameter learning as a subroutine. In this article, we present an overview of PILP and discuss the main results.

Probabilistic Logic Programming (PLP) started in the early 90s with seminal works such as those of Dantsin (

Since then, the field has steadily developed and many proposals for the integration of logic programming and probability have appeared, allowing the representation of both complex relations among entities and uncertainty over them. These proposals can be grouped into two classes: those that use a variant of the distribution semantics (Sato,

The distribution semantics underlines many languages such as Probabilistic Logic Programs (Dantsin,

The languages following a KBMC approach include Relational Bayesian Network (Jaeger,

Learning probabilistic logic programs has been considered from the start: Sato (

PILP uses declarative probabilistic languages that allow learned models to be easily understood by humans. Moreover, languages based on the distribution semantics are Turing complete, thus representing very expressive target formalisms. Recently, effective PILP systems have been proposed that achieve good results on a variety of domains, including biology, chemistry, medicine, entity resolution, link prediction, and web page classification.

In the following, we present an updated overview of PILP by concentrating on languages under the distribution semantics.

We illustrate the distribution semantics through ProbLog (De Raedt et al., _{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}

Example 1: The following program encodes the fact that a person sneezes if he has the flu and this is the active cause of sneezing, or if he has hay fever and hay fever is the active cause of sneezing:

Note that, in the case a ground atom can be derived from more than one ground rule, the contributions in terms of probability of the ground rules are combined with a noisy-OR gate. Hommerson and Lucas (

The problem of computing the probability of queries is called inference. Solving it by computing all the worlds and then identifying those that entail the query is impractical as the number of possible worlds is exponential in the number of ground probabilistic facts. Usually, inference is performed by resorting to knowledge compilation (Darwiche and Marquis,

An early method for exact inference in Relational Bayesian Networks (RBNs) was proposed in Chavira et al. (

The first knowledge compilation approach for performing inference on languages based on the distribution semantics (De Raedt et al., _{1} with _{2} with _{1} ∨ _{2}.

Computing the probability of a Boolean DNF formula is an intractable problem (Rauzy et al., _{1} ∨ _{2} is shown in Figure

Other reasoning systems based on the BDD language are PICL (Riguzzi,

An alternative approach to exact inference can be realized using compilation to d-DNNFs (deterministic Decomposable Negation Normal Form) rather than BDDs (Fierens et al.,

Since the cost of inference may be very high, approximate algorithms have been developed. They either compute subsets of possibly incomplete explanations or use random sampling. In the first approach, a subset of the explanations provides a lower bound and the set of partially expanded explanations provides an upper bound (Kimmig et al.,

In Choi and Darwiche (

In Riguzzi (

Recently, lifted inference approaches have appeared that perform inference without first grounding the model. In this way, groups of indistinguishable individuals are treated as a whole and not individually. The exploitation of the symmetries in the model can significantly speed up inference. For example, consider the following program:

In this case ^{m}

Bellodi et al. (

Van den Broeck et al. (

The problem that PILP aims at solving can be expressed as:

Given

background knowledge as a probabilistic logic program

a set of positive and negative examples ^{+} and ^{-}

a language bias

Find

a probabilistic logic program

This problem has two variants: parameter learning and structure learning. In the first, we are given the structure (the rules) of

Parameter learning for languages following the distribution semantics has been performed by using the Expectation Maximization (EM) algorithm or by gradient descent.

The EM algorithm is used to estimate the probability of models containing random variables that are not observed in the data. This is the case of PLP under the distribution semantics because of the use of combining rules: these imply the presence of unobserved variables. The EM algorithm consists of a cycle in which the steps of Expectation and Maximization are repeatedly performed. In the Expectation step, the distribution of the hidden variables is computed according to the current values of the parameters, while in the Maximization step, the new values of the parameters are computed. Examples of approaches that use EM are PRISM (Sato and Kameya,

Gradient descent methods compute the gradient of the target function and iteratively modify the parameters moving in the direction of the gradient. An example of these methods is LeProbLog (Gutmann et al.,

One of the first structure learning works is (Koller and Pfeffer,

De Raedt et al. (

SEM-CP-logic (Meert et al.,

ProbFOIL (De Raedt and Thon,

SLIPCASE (Bellodi and Riguzzi,

PLP can be framed into the broader area of Probabilistic Programming (PP), which is receiving an increasing attention especially in the field of Machine Learning, as is testified by the ongoing DARPA project “Probabilistic Programming for Advancing Machine Learning.” PLP differs for the use of Logic Programming, which provides a declarative reading of the programs. The array of algorithms for performing inference with PLP is constantly expanding, quickly approaching the variety of algorithms available for other PP languages.

In the field of Statistical Relational Learning (SRL) (Getoor and Taskar,

There are many avenues for future research. Improving the efficiency of inference is very important, since it is a basic component of learning systems. The use of new languages for knowledge compilation, such as Sentential Decision Diagrams (Darwiche,

Regarding learning systems, parameter learning should be combined with lifted inference to speed up the process. Other forms of parameter optimizations can be applied, drawing inspiration from the algorithms developed for related formalisms such as Markov Logic. For structure learning, other search approaches can be investigated, such as local and randomized search, and methods that learn the parameters and the structure at the same time can be considered as Natarajan et al. (

An earlier version of this paper appeared in the ALP Newsletter (

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.