^{1}

^{1}

^{1}

Edited by: Sriraam Natarajan, Indiana University System, United States

Reviewed by: Elena Bellodi, University of Ferrara, Italy; Nicola Di Mauro, Università degli studi di Bari Aldo Moro, Italy

Specialty section: This article was submitted to Computational Intelligence, a section of the journal Frontiers in Robotics and AI

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The aim of statistical relational learning is to learn statistical models from relational or graph-structured data. Three main statistical relational learning paradigms include weighted rule learning, random walks on graphs, and tensor factorization. These paradigms have been mostly developed and studied in isolation for many years, with few works attempting at understanding the relationship among them or combining them. In this article, we study the relationship between the path ranking algorithm (PRA), one of the most well-known relational learning methods in the graph random walk paradigm, and relational logistic regression (RLR), one of the recent developments in weighted rule learning. We provide a simple way to normalize relations and prove that relational logistic regression using normalized relations generalizes the path ranking algorithm. This result provides a better understanding of relational learning, especially for the weighted rule learning and graph random walk paradigms. It opens up the possibility of using the more flexible RLR rules within PRA models and even generalizing both by including normalized and unnormalized relations in the same model.

Traditional machine learning algorithms learn mappings from a feature vector indicating categorical and numerical features to an output prediction of some form. Statistical relational learning (Getoor and Taskar,

During the past two decades, three paradigms of statistical relational models have appeared. The first paradigm is the weighted rule learning where first-order rules are learned from data and a weight is assigned to each rule indicating a score for the rule. The main difference among these models is in the types of rules they allow and their interpretation of the weights. The models in this paradigm include Problog (De Raedt et al.,

The second paradigm is the random walk on graphs, where several random walks are performed on a graph each starting at a random node and probabilistically transitioning to neighboring nodes. The probability of each node being the answer to a query is proportional to the probability of the random walks ending up at that node. The main difference among these models is in the way they walk on the graph and how they interpret obtained results from the walks. Examples of relational learning algorithms based on random walk on graphs include PageRank (Page et al.,

The third paradigm is the tensor factorization paradigm, where for each object and relation an embedding is learned. The probability of two objects participating in a relation is a simple function of the objects’ and relation’s embeddings (e.g., the sum of the element-wise product of the three embeddings). The main difference among these models is in the type of embeddings and the function they use. Examples of models in this paradigm include YAGO (Nickel et al.,

The models in each paradigm have their own advantages and disadvantages. Kimmig et al. (

With several relational paradigms/models developed during the past decade and more, understanding the relationship among them and pruning the ones that either do not work well or are subsets of the other models is crucial. In this article, we study the relationship between two relational learning paradigms: graph random walk and weighted rule learning. In particular, we study the relationship among path ranking algorithm (PRA) (Lao and Cohen,

The relationship between weighted rules and graph random walks has not been discovered before. For instance, Nickel et al. (

Our result is beneficial for both graph random walk and weighted rule learning paradigms, as well as for researchers working on theory and applications of statistical relational learning. Below is a list of potential benefits that our results provide:

It provides a clearer intuition and understanding on two relational learning paradigms, thus facilitating further improvements of both.

It opens up the possibility of using the more flexible RLR rules within PRA models.

It opens up the possibility of generalizing both PRA and RLR models by using normalized and unnormalized relations in the same model.

It sheds light on the shortcomings of graph random walk algorithms and points out potential ways to improve them.

One of the claimed advantages of models based on weighted rule learning compared to other relational models is that they can be easily explained to a broad range of people (Nickel et al.,

It identifies a subclass of weighted rules that can be evaluated efficiently and have a high modeling power as they have been successfully applied to several applications. The evaluation of these weighted rules can be even further improved using sampling techniques developed within graph random walk community (e.g., see Fogaras et al. (

It facilitates leveraging new insights and techniques developed within each paradigm (e.g., weighted rule models that leverage deep learning techniques (Šourek et al.,

For those interested in the applications of relation learning, our result facilitates decision-making on selecting the paradigm or the relational model to be used in their application.

In this section, first we define some basic terminology. Then we introduce a running example, which will be used throughout the article. Then we describe relational logistic regression and path ranking algorithm for relational learning. While semantically identical, our descriptions of these two models may be slightly different from the descriptions in the original articles as we aim at describing the two algorithms in a way that simplifies our proofs.

Throughout the article, we assume True is represented by 1 and False is represented by 0.

A _{x}_{x}_{x}_{1}, …, _{k}_{i}_{1}, …, _{k}_{1}, …, _{k}_{i}_{i}_{xi}. A _{1}, …, _{k}_{1}, …, _{k}_{1}, …, _{k}_{i}_{xi}. Given a set 𝒜 of atoms, we denote by 𝒢(𝒜) the set of all possible groundings for the atoms in 𝒜. A

A _{1} ∨ _{2} of formulae or a conjunction _{1} ∧ _{2} of formulae. Our formulae correspond to open formulae in negation normal form in logic. An _{x}_{1}, …, _{k}_{1}, …, _{k}_{i}_{i}

A binary predicate _{x}^{Δy}: each _{x} is mapped to {^{−1} as the inverse of _{y}^{Δx}, such that ^{−1} (_{0} _{1} _{l}, where _{1}, _{2}, … _{l}_{0}, …, _{l}_{i}) = Δ_{xi−1} and _{i}) = Δ_{xi}. We define _{x0} and _{xi}. Applying a substitution _{1}, …, _{k}_{1}, …, _{k}_{i}_{i}

As a running example, we use the _{1}, _{2}) shows whether an existing paper _{1} has cited another existing paper _{2}. _{1}, _{2}) indicates that _{2} is the year immediately before _{1}. The reference recommendation problem can be viewed as follows: given a query paper

Relational logistic regression (Kazemi et al.,

Let

Following Kazemi et al. (

E

_{0} is a bias. _{1} considers existing papers that have been published a year before the query paper. A positive weight for this WF indicates that papers published a year before the query paper are more likely to be cited. _{2} considers existing papers cited by the other papers published in the same year as the query paper. A positive weight for this WF indicates that as the number of times a paper has been cited by the other papers published in the same year as the query paper grows, the chances of the query paper citing that paper increases. _{3} considers existing papers that have been cited by other papers that have been themselves cited by other papers. Note that the score of the last WF depends only on the paper being cited not on the paper citing.

Consider the citations among existing papers in Figure _{2}) according to the WFs above. Applying the substitution {⟨_{2}⟩} to the above four WFs gives the following four WFs, respectively:

Then we evaluate each WF. The first one evaluates to _{0}. The second evaluates to 0 as _{2} has also been published in 2017. The third WF evaluates to _{2} ∗ 2 as there are 2 papers that have been published in the same year as _{2}. The last WF evaluates to _{3} ∗ 4 as _{5} and _{6} (that cite _{2}) are each cited by two other papers. Therefore, the conditional probability of WillCite(_{2}) is as follows:

Let _{s}

In PRA, each path relation 𝒫 ℛ= _{0} _{1} _{e}_{e}_{0} is considered on the objects in Δ_{x0}, corresponding to the probability of landing at each of these objects if the object is selected randomly. For instance, if there are _{x0}, _{0} for all objects is _{1} over the objects in Δ_{x1} is calculated by marginalizing over the variables in _{0} and following a random step on R_{1}. For instance, for an object _{1} ∈Δ_{x1}, assume R_{1} (_{0}, _{1}) holds only for two objects _{0} and _{0} in Δ_{x0}. Also assume _{0} and _{0} have the R_{1} relation with _{1}, respectively. Then the probability of landing at _{1} is _{2}, …, _{l}_{l}_{e}

Let

Algorithm _{x0}. _{x0}) indicates a uniform probability over the objects in Δ_{x0}. This is the termination criterion of the recursion. When 𝒫 ℛ= _{0} _{1} _{l} is not empty (_{l} = _{0} _{1} _{l −1}. The probability of landing at any object _{l}_{l}_{l}_{l}_{Rl} is a normalization constant indicating the number of possible transitions from _{l}_{l}_{l} stores the probability of landing at any object

_{0} _{1} _{l} |

_{xl} when starting randomly at any object in Δ_{x}_{0} and walking on 𝒫 ℛ. |

1: |

2: _{x}_{0}) |

3: 𝒫 ℛ′= _{0} _{1} _{l−1} |

4: _{l}_{− 1} = |

5: |

6: _{l} |

7: |

8: _{Rl}(_{l}( |

9: |

10: |

11: _{l}(_{l−1}( |

12: _{l} |

E

_{0} is a bias, _{1} considers the papers published a year before the query paper, _{2} considers papers cited by other papers published in the same year as the query paper, and _{3} mimics PageRank algorithm for finding important papers in terms of citations (cf. (Lao and Cohen, _{2}) according to the PRA model above. Applying the substitution {⟨_{2}⟩} to the above WPRs gives the following WPRs, respectively:

_{0} evaluates to _{0}. _{1} evaluates to 0. _{2} evaluates to _{2} _{5} or _{6} and then there is _{5} to _{2} and _{6} to _{2} according to Cited relation. WPR_{3} evaluates to _{3}_{3} ∗ 0. 083. The _{3} to _{5} and then to _{2}, and so forth. Therefore, the conditional probability of _{2}) is as follows:

To prove that RLR with normalized relations generalizes PRA, we first define relation chains and describe some of their properties.

D_{1}(_{0},_{1}),…, _{m}_{m −}_{1}, _{m}_{i}_{i+}_{1}, the second logvar of _{i}_{i+}_{1}, _{0},…,_{m}_{i}_{j}

E_{1}(_{2}(_{1}(_{2}(_{1}(_{2}(_{3}(

D

E_{1}(_{1}, _{2}) ∗ _{2}(_{3}, _{1}) corresponds to a relations chain as the order _{2}(_{3}, _{1}), _{1}(_{1}, _{2}) is a relations chain.

It follows from RLR definition that re-ordering the literals in each of its WFs does not change the distribution. For any WF whose formula corresponds to a relations chain, we assume hereafter that its literals have been re-ordered to match the order of the corresponding relations chain.

D

formulae of WFs correspond to relations chains,

for each WF, the second logvar of the last atom is

For RLR models, to evaluate a formula, one may have nested loops over logvars of the formula that do not appear in the target atom or conjoin all literals one by one and then count. WFs of RC-RLR, however, can be evaluated in a special way. To evaluate a formula in RC-RLR, starting from the end (or beginning), the effect of each literal can be calculated and then the literal can be removed from the formula. Algorithm

_{1}(_{0},_{1}) ∗ _{2}(_{1},_{2}) ∗ … ∗ _{l}_{l}_{−1},_{l} |

1: |

2: _{x0}|) |

3: _{1}(_{0},_{1}) ∗ _{2}(_{1},_{2}) ∗ … ∗ _{l}_{− 1}(_{l}_{− 2},_{l}_{− 1}) |

4: _{l}_{− 1} = _{l} ( |

5: _{xl} |

6: _{l} |

7: _{xl−1} |

8: _{xl} |

9: _{l} |

10: _{l}_{l − 1} |

11: _{l} |

When _{0} ∈ _{0}. Therefore, in this case, the algorithm returns a vector of ones of size |Δ_{x0}|. Otherwise, the algorithm first evaluates _{1}(_{0},_{1}) ∗ _{2}(_{1},_{2}) ∗ … ∗ _{l −}_{1}(_{1 −}_{2},_{l −}_{1}) using a recursive call to the _{l −}_{1,} such that for a _{xl−1}, _{l −}_{1}[_{1}(_{0},_{1}) ∗ _{2}(_{1},_{2}) ∗ … ∗ _{l −}_{1}(_{l −}_{2},_{xl}, we sum _{l −}_{1}[_{xl−1} such that _{l}_{l}_{l}_{l −}_{1}(_{l −}_{1}[_{1}[

P

P_{1}(_{0},_{1}) ∗ _{2}(_{1},_{2}) ∗ … ∗ _{l}_{l}_{− 1},_{l}_{l}_{l}_{l}_{l}_{l}_{l}_{l}_{l}_{l−1} ∈Δx_{l−1} we can evaluate eval_{l−1}(X_{l−1}) = ∑ _{Xl∈Δxl} _{l}(X_{l−1}, X_{l}) * _{l}(X_{l}) separately and replace R_{1}(_{l}_{−1},_{1}) ∗ _{l}_{l}_{l}_{−1} (_{l−}_{1}), thus getting _{1}(_{0},_{1}) ∗ _{2}(_{1},_{2}) ∗ … ∗ _{l}_{− 1}(_{l}_{− 2},_{l}_{− 1}) ∗ _{l}_{− 1} (_{l}_{− 1}). The same procedure can compute

P

P_{0} _{1} _{l} be a path relation. We create a relation atom _{i}_{i}_{−1}, _{i}_{i−1} _{i} resulting in relations _{1}(_{0}, _{1}), _{2}(_{1}, _{2}), …, _{l}_{l}_{− 1}, _{l}_{i}_{i}

E^{−1}(

Having a binary predicate _{x}_{y}_{,} such that

T

P_{0}, 𝒫 ℛ_{0}⟩, …, ⟨_{k}_{k}_{i}_{i}_{i}_{,} and this formula is by construction guaranteed to correspond to a relations chain. We construct an RC-RLR model whose WFs are _{0}, _{0}⟩, …, ⟨_{k}_{k}_{i}_{i}_{i}_{i}_{l}_{Rl} (_{x0}, while Algorithm _{x0}|. Dividing _{l}_{Rl}(E

E

Consider computing _{2}) according to an RC-RLR model with the above WFs, where all existing papers and _{0}. The second WF evaluates to 0. The third WF evaluates to _{2} ∗ ^{−1} have been normalized to _{5} and _{6} as in Figure _{3} ∗ (_{3}, _{5}) ∗ _{5}, _{2}), _{4}, _{5}) ∗ _{5}, _{2}) and _{4}, _{6}) ∗ _{6}, _{2}), and _{5}, _{6}) ∗ _{6}, _{2}). As it can be viewed from Example 2, after creating the equivalent RC-RLR model and normalizing the relations using RWC normalization, all WPRs evaluate to the same value as their corresponding WF, except the last WF. The _{3} = _{3} ∗ _{i}_{i}_{2}) according to the RC-RLR model above will be the same as the PRA model in Example 2.

The restrictions imposed on the formulae by path relations in PRA reduce the number of possible formulae to be considered in a model compared to RLR models. However, there may still be many possible path relations, and considering all possible path relations for a PRA model may not be practical.

Lao and Cohen (

Lao et al. (

Both restrictions in data-driven path finding can be easily verified for RC-RLR formulae and the set of possible formulae can be restricted accordingly. Furthermore, during parameter learning, a Laplacian prior can be imposed on the weights of the weighted formulae. RC-RLR models learned in this way correspond to PRA models learned using data-driven path finding. Therefore, data-driven path finding can be also considered as a structure learning algorithm for RC-RLR. With the same reasoning, several other random walk strategies can be considered as structure learning algorithms for RC-RLR, and

An advantage of PRA models over RLR models is their efficiency: there is a smaller search space for WFs, and all WFs can be evaluated efficiently. Such efficiency makes PRA scale to larger domains where models based on the weighted rule learning such as RLR often have scalability issues. It also allows PRA models to scale to and capture features that require longer chains of relations. However, the efficiency comes at the cost of losing modeling power. In the following subsections, we discuss such costs.

Since PRA models restrict themselves to relations chains of a certain type, they lose the chance to leverage many other WFs. As an example, to predict _{1},_{2}) for the reference recommendation task, suppose we would like to recommend papers published a year before the target paper that have been cited by the papers published in the same year as the target paper. Such a feature requires the following formula: _{1}, _{2}, _{2}_{2} (the second logvar of the target atom) is appearing twice in the formula, thus violating the last condition in Definition 3. While restricting the formulae to the ones that correspond to relations chain may speed up learning and reasoning, it reduces the space of features that can be included in a relational learning model, thus potentially decreasing accuracy.

One issue with PRA models is the difficulty in including unary atoms in such models. As an example, suppose in Example 2, we would like to treat conference papers and journal papers differently. For an RLR model, this can be easily done by including Conference(^{−1} relations (Lao et al., ^{−1} giving the other papers with the same type as the paper in the left of the arrow. However, this is limiting and does not allow for, e.g., treating conference and journal papers differently.

Atoms with more than two logvars are another issue for PRA models because they restrict their models to binary atoms. While any relation with more than two arguments can be converted into several binary atoms, the random walk strategies used for PRA models (and the probabilities for making these random steps) make it unclear how atoms with more than two logvars can be leveraged in PRA models.

For any subpath _{x}_{y}_{1}, _{2}) may be _{1}, _{2}). To extend PRA models to be able to leverage such continuous atoms, one has to change line 8 in Algorithm _{l}

For many types of continuous atoms, however, it is not straightforward to extend PRA models to leverage them. As an example, suppose we have an atom Temperature(

Normalizing the relations is often ignored in models based on weighted rule learning. For the most part, this ignorance may be because several of these models cannot handle continuous atoms. Given that PRA is a special form of weighted rule learning models such as RLR with RWC normalization, not normalizing the relations may be the reason why in Lao et al.’s (Lao et al.,

The type of normalization used in PRA (RWC) may not be the best option in many applications. As an example, suppose for the reference recommendation task we want to find papers similar to the query paper in terms of the words they use. Let ^{−1} (^{−1} (_{1}(_{2}(_{2}. It is straightforward to include the latter score in an RLR model: one only has to multiply the formulae using word information by _{2}(^{−1} (

Evaluating the formulae in models based on weighted rule learning is known to be expensive, especially for relations with lower sparsities and for longer formulae. In practice, approximations are typically used for scaling the evaluations. Since formulae in RC-RLR correspond to path relations, these formulae can be approximated efficiently using sampling techniques developed within graph random walk community such as fingerprinting (Fogaras et al.,

With abundance of relational and graph data, statistical relational learning has gained great amounts of attention. Three main relational learning paradigms have been developed during the past decade and more: weighted rule learning, graph random walk, and tensor factorization. These paradigms have been mostly developed and studied in isolation with few works aiming at understanding the relationship among them or combining them. In this article, we studied the relationship between two relational learning paradigms: weighted rule learning and graph random walk. In particular, we studied the relationship between relational logistic regression (RLR), one of the recent developments in weighted rule learning paradigm, and path ranking algorithm (PRA), one of the most well-known algorithms in graph random walk paradigm. Our main contribution was to prove that PRA models correspond to a subset of RLR models after row-wise count normalization. We discussed the advantages that this proof provides for both paradigms and for statistical relational AI community in general. Our result sheds light on several issues with both paradigms and possible ways to improve them.

SK did this work under supervision of DP.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.