^{1}

^{*}

^{2}

^{3}

^{2}

^{1}

^{2}

^{3}

Edited by: Amy Loutfi, Örebro University, Sweden

Reviewed by: Nicos Angelopoulos, Cardiff University, United Kingdom; Leonardo Trujillo, Instituto Tecnológico de Tijuana, Mexico

This article was submitted to Computational Intelligence in Robotics, a section of the journal Frontiers in Robotics and AI

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

We consider the problem of learning generalized first-order representations of concepts from a small number of examples. We augment an inductive logic programming learner with 2 novel contributions. First, we define a distance measure between candidate concept representations that improves the efficiency of search for target concept and generalization. Second, we leverage richer human inputs in the form of advice to improve the sample efficiency of learning. We prove that the proposed distance measure is semantically valid and use that to derive a PAC bound. Our experiments on diverse learning tasks demonstrate both the effectiveness and efficiency of our approach.

We study the case of learning from

Our work has two key differences. First, we aim to learn an “easily interpretable,” “explainable,” “decomposable,” and “generalizable” concepts as first-order Horn clauses (Horn,

We propose ^{1}

We make the following key contributions:

We derive a new distance-penalized scoring function that computes definitional distances between concepts, henceforth termed as “conceptual distance.”

We treat the human advice as an inductive bias to accelerate learning. Our ILP learner actively solicits richer information from the human experts than mere labels.

Our theoretical analyses of G

We show a PAC analysis of the learning algorithm based on Kolmogorov complexity.

We demonstrate the exponential gains in both sample efficiency and effectiveness of G

Our approach to

Our problem setting differs from the above in that it requires learning from sparse examples (possibly one). Lake et al. (

Background knowledge in ILP is primarily used as search bias. Although the earliest form of knowledge injection can be found in explanation-based approaches (Shavlik and Towell,

The idea of augmented learning with human guidance/knowledge has also been extensively studied in the context of evolutionary computation. Interactive evolutionary systems (Eiben and Smith,

We are inspired by a teacher (human) and student (machine) setting in which a small number of demonstrations are used to learn generalized concepts (Chick,

The input to G

Consider the following example input to the G

Concept 𝕃 (

We aim to learn the optimally generalized (decomposable) representation of the concept (𝕃 in the context of the aforementioned example) referred by the one/few instances that were passed to G

^{k}}

Note that these definitions allow for the reuse of concepts, potentially in a hierarchical fashion. We believe that this is

The generalization must be noted. The last argument of the _{s}, _{s}). However, the number of such dimensional parameters can vary across different concepts. Hence to maintain generality of representation format during implementation, we push the dimensional parameters of the learnable concept into the body of the clause.

A specific case of our concept learning (horn clause induction) framework could be plan induction from sparse demonstrations. This can be achieved by specifying time as the last argument of both the state and action predicates. Following this definition, we can allow for plan induction as shown in our experiments. Our novel conceptual distance is clearer and more intuitive in the case of plans as can be seen later.

_{j}

Decomposable allows for an unknown concept to be constructed as a composition of other known concepts. G

_{b}), _{j}},

An obvious question that arises here is why _{j}} are implicitly understood and defined as a part of the framework itself. This argument applies to the semantics of the “constraint predicates” (described later) as well.

Finally, before we discuss the details of the learning methodology, let us briefly look into a motivating, and presently relevant, real-world scenario that represents our problem setting.

A motivating real-world scenario for concept induction. The concept learnt by the AI agent is “Divert()”.

The above example is solely to motivate the potential impact of our problem setting and the proposed solution. For an explanation of different components and aspects of G

ILP systems perform a greedy search through the space of possible theories. Space is typically defined, declaratively, by a set of mode definitions (Muggleton, ^{−1} = {^{−1} is a mapping from occurrences of ground terms in

The user-provided advice forces the learner to learn longer theory, hence the

There is only one (a few) positive training example(s) to learn from and

Most learners optimize some form of likelihood. For a candidate theory

where ^{*} is the optimal theory, τ is the set of all candidate theories, and _{X} and the original example

Conceptual distance, _{X},

Our solution is to employ _{i} and the instance given as input,

_{i}/θ_{X} and _{T} and π_{X}. To obtain NCD, we execute string compression (lossy or lossless) on each of the plans as well as the concatenation of the two plans to recover the compressed strings _{T}, _{X}, and _{T,X}, respectively. NCD between the plans can be computed as,

The conceptual distance between a theory _{X}, _{T}, π_{X}). This entire computation is performed by the

Highlevel overview of our Guided One-shot Concept Induction (G

_{X},

The search space in ILP is provably infinite. Typically, language bias (modes) and model assumptions (closed world) are used to prune the search space. However, it is still intractable with one (or few) examples. So, we employ human expert guidance as constraints that can directly refine an induced theory, acting as a strong inductive bias. Also, we are learning decomposable concepts (see Definition 2). This allows us to exploit another interesting property. Constraints can now be applied over the attributes of the known concepts that compose the target concept, or over the relations between them. Thus, G

If the human inputs (constraints) are provided upfront before learning, it can be wasteful/irrelevant. More importantly, it places an additional burden on the human. To alleviate this, _{b}, _{s}, _{b}, _{s}) _{b}, _{s},

Thus, we are optimizing the constrained form of the same objective as Equation (1), which aims to prune the search space. This is inspired by advice elicitation approaches (Odom et al.,

_{0} by variablizing the “

Guided one-shot concept induction.

1: |

2: |

3: |

4: |

5: Use _{ℓ−1} as initial model |

6: Candidate theory _{ℓ}← S_{ℓ−1}) |

7: |

8: |

9: |

10: |

11: Score |

12: _{ℓ} < _{ℓ−1} |

13: Retain |

14: |

15: _{ℓ} = _{ℓ−1} |

16: |

_{X} for a test instance _{X} (^{+} (^{−}, [i.e.,

Difference in evaluation of a concept instance across different learning paradigms.

NCD δ(_{Y} and _{Z} be two theories, with same parameterizations (i.e., same heads). Let _{Y}/θ and _{Z}/θ be their groundings with identical parameter values θ. Our learned theories are equivalent to planning tasks. Assuming access to a planner Π() which returns _{Y}/θ) and _{Z}/θ), the two plan strings with respect to the instantiations of concepts are represented by _{Y} and _{Z}, respectively.

_{Y} _{Z}, _{Y} _{Z}

Proof Sketch for Proposition 1: Let _{Y} and _{Z} be 2 induced consistent first-order Horn clause theories, which may or may not represent the same concept. Let θ be some substitution. Now let _{Y}/θ and _{Z}/θ be the grounded theories under the same substitution. This is valid since we are learning horn clause theories with the same head, which indicates the target concept being learned. As explained in the paper, a theory is equivalent to a planning task. We assume access to a planner Π(), and we get plan strings _{Y}/θ) and _{Z}/θ) with respect to the planing tasks _{Y}/θ and _{Z}/θ.

Friend et al. (

Let there be a theory ^{*}, which represents the optimal generalization of a concept _{Y}, _{Z}〉 = 0, that is, they represent the same concept _{Y} ⇄ _{Z}. Thus, both _{Y}/θ and _{Z}/θ will generate the same set of plans as ^{*}, since they will denote the same planning tasks (by structural induction). Thus,

up to equivalence of partial ordering in planning. Let π^{*}() be a minimum length plan in a set of plans Π(). Let ^{*}(^{*}(^{*}(^{*}(

_{Y}/_{Z}/_{TY} _{TZ}, _{T}() _{T} _{TY}, _{TZ}).

Proof Sketch for Proposition 2: This can be proved by considering the connection between NID and the distributions induced by the concept classes we are learning. NID is defined as

However, if we consider a

where _{j} > 0 are constants and _{j}(

Given two grounded theories _{Y}/θ and _{Z}/θ, let _{TY/θ}, _{TX/θ} be the respective distributions when learning probabilistic logic rules. Now let us define the semantics of a distribution _{T/θ} in our case: _{T/θ} =

where

Proposition 2, on the other hand, proves that our proposed metric is not limited to our specific scenario. It positions our work in the context of known statistical distance metrics and establishes its credibility as a valid solution. It proves how in a nondeterministic setting, that is, probabilistic logic formulation, our proposed metric generalizes to Kolmogorov–Smirnov statistic.

PAC analysis of G

^{*}

^{*} -

Proof Sketch for Proposition 3:

To begin with, we are interested in regret bounds for the initially learned hypothesis by the ILP learner

where ϵ is the regret. Now, our ILP learner induces ^{i} ≤

where distance ^{i} = ^{j}^{i} >>

^{(|𝕌|−1)} × ^{t}_{q}

Proof Sketch for Proposition 4: The proof is straightforward and hence we present it in brief. In our setting to show that,

(where 2^{(|𝕌|−1)} × ^{t}_{q} is the maximum number of constraint literals possible, since 𝕌 is the library of constraint predicates) consider that the number of constraint predicates that can be picked up at any iteration is 2^{(|𝕌|−1)}. To form constraint literals, we need to tie arguments to existing logical variables in the current theory. We have defined ^{(|𝕌|−1)} × ^{t}_{q}. So if the distribution induced on the constraint literals by human advice _{ℓ} − _{ℓ−1}|.

Observe that if at each layer ℓ ≤ _{ℓ} should at least be the change in conceptual distance.

The proof is quite straightforward and hence we just discuss the brief idea behind it. Our input is sparse (one or few instances). G

We next aim to answer the following questions explicitly:

Our framework extends a Java version of Aleph (Srinivasan,

We compare G

Note that human guidance was obtained from distinct human experts for every run. The expertise level of all the advice providers was reasonably at par since they were chosen from the same pool of candidates with zero visibility and knowledge of our proposed framework. However, for all the human advice providers we assumed a basic level of knowledge in geometry or fundamentals of logic and reasoning. Additionally, we also explained each of the experimental domains to the human participants to create a similar level of awareness about the domains among all of them.

We employ four domains with varying complexity. Note that we have selected the below domains based on multiple considerations. The domain encoding need to be such that target concepts can be learned in a modular fashion (i.e., decomposable). Thus, the first two domains are structure construction domains either spatial (Minecraft) or chemical/molecular (CheBI). Spatial structures are implicitly modular (such as the 𝕃-structure in

Instances of spatial concepts in Minecraft.

Results for one-shot concept learning.

Minecraft | G |
0.85 | 5.5 ± 3 |

ILP | 0.35 | – | |

Assembly | G |
0.65 | 16.5 ± 4 |

ILP | 0.2 | – | |

ChEBI | G |
0.615 | 13.1 ± 2.13 |

ILP | 0.45 | – | |

Barman | G |
0.7 | 10.5 ± 5.4 |

ILP | 0.51 | – |

Query efficiency is an important consideration in any learning paradigm that leverages human guidance, since controlling the cognitive load on the human expert is critical. So, in general, the observed average query numbers being reasonably low across all domains corroborates our theoretical advice complexity (section 3.4.2).

Learning curves for varying sample size to compare the sample efficiency of Guided One-shot Concept Induction (G

Results of ablation study on Minecraft domain. Relative contribution of our distance-penalized score vs. human guidance.

The most important conclusion from the experiments is that when available, the guidance along with the novel score leads to a jump-start, better slope and in some cases, asymptotically sample efficient with a fraction of the number of instances needed than merely learning from data.

Another important aspect to note here is that our experimental setup did not attempt to ensure in any way that the quality of guidance provided by the human participants is optimal. The formulation of the objective function, itself, in G

As explained earlier and shown in Equation (3), human advice and conceptual distance deal with two distinct aspects of the search process. Human advice controls the size and nature of the search space while conceptual distance ensures the quality of the candidates. Advice and distance have a balancing effect on each other, and thus, it is our novel conceptual distance that makes G

Also, the nature of human advice in our setting is of choosing the most useful set of “constraint predicates” among the set of candidate constraints. Now the candidates are generated by G

Our ablation study in

We developed a human-in-the-loop one-shot concept learning framework in which the agent learns a generalized representation of a concept as FOL rules, from a single (few) positive example(s). We make two specific contributions: deriving a new distance measure between concepts and allowing for richer human inputs than mere labels, solicited actively by the agent. Our theoretical and experimental analyses show the promise of G

All datasets generated for this study are included in the article/supplementary material.

MD and SN contributed equally to the ideation. MD and NR led the empirical evaluation. MD, NR, SN, and JD contributed nearly equally to the manuscript preparation. All authors contributed to the article and approved the submitted version.

MD was employed by the company Samsung R&D Institute India - Bangalore. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The authors acknowledge the support of members of STARLING lab for the discussions.

This article was released as a preprint to arXiv as part of the StarAI 2020 workshop (Das et al.,

^{1}Our algorithm can learn from one (few) example(s). We specify the number of examples in our evaluations.