^{1}

^{2}

^{1}

^{2}

^{3}

^{*}

^{2}

^{1}

^{2}

^{3}

Edited by: Kristian Kersting, Technische Universitt Darmstadt, Germany

Reviewed by: Nicola Di Mauro, Universitá degli studi di Bari Aldo Moro, Italy; Elena Bellodi, University of Ferrara, Italy; Tarek Richard Besold, City University of London, United Kingdom

This article was submitted to Computational Intelligence, a section of the journal Frontiers in Robotics and AI

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

This paper shows how methods from statistical relational learning can be used to address problems in grammatical inference using model-theoretic representations of strings. These model-theoretic representations are the basis of representing formal languages logically. Conventional representations include a binary relation for order and unary relations describing mutually exclusive properties of each position in the string. This paper presents experiments on the learning of formal languages, and their stochastic counterparts, with unconventional models, which relax the mutual exclusivity condition. Unconventional models are motivated by domain-specific knowledge. Comparison of conventional and unconventional word models shows that in the domains of phonology and robotic planning and control, Markov Logic Networks With unconventional models achieve better performance and less runtime with smaller networks than Markov Logic Networks With conventional models.

This article shows that statistical relational learning (Getoor and Taskar,

Formal languages are sets of strings or probability distributions over strings (Hopcroft and Ullman,

However, there is an important, unexamined assumption in much of the grammatical inference literature. Formal languages depend on an

In this article, we apply finite model theory (Hodges,

Specifically, this article re-examines the unary relations that make up word models. These are typically assumed to be disjoint: in a string with three positions like

However, in natural languages (and often in robot planning), events in a sequence can share certain properties. For instance, in the word

There is already precedent for the importance of the representations of words for understanding the complexity of subregular formal languages (Thomas,

We thus take advantage of domain-specific knowledge to model strings with carefully chosen sets of unary relations that capture salient properties. We show that doing so concretely simplifies the formal languages that are often learning targets, and makes it possible to reliably infer them with

This article is organized as follows. Section 2 reviews model theory and

The remainder of the article details our experiments and contributions with Markov logic network (

Section 7 presents the first experimental contribution of this paper: an empirical demonstration on a toy problem that a

Our second contribution comes from the domain of phonology. We examine

The third contribution is found in Section 9, where statistical relational learning is applied for the first time on a problem of (deliberate) cooperative interaction between heterogeneous robots. The first objective here is first to demonstrate how the same theory that helps us reason about words and stress, can also apply to engineering problems of planning and decision making in robotics; the second objective is to show how the use of unconventional models can both analytically and computationally facilitate the analysis of cooperative interaction between autonomous agents. The case study featured in Section 9 involves an aerial vehicle, working together and physically interacting with a ground wheeled robot, for the purpose of allowing the latter to overcome obstacles that it cannot by itself. The focus in this case study is not on learning different ways in which the vehicles can interact with each other — not on the planning of the interaction per se; this can be a subject of a follow-up study.

Instances of problems where physical interaction between autonomous agents has to be coordinated and planned to serve certain overarching goals, are also found in the context of (adaptive) robotic-assisted motor rehabilitation, which to a great extent motivates the present study. In this context, humans and robotic devices may interact both physically and socially, in ways that present significant challenges for machine learning when the latter is employed to make the robots customize their behavior to different human subjects and different, or evolving, capability levels for the same subject. One of the most important challenges faced there is that one does not have the luxury of vast amounts of training data. The algorithms need to learn reliably and fast from

The last section 10 concludes.

Model theory studies objects in terms of mathematical logic (Enderton, ^{1}

Model signatures define a collection of _{i}-ary relations _{i}, for 1 ≤ _{i} ∈ ℕ. A structure is therefore denoted _{1}, …, _{m}〉 ∈ ℜ, in other words, structures are specific instantiations of some particular signature. Such instantiations are referred to as _{C} = {_{1}…_{n})∣_{i} ∈

A signature gives rise to a

We adapt the presentation of De Raedt et al. (_{1}, _{2}, …_{n}} taking values in some space _{k}. There is one such potential function ϕ_{k} for every clique in the graph, and the clique associated with potential function ϕ_{k} is denoted {_{{k}}.

If a particular valuation of

Then the joint probability distribution of the network can be factored over the network's cliques in the form

Usually, a log-normal representation for this joint probability distribution is utilized, in the form of an exponential of a weighted sum of real-valued feature functions _{j}(_{j} for each possible valuation of the state _{j} = logϕ_{k}(_{{k}}). In this form, the joint probability distribution is

A _{i}, _{i}), where _{i} is a first order formula and _{i} is a real number. Note that an _{1}, …, _{m}〉 ∈ ℜ with signature 〈𝔇; ℜ〉, there is a node for every possible grounding of an atomic predicate _{i}, and a feature _{j} for each possible grounding of a formula _{i}. In fact, despite being different depending on the choice of _{i}, all of these _{i}, namely _{j}(_{j} in _{Fj}. Thus, the joint distribution of the ground Markov network generated by the

Since each structure _{i}, _{i}), (2) essentially expresses the probability that the

From a learning perspective, natural problems include finding either the weights of given formulas or both the weights and the formulas themselves (Domingos and Lowd,

For any parametric model

In principle, the weights of the formulas of a _{i}, _{i}) in the

Then standard optimization techniques, such as gradient descent, the conjugate gradient, and Newton's method, or variants thereof, can be used to find weights corresponding to the

Strings (words) are familiar: they are sequences of symbols and formal languages are sets of strings. Formal language theory studies the computational nature of stringsets (Hopcroft and Ullman,

In this section, we provide formal background and notation on strings, formal languages, finite-state automata, logic, and model theory. Connections among them are made along the way.

In formal language theory, the set of symbols is fixed, finite and typically denoted with Σ. The free monoid Σ^{*} is the smallest set of strings which contains the unique string of length zero λ (the identity element in the monoid) and which is closed under concatenation with the symbols from Σ. Thus, if ^{*} and σ ∈ Σ then ^{*} where

For all ^{*}, if _{1}σ_{2}…σ_{n} is a _{k}(_{k}(

We sometimes make use of left and right word boundary markers (⋊ and ⋉, respectively), but do not include those in Σ.

Formal languages are subsets of Σ^{*}. For example suppose Σ = {_{a}. _{a} is a subset of Σ^{*}. It is useful to identify every formal language ^{*} with its characteristic function _{a}, but if _{a}. This shift in perspective provides a direct parallel to the study of probability distributions over Σ^{*}. These are expressed as functions whose co-domains are the real-interval [0, 1]. Formally they are ^{*} → [0, 1] such that ^{*}. We use the term ^{*} identified with ^{*} → {0, 1} and the term ^{*} identified with ^{*} → [0, 1]. We use the

One important problem studied addressed in formal language theory is the membership problem, which is the problem of deciding whether an arbitrary string in Σ^{*} belongs to a categorical stringset. A closely related problem is determining the probability of an arbitrary string in a stochastic stringset. In each case, the problem is, for all ^{*}, to compute the output of

These functions ^{*} → {0, 1} this means the evidence only contains words ^{*} → [0, 1] which are probability distributions this usually means the evidence is obtained according to independent and identically distributed (i.i.d.) draws from

Both the membership and learning problems are closely related to the study of formal grammars. It is well-known, for instance, that if the functions

Informally,

DEFINITION 1. A real-weighted deterministic finite-state acceptor (_{0}, δ, ρ, α):

Σ | is a finite alphabet of symbols, |

is a finite set of states, | |

_{0} ∈ |
is the designated start state, |

δ: |
is the transition function, |

ρ: |
is a real-valued weight, and |

α: |
is a function mapping each state to a real-valued weight. |

A (^{*} reading them from left to right, and transitioning from one state to another upon reading each of the symbols in the input string.

Each ^{*} → [0, 1]. The function _{0}, δ, ρ, α), and henceforth denoted _{A}, can be derived as follows. Let a dot (·) denote real number multiplication, a backslash (\) set difference. Define the function “process”

recursively as follows:

In other words, π(_{A} can be defined as

Note that _{A}(

The recursive path of computation given by π indicates how the membership problem for any stringset definable with a

If for each state

then _{A} computes a probability distribution over Σ^{*} (de la Higuera, _{A}(

As an example, let Σ = {

Note the final product above correlates with the probabilities along the “path” taken by

The

Next we turn to _{A} identifies a characteristic function of a regular categorical stringset ^{*}. Strings _{A}(_{A}(_{A}(

As an example, let Σ={H, _{LHOR} shown in Figure

The _{LHOR} is the basis for the case study in section 8.

Witness the following computation of _{LHOR} on input LL.

Thus _{LHOR} rejects this string. On the other hand, _{LHOR} accepts LĹ.

The reader may verify that _{LHOR} also rejects ĹH and accepts L

There are learning results for the general case of learning any regular stringset and results for learning subclasses of regular stringsets. An early result was that regular categorical stringsets cannot be learned exactly from positive evidence only (Gold,

Each

Regular stringsets can also be defined logically. Traditional logic is used for categorical stringsets and weighted logic for stochastic stringsets (Droste and Gastin,

In order to define a stringset with a logical expression, the logical expressions need to be able to refer to aspects and properties of the string. This is where model theory becomes relevant. Model theory makes explicit the representation of objects. Combined with a logic, such as

For example, consider the unweighted logical expression shown below, which is read as “for all

In plain English, this means “Well-formed words do not contain the letter

In general, the interpretation of φ depends on what the atomic predicates are in the

For the sake of this analysis let Σ = {^{*}. Then following Rogers and Pullum (^{⊲}), which is given by the signature 〈𝔇; ⊲, _{a}, _{b}, _{c}〉 where ⊲ is the binary ordering relation _{σ} is a unary relation denoting which elements are labeled σ.

Contrast this with another conventional model for words: the Precedence Word Model (𝔐^{<}). This model (structure) has signature 〈_{a}, _{b}, _{c}〉 where < is the binary ordering relation _{σ} are the same as in 𝔐^{⊲}.

Under both model signatures, each string ^{*} of length _{1}σ_{2}…σ_{k} has domain _{σ} = {_{i} = σ}. The difference between 𝔐^{⊲} and 𝔐^{<} is the ordering relation. Under the successor model 𝔐^{⊲}, the ordering relation is ⊲ ^{<}, the ordering relation is <

Figure _{a} = {2}, _{b} = {3, 4}, and _{c} = {1}. While the unary relations in these models illustrated in Figure

Successor and precedence models for word

It follows that certain conditions must be met for structures to be interpretable as strings. In both theories, for a structure _{σ}]— and at most one label—again, mathematically expressed as

For example, the structure ^{⊲} or 𝔐^{<}. Structure

Depending on the choice of model and logic different classes of stringsets arise (Büchi,

Subregular Hierarchies from a model-theoretic perspective.

We have already defined regular stringsets as those characterized by a

DEFINITION 2 (Locally Threshold Testable Thomas,

A stringset ^{*} and

In other words, membership of a string _{t, k} stringset is determined solely by the number of occurrences of each ^{⊲} are exactly the Locally Threshold-Testable stringsets.

DEFINITION 3 (Non-Counting). A stringset ^{*}, if ^{k + 1}

McNaughton and Papert (^{<} are exactly the Non-Counting stringsets. They also prove languages in the Non-Counting class are exactly those definable with star-free generalized regular expressions and exactly those obtained by closing LT stringsets under concatenation. Hence this class also goes by the names “Star-Free” and “Locally Testable with Order.” The Non-Counting class properly includes the Locally Threshold Testable languages because the successor relation is

Finally, observe that stringsets that are regular but not Non-Counting typically count modulo some

DEFINITION 4 (Locally Testable Rogers and Pullum,

Language _{k}) iff there is some _{k}(⋊_{k}(⋊_{k}.

From a logical perspective, Locally Testable languages are ones given by a propositional calculus whose propositions correspond to factors (Rogers and Pullum, _{1}), which is the Boolean closure of _{k} class equals LTT_{1,k}.

DEFINITION 5 (Piecewise Testable). A language _{k}) iff there is some _{k}(_{k}(_{k}.

Piecewise Testable languages are ones given by a propositional calculus whose propositions correspond to subsequences (Rogers et al., _{1}), which is the set of Boolean closure of

DEFINITION 6 (Strictly Local Rogers and Pullum, _{k}) iff whenever there is a string _{1}, _{1}, _{2} and _{2} such that _{1}_{1}, _{2}_{2} ∈ _{1}_{2} ∈ _{k} for some

From a logical perspective, Strictly _{1}, which is the set of

From an automata perspective, the SL_{k} class of stringsets is represented by a _{k} reduces to a particular functions ρ and α. Such _{k} stringsets and such

DEFINITION 7 (Strictly Piecewise Rogers et al., _{k}) iff _{k}(_{k}(_{k}; equivalently,

From a logical perspective, Strictly Piecewise languages are ones given by a conjunction of negative propositions where propositions correspond to factors (Rogers et al., _{1}, which is the set of

While the subregular classes of stringsets in the above diagram exhibit different properties, the logical characterizations make the parallels between the two sides of the hierarchy clear. The Strictly Local and Strictly Piecewise classes are relevant to the experiments presented later.

For any two relational structures _{1}, the domain of _{2}, the domain of _{1} and for all _{ij} with _{1}, …, _{n}) ∈ _{1j} iff (_{1}), …_{n})) ∈ _{2j}.

For example under 𝔐^{⊲}, ^{<},

The lemma below is not difficult to prove.

LEMMA 1. ^{*},

Not only do these facts help make clear the similarities between substrings and subsequences observed in earlier works (Lothaire,

From a learning perspective, the characterizations place limits on what kinds of stringsets can be learned when learning systems rely on

It is known that for given _{k}, LT_{k}, PT_{k}, and LTT_{t,k} stringsets (García and Ruiz,

In many domains of interest—including natural language processing and robotic planning and control—stringsets are used to characterize aspects of the nature of system. While conventional word models may be sufficiently expressive for problems in these domains, they do not take advantage of

Here is a simple motivating example. If we restrict ourselves to the alphabet {

In the context of learning stringsets, these correspondences can, and should be, exploited. Current learning approaches based on automata are challenging since automata are best understood as processing individual symbols. On the other hand,

In the remainder of this paper we apply

The first case study serves as a sanity check. We expect that _{k} stringsets, and how to express this logic with

This knowledge is put to use in the subsequent case studies. The second case study is about the problem of learning an aspect of one's phonological grammar: how to assign stress (a type of prominence) in words. The stress pattern we describe is amenable to multiple logical descriptions. We offer two: one using a conventional precedence model and one with an unconventional precedence model. We show learning the stress pattern requires less data and less computation time if the unconventional model is used.

The third case study illustrates the application of

We used the software package Alchemy 2 (Domingos and Lowd,

For each experiment, there are two input files for weight learning. One is a

Our case studies are mostly limited to Strictly

In the same

If we were to consider a Strictly 2-Piecewise grammar, then the

The other input file to Alchemy 2 is a training database (

To represent each input string in the training database, each position in each string are indexed with a dummy denotation. We used capitalized letters of the alphabet and their combinations. These positions correspond to elements of the domain in a word model. Having that, we then list the properties of each position, and also the binary relations between positions.

For example, a string in the training dataset of the current example might be ‘^{⊲}.

In an unconventional word model, more than one unary relation may be listed for some node. How the set of strings for each training dataset was generated for each case study is described later.

When Alchemy 2 is run with these input files, it produces an output file which provides the learned weights for each statement in the

Each of our case studies required some specific treatment beyond the overall methods described above, which we discuss as appropriate in the subsequent sections.

This section shows that

The _{A} is one. Learning _{A} comes down to learning the ρ and α functions.

The parameter values of

start | 0.3333 | 0.3333 | 0.3333 | 0 |

a | 0.3000 | 0.2000 | 0.2000 | 0.3000 |

b | 0.2000 | 0.3000 | 0.2000 | 0.3000 |

c | 0.2000 | 0.2000 | 0.3000 | 0.3000 |

To see how well

We fed these training samples to two learning algorithms. One algorithm is the one mentioned in section 4, which uses the structure of

In natural language processing, the first approach is usually implemented in a way that incorporates

The formulas in the _{a}, _{b}, _{c}, _{initial}, _{final}〉. As such, the

The models output by these learning algorithms were compared in two ways: by comparing the probabilities of subsquent symbols directly in the trained models and by calculating the perplexity the models give to a test set.

For the first comparison, we converted the weights obtained in the

For instance, suppose we are given two constants

Let

We want to determine ρ(

Notice that _{aa}, and zero true grounding in all other formulas. Similarly, the world _{ab} and zero true grounding in all other formulas. Consequently, the ratio of the probabilities equals the ratio of the exponential of the weights of corresponding satisfied formulas, namely

Thus ρ(

The second method examined the perplexity of a data set. Perplexity is a measure of model performance utilized in natural language processing (Jurafsky and Martin, _{M}(σ_{i}∣σ_{1}, …, σ_{i−1}) denotes the probability that model

Low perplexity is an indication of model prediction accuracy.

The training data was randomly generated with the ^{*} that

Before training the

Table

Mean parameter values and perplexity obtained by the two learning algorithms on the training sets. Standard deviations are shown in parentheses.

Start | 0.3550 | 0.3150 | 0.3300 | 0 | 0.3433 | 0.2776 | 0.3790 | 9.5e-5 |

a | 0.2532 | 0.2278 | 0.1914 | 0.3276 | 0.3078 | 0.2028 | 0.2335 | 0.2560 |

b | 0.2354 | 0.3146 | 0.1418 | 0.3082 | 0.2643 | 0.3568 | 0.1596 | 0.2193 |

c | 0.1717 | 0.2202 | 0.2664 | 0.3417 | 0.1467 | 0.2360 | 0.3614 | 0.2259 |

Perplexity | 2,012.9 (1184.2) | 1,794.1 (966.9) | ||||||

Start | 0.3220 | 0.3360 | 0.3420 | 0 | 0.3288 | 0.3394 | 0.3318 | 3.0e-5 |

a | 0.2914 | 0.1989 | 0.2101 | 0.2996 | 0.3588 | 0.1950 | 0.1913 | 0.2549 |

b | 0.1949 | 0.2815 | 0.2267 | 0.2970 | 0.1930 | 0.3262 | 0.2250 | 0.2559 |

c | 0.2040 | 0.2200 | 0.3009 | 0.2751 | 0.2097 | 0.2215 | 0.3382 | 0.2307 |

Perplexity | 1,119.4 (272.9) | 1,090.9 (192.8) | ||||||

Start | 0.3190 | 0.3590 | 0.3220 | 0 | 0.3279 | 0.3466 | 0.3255 | 2.3e-5 |

a | 0.2751 | 0.2214 | 0.1918 | 0.3118 | 0.3406 | 0.2120 | 0.1909 | 0.2565 |

b | 0.1961 | 0.3008 | 0.2047 | 0.2985 | 0.1999 | 0.3442 | 0.2101 | 0.2457 |

c | 0.2046 | 0.2057 | 0.2866 | 0.3030 | 0.2151 | 0.1961 | 0.3440 | 0.2449 |

The results of Table

Thus,

This case study compares conventional and unconventional word models in light of the problem of phonological well-formedness. It is widely accepted in phonology that in many languages the syllables of a word have different levels of prominence, evident either from acoustic cues or perceptual judgments (Chomsky and Halle,

The position of stress in a word is predictable in many languages, and a variety of stress patterns have been described (van der Hulst et al.,

Predictable stress patterns can be broadly divided into two categories: bounded and unbounded. In _{k} where

In some languages, an important factor for predicting stress is ^{2}

Hayes (

Let Σ = {L,H,Ĺ,

Some well-formed words in

Ĺ | LĹ | LLĹ | LLLĹ | LL |

L |

The _{LHOR} in Figure

The well-formedness of a word in LHOR can be analyzed in terms of its subsequences of size 2 or smaller. The permissible and forbidden 2-subsequences in LHOR are shown in Table

2-subsequences in LHOR (Strother-Garcia et al.,

Permissible | Forbidden | ||||||
---|---|---|---|---|---|---|---|

LL | HH | LĹ | HL | H |
ĹL | HĹ | ĹĹ |

LH | L |
ĹH | Ĺ |

Heinz (^{k}L and L^{k}Ĺ belong to ^{k}Ĺ does not. LHOR is therefore not SL for any ^{k}Ĺ belongs to _{k} for any

Furthermore, Heinz (^{3}

Strother-Garcia et al. (

Consider the conventional Precedence Word Model 𝔐^{<} (Section 4.5) with Σ = {L, H, Ĺ, ^{<} is thus 〈_{L}, _{H}, _{Ĺ}, _{H′}〉. The LHOR pattern can be defined with formula templates

For example, strings that satisfy _{HH′} contain the 2-subsequence H_{H′} contain the symbol

is true of string _{LHOR} is in 2-conjunctive normal form (

_{LHOR}.

The unconventional word model 𝔐 is similar to 𝔐^{<} with an important caveat: each domain element may belong to more than one unary relation. In other words, each position may bear multiple labels. Let Σ′ = {_{L}, _{H}, _{S} for

For example, if position _{H}(_{S}(

The unconventional model provides a richer array of sub-structures (section 4.7) with which generalizations can be stated. Given 𝔐 and Σ′, Table

Feature geometry for LHOR sub-structures of size 1.

Ĺ | |

H | |

L | |

σ | ∅ |

Strother-Garcia et al. (

The banned sub-structures are also simplified under 𝔐. Recall here that a stressed light is only permissible if it is the final syllable. Thus one of the banned sub-structures in the LHOR pattern is a stressed light followed by any other syllable, given by the formula _{Ĺσ}. Again, this structure is underspecified. It is a sub-structure of four of the forbidden 2-subsequences in Table

In a word with one or more heavy syllables, the stress must fall on the leftmost heavy. Consequently, a heavy syllable may not be followed by any stressed syllable. This is represented by the formula

Thus, LHOR can be described with a 1-CNF formula under 𝔐,

which contrasts with the 2-CNF formula φ_{LHOR} under 𝔐^{<}.

Formula ψ_{LHOR} refers to sub-structures of size 2 or less, which are analogous to 2- and 1-subsequences. The unconventional word model permits a statement of the core linguistic generalizations of LHOR without referring to a seemingly arbitrary list of subsequences.

Strother-Garcia et al. (

Here we describe the two

Conventional and unconventional word models for L

The different word models also determined a different set of formulas in each of the _{ab} =

For the _{ab} with

However, we do not think it is appropriate to include them all. A sentence of the form “_{ab} =

As mentioned, the LHOR pattern also _{a}. Thus for both the conventional and unconventional word models, we also included statements which require sub-structures in strings. Our initial efforts in this regard failed because Alchemy quickly runs out of memory as it converts all existential quantification into a CNF formula over all the constants in the database file. To overcome this hurdle, we instead introduced statements into the database file _{1}, _{2}, …_{n} which represented each string. In the conventional model, predicates

The

We generated data sets in six sizes: 5, 10, 20, 50, 100, and 250 strings. For each size, we generated ten different datasets. We generated training data of different sizes because we were also interested in how well the

To generate a training data set, we first randomly generated strings from length one to five inclusive from the alphabet Σ =

Table

Runtime of learning weights for linguistic statements.

5 strings | 0.29s | 0.27s |

10 strings | 0.51s | 0.51s |

20 strings | 1.94s | 1.89s |

50 strings | 10.28s | 10.66s |

100 strings | 58.76s | 49.43s |

250 strings | 8 min, 36.02s | 7 min, 57.48s |

Two types of evaluations were conducted to address two questions. Did the

First, to evaluate whether the _{LHOR}).

(G1) No syllables follow stressed light syllables.

(G2) No stressed syllable follows a heavy syllable.

(G3) There is at least one stressed syllable.

(G4) There is at most one stressed syllable.

We elaborate on the analysis for G2; the analyses for the rest is similar. For the conventional model, let two constants (positions)

These probabilities are all disjoint and sum to one i.e., _{1} + _{2} + _{3} + _{4} = 1. The probability of P_{1} is given explicitly as

Probabilities P_{2}, P_{3}, and P_{4} are calculated similarly. Let

Worlds _{2}, _{3}, _{4} (with probabilities _{2}, _{3}, _{4} respectively, in the similar way. Given two syllables, it is obvious that _{1} + _{2} is the probability that a stressed syllable comes after _{3} + _{4} is the probability that an unstressed syllable comes after _{i} the weight for formula _{i}, and _{i} in world

The closer this ratio is to zero, the higher the confidence on the statement that the

The analysis of the

Our second evaluation asked how much training is needed for each model to reliably learn the generalizations. Here we tested both models on small training samples. Specifically, we conducted training and analysis on 10 sets of 10 training examples and 10 sets of 5 training examples.

Prior to running the models, we arbitrarily set a threshold of 0.05. If the ratios calculated with the weights of the trained model were under this threshold, we concluded the model acquired the generalizations successfully. Otherwise, we concluded it failed. We then measured the proportion of training sets on which the models succeeded.

Given a training sample with 100 examples, the resultant ratios representing each generalization for the

Summary of ratios from one training sample with 100 examples.

(G1) | 0.0193 | 6e-6 | |

(G2) | 0.1030 | 7e-4 | |

(G3) | 4e-6 | 0.0185 | |

(G4) | 7.6e-6 | 6.4e-10 | |

(G4) | 0.0013 | 5e-10 | |

(G4) | 0.0023 | 5.7e-10 |

Both

With respect to the question of how much data was required to learn the LHOR pattern, we conclude that

Unconventional word models can potentially reduce the planning complexity of cooperative groups of

Consider a heterogeneous robotic system consisting of two vehicles: a ground vehicle (referred to as the

The heterogeneous robotic system considered in this section. The two robots can latch onto each other by means of a powered spool mechanism. By positioning itself on the other side of the fence, the quadrotor can act as an anchor point for the crawler, which will use its powered spool to reel itself up and over the fence to reach the other side.

The primary motivation in using an unconventional word model is enabling the heterogeneous system to autonomously traverse a variety of otherwise insurmountable obstacles (e.g., the fence in Figure

Table

Feature geometry for each state of the heterogeneous multi-robot system of Figure

a | |||

b | |||

c | |||

d | |||

t | |||

A | |||

B | |||

C | |||

D |

The grammar for the cooperative robot behavior is created based on three assumptions: (i) two vehicles cannot move at the same time — one has to stop for the other to start, (ii) the crawler is

The automaton that accepts strings of robot actions, along cooperative plans in which one robot moves at any given time instant.

To illustrate, the similarity and differences between the conventional and unconventional models, Table

Conventional and unconventional word models consistent with the word (plan)

The grammar of the language generated by the

Forbidden 2-factors constituting the strictly 2-Local grammar for the cooperative behavior of the heterogeneous robotic system of Figure

⋊b, ⋊d, ⋊B, ⋊D | |

⋊t | |

⋊A, ⋊B, ⋊C, ⋊D | |

a⋉, c⋉, t⋉, A⋉, C⋉ | |

aA, aB, aC, aD, bA, bB, bC, bD, cA, cB, cC, cD, dA, dB, dC, dD | |

Aa, Ab, Ac, Ad, Ba, Bb, Bc, Bd, Ca, Cb, Cc, Cd, Da, Db, Dc, Dd | |

aa, cc, ac, at, ca, ct, tt, tA, tC, AA, CC, At, AC, Ct, CA | |

bb, dd, bd, db, BB, DD BD, DB | |

tb, td | |

Bt, Dt | |

[ |
ad, AD |

[ |
cb, CB |

The formulas in the

For the unconventional model, the formulas can be categorized into three types as listed below, yielding a total of 39 statements. These statements were selected based on our own knowledge that the tether features do not interact with the vehicle and motion features, but that vehicle and motion features do interact, as only one vehicle was allowed to move at any point. Thus, the statements are (i) 9

The runtime of the weight-learning algorithm for the robot grammar is given in Table

Runtime for learning weights for robotic statements.

20 strings | 1 min, 3.89s | 35.7s |

50 strings | 8 min, 57.5s | 4 min, 16.3s |

100 strings | 3 h, 3.07 min | 28 min, 49.6s |

250 strings | 11 h, 46.8 min | 5 h, 6.57 min |

We generated strings for the training data-set, assuming that all transitions have the same probability. We considered training data sets of 5, 10, 20, 50, 100, and 250 strings. For each of the data sets of size 5 and 10, we generated 10 files. Due to the significantly longer list of statements that Alchemy had to assign weights to (and consequently longer runtime for training), we did not train it on multiple files of sizes 20, 50, 100, and 250.

The learning outcomes under the conventional and unconventional

Conventional model trained on 20 training strings.

⋊ | 0.496927 | 0.000815 | 0.493377 | 0.000592 | 0.001016 | 0.001278 | 0.000736 | 0.001084 | 0.001039 | 0.003137 |

a | 6.54E-05 | 0.999517 | 5.08E-05 | 2.17E-06 | 0.000101 | 0.000104 | 7.17E-06 | 8.87E-05 | 3.41E-05 | 2.96E-05 |

b | 0.094095 | 0.008997 | 0.091477 | 0.007491 | 0.367243 | 0.000349 | 0.007735 | 0.010931 | 0.009811 | 0.401872 |

c | 8.23E-05 | 4.03E-06 | 6.22E-05 | 0.999358 | 0.000145 | 0.000147 | 7.99E-06 | 0.000119 | 4.02E-05 | 3.35E-05 |

d | 0.071082 | 0.006026 | 0.085054 | 0.004911 | 0.522097 | 0.000112 | 0.005017 | 0.00774 | 0.006597 | 0.291364 |

t | 0.000247 | 4.43E-05 | 0.00021 | 3.24E-05 | 0.000324 | 0.000334 | 0.998374 | 0.00029 | 4.29E-05 | 0.000101 |

A | 0.000329 | 9.12E-05 | 0.000283 | 7.63E-05 | 0.00041 | 0.000423 | 0.997791 | 0.000381 | 5.96E-05 | 0.000156 |

B | 0.00242 | 0.00283 | 0.001839 | 0.002265 | 8.07E-05 | 0.15365 | 0.002296 | 0.054035 | 0.00315 | 0.777433 |

C | 8.88E-05 | 4.82E-05 | 7.07E-05 | 3.93E-05 | 0.000125 | 0.000131 | 9.83E-06 | 0.000113 | 0.999327 | 4.67E-05 |

D | 0.008202 | 0.009073 | 0.006643 | 0.00762 | 0.000387 | 0.286747 | 0.00799 | 0.245995 | 0.009906 | 0.417436 |

Unconventional model trained on 20 training strings.

⋉ | 0.503601 | 0.000828 | 0.494177 | 0.000812 | 0.000217 | 0.000181 | 2.97E-07 | 0.000178 | 2.92E-07 | 4.84E-06 |

a | 1.32E-04 | 0.99942 | 2.64E-04 | 2.35E-06 | 0.000107 | 4.22E-09 | 3.20E-05 | 8.45E-09 | 7.52E-11 | 4.30E-05 |

b | 0.29998 | 0.00156 | 0.244344 | 0.003385 | 0.242705 | 9.60E-06 | 4.99E-08 | 7.82E-06 | 1.08E-07 | 0.208008 |

c | 2.65E-04 | 2.57E-06 | 1.34E-04 | 0.999328 | 0.000214 | 8.48E-09 | 8.22E-11 | 4.28E-09 | 3.20E-05 | 2.46E-05 |

d | 0.321375 | 0.002458 | 0.324781 | 0.001342 | 0.260015 | 1.03E-05 | 7.87E-08 | 1.04E-05 | 4.30E-08 | 0.090007 |

t | 3.60E-09 | 2.73E-05 | 7.20E-09 | 6.41E-11 | 7.34E-09 | 0.000132 | 0.999575 | 0.000264 | 2.35E-06 | 1.19E-08 |

A | 2.00E-08 | 1.52E-04 | 4.01E-08 | 3.57E-10 | 6.31E-09 | 0.000132 | 0.999328 | 0.000264 | 2.35E-06 | 0.000122 |

B | 4.01E-05 | 2.09E-07 | 3.27E-05 | 4.53E-07 | 1.26E-05 | 0.263762 | 0.001372 | 0.214843 | 0.002976 | 0.516961 |

C | 4.03E-08 | 3.90E-10 | 2.03E-08 | 1.52E-04 | 1.27E-08 | 0.000265 | 2.57E-06 | 0.000134 | 0.999377 | 6.96E-05 |

D | 5.40E-05 | 4.13E-07 | 5.46E-05 | 2.26E-07 | 1.70E-05 | 0.355314 | 0.002718 | 0.359079 | 0.001484 | 0.281278 |

These results indicate that both models meet this benchmark of success with 20 training strings. Generally, however, the unconventional model provides higher probabilities to licit sequences.

To evaluate how much training is needed for each model to reliably generalize, we tested both models on small training samples. Specifically, we tested both models on ten training sets with 10 strings and ten training sets with 5 strings. On sets with 5 training strings, the trained

The empirical conclusions from this case study are in agreement with those of Section 8:

This article has applied statistical relational learning to the problem of inferring categorical and stochastic formal languages, a problem typically identified with the field of grammatical inference. The rationale for tackling these learning problems with relational learning is that the learning techniques separate issues of representation from issues of inference. In this way, domain-specific knowledge can be incorporated into the representations of strings when appropriate.

Our case studies indicate that not only can

In addition to exploring learning with unconventional models in these domains and others, there are four other important avenues for future research.

While this article considered the learning problem of finding weights given formula, another problem is identifying both the formulas and the weights. In this regard, it would be interesting to compare the learning of stochastic formal languages with statistical relational learning methods where the formulas are not provided a priori to their learning with grammatical inference methods such as ALEGRIA (de la Higuera,

Second, is the problem of scalability. Unless the input files to Alchemy 2 were small, this software required large computational resources in terms of time and memory. Developing better software and algorithms to allow

Third, Section 7.3 introduces a way to translate the weights on the formulas in the

Finally, while the case studies in this article are experimental, we believe that general theoretical results relating relational learning, grammatical inference, unconventional word models, and formal languages are now within reach. We hope that the present paper spurs such research activity.

MV developed the training data and the

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

^{1}Model signatures may also include constants, but we leave out this term for two reasons. First, within model theory, constants are often treated as functions which take zero arguments. Second, the term constant has a different meaning in the statistical relational learning literature. There, constants are understood as domain elements which ground formulas.

^{2}Syllable weights, and in fact stress itself, manifest differently in different languages. We abstract away from this fact.

^{3}The latter stringset equals Σ^{*}ĹΣ^{*}∪Σ^{*}^{*}.