^{1}

^{2}

^{*}

^{3}

^{1}

^{2}

^{3}

Edited by: Aleksey Nikolsky, Independent Researcher, Los Angeles, CA, United States

Reviewed by: Juan G. Roederer, American Association of Retired Persons, United States; Susan Elizabeth Rogers, Berklee College of Music, United States

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

In sixth century BC, Pythagoras discovered the mathematical foundation of musical consonance and dissonance. When auditory frequencies in small-integer ratios are combined, the result is a harmonious perception. In contrast, most frequency combinations result in audible, off-centered by-products labeled “beating” or “roughness;” these are reported by most listeners to sound dissonant. In this paper, we consider second-order beats, a kind of beating recognized as a product of neural processing, and demonstrate that the data-driven approach of Recurrence Quantification Analysis (RQA) allows for the reconstruction of the order in which interval ratios are ranked in music theory and harmony. We take advantage of computer-generated sounds containing all intervals over the span of an octave. To visualize second-order beats, we use a glissando from the unison to the octave. This procedure produces a profile of recurrence values that correspond to subsequent epochs along the original signal. We find that the higher recurrence peaks exactly match the epochs corresponding to just intonation frequency ratios. This result indicates a link between consonance and the dynamical features of the signal. Our findings integrate a new element into the existing theoretical models of consonance, thus providing a computational account of consonance in terms of dynamical systems theory. Finally, as it considers general features of acoustic signals, the present approach demonstrates a universal aspect of consonance and dissonance perception and provides a simple mathematical tool that could serve as a common framework for further neuro-psychological and music theory research.

Beating is the sensation that typically occurs when two sounds with similar frequencies mutually interfere, giving rise to a waveform with a rhythmic oscillation in amplitude. Following the fundamental contribution of Helmholtz’s treatise,

At 80 dB (or higher) while maintaining the interval around the octave, a distinct beating can be perceived. This disappears when _{2} = 2_{1} (where _{1} and _{2} represent the two frequencies) and reappears as long as the octave becomes mistuned by a factor 𝜀 (i.e., _{2} = 2_{1} + 𝜀). The beating frequency turns out to be 𝜀 (_{2} = 3/2 _{1}) or fourth (_{2} = 4/3 _{1}) is static, the mistuned cases _{2} = 3/2 _{1} + 𝜀 and _{2} = 4/3 _{1} + 𝜀 cause the vibration pattern to change periodically in form, but not in amplitude. From the octave to the fifth and to the fourth, the second-order beats become faster (beating frequency being 𝜀 for the octave, 2𝜀 for the fifth, and 3𝜀 for the fourth) as the vibration pattern grows in complexity (see

Amplitude (y axis) against time (x axis) for

Their neural origin makes second-order beats an excellent phenomenon for investigating the link between the mathematical description of the signals and their neural processing, and consequently allows us to shed light on their perceived “pleasantness.” To achieve a consistent picture of second-order beats, it is fundamental to overcome the frequency–time space representation trade-off and the related problem of non-stationary signal characteristics.

Graphic representations of sound typically plot the course of amplitude over time or report the relative amplitudes of the different frequencies computed by the Fourier Transform. Thus, there is no mention of time in the latter, and no mention of frequency in the former. However, in the actual hearing process, time and frequency are strictly intermingled, because specific frequencies are processed at specific moments. This fact suggests that we should focus on the simultaneous analysis of time/frequency dimensions (

To determine the frequency of an oscillatory phenomenon, we must count the number

It is possible to neglect the explicit consideration of time and visualize tone relationships within the octave by computing the ratio of two simultaneous frequencies and then plotting the interval ratio against the amplitude. This is achieved by forming a linear combination of two pure tone waves, a glissando from the unison (_{1}) to the octave (2_{1}) and a firm wave at frequency _{1}. Similar stimuli were previously adopted by

The original idea of describing non-stationary signals (which are not amenable to classical Fourier analysis) by means of recurrence dates back to the work of Ruelle’s group (

RQA builds upon the computation of a distance matrix between the rows (epochs) of the embedding matrix of the signal of interest, with the lag defined by the method of the first minimum of Mutual Information (_{i} on the trajectory is close to another point _{j}. The relative closeness between X_{i} and X_{j} is estimated by the Euclidian distance between these two vectors. If the distance falls below a threshold radius (

As an example, consider a time series

t0 | t+1 | t+2 | epochs |

7 | 8 | 10 | ep1 |

8 | 10 | 15 | ep2 |

10 | 15 | 6 | ep3 |

15 | 6 | 7 | ep4 |

6 | 7 | 9 | ep5 |

7 | 9 | 11 | ep6 |

9 | 11 | 10 | ep7 |

11 | 10 | 8 | ep8 |

Thus, the original series has been projected into a three-dimensional space in which the variables (columns) are the time-lagged original series and the statistical units (rows) are the overlapping epochs. The second step is to compute the Euclidean distances between the epochs. This generates the following distance matrix AD:

ep1 | ep2 | ep3 | ep4 | ep5 | ep6 | ep7 | ep8 | |

0 | ep1 | |||||||

5.477226 | 0 | ep2 | ||||||

8.602325 | 10.48809 | 0 | ep3 | |||||

8.774964 | 11.35782 | 10.34408 | 0 | ep4 | ||||

7 | 9.433981 | 9.273618 | 0 | ep5 | ||||

4.242641 | 8.3666 | 9.433981 | 3 | 0 | ep6 | |||

3.605551 | 5.196152 | 5.744563 | 8.3666 | 5.09902 | 3 | 0 | ep7 | |

4.898979 | 7.615773 | 5.477226 | 5.744563 | 5.91608 | 5.09902 | 3 | 0 | ep8 |

As the AD elements correspond to the Euclidean distances between corresponding epochs, the diagonal values are 0, and the symmetric character of the distances implies the matrix can be written in lower-triangular form.

We now specify that two epochs are recurrent if their distance is less than 95% of all the between-epoch distances. The average value of the below-diagonal elements of AD is 6.48, and their standard deviation is 2.74. Thus, it is estimated that 95% of distances are greater than 1.74. This implies we have only two recurrences, corresponding to the epoch1–epoch5 and epoch1–epoch6 couples (bolded in the table).

Therefore, example series A has a recurrence rate of 0.071 (two recurrences out of 28 distinct distances) or, equivalently, a recurrence percentage equal to 7.1. The AD matrix corresponds to an RP with only two dots, at coordinates (1, 5) and (1, 6). Note that the recurrences can be identified without the need for any frequency estimation, thus resembling the hearing process that receives sounds as they occur in time.

To provide a quantitative measure of the recurrence, numerical RP descriptors were developed (

Recurrence plots (RPs) of waveforms for mistuned unison

Files were generated using the sound editor Cool Edit Pro and saved in ASCII format before being fed to the Visual Recurrence Analysis (VRA) software. For the plots in

Waveform resulting from linearly adding the amplitudes of two sinusoidal signals: a glissando from 360 to 840 Hz (represented by the diagonal line) and a constant frequency of 400 Hz (line parallel to the x axis). The left y axis shows the amplitude of the waveform and the right y axis is the frequency of the diagonal and plain lines. The x axis shows the time for the glissando to go from 360 to 840 Hz, and therefore contains the full collection of intervals between 360/400 and 840/400. The waveform exhibits a rich texture, as the zoomed inset shows, where the intervals of fourth (4/3) and fifth (3/2) are marked. The discrete character of the signal is the cause of the dot-like nature of the graph. The y axis has both negative and positive numbers depending upon the peak/valley alternation of the combination (where anti-phase destructive interactions correspond to 0).

MATLAB programs were obtained from

A non-stationary signal exploring all interval combinations within the octave can be generated by merging the course of two sounds into a single waveform. The first sound is set at constant frequency _{1} for the full duration of the course, while the second follows an ascending glissando from _{1} to _{2} = 2_{1}.

The most conspicuous singularity (recurrence peaks, see below) in the graph occurs when lines cross themselves, i.e., when _{2} = _{1} (unison, interval ratio of 1:1). A second relevant case occurs at the interval ratio of 2:1, which corresponds to the octave. Less evident events occur at 3:2 (fifth) and 4:3 (fourth), as can be seen in the zoomed inset in

Following the numerical solution of Helmholtz’s glissando, we explore the glissando/constant frequency signal through an RQA windowing procedure called Recurrence Quantification of Epochs (RQE). RQE performs a scansion of the whole signal by sequentially selecting small windows—specifically episodes of 480 points—in which the RQA algorithm (with the consequent computation of recurrence rate for each episode) is applied. The subsequent windows are shifted by 48 points and the process is repeated throughout the entire file. For each iteration, we retain both the recurrence value and the interval ratio in which this value occurs, calculated as the mean of the interval ratios in the window.

Recurrence analysis of the waveform resulting from linearly adding the amplitudes of sinusoidal signals covering the intervals forming the octave. The x axis is the interval ratio and the y axis gives the percentage of recurrence. Each point in this graph is the result of a single recurrence analysis (like those shown in

Emergent features of the glissando are evident in

Rank order of consonances and their degree of recurrence.

Recurrence | Interval ratio | Label | Rational | Name |
---|---|---|---|---|

100,0 | ||||

89,1 | ||||

45,2 | ||||

30,6 | ||||

29,6 | ||||

23,4 | ||||

19,9 | 1.7499 | H7 | 7/4 | Harmonic seventh |

18,5 | ||||

16,3 | 1.4007 | 7/5 | Septimal | |

15,4 | ||||

15,1 | ||||

14,2 | 1.1667 | 7/6 | Septimal minor third | |

11,9 | 1.2855 | 9/7 | Septimal major third | |

11,7 | 1.8339 | 11/6 | Undecimal neutral seventh | |

11,5 | 1.1427 | 8/7 | Septimal whole tone | |

10,1 | 1.4283 | 10/7 | Euler’s tritone | |

9,8 | ||||

9,7 | 1.7139 | 12/7 | Septimal major sixth | |

9,4 | 1.5711 | 11/7 | Undecimal augmented fifth | |

9,3 | ||||

9,1 | 1.8567 | 15/8 | Classic major seventh | |

9,1 | 1.2219 | 11/9 | Undecimal neutral third | |

8,7 | 1.1006 | 11/10 | 4/5 tone | |

8,6 | 1.3755 | 11/8 | Undecimal semi-augmented fourth | |

7,8 | ||||

7,7 | 1.2999 | 13/10 | Tridecimal semi-diminished fourth | |

7,6 | 1.6251 | 13/8 | Tridecimal neutral sixth | |

7,1 | 1.0911 | 12/11 | 3/4 tone | |

6,8 | 1.8891 | 17/9 | Septendecimal minor third | |

6,8 | 1.1823 | 13/11 | Tridecimal minor third | |

6,7 |

In summary, RQA allows us to establish a natural link between the signal properties and the consonance judgment of the listeners without any

Linear relationship between the degree of recurrence (

Whereas Frova’s index is derived from the energy of the partials forming a complex sound, the percentage recurrence is a purely bottom–up phenomenological descriptor of a pure tone signal, relating recurrence (and consonance) to secondary beating and thus providing a natural (albeit roughly phenomenological) link between the signal properties and neural processing.

Note that the computation of recurrences gives very similar results with respect to models based on primary beating, such as the Plomp and Levelt model reported in

Dissonance curve derived from a synthetic sound with 15 harmonics following a natural series. This graph comes from an algorithm ideated by

In this paragraph, we relate the self-similar appearance of the recurrence graph in

The ^{th} century, Christian Huygens studied mode-locking and discovered the phenomenon of resonance. He noticed that, after a time, the pendulums of two clocks fixed on the same mounting swung synchronously. The synchronization of two coupled oscillators starting from (slightly) different frequencies is called resonance. A more general case of resonant behavior appears when a specific constant frequency is periodically driven by an external power to oscillate at a different frequency; the so-called Devil’s staircase pattern refers to the behavior of forced

where

In our terms, Ω is the cumulative recurrence and

The above considerations can be summarized in three main points:

A purely empirical, data-driven analysis (RQA) has highlighted a fundamental property of signals (recurrence distribution) that matches the mathematical (number theory) and physical (mode-locking) theoretical background.

The empirical results are consistent with both a theory-driven “simplicity index” (Frova’s index) and with the order that music intervals are ranked in harmony.

The focus on signal properties (second-order beatings) allows us to consider our results as a basis for modeling consonance and dissonance perception by combining data from both computational and cognitive models, e.g., based on artificial neural networks and Hebbian neuroplasticity (

Numerous studies have confirmed the adequacy of concepts from non-linear dynamics for music perception and construction (e.g.,

Taken together, our work and previous results support the idea that the production and perception of sound are intimately linked, the perceived pleasantness of intervals being an intrinsic property of the signal (in terms of the degree of recurrence), and not only a secondary effect of the signal on the listener. In turn, this allows us to speculate on the auditory system. Second-order beats have been attributed to the central auditory nervous system, and neuronal webs are known to support phase-locking, as in the mammalian auditory system, in which neural activity in areas including the cochlear nucleus, inferior colliculus, and primary auditory cortex is phase-locked to the stimulus waveform (

In line with the literature on music perception (

The origins of the distinction between consonance and dissonance have been hotly debated in recent years. As the phenomenon of consonance represents a key element of Western music theory, this has mainly been investigated in terms of Western science (i.e., mathematics, physics, psychoacoustics, and neuroscience). For this reason,

The main contribution of this paper stems from the numerical solution of Helmholtz’s glissando. Though the standard modern theory of consonance is based on first-order beating, we have shown that similar results can be obtained starting from second-order beats. The recent interest in second-order beating has been fruitful for models of pitch recognition or neural circuitry (see

Scholars have started to consider music from the perspective of dynamical systems, both in neurobiological and physical terms, showing that mode-locking models can explain how the nervous system manages sound and is engaged in the ranking of consonances. The resemblance between the formal Devil’s staircase model and the cumulative recurrence distribution strengthens this idea.

From a methodological perspective, the main contribution of this work is to provide neuroscience scholars with an extremely simple and model-free tool (RQA) that approaches the acoustic signal and the listener’s perception system with the same mathematical method. Different RQA applications have been reported in research on otoacoustic emission (see, for example,

Finally, our results support the idea of natural roots of consonance perception, and are thus in line with several studies published in recent years (see, for example,

LT originally conceived the idea of the paper, elaborated the stimuli, provided all the figures, and significantly contributed to the results and discussion. NDS prepared the manuscript, co-authored the introduction and the results with LT, contributed to the discussion, wrote the conclusion, and finally revised the entire draft. AG wrote the section “Recurrence Quantification Analysis,” reviewed the entire manuscript, and suggested useful ideas for the discussion. All authors equally contributed to the revision of the manuscript before agreeing on the final version.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

LT is grateful to Universitat Pompeu Fabra (UPF) of Barcelona for access to their library.