^{1}

^{1}

^{2}

^{*}

^{1}

^{3}

^{1}

^{2}

^{3}

Edited by: Zhan Li, University of Electronic Science and Technology of China, China

Reviewed by: Yangsong Zhang, Southwest University of Science and Technology, China; Wellington Pinheiro dos Santos, Federal University of Pernambuco, Brazil

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Microexpression is usually characterized by short duration and small action range, and the existing general expression recognition algorithms do not work well for microexpression. As a feature extraction method, non-negative matrix factorization can decompose the original data into different components, which has been successfully applied to facial recognition. In this paper, local non-negative matrix factorization is explored to decompose microexpression into some facial muscle actions, and extract features for recognition based on apex frame. However, the existing microexpression datasets fall short of samples to train a classifier with good generalization. The macro-to-micro algorithm based on singular value decomposition can augment the number of microexpressions, but it cannot meet non-negative properties of feature vectors. To address these problems, we propose an improved macro-to-micro algorithm to augment microexpression samples by manipulating the macroexpression data based on local non-negative matrix factorization. Finally, several experiments are conducted to verify the effectiveness of the proposed scheme, which results show that it has a higher recognition accuracy for microexpression compared with the related algorithms based on CK+/CASME2/SAMM datasets.

Expression is one of the important ways for human to communicate emotion. In 1970s, American psychologist Paul Ekman defined six basic expressions of human, namely, happiness, anger, surprise, fear, disgust, and sadness. Facial expression recognition is to extract the specific states from given images or video, then identify the psychological emotions of the recognized object and understand its facial expressions. Expression recognition has many applications in psychology, intelligent monitoring, robotics, etc. Moreover, sometimes people may disguise their emotion and expression for various purposes. However, people cannot completely suppress their emotions under external strong emotional stimulus. There are some subtle and fast facial actions, which were first discovered and named “micro-momentary” expressions by Haggard and Isaacs (

Different from the general expression, the microexpression is only reflected in a few facial action units, and the duration is about

Microexpression is weak, short term, and difficult to detect, so the traditional expression recognition algorithms do not work well at all for this task. Generally, microexpression recognition can be divided into detection and classification. The former is to determine whether there are microexpressions in an image sequences, and detect the start/apex/end frames of a microexpression. The latter includes feature extraction and classification, which is similar to the general tasks of pattern classification. Significantly, the feature extraction is to acquire the abstract information from the data, which usually is some vectors obtained by image processing. The related algorithms can be used for extracting features, which can reflect the microexpression action information and distinguish various kinds of emotions. The feature classification is to train a classifier on the obtained vectors, directly related to the recognition accuracy, to distinguish the types of microexpression.

The main contributions of this paper are summarized as follows: (i) A local non-negative matrix factorization (LNMF) is developed to extract the features of apex frame on microexpression, which exploits local properties of LNMF to reflect the features of local action on microexpression. (ii) An improved macro-to-micro (MtM) transformation algorithm is proposed to augment the samples of microexpressions from macroexpression data based on LNMF. (iii) The performance of the proposed scheme is verified on CK+, CASME2, and SAMM datasets, which can benefit this work on human–robot interaction.

The rest of the paper is organized as follows. Related works are discussed in section 2. In section 3, the overall scheme, including theoretical derivation on LNMF and MtM algorithm design, is presented. Section 4 provides the experimental process and result analysis. Finally, we conclude this paper in section 5.

Local binary pattern (LBP) is a commonly used method for extracting texture feature of images. LBP from three orthogonal planes (LBP-TOP) is an extension of LBP in video data. Ojala et al. (

Optical flow method aims to quantify facial muscle actions by calculating the motion speed of each pixel in the video. On this basis, the optical strain that reflects the distortion caused by small area motion can be further calculated. If the speed of a pixel in the image is higher than that of the surrounding pixels, its optical strain value will be higher, which can be used to detect the fast and micromovement of muscles in microexpression recognition. Liong et al. (

To determine the facial range of feature extraction, Liong et al. (

Matrix factorization is popular in dimension-reduction fields, which has good physical significance. The original data are expressed as the weighted sum of several bases, which is transformed into a feature vector including weight coefficients to realize perception of the whole from local parts. Principal component analysis (PCA) and singular value decomposition (SVD) are the classic matrix factorization methods. However, the bases and coefficients calculated by these algorithms contain negative elements, which make the decomposition results not well-interpreted. For example, it is not practical to decompose face images into basic sub-images with negative components. To solve this problem, Lee and Seung (

Nowadays, CASME (Yan et al.,

The overall scheme is shown in

The overall block diagram of the proposed scheme.

To determine the RoIs of eyes and mouth regions, we use open source machine learning toolkit DLIB (King, _{eye}/_{mouth}, respectively, to determine the RoIs. The distance between the left and right of bounding box of the eyes is _{eye}/4, the downside is _{eye}/5 away from the lowest point of the eye, and the topside is located on the highest point of the eyebrow. The left and right of the bounding box of the mouth are _{mouth}/5 away from the mouth corners, the top is _{mouth}/4, and the bottom is _{mouth}/7 from the highest and lowest points of the mouth, respectively.

The region of interests (RoIs) of facial expression (Yan et al.,

The optical flow of a pixel refers to its displacement between two frames, which includes both the horizontal and vertical displacement. The optical strain is calculated as the difference of optical-flow values between pixels, which reflects the deformation degree of a non-rigid body during the motion. The microexpression is the micro movement of facial muscles, and the distortion caused by the movement is reflected by the higher optical strain value of this region.

Let _{x} and _{y} be the optical flow in horizontal and vertical directions, and the definition of optical strain is expressed as follows:

where ε_{m} contains the normal and tangential strain of the pixel, and ε is the optical strain value of the pixel.

The pseudo codes for the binary search algorithm to detect the apex frame (Liong et al.,

Binary Apex Frame Detection.

_{f} ∣ f = 1,…N} : Optical strain of every frame |

1: |

2: |

3: |

4: |

5: |

6: |

7: _{lo}, …, ε_{mid}) |

8: _{mid}, …, ε_{hi}) |

9: |

10: |

11: |

12: |

13: |

14: |

15: |

16: |

The definition of NMF is expressed as Equation (3):

where ^{m×n} is data matrix; ^{r×n} is coefficients matrix, in which each column is one sample; and ^{m×r} is base matrix, in which each column is a base. Define

Here, NMF aims to solve the following optimization problem:

In the optimization process, only non-negative constraints are imposed without local constraint to ^{T}^{T}, then the optimization function of LNMF is expressed as follows:

where α and β are constants >0. LNMF iteratively solves Equations (7–9).

where “product” means Hadamad product and “division” means matrix division calculation element by element.

The base matrix

Base matrix of local non-negative matrix factorization (LNMF) for apex frame.

The fewer samples in the existing microexpression datasets are usually insufficient to train a classifier with good generalization. Jia et al. (

where _{ref} and _{probe}; and ^{i} represent the same type of microexpression emotions. The SVD of

where

where _{x}/_{y} corresponds to macro-/microexpressions in _{x} to get _{ref}), and the sum of column vectors of _{y} with same weights to get _{i} (the _{i}), respectively. That is, if we use _{x} to get a macroexpression feature, we can also use _{y} to get a microexpression feature. So we have:

where _{new} is new microexpression feature samples, and the microexpression emotion is same as each column of _{probe}.

Because the new feature samples generated by this algorithm do not have non-negative properties, they cannot be used for feature extraction based on LNMF. The reasons include that _{x}, _{y}, and _{x} be the NMF features of macroexpression, and _{y} be the LNMF features of microexpression.

Macro-to-Micro Transformation.

1: |

2: |

3: |

4: _{emo} ← extract features of emo from |

5: _{emo} ← extract features of emo from |

6: split _{emo} into _{emo,ref} and _{emo,probe} |

7: calculate _{emo} using Equation (16), iteratively |

8: _{new} ← _{emo}_{emo} |

9: _{new} |

10: |

11: |

12: |

Let _{emo} represents the macroexpression NMF feature set of _{emo,ref} and _{emo,probe}. Let _{emo} represents the LNMF feature sample set of microexpression, and the columns number is same as _{emo,ref}. Then we use _{emo,ref} to derive the linear representation of _{emo,probe}:

Equation (16) solves _{emo} from Equation (14) with NMF formula of fixed

In this section, we will evaluate the proposed scheme, including experiment overview, SVM classifier selection, dimension optimization on LNMF, experiments on CK+/CASME2/SAMM datasets, and result analysis.

In general, researchers often take the predicted emotion classes of microexpressions as recognition objects (Jia et al.,

Next, we will validate the proposed scheme based on CK+ macroexpression dataset (Kanade et al.,

We adopt the SVM classifier from the Sklearn toolbox based on LIBSVM (Chang and Lin,

where _{i}, _{j} are the feature vectors, and γ, α,

If the dimension is too small, microexpression features cannot be decomposed into various detailed components based on LNMF. While the dimension is too large, the features will be too scattered. We determine the optimized value through prior testing and comparing with different dimension setting of eyes and mouth. As shown in

Dimensions and recognition accuracy on local non-negative matrix factorization (LNMF).

60 | 0.694 | 0.701 | 0.692 | 0.720 | |

70 | 0.708 | 0.698 | 0.702 | 0.702 | |

80 | 0.706 | 0.707 | 0.697 | ||

90 | 0.700 | 0.690 | 0.700 | 0.680 | |

100 | 0.719 | 0.696 | 0.707 | 0.706 |

The precondition for MtM transformation is that macroexpression features have better distinguishability. To validate this, we first calculate the weight coefficients of macroexpression, and then use them to extract the features of macroexpressions based on NMF. The image resolution of CK+ is 48 × 48. We use NMF of 200 dimensions to acquire the features directly. The confusion matrix about macroexpression recognition is shown in

Confusion matrix of macroexpression recognition on CK+.

First, the basic test only focuses on apex frame recognition, LNMF feature extraction, and SVM classifier on CASME2. The RoIs of microexpression are determined according to the distance between inner eyes and mouth corners. It is necessary to normalize the size of eyes to 80 × 90 and mouth to 70 × 150. The 40/40/80 dimension on LNMF is applied to two eyes and mouth regions of samples in CASME2. Three types of features are concatenated in series as the features of CASME2 samples, so the final dimension is 160. Then, the classifier based on SVM is used to test the recognition accuracy by LOSO cross-validation. As shown in

Confusion matrix of microexpression recognition on CASME2 (without new samples).

Second, the optimized test is carried out with the proposed MtM transformation base on the aforementioned basic test. Considering that CK+ contains anger, contempt, disgust, fear, happy, sadness, and surprise expressions, CASME2 includes disgust, happy, sadness, surprise, fear, repression, and so on. To compare with (Jia et al., _{emo,ref} and _{emo,probe} for subsequent MtM transformation. There are only 156 samples in CASME2, so we double them to 312 through mirroring. For one-to-one correspondence between microexpression in CASME2 and macroexpression in CK+, we use the samples in CASME2 repeatedly to match macroexpression samples in CK+. By this way, we can acquire 312 original samples and 396 new samples (total of 708) for microexpression recognition. After MtM transformation, we get more microexpress samples, including original, mirrored, and new from MtM transformation. It can contribute to train a better SVM classifier. As shown in

Confusion matrix of microexpression recognition on CK+/CASME2 (with new samples).

However, we only select the original 312 samples in CASME2 for testing, instead of the newly augmented samples (only for training), to avoid distorting the recognition accuracy. Although larger number of new samples can increase the final recognition accuracy, it is not consistent with the fact. We double the samples through mirroring only in training set. When using LOSO cross-validation, it is necessary to exclude the mirrored samples for testing to prevent the false high accuracy caused by two similar samples.

There are totally 159 samples in SAMM dataset (Davison et al.,

Confusion matrix of microexpression recognition on SAMM (without new samples).

Compared with

Confusion matrix of micro-expression recognition on CK+/SAMM (with new samples).

We evaluate the proposed MtM algorithm by comparing it with the original MtM (Jia et al.,

Recognition accuracy of different algorithms on CK+/CASME2.

0.655 | 0.572 | 0.618 |

As for SAMM, we evaluate the proposed MtM algorithm by comparing it with SA-AT (Zhou et al.,

Recognition accuracy of different algorithms on CK+/SAMM.

0.549 | 0.701 | 0.682 |

A new microexpression recognition scheme is proposed, which includes feature extracting and sample expanding. We first determine RoIs with optimized dimensions by facial feature points, then the apex frame is obtained from microexpression video by optical flow method. Afterward, LNMF is developed for each RoI, the results of which are concatenated in series as features of microexpression. Furthermore, the MtM transformation based on LNMF is used, which can increase microexpression samples significantly. A classifier based on SVM is trained with microexpression features and yields better generalization. Finally, the proposed MtM algorithm shows better performance in comparison with other algorithms.

However, the proposed algorithm cannot distinguish some expressions with similar motion features at present. There are obvious recognition confusion on similar eyebrow rising motion, such as surprise, disgust, and happy expression. Therefore, our future work will focus on better feature extraction algorithm to address this issue. Moreover, we will also consider deep forest (Ma et al.,

The datasets analyzed for this study can be found at CK+:

Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

JGa, HC, and XZ: conceptualization, methodology, and validation. JGa and HC: software. JGa, HC, XZ, and WL: writing and original draft preparation. JGa, HC, XZ, JGu, and WL: writing–review and editing. All authors have read and agreed to the published version of the manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.