August 17, 2018 feature

A light-weight and accurate deep learning model for audiovisual emotion recognition

by Ingrid Fadelli , Tech Xplore

Researchers at Orange Labs and Normandie University have developed a novel deep neural model for audiovisual emotion recognition that performs well with small training sets. Their study, which was pre-published on arXiv, follows a philosophy of simplicity, substantially limiting the parameters that the model acquires from datasets and using simple learning techniques.

Neural networks for emotion recognition have a number of useful applications within the contexts of healthcare, customer analysis, surveillance, and even animation. While state-of-the-art deep learning algorithms have achieved remarkable results, most are still unable to reach the same understanding of emotions attained by humans.

"Our overall objective is to facilitate human-computer interaction by making computers able to perceive various subtle details expressed by humans," Frédéric Jurie, one of the researchers who carried out the study, told TechXplore. "Perceiving emotions contained in images, video, voice and sound fall within this context."

Recently, studies have put together multimodal and temporal datasets that contain annotated videos and audiovisual clips. Yet these datasets typically contain a relatively small number of annotated samples, while to perform well, most existing deep learning algorithms require larger datasets.

The researchers tried to address this issue by developing a new framework for audiovisual emotion recognition, which fuses the analysis of visual and audio footage, retaining a high level of accuracy even with relatively small training datasets. They trained their neural model on AFEW, a dataset of 773 audiovisual clips extracted from movies and annotated with discrete emotions.

"One can see this model as a black box processing the video and automatically inferring the emotional state of people," Jurie explained. "One big advantage of such deep neural models is that they learn by themselves how to process the video by analyzing examples, and do not require experts to provide specific processing units."

The model devised by the researchers follows the Occam's razor philosophical principle, which suggests that between two approaches or explanations, the simplest one is the best choice. Contrarily to other deep learning models for emotion recognition, therefore, their model is kept relatively simple. The neural network learns a limited number of parameters from the dataset and employs basic learning strategies.

"The proposed network is made of cascaded processing layers abstracting the information, from the signal to its interpretation," Jurie said. "Audio and video are processed by two different channels of the network and are combined lately in the process, almost at the end."

When tested, their light model achieved a promising emotion recognition accuracy of 60.64 percent. It was also ranked fourth at the 2018 Emotion Recognition in the Wild (EmotiW) challenge, held at the ACM International Conference on Multimodal Interaction (ICMI), in Colorado.

"Our model is proof that following the Occam's razor principle, i.e., by always choosing the simplest alternatives for designing neural networks, it is possible to limit the size of the models and obtain very compact but state-of-the-art neural networks, which are easier to train," Jurie said. "This contrasts with the research trend of making neural networks bigger and bigger."

The researchers will now continue to explore ways of achieving high accuracy in emotion recognition by simultaneously analyzing visual and auditory data, using the limited annotated training datasets that are currently available.

"We are interested in several research directions, such as how to better fuse the different modalities, how to represent emotion by compact semantically meaning full descriptors (and not only class labels) or how to make our algorithms able to learn with less, or even without, annotated data," Jurie said.

More information: An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets, arXiv:1808.02668v1 [cs.AI]. arxiv.org/abs/1808.02668

Abstract
This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest earning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of the representations. ii) The isual temporal information is handled by a simple score-per-frame selection process, averaged across time. iii) A simple frame selection echanism is also proposed to weight the images of a sequence. iv) The fusion of the different modalities is performed at prediction level (late usion). We also highlight the inherent challenges of the AFEW dataset and the difficulty of model selection with as few as 383 validation equences. The proposed real-time emotion classifier achieved a state-of-the-art accuracy of 60.64 % on the test set of AFEW, and ranked 4th at he Emotion in the Wild 2018 challenge.

Journal information: arXiv

Citation: A light-weight and accurate deep learning model for audiovisual emotion recognition (2018, August 17) retrieved 1 May 2024 from https://techxplore.com/news/2018-08-light-weight-accurate-deep-audiovisual-emotion.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Using deep neural network acceleration for image analysis in drug discovery

195 shares

Feedback to editors

Natural language boosts LLM performance in coding, planning and robotics

1 hour ago

Science has an AI problem: Research group says they can fix it

3 hours ago

A new roadmap to close the carbon cycle

4 hours ago

A miniature wireless robot that can effectively move through tubular structures

11 hours ago

Methane emissions from landfill could be turned into sustainable jet fuel with plasma-driven process

Apr 30, 2024

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

Apr 30, 2024

New research reports on buckling: When structures suddenly collapse

Apr 30, 2024

Paper power: Origami technology makes its way into quadcopters

Apr 30, 2024

Turning up the heat on data storage: New memory device paves the way for AI computing in extreme environments

Apr 30, 2024

Trotting robots reveal emergence of animal gait transitions

Apr 30, 2024

Load comments (0)

A light-weight and accurate deep learning model for audiovisual emotion recognition

Natural language boosts LLM performance in coding, planning and robotics

Science has an AI problem: Research group says they can fix it

A new roadmap to close the carbon cycle

A miniature wireless robot that can effectively move through tubular structures

Methane emissions from landfill could be turned into sustainable jet fuel with plasma-driven process

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

New research reports on buckling: When structures suddenly collapse

Paper power: Origami technology makes its way into quadcopters

Turning up the heat on data storage: New memory device paves the way for AI computing in extreme environments

Trotting robots reveal emergence of animal gait transitions

Using deep neural network acceleration for image analysis in drug discovery

A new machine learning strategy that could enhance computer vision

An integrated visual and semantic neural network model explains human object recognition in the brain

Training artificial intelligence with artificial X-rays

AI researchers design 'privacy filter' for your photos that disables facial recognition systems

Using multi-task learning for low-latency speech translation

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

Researchers develop a new way to instruct dance in virtual reality

Computer scientists unveil novel attacks on cybersecurity

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Emulating neurodegeneration and aging in artificial intelligence systems

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Phys.org

Medical Xpress

Science X

A light-weight and accurate deep learning model for audiovisual emotion recognition

Natural language boosts LLM performance in coding, planning and robotics

Science has an AI problem: Research group says they can fix it

A new roadmap to close the carbon cycle

A miniature wireless robot that can effectively move through tubular structures

Methane emissions from landfill could be turned into sustainable jet fuel with plasma-driven process

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

New research reports on buckling: When structures suddenly collapse

Paper power: Origami technology makes its way into quadcopters

Turning up the heat on data storage: New memory device paves the way for AI computing in extreme environments

Trotting robots reveal emergence of animal gait transitions

Related Stories

Using deep neural network acceleration for image analysis in drug discovery

A new machine learning strategy that could enhance computer vision

An integrated visual and semantic neural network model explains human object recognition in the brain

Training artificial intelligence with artificial X-rays

AI researchers design 'privacy filter' for your photos that disables facial recognition systems

Using multi-task learning for low-latency speech translation

Recommended for you

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

Researchers develop a new way to instruct dance in virtual reality

Computer scientists unveil novel attacks on cybersecurity

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Emulating neurodegeneration and aging in artificial intelligence systems

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Your Privacy