September 11, 2019 feature

Investigating the self-attention mechanism behind BERT-based architectures

by Ingrid Fadelli , Tech Xplore

BERT, a transformer-based model characterized by a unique self-attention mechanism, has so far proved to be a valid alternative to recurrent neural networks (RNNs) in tackling natural language processing (NLP) tasks. Despite their advantages, so far, very few researchers have studied these BERT-based architectures in depth, or tried to understand the reasons behind the effectiveness of their self-attention mechanism.

Aware of this gap in the literature, researchers at the University of Massachusetts Lowell's Text Machine Lab for Natural Language Processing have recently carried out a study investigating the interpretation of self-attention, the most vital component of BERT models. The lead investigator and senior author for this study were Olga Kovaleva and Anna Rumshisky, respectively. Their paper pre-published on arXiv and set to be presented at the EMNLP 2019 conference, suggests that a limited amount of attention patterns are repeated across different BERT sub-components, hinting to their over-parameterization.

"BERT is a recent model that made a breakthrough in the NLP community, taking over the leaderboards across multiple tasks. Inspired by this recent trend, we were curious to investigate how and why it works," the team of researchers told TechXplore via email. "We hoped to find a correlation between self-attention, the BERT's main underlying mechanism, and linguistically interpretable relations within the given input text."

BERT-based architectures have a layer structure, and each of its layers consists of so called "heads." For the model to function, each of these heads is trained to encode a specific type of information, thus contributing to the overall model in its own way. In their study, the researchers analyzed the information encoded by these individual heads, focusing on both its quantity and quality.

"Our methodology focused on examining individual heads and the patterns of attention they produced," the researchers explained. "Essentially, we were trying to answer the question: "When BERT encodes a single word of a sentence, does it pay attention to the other words in a way meaningful to humans?"

The researchers carried out a series of experiments using both basic pretrained and fine-tuned BERT models. This allowed them to gather numerous interesting observations related to the self-attention mechanism that lies at the core of BERT-based architectures. For instance, they observed that a limited set of attention patterns is often repeated across different heads, which suggests that BERT models are over-parameterized.

"We found that BERT tends to be over-parameterized, and there is a lot of redundancy in the information it encodes," the researchers said. "This means that the computational footprint of training such a large model is not well justified."

A further interesting finding gathered by the team of researchers at the University of Massachusetts Lowell is that depending on the task tackled by a BERT model, randomly switching off some of its heads can lead to an improvement, rather than a decline, in performance. In addition, the researchers did not identify any linguistic patterns that are of particular importance in determining BERT's performance in downstream tasks.

"Making deep learning interpretable is important for both fundamental and applied research, and we will continue working in this direction," the researchers said. "New BERT-based models have recently been released, and we plan to extend our methodology to investigate them as well."

More information: Revealing the dark secrets of BERT. arXiv:1908.08593 [cs.CL]. arxiv.org/abs/1908.08593

Citation: Investigating the self-attention mechanism behind BERT-based architectures (2019, September 11) retrieved 25 April 2024 from https://techxplore.com/news/2019-09-self-attention-mechanism-bert-based-architectures.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Relating sentence representations in deep neural networks with those encoded by the brain

111 shares

Feedback to editors

Engineers uncover key to efficient and stable organic solar cells

2 hours ago

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

3 hours ago

Mask-inspired perovskite smart windows enhance weather resistance and energy efficiency

3 hours ago

Researchers increase storage, efficiency and durability of capacitors

3 hours ago

Study explores why human-inspired machines can be perceived as eerie

5 hours ago

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

20 hours ago

Study shows potential of super grids when hurricanes overshadow solar panels

20 hours ago

Rubber-like stretchable energy storage device fabricated with laser precision

21 hours ago

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

21 hours ago

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

22 hours ago

Load comments (0)

Investigating the self-attention mechanism behind BERT-based architectures

Engineers uncover key to efficient and stable organic solar cells

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Mask-inspired perovskite smart windows enhance weather resistance and energy efficiency

Researchers increase storage, efficiency and durability of capacitors

Study explores why human-inspired machines can be perceived as eerie

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Study shows potential of super grids when hurricanes overshadow solar panels

Rubber-like stretchable energy storage device fabricated with laser precision

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Relating sentence representations in deep neural networks with those encoded by the brain

More chat, less duh, on the way thanks to Nvidia AI leaps with BERT

Researchers use Amazon reviews and AI to predict product recalls

A multi-representational convolutional neural network architecture for text classification

Fluorescent 'breathalyzer' makes optimisation of catalysts much easier

The often-heard complaint that motorcycles can influence the outcome of races is justified

Study explores why human-inspired machines can be perceived as eerie

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Emulating neurodegeneration and aging in artificial intelligence systems

Phys.org

Medical Xpress

Science X

Investigating the self-attention mechanism behind BERT-based architectures

Engineers uncover key to efficient and stable organic solar cells

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Mask-inspired perovskite smart windows enhance weather resistance and energy efficiency

Researchers increase storage, efficiency and durability of capacitors

Study explores why human-inspired machines can be perceived as eerie

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Study shows potential of super grids when hurricanes overshadow solar panels

Rubber-like stretchable energy storage device fabricated with laser precision

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Related Stories

Relating sentence representations in deep neural networks with those encoded by the brain

More chat, less duh, on the way thanks to Nvidia AI leaps with BERT

Researchers use Amazon reviews and AI to predict product recalls

A multi-representational convolutional neural network architecture for text classification

Fluorescent 'breathalyzer' makes optimisation of catalysts much easier

The often-heard complaint that motorcycles can influence the outcome of races is justified

Recommended for you

Study explores why human-inspired machines can be perceived as eerie

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Emulating neurodegeneration and aging in artificial intelligence systems

Your Privacy