March 11, 2019 feature

A new approach to overcome multi-model forgetting in deep neural networks

by Ingrid Fadelli , Tech Xplore

In recent years, researchers have developed deep neural networks that can perform a variety of tasks, including visual recognition and natural language processing (NLP) tasks. Although many of these models achieved remarkable results, they typically only perform well on one particular task due to what is referred to as "catastrophic forgetting."

Essentially, catastrophic forgetting means that when a model that was initially trained on task A is later trained on task B, its performance on task A will significantly decline. In a paper pre-published on arXiv, researchers at Swisscom and EPFL identified a new kind of forgetting and proposed a new approach that could help to overcome it via a statistically justified weight plasticity loss.

"When we first started working on our project, designing neural architectures automatically was computationally expensive and unfeasible for most companies," Yassine Benyahia and Kaicheng Yu, the study's primary investigators, told TechXplore via e-mail. "The original aim of our study was to identify new methods to reduce this expense. When the project started, a paper by Google claimed to have drastically reduced the time and resources required to build neural architectures using a new method called weight-sharing. This made autoML feasible for researchers without huge GPU clusters, encouraging us to study this topic more in depth."

During their research into neural network-based models, Benyahia, Yu and their colleagues noticed a problem with weight sharing. When they trained two models (e.g. A and B) sequentially, model A's performance declined, while model B's performance increased, or vice versa. They showed that this phenomenon, which they called "multi-model forgetting," can hinder the performance of several auto-mL approaches, including Google's efficient neural architecture search (ENAS).

"We realized that weight-sharing was causing models to impact each other negatively, which was causing the architecture search process to be closer to random," Benyahia and Yu explained. "We also had our reserves on architecture search, where only the final results are shed to light and where there is no good framework to evaluate the quality of the architecture search in a fair way. Our approach could help to fix this forgetting problem, as it is related to a core method that nearly all recent autoML papers rely on, and we consider such impact to be huge to the community."

In their study, the researchers modeled multi-model forgetting mathematically and derived a novel loss, called weight plasticity loss. This loss could reduce multi-model forgetting substantially by regularizing the learning of a model's shared parameters according to their importance for previous models.

"Basically, due to the over-parameterization of neural networks, our loss decreases parameters that are 'less important' to the final loss first, and keeps the more important ones unchanged," Benyahia and Yu said. "Model A's performance is thus unaffected, while model B's performance keeps increasing. On small datasets, our model can reduce forgetting up to 99 percent, and on autoML methods, up to 80 percent in the middle of training."

In a series of tests, the researchers demonstrated the effectiveness of their approach for decreasing multi-model forgetting, both in instances where two models are trained sequentially and for neural architecture search. Their findings suggest that adding weight plasticity in neural architecture search can significantly improve the performance of multiple models on both NLP and computer vision tasks.

The study carried out by Benyahia, Yu and their colleagues sheds light on the issue of catastrophic forgetting, particularly that which occurs when multiple models are trained sequentially. After modeling this problem mathematically, the researchers introduced a solution that could overcome it, or at least drastically reduce its impact.

"In multi-model forgetting, our guiding principle was to think in formulas and not just by simple intuition or heuristics," Benyahia and Yu said. "We strongly believe that this 'thinking in formulas' can lead researchers to great discoveries. That is why for further research, we aim to apply this approach to other fields of machine learning. In addition, we plan to adapt our loss to recent state-of-the-art autoML methods to demonstrate its effectiveness in solving the weight-sharing problem observed by us."

More information: Overcoming multi-model forgetting. arXiv:1902.08232 [cs.LG]. arxiv.org/abs/1902.08232

Efficient neural architecture search via parameter sharing. arXiv:1802.03268 [cs.LG]. arxiv.org/abs/1802.03268

Citation: A new approach to overcome multi-model forgetting in deep neural networks (2019, March 11) retrieved 25 April 2024 from https://techxplore.com/news/2019-03-approach-multi-model-deep-neural-networks.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

A new approach for modeling central pattern generators (CPGs) in reinforcement learning

99 shares

Feedback to editors

Study explores why human-inspired machines can be perceived as eerie

1 hour ago

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

16 hours ago

Study shows potential of super grids when hurricanes overshadow solar panels

16 hours ago

Rubber-like stretchable energy storage device fabricated with laser precision

16 hours ago

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

17 hours ago

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

18 hours ago

Why can't robots outrun animals?

19 hours ago

Virtual sensors help aerial vehicles stay aloft when rotors fail

19 hours ago

New insights lead to better next-gen solar cells

20 hours ago

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

20 hours ago

Load comments (0)

A new approach to overcome multi-model forgetting in deep neural networks

Study explores why human-inspired machines can be perceived as eerie

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Study shows potential of super grids when hurricanes overshadow solar panels

Rubber-like stretchable energy storage device fabricated with laser precision

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Why can't robots outrun animals?

Virtual sensors help aerial vehicles stay aloft when rotors fail

New insights lead to better next-gen solar cells

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

A new approach for modeling central pattern generators (CPGs) in reinforcement learning

ColorUNet: A new deep CNN classification approach to colorization

Forgetting in neural networks just got less catastrophic

Brain-inspired algorithm helps AI systems multitask and remember

Modular brains help organisms learn new skills without forgetting old skills

Using multi-task learning for low-latency speech translation

Study explores why human-inspired machines can be perceived as eerie

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Microsoft claims that small, localized language models can be powerful as well

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Emulating neurodegeneration and aging in artificial intelligence systems

Phys.org

Medical Xpress

Science X

A new approach to overcome multi-model forgetting in deep neural networks

Study explores why human-inspired machines can be perceived as eerie

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Study shows potential of super grids when hurricanes overshadow solar panels

Rubber-like stretchable energy storage device fabricated with laser precision

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Why can't robots outrun animals?

Virtual sensors help aerial vehicles stay aloft when rotors fail

New insights lead to better next-gen solar cells

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Related Stories

A new approach for modeling central pattern generators (CPGs) in reinforcement learning

ColorUNet: A new deep CNN classification approach to colorization

Forgetting in neural networks just got less catastrophic

Brain-inspired algorithm helps AI systems multitask and remember

Modular brains help organisms learn new skills without forgetting old skills

Using multi-task learning for low-latency speech translation

Recommended for you

Study explores why human-inspired machines can be perceived as eerie

Scientists pioneer new X-ray microscopy method for data analysis 'on the fly'

Microsoft claims that small, localized language models can be powerful as well

On the trail of deepfakes, researchers identify 'fingerprints' of AI-generated video

New tech could help traveling VR gamers experience 'ludicrous speed' without motion sickness

Emulating neurodegeneration and aging in artificial intelligence systems

Your Privacy