February 26, 2019

New AI approach bridges the 'slim-data gap' that can stymie deep learning approaches

by Tom Rickey, Pacific Northwest National Laboratory

Scientists have developed a deep neural network that sidesteps a problem that has bedeviled efforts to apply artificial intelligence to tackle complex chemistry—a shortage of precisely labeled chemical data. The new method gives scientists an additional tool to apply deep learning to explore drug discovery, new materials for manufacturing, and a swath of other applications.

Predicting chemical properties and reactions among millions upon millions of compounds is one of the most daunting tasks that scientists face. There is no source of complete information from which a deep learning program could draw upon. Usually, such a shortage of a vast amount of clean data is a show-stopper for a deep learning project.

Scientists at the Department of Energy's Pacific Northwest National Laboratory discovered a way around the problem. They created a pre-training system, kind of a fast-track tutorial where they equip the program with some basic information about chemistry, equip it to learn from its experiences, then challenge the program with huge datasets.

The work was presented at KDD2018, the Conference on Knowledge Discovery and Data Mining, in London.

Cats, dogs, and clean data

For deep learning networks, abundant and clear data has long been the key to success. In the cat vs. dog dialogue that peppers discussions of AI systems, researchers recognize the importance of "labeled data—a photo of a cat is marked a cat, a dog is marked a dog, and so on. Having many, many photos of cats and dogs, clearly marked as such, is a good example of the type of data that AI scientists like to have. The photos provide clear data points that a neural network can use to learn from as it begins to differentiate cats from dogs.

Credit: Pacific Northwest National Laboratory

But chemistry is more complex than sorting cats from dogs. Hundreds of factors affect a molecule's promiscuity, and thousands of interactions can happen in a flash of a second. AI researchers in chemistry are often faced with either small but thorough data sets or huge but inconsistent datasets—think 100 clear images of chihuahuas or 10 million images of furry blobs. Neither is ideal or even workable alone.

So the scientists created a way to bridge the gap, combining the best of "slim but good data" with "big but poor data."

The team, led by former PNNL scientist Garrett Goh, employed a technique known as rule-based supervised learning. Scientists point the neural network to a vast repository of chemical data known as ChEMBL, and they generate rule-based labels for each of these many molecules, for example calculating the mass of the molecule. The neural network crunches through the raw data, learning principles of chemistry that relate the molecule to basic chemical fingerprints. Taking the neural network trained on the rule-based data, the scientists presented it with the small, but high quality, dataset containing the final properties to be predicted.

The pre-training paid off. The program, called ChemNet, achieved a level of knowledge and precision as accurate or more than the current best deep learning models available when analyzing molecules for their toxicity, their level of biochemical activity related to HIV, and their level of a chemical process known as solvation. The program did so with much less labeled data than its counterparts and achieved the results with less computation, which translates to faster performance.

More information: Garrett B. Goh et al. Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction. arXiv:1712.02734 [stat.ML]. arxiv.org/abs/1712.02734

Provided by Pacific Northwest National Laboratory

Citation: New AI approach bridges the 'slim-data gap' that can stymie deep learning approaches (2019, February 26) retrieved 19 April 2024 from https://techxplore.com/news/2019-02-ai-approach-bridges-slim-data-gap.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Deep learning for electron microscopy

43 shares

Feedback to editors

Climate change will increase value of residential rooftop solar panels across US, study finds

19 minutes ago

Bitcoin's next 'halving' is right around the corner. Here's what you need to know

1 hour ago

Team develops a way to teach a computer to type like a human

12 hours ago

Universal 'cocktail electrolyte' developed for 4.6 V ultra-stable fast charging of commercial lithium-ion batteries

12 hours ago

Garbage could replace a quarter of petroleum-based jet fuel every year

13 hours ago

For more open and equitable public discussions on social media, try 'meronymity'

15 hours ago

Mess is best: Disordered structure of battery-like devices improves performance

15 hours ago

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

15 hours ago

An ink for 3D-printing flexible devices without mechanical joints

16 hours ago

Floating solar's potential to support sustainable development

16 hours ago

Load comments (0)

New AI approach bridges the 'slim-data gap' that can stymie deep learning approaches

Cats, dogs, and clean data

Climate change will increase value of residential rooftop solar panels across US, study finds

Bitcoin's next 'halving' is right around the corner. Here's what you need to know

Team develops a way to teach a computer to type like a human

Universal 'cocktail electrolyte' developed for 4.6 V ultra-stable fast charging of commercial lithium-ion batteries

Garbage could replace a quarter of petroleum-based jet fuel every year

For more open and equitable public discussions on social media, try 'meronymity'

Mess is best: Disordered structure of battery-like devices improves performance

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

An ink for 3D-printing flexible devices without mechanical joints

Floating solar's potential to support sustainable development

Deep learning for electron microscopy

How learning more about neuroscience might influence development of improved AI systems

Using deep neural network acceleration for image analysis in drug discovery

Could artificial intelligence make life harder for hackers?

Training artificial intelligence with artificial X-rays

How deep learning is bringing automatic cloud detection to new heights

Team develops a way to teach a computer to type like a human

For more open and equitable public discussions on social media, try 'meronymity'

Using sim-to-real reinforcement learning to train robots to do simple tasks in broad environments

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

Researchers use machine learning to create a fabric-based touch sensor

Researchers develop energy-efficient probabilistic computer by combining CMOS with stochastic nanomagnet

Phys.org

Medical Xpress

Science X

New AI approach bridges the 'slim-data gap' that can stymie deep learning approaches

Cats, dogs, and clean data

Climate change will increase value of residential rooftop solar panels across US, study finds

Bitcoin's next 'halving' is right around the corner. Here's what you need to know

Team develops a way to teach a computer to type like a human

Universal 'cocktail electrolyte' developed for 4.6 V ultra-stable fast charging of commercial lithium-ion batteries

Garbage could replace a quarter of petroleum-based jet fuel every year

For more open and equitable public discussions on social media, try 'meronymity'

Mess is best: Disordered structure of battery-like devices improves performance

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

An ink for 3D-printing flexible devices without mechanical joints

Floating solar's potential to support sustainable development

Related Stories

Deep learning for electron microscopy

How learning more about neuroscience might influence development of improved AI systems

Using deep neural network acceleration for image analysis in drug discovery

Could artificial intelligence make life harder for hackers?

Training artificial intelligence with artificial X-rays

How deep learning is bringing automatic cloud detection to new heights

Recommended for you

Team develops a way to teach a computer to type like a human

For more open and equitable public discussions on social media, try 'meronymity'

Using sim-to-real reinforcement learning to train robots to do simple tasks in broad environments

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

Researchers use machine learning to create a fabric-based touch sensor

Researchers develop energy-efficient probabilistic computer by combining CMOS with stochastic nanomagnet

Your Privacy