July 30, 2018 feature

A new complex network-based approach to topic modeling

by Ingrid Fadelli , Tech Xplore

Researchers at Northwestern University, the University of Bath, and the University of Sydney have developed a new network approach to topic models, machine learning strategies that can discover abstract topics and semantic structures within text documents.

"One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts," the researchers explained in their study. "Topic models are one popular machine-learning approach that infers the latent topical structure of a collection of documents."

Topic models are currently being used to identify semantically related texts and classify documents within a number of fields, including sociology, history, linguistics, and psychology. The most commonly used method, latent Dirichlet allocation (LDA), is also used for bibliometrical, psychological and political analysis, as well as for image processing.

Despite its widespread success, LDA presents several flaws in the way it represents text, such as a lack of method to choose the number of topics, discrepancies with statistical properties of real texts and a lack of justification for the Bayesian prior, which in Bayesian statistical inference is the probability distribution expressed before evidence is presented.

A large portion of recent research into topic models has focused on creating more sophisticated versions of LDA that perform better or can effectively analyze particular aspects of documents.

The approach developed by this team of researchers stems from network theory, a theory used in physics and other scientific fields that provides techniques for analyzing graphs, as well as structures in systems with different interacting agents. Their new framework for topic modeling is based on the approach used to find communities in complex networks, which, in the context of network theory, is a graph with features that occur in modeling of real-life systems.

"I was working on natural language and topic modeling from the perspective of complex systems and complex networks," Martin Gerlach, postdoctoral fellow at Northwestern University told TechXplore. "The problems seemed very similar, yet the communities of computer science (topic modeling) and complex networks seemed to work largely independently. Being trained as a physicist, we wanted to show that two seemingly different problems could be reduced to the same underlying math."

Gerlach and his colleagues devised a new approach to identifying topical structures that relates to the problem of finding communities in complex networks. Their technique represents text corpora as bipartite networks, a class of complex networks that divide nodes into sets X and Y, only allowing connections between nodes in different sets.

"We mapped the problem of topic modeling to the problem of community detection in a network consisting of words and documents showing that they are mathematically equivalent," explained Gerlach.

The researchers' approach, which adapts existing community-detection methods, was found to be more versatile and principled than other existing topic models, for instance detecting the number of topics present in texts and hierarchically grouping both words and documents. Their method used a stochastic block model (SBM), a generative model for graphs that generally maps communities, subsets of items that are connected with one another.

"We solve some of the intrinsic and known problems of popular topic modeling algorithms such as LDA (e.g. how to determine the number of topics)," said Gerlach. "In addition, our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields."

The SBM approach developed by Gerlach and his colleagues could have interesting applications in other areas where machine learning is used, such as the analysis of genetic codes or images. In future, the researchers plan to continue exploring the potential of complex networks both within the context of text analysis and beyond.

"The equivalence between topic modeling and community detection allows to use insights gained in each of the communities and apply to the other domain," said Gerlach. "I hope to use these insights to gain a better understanding of these machine learning algorithms; why they work, and more importantly, under which conditions they do not work."

More information: A network approach to topic models, Martin Gerlach et al. A network approach to topic models, Science Advances (2018). DOI: 10.1126/sciadv.aaq1360

Journal information: Science Advances

Citation: A new complex network-based approach to topic modeling (2018, July 30) retrieved 26 April 2024 from https://techxplore.com/news/2018-07-complex-network-based-approach-topic.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New algorithm can separate unstructured text into topics with high accuracy and reproducibility

88 shares

Feedback to editors

New approach could make reusing captured carbon far cheaper, less energy-intensive

2 hours ago

How much energy can offshore wind farms in the U.S. produce? New study sheds light

13 hours ago

Engineers uncover key to efficient and stable organic solar cells

18 hours ago

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

19 hours ago

Mask-inspired perovskite smart windows enhance weather resistance and energy efficiency

19 hours ago

Researchers increase storage, efficiency and durability of capacitors

19 hours ago

Study explores why human-inspired machines can be perceived as eerie

21 hours ago

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Apr 24, 2024

Study shows potential of super grids when hurricanes overshadow solar panels

Apr 24, 2024

Rubber-like stretchable energy storage device fabricated with laser precision

Apr 24, 2024

Load comments (0)

A new complex network-based approach to topic modeling

New approach could make reusing captured carbon far cheaper, less energy-intensive

How much energy can offshore wind farms in the U.S. produce? New study sheds light

Engineers uncover key to efficient and stable organic solar cells

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Mask-inspired perovskite smart windows enhance weather resistance and energy efficiency

Researchers increase storage, efficiency and durability of capacitors

Study explores why human-inspired machines can be perceived as eerie

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Study shows potential of super grids when hurricanes overshadow solar panels

Rubber-like stretchable energy storage device fabricated with laser precision

New algorithm can separate unstructured text into topics with high accuracy and reproducibility

Data science can tell us which political party is dominating

Physicists with green fingers estimate tree spanning rate in random networks

How community structure affects the resilience of a network

Does my algorithm work? There's no shortcut for community detection

Scientists teach neural network to identify a writer's gender

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Emulating neurodegeneration and aging in artificial intelligence systems

Holographic displays offer a glimpse into an immersive future

For more open and equitable public discussions on social media, try 'meronymity'

Researchers develop energy-efficient probabilistic computer by combining CMOS with stochastic nanomagnet

New computer vision tool can count damaged buildings in crisis zones and accurately estimate bird flock sizes

Phys.org

Medical Xpress

Science X

A new complex network-based approach to topic modeling

New approach could make reusing captured carbon far cheaper, less energy-intensive

How much energy can offshore wind farms in the U.S. produce? New study sheds light

Engineers uncover key to efficient and stable organic solar cells

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Mask-inspired perovskite smart windows enhance weather resistance and energy efficiency

Researchers increase storage, efficiency and durability of capacitors

Study explores why human-inspired machines can be perceived as eerie

High-energy-density capacitors with 2D nanomaterials could significantly enhance energy storage

Study shows potential of super grids when hurricanes overshadow solar panels

Rubber-like stretchable energy storage device fabricated with laser precision

Related Stories

New algorithm can separate unstructured text into topics with high accuracy and reproducibility

Data science can tell us which political party is dominating

Physicists with green fingers estimate tree spanning rate in random networks

How community structure affects the resilience of a network

Does my algorithm work? There's no shortcut for community detection

Scientists teach neural network to identify a writer's gender

Recommended for you

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Emulating neurodegeneration and aging in artificial intelligence systems

Holographic displays offer a glimpse into an immersive future

For more open and equitable public discussions on social media, try 'meronymity'

Researchers develop energy-efficient probabilistic computer by combining CMOS with stochastic nanomagnet

New computer vision tool can count damaged buildings in crisis zones and accurately estimate bird flock sizes

Your Privacy