July 29, 2014

Communication-optimal algorithms for contracting distributed tensors

by Pacific Northwest National Laboratory

Comm Link — Iteration Space and Data Space Mapping for Matrix-Matrix Multiplication on a 2D Torus Network

Tensor contractions, generalized matrix multiplications that are time-consuming to calculate, make them among the most compute-intensive operations in several ab initio computational quantum chemistry methods. In this work, scientists from Pacific Northwest National Laboratory and The Ohio State University developed a systematic framework that uses three fundamental communication operators—recursive broadcast, rotation, and reduction, or RRR,—to derive communication-efficient algorithms for distributed contraction of arbitrary dimensional tensors on the IBM Blue Gene/Q Mira supercomputer. The framework automatically models potential space-performance trade-offs to optimize the communication costs incurred in executing tensor contractions on supercomputers. The paper documenting this work, "Communication-optimal Framework for Contracting Distributed Tensors," is a SC14 Best Paper award finalist.

In computational physics and chemistry, tensor algebra is important because it provides a mathematical framework for formulating and solving problems related to areas such as fluid mechanics. By offering a comprehensive framework that automatically generates communication-optimal algorithms for contracting distributed tensors, redundancy is avoided and the total computation load is balanced, improving the overall communication costs. By deconstructing these distributed tensor contractions, the work also afforded insights into the fundamental building blocks of these widely studied computations.

The researchers characterized distributed tensor contraction algorithms on tori (mesh circles with wraparound connected in more than one dimension) networks, defining tensor indices, iteration space, and their mappings. By mapping the iteration space, they could precisely define where each computation of a tensor contraction occurs, as well as define the data that needs to be present in each processor. For each tensor contraction, the researchers sought an iteration space mapping, a data space mapping, and an algorithm to minimize the communication cost (per contraction) for a given amount of memory per processor.

Then, for a given iteration space mapping, their RRR framework identified the fundamental data movement directions required by a distributed algorithm, which also are elemental to the tensor contraction, called "reuse dimensions." With these reuse dimensions, the framework can compute compatible input and output tensor distributions and systematically generate a contraction algorithm for them using communication operators. In their work, the researchers also showed a cost model that predicted the communication cost for a given iteration space mapping, compatible input and output distribution, and the generated contraction algorithm. The cost model then was used to identify iteration and data space mapping that minimized the overall communication cost.

In their experiments, the researchers showed their framework was scalable up to 16,384 nodes (262,144 cores) on Blue Gene/Q supercomputers. They also demonstrated how their framework improves commutation optimality—even exceeding the Cyclops Tensor Framework, which stands as the current state of the art.

In addition to their distributed and symmetric nature, tensors also might exhibit various forms of sparsity. The team is working on combining this work with the approach published in "A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning," presented last year at SC13, to dynamically load balance tensor contractions. The outcome would be a hybrid approach that exploits the communication efficiency of this work while dynamically adapting to load imbalances introduced by sparsity.

More information: Rajbhandari S, A Nikam, P Lai, K Stock, S Krishnamoorthy, and P Sadayappan. 2014. "Communication-optimal framework for contracting distributed tensors." Presented at: International Conference for High Performance Computing, Networking, Storage and Analysis (SC14). November 16-21, 2014, New Orleans, Louisiana (Best Paper Finalist).

Lai P, K Stock, S Rajbhandari, S Krishnamoorthy, and P Sadayappan. 2013. "A framework for load balancing of tensor contraction expressions via dynamic task partitioning." Presented at: International Conference for High Performance Computing, Networking, Storage and Analysis (SC13). November 17-22, 2013, Denver, Colorado.

Provided by Pacific Northwest National Laboratory

Citation: Communication-optimal algorithms for contracting distributed tensors (2014, July 29) retrieved 18 April 2024 from https://phys.org/news/2014-07-communication-optimal-algorithms-tensors.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

No downtime for communication: New framework allows for asynchronous communication in exascale machines

0 shares

Feedback to editors

Key protein regulates immune response to viruses in mammal cells

3 hours ago

Unraveling the mysteries of consecutive atmospheric river events

6 hours ago

Research team resolves decades-long problem in microscopy

6 hours ago

RNA's hidden potential: New study unveils its role in early life and future bioengineering

7 hours ago

Smoother surfaces make for better accelerators

7 hours ago

Scientists reveal hydroclimatic changes on multiple timescales in Central Asia over the past 7,800 years

7 hours ago

Research reveals a surprising topological reversal in quantum systems

8 hours ago

NASA's Juno gives aerial views of mountain and lava lake on Io

8 hours ago

Toxic fireproof chemicals can be absorbed through touch, 3D-printed skin model shows

8 hours ago

Skyrmions move at record speeds: A step towards the computing of the future

9 hours ago

Load comments (0)

Communication-optimal algorithms for contracting distributed tensors

Key protein regulates immune response to viruses in mammal cells

Unraveling the mysteries of consecutive atmospheric river events

Research team resolves decades-long problem in microscopy

RNA's hidden potential: New study unveils its role in early life and future bioengineering

Smoother surfaces make for better accelerators

Scientists reveal hydroclimatic changes on multiple timescales in Central Asia over the past 7,800 years

Research reveals a surprising topological reversal in quantum systems

NASA's Juno gives aerial views of mountain and lava lake on Io

Toxic fireproof chemicals can be absorbed through touch, 3D-printed skin model shows

Skyrmions move at record speeds: A step towards the computing of the future

Relevant PhysicsForums posts

Error logging in: onLoginSuccess is not a function

My Website For Creating Interactive Visuals Linked To Equations

Latest Notable AI accomplishments

Building a homemade Long Short Term Memory with FSMs

Most efficient way to randomly choose a word from a file with a list of words

Git, staging and committing files

No downtime for communication: New framework allows for asynchronous communication in exascale machines

When does a physical system compute?

Neurologic recovery from corticospinal tract injury due to subfalcine herniation

Reliable communication, unreliable networks

Scheduling algorithms based on game theory makes better use of computational resources

Technology to reduce network switches in cluster supercomputers by 40 percent

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

Communication-optimal algorithms for contracting distributed tensors

Key protein regulates immune response to viruses in mammal cells

Unraveling the mysteries of consecutive atmospheric river events

Research team resolves decades-long problem in microscopy

RNA's hidden potential: New study unveils its role in early life and future bioengineering

Smoother surfaces make for better accelerators

Scientists reveal hydroclimatic changes on multiple timescales in Central Asia over the past 7,800 years

Research reveals a surprising topological reversal in quantum systems

NASA's Juno gives aerial views of mountain and lava lake on Io

Toxic fireproof chemicals can be absorbed through touch, 3D-printed skin model shows

Skyrmions move at record speeds: A step towards the computing of the future

Relevant PhysicsForums posts

Related Stories

No downtime for communication: New framework allows for asynchronous communication in exascale machines

When does a physical system compute?

Neurologic recovery from corticospinal tract injury due to subfalcine herniation

Reliable communication, unreliable networks

Scheduling algorithms based on game theory makes better use of computational resources

Technology to reduce network switches in cluster supercomputers by 40 percent

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience