June 27, 2019

Drag-and-drop data analytics

by Rob Matheson, Massachusetts Institute of Technology

In the Iron Man movies, Tony Stark uses a holographic computer to project 3-D data into thin air, manipulate them with his hands, and find fixes to his superhero troubles. In the same vein, researchers from MIT and Brown University have now developed a system for interactive data analytics that runs on touchscreens and lets everyone—not just genius, billionaire, playboy philanthropists—tackle real-world issues.

For years, the researchers have been developing an interactive data-science system called Northstar, which runs in the cloud but has an interface that supports any touchscreen device, including smartphones and large interactive whiteboards. Users feed the system datasets, and manipulate, combine, and extract features on a user-friendly interface, using their fingers or a digital pen, to uncover trends and patterns.

In a paper being presented at the ACM SIGMOD conference, the researchers detail a new component of Northstar, called VDS for "virtual data scientist," that instantly generates machine-learning models to run prediction tasks on their datasets. Doctors, for instance, can use the system to help predict which patients are more likely to have certain diseases, while business owners might want to forecast sales. If using an interactive whiteboard, everyone can also collaborate in real-time.

The aim is to democratize data science by making it easy to do complex analytics, quickly and accurately.

"Even a coffee shop owner who doesn't know data science should be able to predict their sales over the next few weeks to figure out how much coffee to buy," says co-author and long-time Northstar project lead Tim Kraska, an associate professor of electrical engineering and computer science in at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and founding co-director of the new Data System and AI Lab (DSAIL). "In companies that have data scientists, there's a lot of back and forth between data scientists and nonexperts, so we can also bring them into one room to do analytics together."

VDS is based on an increasingly popular technique in artificial intelligence called automated machine-learning (AutoML), which lets people with limited data-science know-how train AI models to make predictions based on their datasets. Currently, the tool leads the DARPA D3M Automatic Machine Learning competition, which every six months decides on the best-performing AutoML tool.

Joining Kraska on the paper are: first author Zeyuan Shang, a graduate student, and Emanuel Zgraggen, a postdoc and main contributor of Northstar, both of EECS, CSAIL, and DSAIL; Benedetto Buratti, Yeounoh Chung, Philipp Eichmann, and Eli Upfal, all of Brown; and Carsten Binnig who recently moved from Brown to the Technical University of Darmstadt in Germany.

An "unbounded canvas" for analytics

The new work builds on years of collaboration on Northstar between researchers at MIT and Brown. Over four years, the researchers have published numerous papers detailing components of Northstar, including the interactive interface, operations on multiple platforms, accelerating results, and studies on user behavior.

Northstar starts as a blank, white interface. Users upload datasets into the system, which appear in a "datasets" box on the left. Any data labels will automatically populate a separate "attributes" box below. There's also an "operators" box that contains various algorithms, as well as the new AutoML tool. All data are stored and analyzed in the cloud.

The researchers like to demonstrate the system on a public dataset that contains information on intensive care unit patients. Consider medical researchers who want to examine co-occurrences of certain diseases in certain age groups. They drag and drop into the middle of the interface a pattern-checking algorithm, which at first appears as a blank box. As input, they move into the box disease features labeled, say, "blood," "infectious," and "metabolic." Percentages of those diseases in the dataset appear in the box. Then, they drag the "age" feature into the interface, which displays a bar chart of the patient's age distribution. Drawing a line between the two boxes links them together. By circling age ranges, the algorithm immediately computes the co-occurrence of the three diseases among the age range.

"It's like a big, unbounded canvas where you can lay out how you want everything," says Zgraggen, who is the key inventor of Northstar's interactive interface. "Then, you can link things together to create more complex questions about your data."

Approximating AutoML

With VDS, users can now also run predictive analytics on that data by getting models custom-fit to their tasks, such as data prediction, image classification, or analyzing complex graph structures.

Using the above example, say the medical researchers want to predict which patients may have blood disease based on all features in the dataset. They drag and drop "AutoML" from the list of algorithms. It'll first produce a blank box, but with a "target" tab, under which they'd drop the "blood" feature. The system will automatically find best-performing machine-learning pipelines, presented as tabs with constantly updated accuracy percentages. Users can stop the process at any time, refine the search, and examine each model's errors rates, structure, computations, and other things.

According to the researchers, VDS is the fastest interactive AutoML tool to date, thanks, in part, to their custom "estimation engine." The engine sits between the interface and the cloud storage. The engine leverages automatically creates several representative samples of a dataset that can be progressively processed to produce high-quality results in seconds.

"Together with my co-authors I spent two years designing VDS to mimic how a data scientist thinks," Shang says, meaning it instantly identifies which models and preprocessing steps it should or shouldn't run on certain tasks, based on various encoded rules. It first chooses from a large list of those possible machine-learning pipelines and runs simulations on the sample set. In doing so, it remembers results and refines its selection. After delivering fast approximated results, the system refines the results in the back end. But the final numbers are usually very close to the first approximation.

"For using a predictor, you don't want to wait four hours to get your first results back. You want to already see what's going on and, if you detect a mistake, you can immediately correct it. That's normally not possible in any other system," Kraska says. The researchers' previous user study, in fact, "show that the moment you delay giving users results, they start to lose engagement with the system."

The researchers evaluated the tool on 300 real-world datasets. Compared to other state-of-the-art AutoML systems, VDS' approximations were as accurate, but were generated within seconds, which is much faster than other tools, which operate in minutes to hours.

Next, the researchers are looking to add a feature that alerts users to potential data bias or errors. For instance, to protect patient privacy, sometimes researchers will label medical datasets with patients aged 0 (if they do not know the age) and 200 (if a patient is over 95 years old). But novices may not recognize such errors, which could completely throw off their analytics.

"If you're a new user, you may get results and think they're great," Kraska says. "But we can warn people that there, in fact, may be some outliers in the dataset that may indicate a problem."

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: Drag-and-drop data analytics (2019, June 27) retrieved 23 April 2024 from https://techxplore.com/news/2019-06-drag-and-drop-analytics.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Cracking open the black box of automated machine learning

3 shares

Feedback to editors

How potatoes, corn and beans led to breakthrough in smart windows technology

14 minutes ago

A new framework to generate human motions from language prompts

3 hours ago

New metasurface innovation unlocks precision control in wireless signals

17 hours ago

Neural networks can mediate between download size and quality, according to researcher

18 hours ago

A win-win approach: Maximizing Wi-Fi performance using game theory

18 hours ago

Plasma treatment enhances electrode material for fuel cells in industry, homes and vehicles

22 hours ago

People, not design features, make a robot social

23 hours ago

An ultralow-concentration electrolyte for lithium-ion batteries

Apr 22, 2024

A coffee roastery in Finland has launched an AI-generated blend. The results were surprising

Apr 21, 2024

Microsoft teases lifelike avatar AI tech but gives no release date

Apr 20, 2024

Load comments (1)

Drag-and-drop data analytics

Approximating AutoML

How potatoes, corn and beans led to breakthrough in smart windows technology

A new framework to generate human motions from language prompts

New metasurface innovation unlocks precision control in wireless signals

Neural networks can mediate between download size and quality, according to researcher

A win-win approach: Maximizing Wi-Fi performance using game theory

Plasma treatment enhances electrode material for fuel cells in industry, homes and vehicles

People, not design features, make a robot social

An ultralow-concentration electrolyte for lithium-ion batteries

A coffee roastery in Finland has launched an AI-generated blend. The results were surprising

Microsoft teases lifelike avatar AI tech but gives no release date

Cracking open the black box of automated machine learning

From one brain scan, more information for medical artificial intelligence

Tool for nonstatisticians automatically generates models that glean insights from complex datasets

Researchers look to add statistical safeguards to data analysis and visualization software

Infusing machine learning models with inductive biases to capture human behavior

Predicting the accuracy of a neural network prior to training

A new framework to generate human motions from language prompts

Neural networks can mediate between download size and quality, according to researcher

A coffee roastery in Finland has launched an AI-generated blend. The results were surprising

Microsoft teases lifelike avatar AI tech but gives no release date

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Phys.org

Medical Xpress

Science X

Drag-and-drop data analytics

Approximating AutoML

How potatoes, corn and beans led to breakthrough in smart windows technology

A new framework to generate human motions from language prompts

New metasurface innovation unlocks precision control in wireless signals

Neural networks can mediate between download size and quality, according to researcher

A win-win approach: Maximizing Wi-Fi performance using game theory

Plasma treatment enhances electrode material for fuel cells in industry, homes and vehicles

People, not design features, make a robot social

An ultralow-concentration electrolyte for lithium-ion batteries

A coffee roastery in Finland has launched an AI-generated blend. The results were surprising

Microsoft teases lifelike avatar AI tech but gives no release date

Related Stories

Cracking open the black box of automated machine learning

From one brain scan, more information for medical artificial intelligence

Tool for nonstatisticians automatically generates models that glean insights from complex datasets

Researchers look to add statistical safeguards to data analysis and visualization software

Infusing machine learning models with inductive biases to capture human behavior

Predicting the accuracy of a neural network prior to training

Recommended for you

A new framework to generate human motions from language prompts

Neural networks can mediate between download size and quality, according to researcher

A coffee roastery in Finland has launched an AI-generated blend. The results were surprising

Microsoft teases lifelike avatar AI tech but gives no release date

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Your Privacy