Solving for Subjectivity in Machine Learning

If you want to use computers as intelligent assistants, don’t teach them bad habits. Ghassan AlRegib, a leading expert on computing for the oil and gas industry, said that’s a persistent problem as the industry struggles with concepts like machine learning and artificial intelligence.

On top of that, he warned, the industry needs to get rid of some unhelpful data habits of its own.

AlRegib is a professor in the School of Electrical and Computer Engineering at Georgia Institute of Technology in Atlanta, Ga. He’s also head of Georgia Tech’s Omni Lab for Intelligent Visual Engineering and Science.

He will be a featured speaker in the special session “Intelligent Prospecting: Unlocking Hydrocarbon Potential Using AI and Big Data” at GEO 2020, the 14th Middle East Geosciences Conference and Exhibition in Bahrain this month.

The industry’s computing efforts in the Middle East reflect its challenges in other parts of the world, especially in regard to machine learning, according to AlRegib.

A Third Path

Computers can learn to generate outputs from data through supervised learning or unsupervised learning. In supervised learning, the simpler method, the machine processes labeled training data and proceeds according to specified examples.

Unsupervised leaning does not rely on pre-existing labels or directed outputs. A common example is cluster analysis, a statistical method used for discovering patterns in data.

“What’s happening in the industry right now is fully supervised learning, which I think is not the way to go,” AlRegib said.

One problem stems from supervised learning’s reliance on good and consistent data labeling.

“First, you need access to experts on whom you can rely to label the data. And we’ve tried this,” AlRegib explained.

“The second thing is, and I think this is true in geophysics as much as medicine, if you get two of these experts in the same room, they will disagree,” he said.

Image Caption

An aerial view of Bahrain

Please log in to read the full article

If you want to use computers as intelligent assistants, don’t teach them bad habits. Ghassan AlRegib, a leading expert on computing for the oil and gas industry, said that’s a persistent problem as the industry struggles with concepts like machine learning and artificial intelligence.

On top of that, he warned, the industry needs to get rid of some unhelpful data habits of its own.

AlRegib is a professor in the School of Electrical and Computer Engineering at Georgia Institute of Technology in Atlanta, Ga. He’s also head of Georgia Tech’s Omni Lab for Intelligent Visual Engineering and Science.

He will be a featured speaker in the special session “Intelligent Prospecting: Unlocking Hydrocarbon Potential Using AI and Big Data” at GEO 2020, the 14th Middle East Geosciences Conference and Exhibition in Bahrain this month.

The industry’s computing efforts in the Middle East reflect its challenges in other parts of the world, especially in regard to machine learning, according to AlRegib.

A Third Path

Computers can learn to generate outputs from data through supervised learning or unsupervised learning. In supervised learning, the simpler method, the machine processes labeled training data and proceeds according to specified examples.

Unsupervised leaning does not rely on pre-existing labels or directed outputs. A common example is cluster analysis, a statistical method used for discovering patterns in data.

“What’s happening in the industry right now is fully supervised learning, which I think is not the way to go,” AlRegib said.

One problem stems from supervised learning’s reliance on good and consistent data labeling.

“First, you need access to experts on whom you can rely to label the data. And we’ve tried this,” AlRegib explained.

“The second thing is, and I think this is true in geophysics as much as medicine, if you get two of these experts in the same room, they will disagree,” he said.

For effective learning, computers need accurate labeling and a highly objective view of the world. But human beings – even experts – are subjective in interpretation. Experts make mistakes, conflict with one another and interpret according to their own subjective views.

AlRegib said there’s a third path in machine learning: semi-supervised or weakly-supervised learning. That hybrid approach usually incorporates a small amount of labeled data with a larger amount of data available for statistical analysis.

“I think it’s extremely important for this industry to look into weakly supervised learning,” he said.

Fortunately, AlRegib observed, in recent years unsupervised machine learning has moved well beyond basic approaches like k-means clustering.

“We’ve come a long way since then,” he said.

Integrating a New Paradigm

That doesn’t overlook just how new machine learning concepts are to the oil industry. AlRegib recalled the topic’s introduction at a European Association of Geoscientists and Engineers meeting a little less than three years ago.

“I can remember the first time we had a special session on machine learning at EAGE in Paris in 2017,” he said.

AlRegib himself has studied geophysical interpretation coupled with machine learning and AI since 2012, when he was named director of Georgia Tech’s Center for Energy and Geo Processing.

“We realized basically that there has been quite a bit of development in visualization over the past 20 years,” he said.

What struck him about visual seismic interpretation is how much the results depend on the individual interpreter.

“Talking to many experts, mainly in Houston, it’s very subjective,” he noted. “It’s a very fascinating combination of Big Data and human visualization.”

Studies found that seismic-interpretation results varied even according to the seniority or executive level of the interpreters, he said.

Other areas of current interest in computing for the oil industry are benchmarking, cloud computing and better data habits, according to AlRegib.

“There is no real benchmarking to let the community know, in terms of the type of data or this dataset, what works best,” he observed.

“Cloud computing is a biggie. I spoke with colleagues who worked at Schlumberger back in the 1970s, the ‘80s, and they had a form of cloud computing back then,” AlRegib said.

Early efforts tended to incorporate synthetic datasets with “nothing like real data, nothing like field data,” he said. Today, cloud computing offers meaningful advantages in storing, moving and manipulating the industry’s huge, captured datasets.

“At the end of the day, the data has to be stored somewhere. I think there is enough room, enough ways, to do migration much cheaper on the cloud,” AlRegib said.

In putting together those huge collections of information, the oil industry seems to have fallen in love with its stores of data, even when they include older legacy data or data accumulated from less-than-reliable sources. That kind of romance can be problematic, he noted.

At this point, “we need to let go of some of the data,” he said.

Over the years, the oil industry developed some unhelpful computing beliefs, such as “sharing data is taboo, benchmarking is not something we do,” AlRegib observed. The industry recently has started to adopt more flexible thinking, he said.

“For the past year and a half we’ve been having a discussion that’s very fruitful. There’s much less resistance than there was five years ago,” he noted.

Geoscientists still have an important role in fostering machine learning and AI for oil and gas prospecting, AlRegib said. He rejects what he called technological “hype” promoting the idea “that machine learning, that AI, can solve any problem. That’s wishful thinking.”

“We cannot rely on the computer to do all the work. That’s a big mistake. The experts can insert their expertise from the physics, from the models, into the learning,” he said.

Because he studies many types of computing and visualization, AlRegib is able to make comparisons of practices among several industries.

“I always draw analogies between medical imaging in health care and geophysics. Both of them came to machine learning a little later than consumer electronics and the like,” he noted.

Both have struggled with the idea of sharing data, and “they have both been in closed boxes with their own experts,” he said. A next step will be opening those closed-off boxes to additional expertise.

“The next phase will be for us training people to work on geophysics, but these people may have a background in machine learning, or data analytics or AI,” instead of geoscience, he predicted.

Newfound Optimism

In the GEO 2020 session, AlRegib will be joined by speakers Malleswar Yenugu, director of data analytics for IHS Markit, and Gabriel Guerra, general manager for Shell.

The conference, coordinated by AAPG, EAGE and the Society of Exploration Geophysicists, will include numerous presentations on computing applications for the oil industry, addressing topics from neural networks to fuzzy logic. That points up the importance of computing and analytics in the industry today.

“I want to talk about benchmarking. I want to talk about the paradigms of machine learning. And then, how all of this works together,” AlRegib said.

The oil industry’s recent shift to a more current and more flexible mentality toward computing applications has given him new hope, though he said the industry still has a long way to go.

“I’m more optimistic now than I would have been if you’d asked me two years ago,” he said.

The GEO 2020 conference and exhibition, previously scheduled for March 16-19 in Bahrain, has been postponed until Sept. 14-17, 2020. Although, having spoken to the authorities, the organizers are satisfied that the COVID 19 (Coronavirus) outbreak was intercepted and under very competent control, they are also mindful that many within our industry are currently facing difficulties traveling around the world.

You may also be interested in ...