5 big challenges around adopting AI and data analytics

6 Oct 2022

Image: © Robert Herhold/Stock.adobe.com

Data analytics can transform how a business operates, but only if used properly. Red Hat’s Steven Huels dives into some of the challenges around adopting analytics.

While there has been a massive uptick in machine learning operations (MLOps) to maximise the value of AI and data analytics models, there are still many struggles that companies face when it comes to using this technology.

Red Hat’s Steven Huels is a leader in the data and AI space with more than 20 years of experience in the industry.

“My first job out of university was as a data warehouse consultant. At the time, organisations were focused on centralising all their data into data warehouses so analysts could easily access data and work with it to generate insights. From there I began getting involved in how to generate value out of data,” he told SiliconRepublic.com

During his career, he said recommender systems for websites and content recommendations started to take off, which involved a major step-change in the frequency and quality of analytics models.

“This change saw teams have to move large volumes of data at rest to train models, which brought about many technical challenges I was involved in helping to solve.”

In recent years, Huels has been heavily involved in solving problems around deploying data analysis into production. He has also been involved in developing and maintaining OpenDataHub, an open-source meta-project that provides the infrastructure to run AI and data analysis as a service.

“Today, our team is now taking OpenDataHub to market as a commercial offering called Red Hat OpenShift Data Science.”

Huels outlined five main challenges that he sees companies face when it comes to deploying AI and data analytics.

Repeatability

The issue of repeatability and reproducibility is one of the biggest challenges for AI and data analytics in a production environment, according to Huels.

“You don’t have to just safeguard against statistical problems like concept drift, which could hinder repeatable results, but also accommodate changes in the infrastructure running the model in the first place,” he said.

“One of the reasons we developed OpenDataHub is to reduce the complexity and infrastructure work involved in putting a model into production, monitoring it, recycling it, and then running it over time. By reducing the work involved in setting up a reliable architecture and infrastructure for AI and data analysis, we can free up time for data scientists to focus on refining their models and interpreting their results.”

Speed of implementation

Being able to rapidly deploy new features and fixes can also be challenging. However, Huels said the open-source system Kubernetes can provide the means to speed this up.

“Kubernetes and cloud service architectures give teams more freedom to ramp up and down the resources allocated to an application in line with demand. Say you’re facing peak seasonal demand, such as a retailer’s website during the Christmas shopping season. Cloud and Kubernetes allow you to allocate computing and storage power in an afternoon, whereas on-prem capacity requires a hefty capital investment and weeks (or months) of set-up and integration,” he said.

Another example of this demand is in capital markets, which are currently seeing massive and frequent swings in market conditions. “This demands the frequent retraining of models in a matter of minutes to meet demand, which simply can’t be achieved without the scaling agility provided by cloud services and Kubernetes.”

Using the right data

A common misconception companies can have about gleaning insights from analytics is to use as much data as possible.

“One of the big pitfalls I’ve seen from organisations is an attitude of ‘If we give a model enough data, it will give us an answer,’” said Huels. “That’s sadly not how it works.”

Instead, he advised starting simple with something that has a specific use case. “AI and data analysis requires specific, targeted questions to produce useful insights and outputs,” he said.

“A productive model requires a well-defined question, which in turn will require careful consideration as to what data you train a model with and what its production environment will look like.”

Deploying models at scale

While organisations have learned how to build models for production, Huels said how they deploy these models at scale is another big challenge confronting teams working in AI and data analysis.

“To respond to this, we’ve seen the rise of MLOps as a discipline in its own right, which helps tackle issues like explainability, monitoring and continuous deployment for models at scale.”

Managing the volume

Finally, even with focusing on specific data, there is still a large volume of data needed to power AI and analysis models in the first place.

This leads to the challenge of how that amount of data is handled. “This is particularly pressing as you ramp up the size of AI models, given that most public cloud providers place substantial charges on customers for data ingress, egress and API calls,” said Huels.

“Optimising your data query and aggregation strategy, as well as your data flows and data governance, can have a significant impact on the value you are able to gain from your AI and machine learning efforts.”

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.

Jenny Darmody is the editor of Silicon Republic

editorial@siliconrepublic.com