How AI can benefit the development of cloud platforms

22 Feb 2024

Owen O'Brien. Image: Huawei

Huawei’s Owen O’Brien discusses the various advantages and challenges of implementing AI in cloud-native environments, from mitigating human error to the difficulties of blending tech.

In the past couple of weeks, Microsoft announced major investments into the AI and cloud infrastructure of two European countries – Germany and Spain – both worth billions.

With the announcement of these investments, as well as recent tech predictions for 2024, it’s clear that the focus on AI and cloud technology is not ceasing any time soon.

Many sectors are expected to utilise these disruptive technologies in the near future, such as the telecoms industry. Speaking to SiliconRepublic.com last week, IP Telecom CEO and co-founder Shena Brien predicted that AI will “augment” what telecoms companies can provide for customers.

To find out more about the ways that AI and cloud tech are being integrated into telecoms, we spoke to Owen O’Brien, chief cloud architect with the Smart Networks Innovation Lab at the Huawei Ireland Research Centre.

Watch…

The Smart Networks Innovation Lab aims to solve the complex challenges of achieving autonomous driven networks (ADN). O’Brien’s team is doing its part to aid this objective by working to deliver an autonomous cloud platform  that is fully autonomous and can detect faults, self-heal and self-optimise to ensure high availability and reliability.

According to O’Brien, the autonomous cloud platform is “critical to achieving highly reliable and available ADN solutions”. His team’s work involves leveraging AI capabilities such as fault detection and prediction, root cause analysis, intelligent tracing, and self-healing.

When it comes to the specific benefits that AI and large language models (LLM) offer to cloud-native environments, O’Brien lists a few examples, including AI’s helpful management of “overwhelming” information in relation to observability (the ability to measure a system’s condition through its data output).

“AI algorithms can be used for fault detection and prediction, and correlation of multiple data sources can help to avoid false alarms,” he says. “This can further help the development of autonomous capabilities by triggering remediation actions on detection of issues.

“Couple this with LLMs and you now have the ability to interact with systems through natural language by expressing intentions or objectives which will be carried out in a fully automated capacity.”

O’Brien says that AI and LLMs are becoming an integral part of observability, as AI can provide “capabilities to enrich and enhance observability by bringing the most important operational and business KPIs into focus”, with LLMs adding to these capabilities by “enabling operators to interact using natural language to query and analyse data”.

But while there are many advantages to the integration of AI in cloud observability, O’Brien also points out some of the challenges of implementing the tech, such as blending AI with other technologies like extended Berkley Packet Filter (eBPF), a non-intrusive technology that can benefit observability.

“eBPF is a huge enabler for observability but due to the constraints such as minimal instruction sets and restrictions imposed by the verifier, it makes it extremely difficult to consider applying intelligent capabilities or techniques at the point of data acquisition,” says O’Brien.

“It’s not something that has been solved yet, but there is positive progress in academic research where the implementation of MLP [multilayer perceptron] algorithms are being developed to perform intelligent analysis within eBPF programs to achieve analysis and decision-making at the data source.”

…and learn

Aside from its observability capabilities, O’Brien also points out other ways AI can benefit the cloud-native environment, such as the mitigation and prevention of service failures.

“[AI and LLMs] can enable the ability for services and the platforms they are deployed on to reach full autonomy – that is the ability to detect, predict, mitigate, root cause and repair issues, thereby maintaining service level agreements,” he says.

“In addition, with the ability to constantly build and reinforce knowledge, continuous service improvement can be achieved. This can lead to real competitive advantages by being able to offer better reliability and availability to customers.”

As well as this, O’Brien says that AI can solve the issues of human error (which he says is the root cause in the majority of major service outages) due to its ability to “allow objectives or intents to be specified”, such as optimisation for energy efficiency, cost management, resource utilisation or “specific customer scenarios which cannot be expressed easily as a set of rules when the parameters are dynamic”.

Despite the many possibilities and advantages offered by AI to cloud-native tech, O’Brien says there is a challenge. “If you need an on-premise solution, as is the case with most telco solutions, there is a catch; the footprint and cost required to run LLMs makes it almost prohibitive to use in the context of system operations – it conflicts with the normal operational requirements of low resource consumption and minimal impact on products or services.”

Still, O’Brien looks towards the future. “The next advancement will be a collaborative, hierarchy of autonomous agents or specialised compact models at various levels, interacting machine natively and at the top of the hierarchy is the human-machine interaction through natural language.”

Find out how emerging tech trends are transforming tomorrow with our new podcast, Future Human: The Series. Listen now on Spotify, on Apple or wherever you get your podcasts.

Colin Ryan is a copywriter/copyeditor at Silicon Republic

editorial@siliconrepublic.com