A man wearing a grey T-shirt smiles at the camera with his arms folded in an office setting. He is Robert McCullough, a senior solutions architect at Liberty IT.
Robert McCullough. Image: Liberty IT

From limitations to hallucinations: The hardest parts of working in AI

24 Oct 2023

Liberty IT’s Robert McCullough discusses working with generative AI and the difficulties of assessing potential versus hype in the space.

Click to read more stories from Deep Tech Week.

When asked about what sets generative AI apart from other emerging technologies, Robert McCullough points to the accessibility of the tech.

“As programmers, we write code as instructions to create outcomes. In generative AI, anyone who can ask a question can get up and get going with it.

“I was chatting to a colleague the other day whose eight-year-old was using generative AI to create all manner of things and that’s really calling home the intuitiveness of it.”

However, McCullough adds that this is a “double-edged sword” as, just like in real life, “how you ask a question can influence the outcome greatly”.

McCullough is a senior solutions architect at Liberty IT, where he has worked on a variety of projects for the last 15 years.

‘Sometimes we don’t hear about the problems from our customers unless they think they are solvable’

If there is such a thing, can you describe a typical day in the job?

I work in the Incubator, which is a space that’s primary purpose is to explore and exploit emerging technologies to deliver business value, so there’s a lot of working in spaces where there’s a huge amount of unknowns. I work with several engineering teams and into a larger architecture team. It helps to be comfortable being uncomfortable in the space given the nature of the work.

After dropping off the kids, I spend a few minutes glancing at my inbox and Teams messages and prioritising anything that needs immediate attention and setting reminders or follow-ups for the less urgent ones. As generative AI is the current focus, I spend about 30 minutes reading what has changed overnight, as the pace of change is so rapid it seems like every day there is some announcement or advancement in the field.

As an architecture team, we have a daily stand-up meeting which is always really interesting, as there’s a diversity of work and challenges and I find it refreshing to talk about other problems or get insights from others into mine.

I’m a firm believer that architects shouldn’t live in some kind of ivory tower making edicts, but should be couched in reality and involve a lot of time with the people actually implementing and/or being impacted by the changes, so I would check in with some members of the teams and see if I can help them at all. I will say that often I end up learning as much, if not more, from the team than they do from me! Currently we have a small but talented team working on some generative AI-related projects with scaling to meet demand from users across the globe.

The afternoon for me varies, as it’s usually dictated by what the current challenge with generative AI that impacts users or our use cases is, so it really could be anything. As Liberty IT is a subsidiary of Liberty Mutual, this is also when our collaboration time with our colleagues in the US happens, as making the most of the intersections of time zones for video calls is crucial.

What are the hardest parts of working with generative AI technology, and how do you navigate them?

This is really multifaceted.

A new generative AI model or solution is released on a daily basis, so it’s challenging to keep up with the pace of change. For this, it really comes down to being able to quickly spot potential in something new without spending weeks or months exploring it. Prioritisation is key.

Building a governance process that’s robust and ensures that we use this technology appropriately without being overly burdensome and slow. Our responsible AI process has gone through a number of changes since its inception and we’ve found that having dialogue between everyone impacted really helps ensure that the process works for everyone.

Limitations where models are available regionally can make it challenging to scale. We’ve found great success in working closely with cloud partners to help validate our approaches as well as having a healthy dialogue and environment where we can get questions answered by the appropriate experts in the space.

Cross-functional requirements such as scalability, cost and latency are things that really come into play here. GPT-4 has a lot of buzz for instance, but it’s also 18 times more expensive than its GPT-35 turbo counterpart, so it comes down to understanding use-case-specific performance versus value.

Content filters and guard rails. These are still somewhat emerging, but it seems each cloud vendor has their own implementation of filters that will run on prompts and responses and will block based on rules that are generally configurable. However, they all work slightly differently so we’re currently working on research and strategies of how to deal with the impact of content filters

Hallucinations. Generative AI has had some well-publicised snafus out there (such as Google Bard). Guardrails may go a bit towards solving this, but the technology can be very convincing even when its wrong. Solving this starts with education, but techniques in prompt engineering (how the questions are crafted) and future automation will likely lead to safer outcomes going forward.

‘We need to assess AI value and potential versus hype’

What skills and tools are you using to communicate daily with your colleagues?

In general, I think really putting yourselves in the shoes of others and not treating everyone as if they were yourself or some archetype. I tend to chat to people and build up an idea of what works for them or what their situation is and communicate as it works for them. There are people who prefer in-person communication, while others are happy with a video call. I’m also keen for people to not be intimidated by grades or job titles if they have questions or opinions, so I’d tend to solicit opinions when we’re making big decisions.

Visuals go a long way and sometimes a complex concept can be easily communicated in a visual rather than a load of prose in an email. I also find it helpful to do a collaborative mindmap when starting something.

With the speed of evolution in the area of generative AI, how has the nature of your work changed and how have you adapted?

When you’ve been around the emerging technology space you realise that in technology, sometimes we don’t hear the problems from our customers unless they think they are solvable. So it’s been really eye-opening to see some things raised which were thought of as unsolvable which could have been solved by existing technologies. After being involved in blockchain in the past, I’ve seen something that works being overhyped and oversold and applied to scenarios where it doesn’t really make sense, so I take those lessons into generative AI.

As a consumer of the technology, I would use it for things like getting starter code for a particular problem and for things like wordsmithing text. I’ve had mixed results in both of these with some hallucinations, so not blindly trusting anything produced and giving it a critical review after is really essential. It has really been impressed upon me the need to craft prompts a specific way to ensure the answer is great.

Essentially, change is inevitable and we need to build assessments that allow us to assess value and potential versus hype rapidly and be prepared to pivot to take advantage of a breakthrough.

What advice would you give to someone who wants to work in generative AI?

Read the docs multiple times and ask the experts! I’m seeing vendor docs change multiple times a day at points.

Read up on prompt engineering and the art of asking a succinct prompt.

Be aware that this is still experimental. Leading models today can be overtaken in the future, so thinking about architecture that would allow different models to be replaced is a prudent step.

Models have versions and performance can degrade over time (just like any model) so really take time to think of how you could ensure that adequate monitoring is put in place.

Don’t omit due diligence on cross-functional requirements. Don’t assume it will scale to every use-case scaling need, latency and cost demand etc.

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.

Loading now, one moment please! Loading