Podcast: How (agentic) AI can help with unstructured data

In this podcast, we talk to Boris Bialek, vice-president and field chief technology officer (CTO) at MongoDB, about how artificial intelligence (AI) can help with discovery and management of unstructured data.
Bialek sets out how AI can help bring together different classes of information that an organisation might hold about customers to make processes much quicker and more efficient.
He also talks about how multiple AI agents can operate together to make these processes work in an agentic fashion.
How can AI help with discovery and management of unstructured data?
The recovery and identification of unstructured data is one of the oldest tasks in IT.
It started with scanning papers and trying to make pictures out of them, and then people actually typed the stuff out. Imagine you get a handwritten document about an accident description and you try to make sense of it. Today, AI can do that for you in zero time.
And beyond that, it can understand and reason about the data. It can lift the intellectual level from “I have a picture” to “I have a text and I can extract sentences which consist of ‘accident’, ‘bicycle’, ‘street’ and ‘the mountain was steeper than I thought’”.
So, this is where AI really can help. It can be pictures, it can be text, it can be sound.
The classic database model, the RDBMS from the 1970s, is great for structured data. But this so-called structured data means mostly textual data, which can be numbers, but anything which is in a structure which we can put in a spreadsheet. Anything else is considered unstructured, which is a little bit unfair.
What we’re doing now with AI is lifting this data to the next level and being able to interpret it in a sensible way.
What approaches in the use of AI to discover and manage unstructured data exist for customers?
If you ask any startup, they will tell you they’re the only answer for that one.
But when we take a more intelligent view, there are two major ways. One is to look at what kind of data you have and build a solution around it. And most important is the combination of fresh data, where I get unstructured data – video, sound, things like that – and put it into context with other known information.
For example, Boris has an insurance number, and Boris has a contract with Antony’s insurance company. So, those kinds of mashups between, for example, operational data, metadata and reference data, together with what we call “signals”, is the first approach to bring these things together.
But the other option is how do we do this more intelligently and break it up into a horses-for-courses approach, the best horse for the best racetrack?
There are solutions here. One is EncoreCloudAI, or PurpleFabricAI from a different vendor.
Those solutions allow us to put the data into an intelligent form, so I don’t need to start from scratch. So, I can get my data, bring it into an operational data store, get my legacy data out, and lift data from there, which could be, for example, documents, physical papers. These could be in legacy document archives or document management systems.
That, in my opinion, is the fastest way to do it.
That said, there are enough good reasons to build your own. In many cases, if you have specific needs, such as if you have specific video information that you need to process in a very specific form. For example, somebody driving through a toll gate on a highway and you want to make sure they pay the toll.
There are specific cases where writing your own code makes a lot of sense. But it’s all about getting the data together from existing data and the new data, the unstructured data.
That’s really what makes intelligence work.
What are the key benefits of applying these types of techniques to the data?
The key benefits are that I can build a completely different picture of my environment. In the classical relational database, such as with an ERP [enterprise resource planning] system, which knows your sales numbers, you know how much you sell.
You might have a CRM [customer relationship management] system and it tells you, “Boris is a great client” and “Boris is on my website right now”. But what does Boris really want? I could do the classical approach of a BI [business intelligence] system and say, “Boris falls into the category of white male, middle-aged person, and maybe he is looking for a new bicycle. Let’s offer him a bicycle.”
But that’s not really what you could potentially know about Boris. Boris may have bought a bicycle from you last week and is maybe now looking for a new helmet.
So, when you bring these things together, you want to drive more intelligence towards your consumers in the retail space. In the positive sense, in that you want to be relevant, and you want to help them. You don’t want them to say, “Why is he showing me this stuff? I’m not interested in this.”
Also, let’s say we have an insurance case, somebody bumped my bicycle, it was parked in front of the house, and now I have a repair case. So, I go to my insurance. If the insurance is able to make sense out of the information I provide very quickly, they can have a very quick turnaround in claims management.
And if they do that, it helps me to be a happy client and not be concerned that my bicycle was damaged, who pays for it, etc. Now I get an answer an hour later: “Yes, the bicycle is insured. We will fix this, don’t worry.”
So, these are the reasoning parts which were not possible before. You could not put so much data into context.
Secondly, there is natural language processing. Boris can talk to the insurance company and say, “My bike got damaged. My bike was parked in front of the door. It got hit by a tractor.”
At that point, the system can already interpret that as “bike, bicycle – he has a bicycle, it’s insured, he’s probably talking about his house door”.
That’s reasoning, so it can assume a lot of stuff and say, “Hey Boris, is this the bicycle you’re talking about? Was it parked in front of your house in this village? You are insured. Can you tell me a little bit more about it?”
This is all about intellectual connectivity and not necessarily about breaking the process. I can always ask to talk to an agent, but this is much faster for me, and there are no wait times. I can get my problem resolved and move on.
So, this automation of routine tasks, tagging things, inputting things, all of those things can be done very nicely by an AI system. And most importantly, it is repeatable, doing the same thing with the same system again.
I know there’s a lot of discussion about hallucinations, but today’s embedding models, such as VoyageAI, are so good now in terms of their quality and re-ranking systems, which allow answers to be structured as good, bad and ugly based on my data.
Is there a role for agentic AI in this and how that would work?
Agentic AI is like a player on a soccer field, but to have a really good team, you need 11 players. It’s like the different positions in a football team – agents perform really specific functions.
If we look into the insurance case, one agent checks out what contracts Boris has, one system can figure out Boris’s address and where might this tractor have come from? Is this a realistic description of events?
Different agents collaborate as digital experts to create a framework, a soccer team of agents that come together to drive the experience for me as a consumer, as well as for the insurance company.
That system can come up with very good answers to very basic questions and bring them all together, and drive a resolution. So, this is where agentic AI really shines.




