Episode Transcript
[00:00:00] Speaker A: If you've ever wondered what's actually hiding in your shared drives, Trailblazer Insight scans your files locally for PI, hipaa, PCI and other compliance risks. No cloud, no IT ticket, just answers. Search Trailblazer Insight in the Microsoft Store.
[00:00:20] Speaker B: Hello, thank you for joining us. Every organization hides a story and its data contracts. Nobody reads policies, nobody follows, files nobody can find.
This Is what Counts, the podcast that digs into the governance problems people inherit, ignore and discover too late. I'm Lee and in each episode, my co host Maura Dunn and I bring to you the hard earned lessons from the room where the real work happens. In this episode, Maura and I are going to do what we promised last time, which is we're going to walk you through what it actually looks like to reconcile counterparty data at scale, not theory, the real thing.
Maura we left off last time, last time with a teaser saying that. Tune in and you'll see how to reconcile these 300,000 counterparty records.
[00:01:11] Speaker A: So have at was in fact 350,000 even bigger. So start with that.
And it was for sure a real thing. It took, it took a lot of work over over two years, but sort of the first couple of the first 18 months or so was kind of unproductive. It was a lot of work, but it was unproductive. And let me explain that a little bit more.
[00:01:41] Speaker B: So not any fault of ours, it
[00:01:46] Speaker A: wasn't, but, but it wasn't anybody's fault. It was just the nature of this challenge.
So you start with you have a company that's grown quickly and through acquisition. That's a key key factor out of the, through the course of that quick and quick growth through acquisition. They ended up with four active contract management solutions that were each holding some part of the contracts in the company.
Then you have, you had a fifth source, which is the ERP that was holding purchase orders and the counterparties connected to those purchase orders and those purchase orders didn't appear in any of the contract management solutions.
Then you have two other sources which were trading systems.
The trading systems contained some data about some contracts and some counterparties in order to support the trading transactions.
So seven sources is the first thing. Seven active systems currently in use doing something related to counterparties.
So that's, that's complicating factor number one, because they weren't even if you just look at those original four systems that we talked about that came in through acquisition, there was not a clear line between the four of them. There were some lines, but there was also some overlap Second, the ERP contained a record for every counterparty that had some payment relationship with the company. And that could be the company is paying them as a supplier, or they were, the company was being paid by a counterparty as a customer.
And whether you were, you could be both a counterparty, could be both a supplier and a customer, in which case it had two records in the ERP.
Because that is how ERPs work. They have a customer master file, they have a supplier master file.
Same party, same entity, two different identifiers. Because in the world of ERPs and transactional systems, they were parts of different transactions.
You also could have, for those, that one entity that was both a customer and a supplier, multiple contracts across the four systems and the two trading systems. So that one legal entity might appear in all seven places.
And it might appear more than once in any one of the seven. And that's about the second complicating factor, which is time.
So these seven things came about separately and then came together the seven sources. But each of them lived a life, those seven sources.
And over time, standards changed, parties changed, they were purchased by each other, contracts were assumed, or contracts were sold, land was assumed, land was sold, land was inherited and divided.
So the counterparties related to any land agreements changed like that internally. The organization also had its own legal entities that kept combining and splitting and renaming and changing. And those legal entities could also be on either side of some transactions.
There could also be more than one party on either side of any of the transactions, any of the contracts. So you had just a lot of relationships between the entities that are represented by these records in all of the seven sources.
I think that's the second key thing to think about is that the entities, the real life entities in the real world and their history, changing hands, changing names, changing tax IDs, changing addresses, having multiple addresses for different reasons.
That happens all the time. And it's, and it seems easy until you get into it. Just think about us, right? We're a small business. We've been in business for 13 years. We've had four addresses over our time, I think, not including our agents addresses like the registered agents in different states, but just four addresses that we've used as our headquarters address in that time for different reasons.
One of which was the very first one was my house when we started this.
I don't own that house anymore. I haven't owned that house in eight years.
Can we get rid of that address in all of our documents? No. It's still out there in some places, right?
No. Matter how hard we try, we can't get rid of that. So. And every now and then, it pops back up and somebody sends us something that references that address. They're actually probably sending it electronically. So that's good, but. So we get it. But it still talks about that address that we haven't been in for eight years.
And we have been actively working it. And we are one small company trying to keep these things straight. So think about a Fortune 500 company, a Fortune 1000 company, a company with thousands of contracts that's been in business for 15, 20, 30 or more years, trying to keep track of all that so that that lifetime, those changes, every single counterparty in your world of contract management has that history.
You may not need to know all of it, but you need to know some of it.
So how do we do this? How did. How did we do this?
First, let's talk about that. 18 months of unproductive but really hard work.
We started by saying, okay, we feel pretty confident that the supplier and customer data that's in the ERP is correct because we're actively paying or being paid through those records. Right?
Okay, great.
But turns out there were something like, I don't know, 9,000 suppliers out of our 350,000 records.
So they were good, but they really didn't represent the bulk of our problem.
But we started there and we said, all right, what is an identifier that we can look across all the sources and.
And count on. Count on it being right, as in count on. If we find this identifier, it means it is this entity. And we looked at an identifier that was not assigned by the company and that was not assigned by any of the predecessor companies that had been purchased and acquired and merged into the system into the current company.
We looked at tax IDs because. Because tax IDs are assigned by the IRS or.
And assigned by a state. So we started with federal tax IDs and we matched across the records. Well, we tried the 350,000 records coming out of seven different sources.
That's a lot of matching. And it kept breaking all the things we tried to do it in. So we couldn't do 350,000 records from seven sources. Matching tax IDs in a spreadsheet just couldn't do it. Spreadsheet, couldn't process it. So then you're matching one source against another source and trying to find things, and you start looking at names.
Well, each of those systems had different rules about whether they used all caps or just initial Caps, whether they used punctuation or not.
Couple of the systems were old enough that they had character limits that were pretty restrictive on their main fields. And so then they added these sort of flexible fields that were called overflow. So you had an entity name in the name field that got cut off.
I've forgotten the numbers, but let's say 25 characters in. I think it was longer than that. And then you had an overflow field that just picked right up where the one that got cut off went anded. So then you actually had two fields that you had to compare to one or two fields in the other multiple systems.
You know what? There were actually eight systems. Sorry, there were three trading systems. I forgot one.
So.
So that didn't work. But we. But we tried. We tried so hard. And we were asking people, because what we knew was that none of these sources was 100% accurate.
We knew that they were accurate for limited purpose. The ERP system was accurate for getting payments out and getting payments in. The trading systems were accurate for conducting, you know, completing a transaction.
A deal could be done, and the deals were happening every day.
To complicate matters further, there were a couple of upgrades going on in all of these systems, all the source systems, just for fun, but we were keeping track. And there were migrations of counterparties happening with the trading systems as they were being upgraded. And the teams involved were doing manual visual checks, counterparty by counterparty, as things got migrated from one system to another.
That was a painful, painful process that was happening at the same time as this. 18 months of really, we're trying so hard to get to what are our actual counterparty records out of all these multiple duplicate records or overlapping records, the one by one, checking the system by system, just comparing system one to system two to system three to system four, one at a time. And it was just spinning wheels. And every time that we would get a new spreadsheet, we would all sit there and say, how does this spreadsheet relate to the last spreadsheet? Where did this come from?
It would involve a conversation because, in fact, it was different. It was different because they tried a different way to match. And it was different because it had been three months and everything had changed. There were new contracts, there were new counterparties in there.
All right, let me skip to the end, because then we started thinking about, okay, the tax ID is not enough to do the matching because it's just too many records to try and resolve to one variable.
And not all of the systems had a tax id Just. Just to be clear, some of them had multiple, but not all of them had one that we could count on.
So then we thought about, all right, each of these systems, the four main ones, the three trading systems especially, they only dealt with a certain kind of contract.
It wasn't 100% accurate, but it was close.
That system four was land and right of way in Oklahoma system. System three was landing right away in Texas, roughly speaking as an example.
And so there were differences in the contract numbers, the way the contracts numbers were structured from each of the source systems and then they related to the trading systems. And those also had different counterparty IDs and different contract information they that traced back to one of the four primary sources.
So now we were matching not just on tax ID but also on contract type, using the contract number variations to say okay, big bucket, this is contract type category one versus category two. So not a granular contract type, just a big bucket, big bucket. These counterparties all have contracts that relate to land and right of way in Oklahoma. These counterparties all have contracts that relate to some kind of a commercial activity in everywhere but Oklahoma, for instance.
And so we started narrowing it down and we had to go, as I said, Excel couldn't handle it. We tried using Microsoft Access, kept hanging up.
We finally ended up with a couple of different big systems. Snowflake, elasticsearch and a few other things that could do that could handle the multi variable checking today. This was a few years ago today and that took six months going through and matching up contract numbers to counterparty IDs to tax IDs, to names, to supplier IDs coming out of the ERP, et cetera, et cetera. Doing that matching in an iterative way over and over again with the, with Snowflake and the elasticsearch that that allowed for some inferences is really what that helped us with today.
You I haven't tried it, but I believe that there are that the use of an AI tool could accelerate this matching. You could feed the same parameters, the same parameters that we were coding into Snowflake and they were.
And the parameters were about how do counterparties relate to contracts when the contract has these characteristics, which is it came from this source and it had this type of a structure for the contract number and the counterparty had these characteristics where it was associated with a contract that came from a source that had a certain contract number structure and had a tax ID or something else.
We were narrowing down and we were writing these rules and telling the system to do the comparisons based on the Rules. And there was precedence to the rules of what did they look at first and then what did they do to confirm that conclusion and then what did we do next?
I do believe that AI would make that process much faster.
You would still go through the iterations. You would still have to determine those rules which are based on your particular set of systems and the contracts and counterparties that are represented in the transactions in those systems to be able to tell an AI tool how to do the matching.
Ultimately, we ended up from the 350,000 records with 35,000 actual entities. So across the 7, 8, 8 systems, it was a 10X, a duplicate or partial duplicate. Somehow other related, conflicting records that represented actual counterparties with actual contractual relationships with this company, whether paid or not, whether on the customer side or the supplier side or some other legal relationship through the contracts.
So what did you get out of all that?
[00:18:15] Speaker B: Well, one was an excellent story.
Two, it just goes to the.
The experience that we've had in all of our situations.
So this is a great case study. That was the other thing that I was thinking.
And three, that, yeah, I think AI could probably help in some of those instances, as long as you get the prompt right.
[00:18:40] Speaker A: Right. And I think that the prompt is where you would put your knowledge of your collection. Right.
Because what ended up serving us here was not pure data just looking to match tax IDs, because that wasn't enough.
The data wasn't clean enough. The data wasn't consistent enough for that to be the answer, even though that was our theory going in, was every counterparty, every company should have a tax id. Even if you're a person, you have a tax id, it's your Social Security number. And that we should have been able to use that as a unique identifier, and we couldn't.
So it was the knowledge of how did the systems, the contracts and the counterparties all relate to each other that ultimately got us to the answer. And I think if you're going to use an AI or if you're going to use another system where you're doing kind of the manual coding to match all those things up, the first thing is you have to understand how your contracts and counterparties are reflected in your systems.
[00:19:47] Speaker B: Makes sense.
[00:19:49] Speaker A: So I want to say it is. It was a satisfying project when we finally got through it, but. But it was, but it was not easy.
I think one of the things that we left off with last time was having done all that work, how do you hope, how do you set things up so it doesn't happen again.
Wasn't that one of the questions?
[00:20:13] Speaker B: We have no time to go into that today.
[00:20:15] Speaker A: All right, we'll talk about that next time then.
[00:20:17] Speaker B: There you go.
If you have any questions, stay tuned. Please send us an email at inforailblazer.us.com or look us up on the web at www.trailblazer.us.com or check us out at the Trailblazer Learning Academy at trailblazerlearningacademy.com thank you for listening. Please tune into our next episode. If you like this episode, please be a champion and share it with people in your social media network or like or subscribe to this podcast. As always, we appreciate you, the listeners. Special thanks goes to. Jason Blake created our music.
[00:20:53] Speaker A: Thanks everyone.
Come back and hear about how to stop this problem from happening again.