October 07, 2024

00:12:16

Raw Data vs Manipulated Data - E92

Raw Data vs Manipulated Data - E92
What Counts?
Raw Data vs Manipulated Data - E92

Oct 07 2024 | 00:12:16

/

Show Notes

2024 Episode 92 – What does it mean to give your employees access to raw data instead of manipulated data in the format they are used to? Which is greater, the potential for better reporting and results or disaster and confusion? Join Information Governance Consultants, Maura Dunn and Lee Karas, as they explain the difference between raw data and manipulated data in the context of large scale reporting requirements. Each episode contains important information gained through our experience working with companies across various industries and we talk about how you can apply this experience to your company.
View Full Transcript

Episode Transcript

[00:00:01] Speaker A: Hello. Thank you for joining us. This is what counts. A podcast created by Trailblazer Consulting. Here we highlight proven solutions developed through our experience working with companies across various industries. And we talk about how you can apply these solutions to your company. We share our experience solving information management challenges, like creating and implementing a records retention schedule, creating an asset data hierarchy, or helping with email management. This is Lee. In this episode, Moore and I will talk about what it's like to have raw data available to you instead of manipulated and handed to you in a format that you're used to. Mora, this is an interesting topic. Some people will think having self serve data, raw data, a plus. And some will frown upon it. Let's dive into this a little deeper. [00:00:49] Speaker B: Good morning. Yes, it's funny, when you said raw data versus manipulated data, I was like, wow, both of those sound kind of bad. And that's kind of true. That in some ways, raw data can be unhelpful. And the main reason that it can be unhelpful is if you don't understand the context in which it was created, you don't understand the definitions and the granularity and the measurement, the units, depending on the type of data we're talking about. So raw data is very scary. On the other hand, it's very powerful because it's brand new. It's the raw data. It's exactly what happened. A couple of different examples. Think about data coming from some sort of sensor. It's a smoke alarm, and it's measuring the level of smoke in a room, and it's measuring it by the parts per million of particles that are not air. And that raw data, helpful to the person who built the smoke alarm, who needs to figure out when you should advise people in the room that it's time to get out or people in the building. And it varies based on how big the building is and how far it is to the exit. So the designer needs to know all that raw data, all that detail. But the person who's sleeping in a hotel just needs to know, get out. So telling them the smoke is now at 0.23%, the smoke is now at 0.75%. Doesn't tell them enough to make a decision. So raw data, while it sounds like pure, undiluted, that's what I want. Give me the raw data isn't always the right answer. On the other hand, you compared that with manipulated data that is served up and manipulated has such a negative connotation that you're, like, immediately suspicious. Is this for real? Is this. Could I trust this data. What is it? And I'm glad you said it that way, because we're in a place in the world where we're still in this transition from everything is neatly edited and served up to it's now. You have it in a more immediate and less curated sense. So think about a report. You look like you want to jump in. You're just making. [00:03:22] Speaker A: It's not that I want to jump in. I was shaking my head because before this, you said, don't. I'm not going to do AI. And that manipulated data fed right into going into AI. [00:03:32] Speaker B: Oh, that's funny, because I wasn't thinking about AI. I was thinking about people. I was thinking about people, because you can collectively tell a story. You can tell a story in a certain way. You know the famous tv show where it was yada, yada, yada, and the meat of the story was yada, yada, yada away. The beginning and the end were true. The yada, yada, yada was what made all the difference. That's what I was thinking about. I wasn't thinking about AI. We will get to the AI world, but not right away. So back to my manipulated data, which is a term of art in terms of data, in terms of databases. And typically, it means validating that the data that you are reporting matches the parameters that you set and that if there's an outlier, is it bad data? Was it an off sensor? Or is it a trend? Is it something to be alarmed about? And understanding that correcting for that and then presenting sort of normalized data to, again, that's a term of art to your audience, is a form of data manipulation. But it doesn't. It's not. The idea behind it isn't to change the story or change an opinion. The real example that I've been thinking about a lot lately is when you go from an older system, and it used to happen a lot with mainframe systems and batch processing, and a lot of data files would be collected, and then the process would happen overnight, and then reports would come out, and the report would tell the user with pretty headings and explanations, what happened, what data was processed, and what was the result. Could have been checks clearing. It could have been transactions happening. It could have been measurements taking place. Doesn't matter. But it all happened kind of at a distance from the user, and it was all hidden behind these big mainframe computers that only computer programmers understood. And so the computer programmers managed the process. They coded the system to take the raw data and process it correctly through the transactions, and they programmed the system to produce a report, to explain, to explain it to the end user. But that's 30, 40, maybe 50 years ago when computers typically did that. Even in the past 15 years, you might have seen client server systems where you had like a crystal reports tool or later a power bi tool, but they were very static, where the data got entered, transactions happened, and then there was a format of that spit out the results of the transactions in a crystal report. But today in more modern databases, the expectation is that as soon as you enter data into the system, you ought to be able to search it, filter it, use it for calculations, create your own views into the data. And it's very powerful and it cuts out a lot of time that feels wasted when you have to go and ask it to write you a new report because you need to show something different. Somebody's asking you for a different view into the data. And that's true. It is very powerful and it is more immediate. On the other hand, now you have to be a data analyst and somewhat of a database programmer or report writer. You have to do a lot of that yourself, you as a power user or even an end user, because now you're looking at basically this raw data and trying to understand it, and you don't have that whole coding world around you that is making that raw data palatable for the end users. [00:07:37] Speaker A: I agree with you that it's still a lot, but I just wanted to add that a lot of applications today make it easier to drag and drop certain fields to create a report. Instead of having to know SQL to be able to program something, a report or pull data, they make it easy to just drag and drop into certain areas. [00:08:01] Speaker B: They do make it easy from a format perspective, but dragging and dropping a field, if you don't know what that field means, does not mean that you understand the report. [00:08:10] Speaker A: Totally agree. Totally agree. All I was getting to was just me. You don't have to be a sophisticated programmer, that's all. [00:08:17] Speaker B: No, you don't have to. And actually, and that's powerful and flexible. But what we lost there was the programmers understood the structure of the data because they had to in order to write the code in order to do the transactions and do the reports. Typical end user who can drag and drop fields may or may not understand the structure of the data. They have some understanding for sure, but do they understand all of it? [00:08:47] Speaker A: Maybe just their specific department's understanding as well. [00:08:51] Speaker B: They might understand just their department's piece. Do they understand the reference data that underlies a lot of things, because typically one department isn't in charge of all of the reference data in a shared enterprise system. And if they don't understand how that reference data maintained or what it represents from outside of the system or how it's used in other parts of the organization, then it, then you can get yourself in trouble because you're depending on something and you're doing your own searching, your own filtering and your own views, your own reports, and you don't actually have a full understanding of the data set that you're working with. And that's scary, because if you're working in an enterprise wide system, in a large company, then people are depending on your data to be correct and on the reports that you are giving them to be correct. And it's important. Okay, so that's the scary end of the world of self serve data. How can we make that less scary? Because we are in the information governance space. The people in our audience were together in this, trying to figure out from an information governance perspective, how do we help the situation? Because we want to make data available to people. We know that there aren't staffs of report writing clerks out there who are going to continue to be the intermediary between the raw data and the end user. People need to learn how to do these things. Tools are better. You can drag and drop. Report writers are intuitive and formats are easy. Filters are pretty sophisticated. But from an information governance perspective, the foundation here is data definition. It's data structure, it's data relationships. And so as information governance professionals, all of us, we have, our job now is not just creating those structures. Had that job for a while, but now our job is helping other people understand it. And what are the key things that they need to know and what parts of that structure are they going to be able to manipulate, and therefore they need to understand the implications. And I want to dive into that. I want to dive into how can we help? How can information governance professionals help and power users or end users to understand that data structure and understand the data definitions and make the systems work for them? But I feel like if I did that today in this episode, we'd be going on another 45 minutes. So I'm going to stop there with a tease of there's a big role here for information governance professionals in the world of self serve data, and we'll leave it at that. [00:11:49] Speaker A: If you have any questions, please send us an email at [email protected] or look us up on the web at www.trailblazer.us.com. Thank you for listening, and please tune in to our next episode. Also, if you like this episode, please be a champion and share it with people in your social media network. As always, we appreciate you the listeners. Special thanks goes to Jason Blake who created our music. [00:12:13] Speaker B: Thanks everyone. See you next time.

Other Episodes