Data | Content: Devils in the Details

[00:00:01] Speaker A: Hello, thank you for joining us. This is what Counts, a podcast created by Trailblazer Consulting. Here we highlight proven solutions developed through our experience working with companies across various industries, and we talk about how you can apply these solutions to your company. We share our experience solving information management challenges like creating and implementing a records retention schedule, creating an asset data hierarchy, or helping with email management. This is Lee in this episode or and I will continue on the arc of describing the devil in the details in our framework, particularly the piece Data Content. More this is a big one. Data and content. Where are those details here? Where do they hide? [00:00:46] Speaker B: I mean, it's almost like a cliche. Data is full of details. But in our context of what we're talking about here is how do you address those details when you're thinking about the data or content as part of your information governance program? I think we've talked already about why we separated data and content. Well, we didn't separate them, we put them together. But why we used both words. The main reason, you may recall if you heard our earlier episode, is that when we think of data and when we talk to people about their data, the picture in everybody's mind is like rows in a database. So it's structured data, it's got fields, it's defined, and it's not just free text. And then in content, when people think about that, if they think about it, most people actually think about documents. And we have to help them kind of see content and documents as similar, although not synonymous. But the key thing there from an information governance perspective is that you're dealing with an object, a file, a blob, as opposed to structured data. From a policy standpoint, just because I'm going to weave all the pieces of the framework together, they all impact each other. From a policy standpoint, the rules should be the same. You should be protecting your information, whether it is in a structured format or in a file format, a blob format, going to protect it at the same level of security because it represents the same risk to your company. It's equally valuable. So it's, you know, company internal use only. It's company sensitive, financial sensitive, or customer financial sensitive or employee health information sensitive or something. All of those rules mean the same thing, whether you're talking about a database or a blob. But how you can apply those rules for retention, for security, for access, it's different depending on the format, depending on the medium where you're storing this. So if you're storing something in a structured database, it's easier to identify all the PCI information, all the personal credit information, all the phi, the personal health information or the pii, any personally identifiable information, or if you're in an infrastructure industry, the critical infrastructure information, it's easier to identify it in a database because you can look at a field that is called, you know, facility name or customer name or identifier, you know, Social Security number or something. You can find those things more easily in a database, which means that, you know, kind of, what's the universe that I need to protect? And you can target the security, you know, what's the end date? What's the trigger? From a retention perspective, is this data still active? Has it hit the termination of the contract, or is this employee still here or have they left? Because that's the trigger for managing the retention of that data. It's much easier to find when you've parsed it all out into fields. When you're looking at content at a file, at a blob, at a document, whether it's in paper form or electronic form, it's the same problem, which is you only know so much about it. You may know a name, you name the name of the file, you may know a file type, you may know that it sits in a file structure and that your file structure has a label on it and it. And maybe that label is attached to a tag that matches your security or your retention requirements, which would be great. And you just apply those tags to everything that's in that file folder that has that same label. Virtual file folder or physical file folder, you've tagged it with the same information. So that's a different way of applying those security rules or retention rules or access rules to that content side versus the structured side. So the details part of this, when you're looking across a whole organization and you've written your policies, you know what your information classification requirements are, you know what your retention requirements are, you've matched your record categories to your business processes. So you kind of know who's working on what across the company. But then you start looking at, where are we putting these things, where are we storing them? And you have probably a mix. We have seen this in every client. You have a mix of sort of shared file storage locations. They might be actual file servers, they might be virtual file servers in the cloud, they might be SharePoint locations or some other collaboration tool. Might be an enterprise content management system or an enterprise document management system. You have all those places where people can set up structures and drop things, they can put folders, they can put documents into folders, they can put spreadsheets, pictures, other files into these folders. Then you might have, and you probably do have a lot of enterprise level systems. So you have your financial system, you might have a shipping management system, depending on your business, you might have a maintenance management system. If you're in infrastructure or running a factory or something, you might have a, a requirements management system like Jira or something where if you are a software shop and you are building software and you have to check code in and out because you want to control what changes are made to your product before it goes out to the public, all of those enterprise level systems have some built in safeguards for your data and they require that you provide certain information before you load things in there. You have to say what this is. Whereas those shared file locations, shared file repositories, they require far less information for you to store something, which means you have less information to work with when you're trying to figure out what it is. In addition to the enterprise systems and the file repositories, you also have smaller systems. People might build a little access database, a little SQL Server database. They might build a little power BI tool or a power apps or something. People might have a contractor that they're working with and the contractor's got a tool and it's doing something outside of your environment. Those are the hard ones. That's where the devil in the details comes in. Because high, you know, system level, sorry, enterprise level repositories, enterprise level systems, your IT shop knows that they exist, they know that they're there. You can impose some structure, some requirements on your file repositories through active directory groups, through security protocols, by using tags to say it's this, this folder has this level of security, this level of retention, this type of retention. You can impose some of that because you have an IT shop that's looking across and, and you have policies and they can help implement them. But when it comes to the small stuff that people have the power to do on their own as an enterprise, you have very little visibility into that. And so then you're, then you're down to a couple of options. A lot of training tell people, this is how you should be doing it, this is where you should be putting it. This is the level of security or the retention requirements, et cetera, that you need to apply to this information. And then check, did they do it? Are they doing it? Give them a refresher training because they probably forgot, like there's a lot of relying on Individuals to carry things out. When you're talking about something that can be complex, thinking about how much data there is and how much flexibility you have, it requires a lot of follow up. In some cases you can lock down access. You can, from an IT perspective, you can prevent anybody from downloading a, a non approved piece of software. You can prevent them from sending data out of your company using data leak prevention protocols and software tools. But those are. It's still hard. It's hard to lock down everything completely and you have that battle between efficiency and getting work done versus security. From a security perspective, being on the Internet is harder. Being completely isolated, every single machine on its own is more secure, but it doesn't let you get your work done. So you've got to have that balance between the two. So from an information governance program perspective, start with the big systems, the big repositories. Do everything that you can to, to build in the requirements, the security requirements, the retention requirements, the organization requirements, so that people can find what they need and then make them very user friendly. Make those areas, those systems and those repositories, the place where people want to do their work. Give them the tools that they need to do their work in the places that you want the information to end up. If you can do that, then you're going to minimize the impact of I have to go do my own thing because the central stuff doesn't work for me. I don't trust that I'll ever find that again on SharePoint. So I've just kept all the copies in my, in my OneDrive or in my hard drive or something. The equivalent of in my garage. I can see you talking, Lee, but you're on mute. [00:11:01] Speaker A: Well, yeah, but like Dropbox or something, something external third party repository. That's going to really make a cause a problem. [00:11:09] Speaker B: Yes, yes. [00:11:10] Speaker A: Not that it's a bad thing to there in the marketplace. It's just from a control perspective and where your content is located perspective, it causes issues for enterprises. [00:11:23] Speaker B: Right. It's an issue if the enterprise hasn't adopted it and put some controls around it. It's an issue if somebody decides to do it on their own. That's where the issues come in. So that was a very quick introduction of the details that are problematic when you think about the proliferation of either data or content across your organization and how easy it is to do that, how easy it is for any person in your organization to copy something, change it a little bit, make a new version, share it with somebody, whether it's a database or a content, because if it's a database, they can potentially export some data and then manipulate it in Excel or Access or some other system. And you don't know. From an enterprise perspective, it's really hard to know what people have done. So combination of apply all the controls that you can to the enterprise, make the enterprise environment an attractive one for doing work, make it easy to use, listen to the users on what they need to make this happen and then the training and the follow up on trying to prevent the rogue systems, the rogue repositories that people will put in place as workarounds if they don't get what they need from the enterprise. This one's a little bit of a cautionary tale more than some of our others, but there's a lot of risk in letting everybody do what they want with your data. So you really need to think about what are all the places where it could go wrong. I realized as we were talking about the data and content details and the traps that you might fall into there that thinking back to the process discussion you mentioned, Lee, how there were so many pieces to the to the processes and how it's all so complicated and together. What I'm now what the thought that just occurred to me is those two pillars are at the center of our framework and there's a reason they're at the center. They are holding the whole thing up, the process and the data, the process of how people interact with their information in data or content. So they're in the middle for that reason. But they're also in the middle because they need the protection, the shoring up of the policy on one side and the applications on the other, governance on top and infrastructure on the bottom. And we're going to get to talking about applications and infrastructure. We haven't done those two yet. But that center of having process and the process and the information and the data and the content that is the heart of information governance. And these pieces on the outside and on the top and the bottom are really there to support the best use of those two things. [00:14:34] Speaker A: Makes sense to me. If you have any questions, please send us an email at [email protected] or look us up on the web at www.trailblazer.us.com. Thank you for listening and please tune in to our next episode. Also, if you like this episode, please be a champion, share it with people in your social media network. As always, we appreciate you, the listeners. Special thanks goes to Jason Blake created our music. [00:15:02] Speaker B: Thanks everybody.

Data | Content: Devils in the Details - E101

Show Notes

Episode Transcript

Other Episodes

Raw Data vs Manipulated Data - E92

Donna V.: Steps to Take to Achieve Successful Training - E034

From Clay Tablets to Cloud: The Timeless Power of Records - E118