May 29, 2018

Cleansing Your Content: What a Novel Idea

5 minute read

Below is our second in a series of blog posts written by Carla Mulley, Vice President of Marketing at Concept Searching. Concept Searching and Gimmal are working together to offer more intelligent records management capabilities to organizations of all sizes.

Read this post to learn why it's important to eliminate redundant, obsolete, and trivial (ROT) information.

Migrating, or I should say moving, to the cloud seems like an easy out to many organizations. It’s free isn’t it? No. Well it’s cheap. No again. I hate to burst bubbles, but cloud storage certainly isn’t free, and it isn’t cheap. But that’s the prevalent mentality. IBM claims that 85% of your data is unstructured and IDG indicates it is growing at 62% per year. Look at the numbers. The compound average annual growth rate for unstructured data far surpasses the price of a reduction in storage costs.

In the scenario of moving content to the cloud, or if you have moved content to the cloud, the prevailing theory is if you don’t adversely impact the data being moved or impact productivity, then the migration was judged a success. Wrong again. The majority of unstructured data is neither managed nor tracked. Less than 1% is even analyzed. Which leads us to the key question: what exactly does it contain and why on earth are you moving it, did you move it, or why are you keeping it?

Most likely you moved garbage, or ROT. All of it? Well no. Migrating content using the forklift approach leads to moving and ultimately paying for the storage of unknown privacy exposures, undeclared records, as well as the usual assortment of duplicates, stale information, even unknown content of value, whatever you can imagine is most likely being moved. For those organizations that depend on the owner of the content to keep what’s business critical and archive or delete the rest, I think we all know that won’t happen. We, as humans with our possessiveness of our data, just don’t like to get rid of it – ever, myself included.

Let’s assume that your organization has no qualms about paying to store this data, regardless of where it resides. This approach carries great risk and, ultimately, the costs can far exceed just the costs of storage.

Cleansing your unstructured data before migration means you are finally in control and are proactively alleviating organizational risk. The ability to automatically generate multi-term metadata is a key enabler. Multi-term metadata can consist of up to say 5 or so terms that represent a subject, topic, or a concept. Once auto-classified to one or more taxonomies, you have a highly granular inventory of what is being stored in your file shares, SharePoint, Exchange, ECM systems – basically any repository. Records management professionals and domain experts can then make the determination to keep it, archive it, or just plain old delete it.

It’s very straightforward. Let’s look at an example. We will assume you are looking for any privacy or sensitive information vulnerabilities that are unprotected in your corpus of content. A taxonomy is created that contains the types of privacy and sensitive phrases you are looking for. Sensitive information can be content that is defined by the organization, so there are no limits on the types of content you are seeking and you can also use phrases. Content is then auto-classified and the exposures are identified. If you would like, you can automatically remove files from access and send them to a secure repository for disposition. This same approach can be used to identify and manage undeclared records. Compliance and governance processes can also be enforced as well as providing defensible deletion with full audit capability. Of course, this cleansing of unstructured data also identifies duplicates, near duplicates, ROT, stale information – the usual suspects.

Although this post has focused on migrating content, we recommend running this process a few times a year. One client reduced their server footprint from 56 to 4. The impact of the risk becoming a reality doesn’t seem like it’s worth it.

You can read how another client addressed privacy and the elimination of one type of data breaches or how a healthcare client who faced strict regulatory guidelines for HIPAA compliance at the Concept Searching website.

Concept Searching and Gimmal have joined forces to add multi-term metadata and classification to an already exceptional product. For more information contactConcept SearchingorGimmal.

Receive News Updates As Soon As They Happen