October 9, 2015

What is Dark Data and How Can It Be Cleaned Up?

3 minute read

I recently came across a great blog post on a term called Dark Data and the importance of cleaning it up. According to Gartner, Dark Data is the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes. This very well written post by Rick Delgado (@ricknotdelgado) discusses several aspects about the importance of cleaning up this dark data. My post will just summarize the key points, but I highly recommend reading the post directly.

Dealing with Dark Data

To deal with dark data, Rick says you’ll first need to identify it.

Often, dark data will sit unused for years, taking up valuable space in a data center while your company continues to collect even more data. What can start off as a small problem can grow rapidly as unused information continues to pile up.

Why Dark Data stays around?

He also answers why companies tend to keep this data around.

The truth is many organizations prefer to store all the information they collect to ensure they are in compliance with all laws and regulations. At the same time, businesses are reluctant to just toss out unused data because they never know if they might need it at some time in the future. Big data analytics can yield some promising solutions to problems, and to come to those solutions, organizations need the relevant data. As the usual mindset goes, just because you don’t need it now doesn’t mean it won’t prove valuable in the future.

Cleaning it up

What about cleaning up this dark data?

It’s true that a thorough cleanup of dark data can be time-consuming, but the results are well worth the effort. The main challenge is to get rid of dark data while still holding onto any necessary data. There are several ways you can do this at your organization. One of the most effective methods is filtering your data. When gathering data generated by machines and the internet, you’ll find a lot of valuable information along with data that is largely useless. By identifying and isolating the data you need, you can keep it separated from all the other noise. This helps prevent unneeded data from piling up in the first place.

This is where Gimmal can help. Our technology cleans up this dark data for you by classifying unstructured (as well as structured) information and determining what should be kept and what should be properly destroyed.

