Data Mining

Data mining may be the most valuable tool for organizations who may suspect fraud, waste, or abuse.  Data mining is my go-to analysis tool because I feel like it provides the most efficient “bang for the buck.”  Here are a few of my favorite reasons to use data mining.

  • You aren’t limited by system interfaces. How many organizations have you been in where the employees rave about how great their system is, and that it has every report and capability they could ever want in the exact format they want it in?  Probably none.  Systems are necessary and serve a great purpose to standardize data and make it usable to clerks, technicians, and the like.  However, their interface screens are limited in what you can accomplish and ways you can look at the information.  Data mining can cut all of that out.  All systems are built on top of data tables.  Those tables are full of all of the meat and potatoes that make data mining so delicious.  I like to use data mining software to import those tables so that I can sort, filter, compare, and analyze the information forward, backward, sideways, and upside down according to the risks of the data set.


  • You analyze 100% of the data. Anyone who has participated in any sort of audit activity has heard about “sampling.”  There are many sampling techniques, statistical and non-statistical, with many ways to select the sample (random, systematic, and my favorite, haphazard).  There is certainly a time and place for sampling and it isn’t going away in audits, but the risk remains that your sample wasn’t representative or didn’t include the outliers.   Statisticians make a lot of money ensuring that if you choose your sample correctly (according to a certain formula,  methodology, dotting every i, crossing every t) then you can be confident that your sample is valid.  For me, that isn’t good enough.  The truth of the matter is, if there are 1 million transactions, and you sample 200 or 500 or 10,000 of them, you just aren’t going to find all of the junk.  If there was an easy way to do it, I want to look at all 1 million, please.
  • You spend your time where it matters. This is a follow-on to the previous point about sampling.  You’ve imported all of the data, you’ve considered the unique risks of the organization whose data you’re mining to select the right filters, sorting, and anomalies, and you’ve identified the transactions that are high risk.   Maybe there are 10, maybe there are 1,000, but every single one of them is high risk.  This is very different from sampling, where your sample may or may not pick up high risk transactions.  In this manner, you’re certain to focus your time on the areas where there is high reward and minimize time spent on ordinary transactions where there is nothing to be found.

If your organization doesn’t have any type of data mining program in place, it’s missing out on an opportunity to identify fraud, waste, and abuse.  If you’re the one to implement it, you’ll look like a genius while your software does the heavy lifting.

Jennifer Hathaway