Data mining – Can you find gold?

Klondike Gold MinerBusiness Intelligence it’s quite the rage at the moment. Leaving aside the oxy-moronic character of the very term, it is clear that executives are dying of thirst for useful information in a sea of data. Anyone who can promise to synthesize the deluge of information into manageable size bites is the new hero in the executive circle.

A survey was reported by Computing Canada on April 7, 2006. In it, of the 385 IT and Finance VPs surveyed, a stunning 82% did not possess information which was of sufficient quality, trustworthiness or usefulness to manage their operations.

This kind of statistic rings true to anyone in an executive position these days. There is a tremendous amount of data, but most of the information used to make decisions comes anecdotally from those around you.

This idea of getting synthesized information isn’t new. When I first got started in the project management software business in the early 80′ we used to refer to this phenomenon as the ‘elusive 1-page report’. These days we’re more likely to talk about ‘scorecards’ or ‘dashboards’.

The exercise of going from huge volumes of data to a small amount of information is called data mining. One might easily think of a prospector back in the goldrush days looking for that tiny nugget of gold in a huge riverfull of gravel.

So how does one go about panning for some project management gold?

Let’s first talk about what we’re looking for. The basic building block of any Data Mining exercise is the Key Performance Indicator (KPI).

The Definition of a Key Performance Indicator is: “A significant measure used on its own, or in combination with other key performance indicators, to monitor how well a business is achieving its quantifiable objectives.”

An effective KPI must have several characteristics. It must be align to a strategic purpose. It must be measureable. And it must, most importantly, be actionable. When we go looking to establish our KPIs one of the first pitfalls is the desire to measure everything. There are perhaps hundreds or even thousands of things that could be measured but just because they can be measured does not mean that we should measure them. We will often do an exercise with senior management working backwards from what drives the business to what view, measure or display would allow them to take action on that business driver. This isn’t much different than a prospector who decides before he goes into the field what he’s looking for. Is it diamonds? Iron? Gold? The metrics sought will determine where you go looking. Just like the prospector, however, we may find something we didn’t expect.

Once you’ve decided on what you’d like to mine for, the next phase is to determine if you’ve got the data to mine with. There are several aspects of the data you’ll sift through in your mining expedition that you need to keep in mind while you move forward in the exercise. First, what is the quality of the data you’re encountering? It’s quite possible to create scorecard system based completely on subjective scores entered each week by a clerk. The quality of the data is then complete dependent on the perspective of the person entering the score. A couple of levels up, when management looks at summary or synthesis of this data it may not be at all apparent that the source of the resulting score or indicator was a clerk who didn’t “feel good” when they chose a score for some metric.

An important aspect of data quality is often ignored and that is data completeness. Let’s take the most obvious example to make the point. If you were to design a resource capacity planning indicator showing the work load compared to the resource availability over the next 90 days but you only had access to 80% of the project data, what would be the value of the report? It would have, at best, no value and, at worst, a detrimental value as it might lead you to the wrong resource decision.

The completeness of the data is critical and even a metric of data completeness would be a worthwhile indicator in your final display.

Another aspect of data quality is the timeliness of the data. In some cases, this is not critical as we’re looking at trends from the past. Still, even if you are looking at older data in your mining efforts, it’s good to know from what era this data exists. Project data in particular may look very different from one period of time to the next as an organization goes through natural changes over time. If the kind of metric you’re looking for is only relevant to data that is very current, then an indicator of the age of the data is important. In some cases, we’ve seen a key metric being the use of the project management system itself. So, the date that project data was last saved or the frequency with which the project data was accessed becomes a useful bit of data to mine.

Ok, so you know the quality and timeliness of the data you’re looking for, how accessible is it? One of the challenges with project data is that it’s often stored in many different locations and even on individual hard disks. In fact, one of the key incentives for enterprise project management system deployments is a desire by management to put data into a common format that is centrally located so that it can be mined later.

While we’re talking about data, one thing that’s common in data mining exercises is the correlation of data from one source to another. Data mining often enables you to look at the data in the project system, the financial system, the HR system, the CRM system and other corporate systems simultaneously. If you can bring data from multiple sources together then you’re able to compare one set with another or combine the data from those sources to see a picture of the overall situation that would have been otherwise impossible.

If you’re thinking of what project management indicators you could use as KPIs, you can start with the project management classics: the budget vs. actual variance by project phase, by costs or by actual effort vs. planned effort. I’d also recommend that you let your imagination stretch a little further afield. How about the number of different projects worked on each week by employee or the amount of project time vs. overhead time per employee. An indicator of the amount of time spent travelling between project sites can produce stunning results for organizations where the team is not co-located.

Ok, if this all sounds like something you’d like to implement, here are a few tips on how to get started:

First, start with less. While you may find you’re capable of measuring 500 performance indicators, it’s highly unlikely that 500 of them are all “Key” performance Indicators. It’s far better to start with a small number of high quality measures that leave the readers empowered to take action.

Next, make an inventory of data that you believe you can mine from. There are potentially many sources, but you can qualify those data sources by thinking of completeness and quality. Lean towards empirical data. That is to say data that arrives from a structured source such as an EPM package rather than subjective data such as a weekly hand-entered score.

Whatever data you end up deciding to mine, remember, data is not a static thing. Create some checks and balances for your data to ensure that over time the quality and completeness of the data remains high.

Finally, document, document and document some more. The path from the raw data that you’re mining to the diamond of a dashboard that is your final product will be quickly forgotten. Once you’re gone from the process, how will people remember where this measure came from and what its significance is?

Prospecting is not easy. But the rewards for those who are diligent are nuggets of pure gold. Happy mining!

Leave a Reply