Blog

Data Analysis—it’s more than Pivot Tables

Excel pivot tables are incredibly powerful. They are simple and easy to learn, as long as you are working with clean data. Every tutorial, for understandable reasons, uses simple tables. Sometimes you have to create the table, and other times it is provided to you, but in all cases, there is nothing that you need to do to prepare (or “prep”) the data for analysis.

In the real world, how often are you able to extract data and start analyzing them immediately? Anyone who has ever worked with large data sets will tell you that you would spend the majority of your time prepping the data before you get to use pivot tables (or Minitab or any other program). Once the data are clean, then it’s a fairly straight forward process.

Large data sets often have missing or corrupt data points and will often require flagging of records based on different criteria. Before using these sets in your analysis, you need to address those issues.

For most end users, pivot tables are fine for simple counts, sums, and averages, but when it comes to mining massive data sets for improvement opportunities, you should consider working with someone with data prepping experience.

The Power of Visuals in Story Telling: Minard’s Chart of Napoleon’s Russian Campaign

It is said that a picture is worth a thousand words. Years ago, I was lucky enough to attend a seminar on data visualization. In that talk, we were introduced to the work of Charles Joseph Minard, a French civil engineer who was renowned for his data visualization maps—the most famous of which is Napoleon’s failed Russian Campaign.

The map shows how Napoleon’s army started with 422,000 soldiers (in beige), and by the time it reached Moscow, it was down to 100,000. It retreated (in black) and only 10,000 arrived at where they started.

Click on image to expand

If you click on the chart and study it a little, several details jump at you. For example, 60,000 soldiers split at Wilna but only 33,000 made it to Polotzk. Also, on their return from Moscow, they suffered heavy losses at the Berezina river. Thirty thousand soldiers from Polotzk join 20,000 retreating soldiers, but only 28,000 make it across the river.

Although no software that I am aware of can create a masterpiece like this one, I believe that, whenever possible, a good graphic should always accompany tabular data. The story is simply easier to tell, and easier to remember.

Find the Nuggets in your River: MRI Utilization.

If you are an administrator in a hospital or a diagnostic imaging center, then chances are you are already  looking at your MRI utilization. MRI exams have a relatively large contribution margin and being able to “squeeze” an extra exam here or there can have a huge financial long-term impact.

You cannot aim for 100 % utilization—that would be unachievable and certainly inadvisable. There will always be claustrophobic patients, late patients, and “no shows”, among other legitimate reasons for lower than 100% utilization. Also, a schedule should have “gaps” to accommodate emergency add-on cases and allow for exam overruns.

A good starting utilization goal is 85%. In this example (Table 1), I am using a year’s worth of data to look at room utilization, defined as number of minutes patients are on the table divided by the number of minutes a room is staffed. This particular machine had 65% utilization. The addition of one 45-minute exam every weekday would bring the utilization to 71% and generate between $75,000 and $200,000 annually (assumed a contribution margin range between $300 to $800 /case). This is only from one machine reaching 71% utilization only.

Table 1

The next challenge is where to find those opportunities. One potential area is to look at your first case of the day—is it starting on time? A quick visual (Graph 1) of the average starting time for a whole year reveals that the “begin” time for the first case is all over the place. If the first exam is scheduled for 7 AM, then there may be opportunities for improvement.

Graph 1

The example presented here uses one year’s-worth of data. It is high level and meant to show potential opportunities. A lot can happen in one year. A second iteration of the analysis should probably look at the latest quarter.

As I have recommended in a previous blog, it is important to sit with the client and those who know the operation best, before coming up with any recommendations. Only they can tell you if the schedule has changed within the year, and exactly at what moment a technologist “begins” an exam (i.e., is it when the patient enters the room or when the patient is on the table?).

This type of detailed analysis is difficult to achieve using canned reports. Those may be quick and easy, but they do not give you the flexibility you need when looking for improvement opportunities.

Raw data, on the other hand, is like a raging river—it can be overwhelming, but it can hide some golden opportunities in its depths. In the hands of someone who can navigate it, those opportunities can be surfaced.

Do you trust your data?

If you are trying to use data for decision making, the first thing you should do is establish their trustworthiness. Many managers feel that if the numbers come out of a computer, then they must be good. Well, in some cases that is true, especially if the numbers (say timestamps) are generated automatically by the system. In other cases, data accuracy relies on staff inputting data correctly.

One way to establish data trustworthiness is to look for unusual patterns. You can run a statistical analysis or simply plot the data. In this example, MRI exam durations (Begin to Complete) were analyzed and the technologists’ results were compared to each other.

The descriptive statistics and dotplot immediately showed that Tech5’s data were unusual and different from the others’. The durations were less scattered and tended to aggregate around two numbers.

It is important, though, not to jump to conclusions. Data analysis is meant to be iterative. Results from an analysis often lead to more questions. In this example, it would behoove the analyst to ask more questions, like, is the technologist manually changing the time stamps to ensure the exams are as long as their manager expects? Or, does the technologist “specialize” in only certain exams? (perhaps due to equipment limitation or comfort level, in which case, the numbers may be accurate).

When analyzing large data sets, it is important to include the client early in the analysis before delving too deeply. They can help you decide if your early observations make sense. After all, it is their operation.

It is rare that you will find large data sets that are 100% accurate and clean. You will almost always need to remove bogus data points. As long as they are few, and random, you can proceed with your in-depth analysis.