If you are trying to use data for decision making, the first thing you should do is establish their trustworthiness. Many managers feel that if the numbers come out of a computer, then they must be good. Well, in some cases that is true, especially if the numbers (say timestamps) are generated automatically by the system. In other cases, data accuracy relies on staff inputting data correctly.
One way to establish data trustworthiness is to look for unusual patterns. You can run a statistical analysis or simply plot the data. In this example, MRI exam durations (Begin to Complete) were analyzed and the technologists’ results were compared to each other.
The descriptive statistics and dotplot immediately showed that Tech5’s data were unusual and different from the others’. The durations were less scattered and tended to aggregate around two numbers.
It is important, though, not to jump to conclusions. Data analysis is meant to be iterative. Results from an analysis often lead to more questions. In this example, it would behoove the analyst to ask more questions, like, is the technologist manually changing the time stamps to ensure the exams are as long as their manager expects? Or, does the technologist “specialize” in only certain exams? (perhaps due to equipment limitation or comfort level, in which case, the numbers may be accurate).
When analyzing large data sets, it is important to include the client early in the analysis before delving too deeply. They can help you decide if your early observations make sense. After all, it is their operation.
It is rare that you will find large data sets that are 100% accurate and clean. You will almost always need to remove bogus data points. As long as they are few, and random, you can proceed with your in-depth analysis.