## Statistics in Pictures: Illustrations of Why Companies with Statistics Beat those Without

In this article, I wanted to provide clear insight into why data queries, charts, graphs, and trend lines are often misleading and may support incorrect decisions if used at their face value to gain insight into the workings of an organization, customers, or markets. I then provide insight into why statistics overcome the shortcomings of data reporting methods and why companies that use statistics in their operations sport a huge competitive advantage over those who do not. By the end of this article, we will answer the question, “Can companies that use statistics make better decisions then those who do not?”

**FIGURE 1 (above) – Table of Monthly Revenue for Company X
**

On each row of the above table, we see the monthly revenue for Company X from October 2008 to April 2009. As often is the case, Company X asked its analysts to project what the revenue would likely be during the upcoming months of May through August and to give a report on the general financial health of the company based on this revenue forecast. However, the financial manager was quick to comment that revenue was going up and that the company looked strong and was getting stronger. Based on the above table, do you agree with the manager’s comments?

The analysts went off to work. One of the first tasks they performed was to create a scatter plot of the data. This plot of revenue by month appears below.

**FIGURE 2 (above) – Graph of Monthly Revenue for Company X
**

In looking at the above plot, do you see a pattern? The analysts did, and they were ecstatic. The organization’s revenue appeared to be going up each successive month, and the revenue increase over time seemed to be fairly steep on the plot.

After viewing this plot of the company’s revenue by month, the analysts made a revenue projection by adding a best fit line. The revenue projection graph appears below.

**FIGURE 3 (above) – Best Fit Revenue Projection for Company X
**

After viewing the above graph, the analysts felt pretty good. The best fit line seemed appropriate, as the data points were “snaking around” the line, conforming to what the analysts were taught is a sign that the model is appropriate. It appeared that May’s revenue was projected to be around $7.5M, and by October, the monthly revenue would be somewhere around $13M. The analysts used the graph to complete the table shown in Figure 1 as to show the projected revenue from May to August. This updated table appears below.

**FIGURE 4 (above) – Table of Revenue and Revenue Projections for Company X**

After checking the numbers and making sure there were no arithmetic errors, the analysts presented this table of revenue and revenue projections back to their manager. The manager was pleased since he now had numbers to corroborate the magnitude of what he visually saw as an increasing revenue trend based on prior months. The manager submitted the projection to the CFO, and the company made the decision to keep doing what they were doing since October 2008, as the data indicated that revenue was steadily increasing and would continue to increase such that by August, they would have a monthly revenue of around $11M.

The thought process and steps taken in this example are all too familiar. I cannot recall how many organizations I’ve worked with that will look at a set of data, visualize a trend, and then use this visualized or plotted trend to determine what lies ahead. If only life were so simple.

As it turns out, Company X’s revenue is being impacted by a variety of influences. Season, competition, suppliers, a changing customer base – all of these could be impacting our revenue. Simply looking at queries, tables, plots, and trends causes us to ignore the dynamic and interconnected world in which our company lives and how various influences are affecting us like waves coming over us in the ocean.

On my most recent vacation, my wife and I went to the shore. When I was a young boy, waves often crashed into me and knocked me over. However, as I’ve grown, my size has enabled me to withstand the force of small waves hitting me on a calm, sunny day. However, if I was to stand at the shoreline day after day without getting knocked over, can I expect such a trend to continue? If we were to apply the same thought process that Company X used to project revenue, the answer would be yes- since I did not get knocked down by a wave day after day while on vacation, this pattern should continue. However, we can easily take a step back and see that is likely not the case. If we take into account hurricane season, moon phase, wind speeds, and weather, it would not be difficult to imagine me being knocked over and dragged out to sea by a huge wave if I stood by the shoreline during hurricane season when there was a full moon, high wind speeds, and a tornado nearby. The same applies for business- we need to look at the world around us and assess how many different variables are interacting with each other and ourselves to generate the performance indicator we depend on.

In the case of Company X, three variables are suspected to be influencing monthly revenue. These variables appear in the below table as A, B, and C.

**FIGURE 5 (above) – Table of Suspected Influential Variables and Revenue for Company X
**

When A, B, or C change, do you see a pattern in how revenue changes? To assess whether these variables are truly influencing revenue and by how much, SAS business analytics software can be used to run a simple linear regression, as shown below.

**FIGURE 6 (above) –Company X data imported into SAS**

**FIGURE 7 (above) –Company X Linear Regression in SAS
**

FIGURE 8 (above) –Company X Linear Regression Results in SAS

While the results of the analysis may seem alien at first, they are actually easy to interpret. The r-squared value on top tells us how well our statistical model fits the data, or how much variables A, B, and C relate to monthly revenue. An r-square value of 1 means it’s a perfect fit, whereas an r-square of 0 means that variables A, B, and C do not relate to monthly revenue at all.

In this case, our r-squared is almost 1, meaning that if we know the value of variables A, B, and C, we can determine monthly revenue almost perfectly! Under the r-square measure, we see a table that includes a column that shows our variables and a column that shows “parameter estimates.” These parameter estimates tell us how each of the variables A, B, and C impact revenue. This output is our regression model. To estimate monthly revenue, we can rewrite this model as follows:

**FIGURE 9 (above) –Company X Linear Regression Model
**

Using the output from SAS, we determined that monthly revenue for Company X equals 0.159 (the intercept parameter estimate) plus the parameter estimate of A times the value of A, plus the parameter estimate of B times the value of B, plus the parameter estimate of C times the value of C. We can write this into an MS Excel spreadsheet if we so choose:

**FIGURE 10 (above) –Company X Linear Regression Model Pasted into Excel**

If we filled the formula down in Excel, we will arrive at estimates of revenue for not just our previous months, but the upcoming months of May – August. We can then compare the revenue estimates of our SAS model for May – August to the revenue estimates of our best fit line. These results appear below:

**FIGURE 11 (above) –Comparison of Best Fit Line to SAS Prediction
**

The results of our SAS model tell us something completely different from our best fit line. In fact, the results oppose one another! Based on the results of our best fit line, we had determined that revenue would increase in the months of May through August. However, our SAS prediction says that our revenue is about to plummet in the months of May through August. Looking at an overlay plot of the two methods better illustrates how different the results are:

**FIGURE 12 (above) – Graphical Comparison of Best Fit Line to SAS Prediction**

In the above plot, the blue line illustrates the results of the best fit trend and the pink represents the results of the SAS model. In one case, it looks like everything is going well for Company X and that they should continue doing the same, whereas in the other, Company X may be on the verge of crashing. Which is correct?

As it turns out, variables A, B, and C are fictitious and were created to be directly related to revenue for illustrative purposes. The relationship is: Monthly Revenue = A divided by B times C. If we look back at the values, it will become apparent that this is the relationship.

**FIGURE 13 (above) – Relationship between Monthly Revenue and A, B, and C**

In the real world, we rarely know the true relationship between variables. However, we can use statistical methods to estimate these relationships. However, how accurate are statistical methods in determining these unknown patterns? If we go back to our plot from Figure 12 and add in the actual relationship, the power of statistics quickly becomes evident:

**FIGURE 14 (above) – Graphical Comparison of Best Fit Line, SAS Prediction, and Actual
**

Unfortunately, using classic BI reporting methods, the CFO made the wrong decision, and this wrong decision could have cost him his job and the company’s solvency. The queries, tables, and plots all showed that revenue was steadily increasing. Even if one carefully queried, charted, and plotted variables A, B, and C, their relationship with revenue would likely be missed.

Using regression, SAS, without knowing the true relationship between our variables, was able to very decisively detect the relationship occurring behind the scenes and use this relationship to predict that Company X was on the verge of catastrophe! Had Company X used statistics to augment their reports, the CFO may have decided to change which marketing strategy was being used, or which supplier he was purchasing from, or perhaps which customer demographic he was targeting- all variables that may be represented through A, B, and C.

While we worked with only three variables in this case – A, B, and C – businesses tend to be influenced by many more variables that are often related in much more complex ways then our simple formula shown here. Yet, even in this simple example, it would have been difficult to detect the underlying pattern visually. While linear regression may have worked just fine in this example, more powerful methods are often necessary within business to accurately detect patterns driving key performance metrics. These methods may include time series analysis, neural networks, decision trees, cluster analysis, and the like. Luckily, technologies such as SAS sport an environment that makes these technologies available in a clean, graphical interface that is designed with business application in mind.

The use of statistics, whether in marketing, healthcare, insurance, energy, or otherwise, yields the user a powerful competitive advantage over those organizations using BI reporting alone to drive decisions. Furthermore, adding a statistical capacity into an organization tends not to be all that challenging with the appropriate guidance. For those organizations who would like to become more profitable using methods such as those described in this article, my colleagues and I are available to help guide the building of a robust analytical capability that will leave competitors in your organizational dust!

Timothy D’Auria serves as SAS Analytics Practice Manager for Creative Computing, awarded as a top data consulting firm in the Northeast US.

For more information, feel free to contact Timothy D’Auria at 401-727-2400.