I recently saw a post on the Visual Business Intelligence blog about representing 360 data points. The discussion in the forum was interesting as people demonstrated ways to represent these data points on a single image.
I wanted to see what I could come up with. Representing information in effective ways is a good skill for the forensic analyst. I also wanted to try out the conditional formatting feature of Excel 2007 to see what it was capable of.
First step was to download the data. Here is what the spreadsheet looks like:
Looking at some of the other submissions, I decided to order the columns similarly. Housing was the biggest expenditure down through education. This would make the formatting easier to see later. You also need to do this step first otherwise your formatting may get "confused" when you start moving columns.
On the Home tab on Excel 2007, there is a conditional formatting button. Selecting this button gives the following choices:
There are some powerful options under this area and I encourage you to experiment with them. Today I am going to look specifically at Data Bars and Color Scales.
Data Bars
If I select all of the data and choose the blue Data Bar, I get the following:
Notice how the bars are relative to the biggest data item in the selection. If I was to instead select each column individually and assign the blue data bar we would get the following:
Which style you want comes down to what you are trying to show with your data. For our purposes, we would like to see the data bars relative to the largest value, so we will use the first method.
There are some issues with the data bars:
First, you will notice that there is a gradient on the data bar. This can make the end of the data bar hard to see. Unfortunately, this is not something that can be changed.
Second, you may also notice that the smaller values still have a considerable size data bar. Should the data bar for 0.3 really be that big? Turns out that the data bar has a minimum size. Fortunately this can be changed.
The change can be made by going into the Visual Basic editor (ALT-F11), bringing up the immediate window (CTRL-G) and typing the following:
Range("B2:M31").FormatConditions(1).PercentMin = 1
After pressing the enter key, this will change the minimum size of the data bar to its smallest value for the range specified:
One thing that would be nice is to have the data bars be different colors depending on their value. Through the UI there really is no way to do this. Thanks to the Excel 2007 Blog, there is a way to achieve this. This "trick" uses a little VBA code to achieve its goal.
With Excel 2007, you can apply multiple data bar conditional formats to the same data, one over the top of the other. Then, once you have created your multiple data bar layers, you can selectively show the ones you want. There is a "trick" to the "trick" though. Each successive data bar format needs to be on a different selection. This was not explained in the blog entry, but I couldn't find a way to overlay formats on the same data selection.
So, starting with the original Excel spreadsheet, select all of the data and apply the blue data bar format and then change the PercentMin to 1:
Now, select all of the data AND an extra column and apply the orange data bar format:
The blue data bar format will be underneath the orange. Now, select all of the data AND 2 extra columns and apply the red data bar format:
Now we can selectively display the data bars. Select the data from B3 through M31 – This selects all data except for the EU25 data. Now bring up the Visual Basic editor and type the following in the immediate window:
selection.FormatConditions(1).formula = "=if(b3<b$2, true, false)"
selection.FormatConditions(2).formula = "=if(b3=b$2, true, false)"
selection.FormatConditions(3).formula = "=if(b3>b$2, true, false)"
Now, each line of VBA code will only be executed when you press enter on that line. You will therefore need to press enter 3 times to execute each line of VBA code.
The first line specifies the visibility of the red data bar (the last format added is at index 1). We specify that the red data bars indicate values less than the EU25 average. The second line indicates in orange the values equal to the EU25 average. The third line of code indicates in blue the values greater than the EU25 average.
You should end up with the following:
Looking at this you can clearly see that the countries are listed in order of percentage spent on food. To change it to housing we can sort the data. To do that, select all of the data except for EU25 (as that cell is used in our formula). Now click on the Z-A sort button on the Data tab:
There appears to be an artifact in the housing column where some red shows up. We can remove that by executing the 3 lines of VBA code again.
It is important to note that above I used data bars on the whole data set. If you wanted the data bars to be specific to an expenditure, you would have to repeat the procedure above for each column. This is shown in the final chart in the blog.
Now, let's turn our attention over to the Color Scales.
Color Scales
Starting out with the original data, we can select all of the data and apply the Red, Yellow, Blue color scale. The result will be something like this:
You can clearly see that there is more expenditure on housing and less on education. You can also see that restaurants have quite a range of values and Romania spends a lot more than the other countries on food.
You can customize the colors using "More Rules…"
There's not really much more to say about color scales, so we will move on to combining the data bar and color scales in the finished spreadsheet.
Finished Spreadsheet
Here is the finished spreadsheet combining the 2 conditional formatting methods discussed in this post:
I chose to use shades of grey for the data bars to prevent clashing with the color scales. The color scales represent values across the whole data set, whereas the data bars represent values for the particular expenditure.
In addition, I was able to remove the text for the values. I did this by making the data values font very big (409pt) and sizing the cells as before. You then need to align the countries text to the top of the cell.
The task was to create a single image that had all of the data set represented. That proves to be quite a challenge. Typically you would highlight the data you wanted to showcase.
I have posted a better image of the finished spreadsheet to the original discussion.