Thursday, February 20, 2014

Making visualization self-explanatory by embedding insights

1.    Background

In recent times data visualization has come up as a popular choice for analysis and planning. There are numerous new visualizations which present the data in different ways for easy fact finding.
But these benefits come with a side effect - Visualizations are getting more and more complex and generally need further information (in textual, audio or visual format) to explain the facts so that user can deduce proper insights.
Dynamic visualizations, where transitions are used to show the changes in data over period of time, elevate the problem even further. As the underlying visualization continuously changes, either we need to ‘pause’ the transition or again provide audio or separate text inputs to support the data presented. Few example of complex visualization are given below: (Ref: http://www.visualcomplexity.com/vc/)


Use Case
Consider the chart shown below; on high level it’s showing the per-capita income in various countries at different point of time. As you will see, over the period of time the data points are moving towards the right top corner. 




Though the intellectual users will get some insights using the axis labels (the static information) in the chart but more insights on a particular instance of dynamic visualization will be difficult or many times totally lost.
For example
Obvious insight:
  • During 1970s, 80s, except few countries all having low per capita income
  • Over the last few decades more countries having better per capita income
Further insights:
  • Even though the already developed countries are growing further, but their growth rate is much slower than the developing nations
  • The difference between the highest earning and lowest earning nations is still almost same
  • Nation X has made the most progress,  etc …

With a bit more complexity in data or visualization, it can be almost impossible to convey the underlying message by just using the visual cues.

Current Solutions and Associated problems
The current solutions partially solve the problem by either:
  • Providing a separate text outside the visualization:
  1. Not user friendly
  2. while sharing the visualization, we need to share both visualization and accompanying information.
  3. Not real time. If the data changes we need to change the text accordingly.
  • Providing a text/annotation on the visualization itself, in other words we can add a text on top of the snapshot of particular visualization. For example on any chart I can add a text box and put an insight for it.
  1. For static visualizations: though it solves the problem of keeping insight with visualization, but the insight is still static. In case the data changes we need to go back put new insights. In case we want to use the similar insight at same event for some other data it can’t be done.
  2. For dynamic visualization, currently there is no feasible way to add annotation at a particular instance of visualization.
  • Using a third party tools to embed text/audio commentary on the visualization: For data based visualizations, this means recoding the screen and adding the labels on top.
  1. Because we have recorded the screen (or taken a snapshot), the whole visualization is no longer attached to the data itself. Hence the insight can’t be used to in some other chart or data as such. For any new data we need to again create the visualization and reuse the tool to add the insight.
  2. Extra time, efforts are required to do the recording and embedding the text/audio.
  3. You need third party tools to achieve this.

2.     Proposed Solution to the Problem

In recent times, many grammar based visualization engines have been introduced in the data visualization field. (Vega (http://trifacta.github.io/vega/editor/), IBM Rave etc.) These engines allow the use to define the visualization using a predefined template.
The solution proposes to attach the insights in the visualization by attaching the trigger points and related insights with the template used to create the visualization.
Based on the trigger, we can show more information about the underlying data in the form of text, audio, graphics or other means. The triggers can be:
‘event’ based,
“when the current visualization is showing the data for 2001” or
“when the data for X axis goes beyond the value 1000” show “…..” .
Time based
      When we are into 3rd minute of transition    
Due to this dynamic nature of attached information we can 
  • change the underlying data with same insight and same visualization(text or audio)
  • change the insight for same or different underlying data.

3.     Benefits

Proposed solution will allow business users to:
  1.  gain more insights from a visualizations.
  2. embed more insights for end users.
  3. as the insights are embedded we can increase the complexity of the visualizations without the fear of overwhelming the end user.
  4. as the insights are embedded in the visualization and can be attached to the data this adds the capability to: 
    • show different insights based on same data 
    • show similar insights for different sets of data

4.     Sample Implementation and Flow

This type of system can be easily built using the popular Grammar of graphics or a similar system. A sample flow:
Sample Visualization Template:





Execution Example:
Condition: Difference between max and min value > 200% of average
:- Insight: There is huge difference between best performing and worse performing nations.


Note: The mentioned insight will be shown if the condition is met. So in case of any other year’s data if the difference is not as huge the text will not be shown. 

5.     Going one step ahead

Once we have the system in place, it will be easy to have an implementation where there are several insight triggers attached to the templates based on a rule and over time these triggers will build up. It’s quite possible that on producing visualization for some a set of data, user will be overwhelmed with multitude of such insights (obviously enough due to several triggers getting fired). To help the end user, the system can provide a facility for providing a ‘filter’ input which will filter the insights shown to the user. So in above mentioned example, in case the user is interested in the progress of only third world countries the respective inputs can be provided. On which the system will filter the insights using a simple text based comparison.