Big Data Visualization
上QQ阅读APP看书,第一时间看更新

An explanation of data visualization

So what is data visualization? Simply put, one can think of the two words, data meaning information/numbers and visualization meaning picturing, or picturing the information as shown in the following figure:

Perhaps a simplistic example that can be used to define data visualization is the practice of striking lines between stars in the night sky to create an image.

Imagine certain stars as the data points you are interested in (among the billions of other stars that are visible in the sky) and connecting them in a certain order to create a picture to help one visualize the constellation.

Voila! Data visualization!

Nowadays, it is reported within the industry that data visualization is regarded by many disciplines as the modern equivalent of visual communication.

Okay, so then what is the point of or chief objective of visual communication or visualizing your data?

The main point (although there are other goals and objectives) when leveraging data visualization is to make something complex appear simple (or in our star example earlier, perhaps to make a data pattern more visible to a somewhat untrained eye).

Communicating a particular point or simplifying the complexities of mountains of data does not require the use of data visualization, but in some way today's world might demand it. That is, the majority of the readers of this book would most likely agree that scanning numerous worksheets, spreadsheets, or reports is mundane and tedious at best, while looking at charts and graphs is typically much easier on the eyes. Additionally, the fact is that we humans are able to process even very large amounts of data much quicker when the data is presented graphically. Therefore, data visualization is a way to convey concepts in a universal manner, allowing your audience or target to quickly get your point.

Other motives for using data visualization include:

  • To explain the data or put the data in context (that is, highlight demographical statistics)
  • To solve a specific problem, (for example, identifying problem areas within a particular business model)
  • To explore the data to reach a better understanding or add clarity (that is, what periods of time does this data span?)
  • To highlight or illustrate otherwise invisible data (such as isolating outliers residing in the data)
  • To predict, for example, potential sales volumes (perhaps based upon seasonality sales statistics)

With computers, technology, and the corporate business landscape changing so rapidly today (and all indications are that it will continue to change at an even faster pace in the future), what can be considered the future of the art of data visualization?

As per Data Visualization: The future of data visualization, Towler, 2015:

"Data visualization is entering a new era. Emerging sources of intelligence, theoretical developments, and advances in multidimensional imaging are reshaping the potential value that analytics and insights can provide, with visualization playing a key role."                                                       

With big data getting bigger (and bigger!), it is safe to undertake the notion that the use of data visualization will only continue to grow, to evolve, and to be of outstanding value. In addition, how one approaches the process and practice of data visualization will need to grow and evolve as well.

Conventional data visualization concepts

Let's start out this section by clarifying what we mean when we say conventional.

In the context of this book, when I say conventional, I am referring to the ideas and methods that have been used with some level of success within the industry over time (for data visualization).

Although it seems that every day, new technologies and practices are being discovered, developed, and deployed providing new and different options for performing ever more ingenious real-time (or near real time) data visualization, understanding the basic concepts for visualizing data is still essential.

To that point, gaining an understanding of just how to go about choosing the correct or most effective visualization method is essential.

To make that choice, one typically needs to establish:

  • The size and volume of the data to be visualized.
  • The data's cardinality and context.
  • What is it you are trying to communicate? What is the point that you want to communicate?
  • Who is your audience? Who will consume this information?
  • What kind or type of visual might best convey your message to your audience?
Note

We have also been realistic that sometimes the approach taken or method used is solely based upon your time and budget.

Based on the earlier and perhaps other particulars--and you most likely are already familiar with these--the most common visualization methods/types include:

  • Table
  • Histogram
  • Scatter plot
  • Line, bar, pie, area, flow, and bubble charts
  • Data series or a combination of charts
  • Time line
  • Venn diagrams, data flow diagrams, and entity relationship (ER) diagrams

As I've mentioned earlier, as and when needs arise, newer or lesser known options are becoming more main stream.

These include the following:

  • Word/Text/Tag clouds
  • Network diagrams
  • Parallel coordinates
  • Tree mapping
  • Cone trees
  • Semantic networks

Each of the earlier mentioned data visualization types/methods speak to a particular scenario or target audience better than others--it all depends. Learning to make the appropriate choice comes from experience as well as (sometimes) a bit of trial and error.

Training options

Due to the popularity of data visualization, there exist many formal training options, (classroom and online) and new and unique training curriculums are becoming available every day.

Coursework includes topics such as:

  • Channeling an audience
  • Understanding data
  • Determining informational hierarchies
  • Sketching and wire framing
  • Defining a narrative