Learning Tableau 10(Second Edition)
上QQ阅读APP看书,第一时间看更新

The Tableau paradigm

Tableau connects directly to native data engines and also includes the option to extract data locally. The unique experience of working with data in Tableau is a result of Visual Query Language (VizQL).

VizQL was developed as a Stanford Research Project focusing on the natural ways by which humans visually perceive the world and how those ways can be applied to data visualization. We naturally perceive differences in size, shape, spatial location, and color. VizQL allows Tableau to translate your actions, as you drag and drop fields of data in a visual environment, into a query language that defines how the data encodes those visual elements. You will never need to read, write, or debug VizQL.

Tableau will generate the Visual Query Language behind the scenes as you drag and drop fields onto various shelves defining size, color, shape, and spatial location. This allows you to focus on visualizing data and not on writing code!

One of the benefits of VizQL is that it provides a common method to describe how the arrangement of various fields in a view defines a query of the data. This common baseline can then be translated into numerous flavors of Structured Query Language (SQL), Multidimensional Expressions (MDX), and Tableau Query Language (TQL),(used for extracted data). Tableau will automatically perform the translation of VizQL into a native query to be run natively by the source data engine.

At its simplest, the Tableau paradigm of working with data looks similar to the following diagram:

A simple example

Go ahead and open the Chapter 02 Starter.twbx workbook located in the \Learning Tableau\Chapter 02\ directory and navigate to the Tableau Paradigm sheet. Take a look at the view that was created by dropping the dimension Region on Columns and the measure Sales on Rows:

The Region field is used as a discrete (blue) field in the view and it defines the Columns headers. As a dimension, it defines the level of detail in the view and slices the measure. The Sales field is a measure that is being aggregated by summing up each sale within each market. As a continuous (green) field, Sales defines an axis.

Tip

You do not need to understand SQL or any kind of scripting to use Tableau. As you design data visualizations, you will most likely not be concerned with what scripts Tableau generates. The following example is simply to demonstrate what Tableau does behind the scenes, but you do not have to understand the script before moving on.

For this example, let's say that you were connected live to an SQL Server database with the Superstore data stored in a table called SuperstoreData. When you first create this view, Tableau generates a VizQL script which is translated into an SQL script and sent to the SQL Server. The SQL query will look similar to the following example:

SELECT [SuperstoreData].[Region] AS [none:Region:nk], 
 SUM([SuperstoreData].[Sales]) AS [sum:Sales:ok] 
FROM [dbo].[ SuperstoreData] [SuperstoreData] 
GROUP BY [SuperstoreData].[Region] 

This script selects the market and the sum of sales from the table and groups it according to the region. The script aliases the field names using naming conventions that are used by Tableau's engine when the data is returned.

Tip

On certain occasions, a database administrator may want to understand what scripts are running against a certain database to debug performance issues or determine a more efficient indexing or data structure. Many databases supply profiling utilities or log execution of queries. In addition to this, you can find SQL or MDX generated by Tableau in the logs located in the \My Tableau Repository\Logs directory.

You can also use Tableau's built-in Performance Recorder. From the top menu, navigate to Help | Settings and Performance | Start Performance Recording, then interact with a view and finally stop the recording from the menu. Tableau will open a dashboard that will allow you to see tasks, performance, and queries that were executed during the recording session.

There may have been hundreds, thousands, or even millions of rows of sales data in SQL Server. However, it returns aggregate results when SQL Server processes the query. In this case, SQL Server returns only 4 aggregate rows of data to Tableau-one row for each region, as shown in the following screenshot. To see the aggregate data Tableau used to draw the view, press Ctrl + A to select all the bars, then right-click on one of them and select View Data. It is shown in the following figure:

The Summary tab displays the aggregate level data that makes up the view. The Sales here is the sum of sales for each region. When you click on the Underlying tab, Tableau will query the data source to retrieve all the records that make up the aggregate records. In this case, there are 9,426 underlying records, as indicated on the status bar in the lower-right corner:

Tableau did not need 9,426 records to draw the view and did not request them from the data source until the Underlying data tab was clicked.

Database engines are optimized to perform aggregations on data. Typically, these database engines are also located on powerful servers. Tableau leverages the optimization and power of the underlying data source. In this way, Tableau can visualize massive data sets with relatively little local processing of the data.

Additionally, Tableau will only query the data source when you make changes requiring a new query or refresh a view. Otherwise, it will use the aggregate results stored in a local cache, as illustrated here:

In the preceding example, the query based on the fields in the view (Region as a dimension and the sum of Sales as a measure) will only be issued once to the data source. When the four rows of aggregate results are returned, they are stored in the cache. Then, if you were to move Region to another visual encoding shelf, such as Color, or Sales to a different visual encoding shelf, such as Size, then Tableau will retrieve the aggregate rows from the cache and simply re-render the view.

Tip

You can force Tableau to bypass the cache and refresh the data from the data source by pressing F5 or selecting your data source from the Data menu and selecting Refresh. Do this any time you want a view to reflect the most recent changes in a live data source.

Of course, if you were to introduce new fields into the view, which did not have cached results, then Tableau would send a new query to the data source, retrieve the aggregate results, and add those results to the cache.