Tables and graphs
Most datasets are still maintained in tabular form, as in Figure 2-12, but tables with thousands of rows and many columns are far more common than that simple example. Even when many of the data fields are text or Boolean, a graphical summary can be much easier to comprehend.
There are several different ways to represent data graphically. In addition to more imaginative displays, such as Minard's map (Figure 3-1), we review the more standard methods here.
Scatter plots
A scatter plot, also called a scatter chart, is simply a plot of a dataset whose signature is two numeric values. If we label the two fields x and y, then the graph is simply a two-dimensional plot of those (x, y) points.
Scatter plots are easy to do in Excel. Just enter the numeric data in two columns and then select Insert | All Charts | X Y (Scatter). Here is a simple example:
The given data is shown in Figure 3-2 and its corresponding scatter plot is in Figure 3-3:
The scales on either axis need not be linear. The microprocessor example in Figure 2-11 is a scatter plot using a logarithmic scale on the vertical axis.
Figure 3-4 shows a scatter plot of data that relates the time interval between eruptions of the Old Faithful Geyser to the duration of eruption. This image was produced by a Java program that you can download from the Packt website for this book.
Line graphs
A line graph is like a scatter plot with these two differences:
- The values in the first column are unique and in increasing order
- Adjacent points are connected by line segments
To create a line graph in Excel, click on the Insert tab and select Recommended Charts; then click on the option: Scatter with Straight Lines and Markers. Figure 3-5 shows the result for the previous seven-point test dataset.
The code for generating a line graph in Java is like that for a scatter plot. Use the fillOval()
method of the Graphics2D
class to draw the points and then use the drawLine()
method. Figure 3-6 shows the output of the DrawLineGraph
program that is posted on the Packt website.
Bar charts
A bar chart is another common graphical method used to summarize small numeric datasets. It shows a separate bar (a colored rectangle) for each numeric data value.
To generate a bar chart in Java, use the fillRect()
method of the Graphics2D
class to draw the bars. Figure 3-7 shows a bar chart produced by a Java program.
Bar charts are also easy to generate in Excel:
- Select a column of labels (text names) and a corresponding column of numeric values.
- On the Insert tab, click on Recommended Charts, and then either Clustered Column or Clustered Bar.
The Excel bar chart in Figure 3-8 shows a bar chart for the population column of our AfricanCountries
dataset:
Histograms
A histogram is like a bar chart, except that the numeric data represent frequencies and so are usually actual numerical counts or percentages. Histograms are the preferred graphic for displaying polling results.
If your numbers add up to 100%, you can create a histogram in Excel by using a vertical bar chart, as in Figure 3-9.
Excel also has a plug-in, named Data Analysis, that can create histograms, but it requires numeric values for the category labels (called bins).