Why Visualize?

conceptual
visualization

Why visualize

Author

Josef Fruehwald

Published

September 9, 2024

Different Data, Same Stats

Data visualization is fundamental for statistical analysis, because without visualizing your data, you won’t really be able to understand it. A classic example is Anscombe’s Quartet (Anscombe 1973). When plotted, these look like 4 very distinct data series. If we were going to theorize about what kind of processes gave rise to each of these data sets, our theories would necessarily be very different.

Figure 1: Anscombe’s Quartet

But if we decided to only look at statistical summaries of the data, or statistical models of the data, they’d look nearly identical.

series mean sd
x y x y
1 9.00 7.50 3.32 2.03
2 9.00 7.50 3.32 2.03
3 9.00 7.50 3.32 2.03
4 9.00 7.50 3.32 2.03
Table 1: Mean and Standard Deviations of x and y for each series.
series estimate std.error statistic p.value
intercept 1 3.00 1.12 2.67 0.03
2 3.00 1.13 2.67 0.03
3 3.00 1.12 2.67 0.03
4 3.00 1.12 2.67 0.03
slope 1 0.50 0.12 4.24 0.00
2 0.50 0.12 4.24 0.00
3 0.50 0.12 4.24 0.00
4 0.50 0.12 4.24 0.00
Table 2: Linear model coefficients for each series

This was taken to an extreme degree with the Datasaurus (Matejka and Fitzmaurice 2017)

Table 3: Summary stats for the datasaurus dozen
dataset mean sd
x y x y
away 54.27 47.83 16.77 26.94
bullseye 54.27 47.83 16.77 26.94
circle 54.27 47.84 16.76 26.93
dino 54.26 47.83 16.77 26.94
dots 54.26 47.84 16.77 26.93
h_lines 54.26 47.83 16.77 26.94
high_lines 54.27 47.84 16.77 26.94
slant_down 54.27 47.84 16.77 26.94
slant_up 54.27 47.83 16.77 26.94
star 54.27 47.84 16.77 26.93
v_lines 54.27 47.84 16.77 26.94
wide_lines 54.27 47.83 16.77 26.94
x_shape 54.26 47.84 16.77 26.93

Being cautious about our plots

We also need to be careful about our plots. Here’s another set of plots from the same Matejka and Fitzmaurice (2017) paper. This time, I’ve chosen three different (and somewhat common) methods for visualizing the same dataset. Depending on the visualization method, they wind up looking really different!

References

Anscombe, F. J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27 (1): 17–21. https://doi.org/10.1080/00031305.1973.10478966.
Matejka, Justin, and George Fitzmaurice. 2017. “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics Through Simulated Annealing.” In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 1290–94. ACM. https://doi.org/10.1145/3025453.3025912.

Reuse

CC-BY 4.0

Citation

BibTeX citation:
@online{fruehwald2024,
  author = {Fruehwald, Josef},
  title = {Why {Visualize?}},
  date = {2024-09-09},
  url = {https://lin611-2024.github.io/notes/meetings/2024-09-09_data-viz.html},
  langid = {en}
}
For attribution, please cite this work as:
Fruehwald, Josef. 2024. “Why Visualize?” September 9, 2024. https://lin611-2024.github.io/notes/meetings/2024-09-09_data-viz.html.