class: center, middle, inverse, title-slide # Lec 04: Graphical Integrity ## SDS 192: Introduction to Data Science ###
Shiya Cao
Statistical & Data Sciences
, Smith College
###
Fall 2024
--- # Today's Learning Goals * Understand Tufte's principles of graphical integrity. --- # "Graphical integrity refers to how accurately visual elements represent data. Information can vary widely, even for related data, so there's a desire and tendency to scale the data disproportionately in order to make it fit in the space allowed." > Framework drawn from: Tufte, Edward R. 2001. *The Visual Display of Quantitative Information*. 2nd edition. Graphics Press. --- # Six Principles of Graphical Integrity * Representations of numbers should match their true proportions. * Labeling should be clear and detailed. * Designs should not vary from some ulterior motive, but show only data variations. * In time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units. * The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data. * Graphics must not quote data out of context. --- # Principles 1-3 .pull-left[ * Republicans from the U.S. Congress questioned Cecile Richards, the president of Planned Parenthood, regarding the misappropriation of $500 million in annual federal funding. This graph was presented as a point of emphasis. * Representative Jason Chaffetz of Utah explained: “In pink, that’s the reduction in the breast exams, and the red is the increase in the abortions. That’s what’s going on in your organization.” ] .pull-right[  ] --- .pull-left[  ] .pull-right[  ] --- * Say the following piecharts represent results of an election poll at time points: A = September, B = October, and C = November. At each time point we present the proportion of the poll respondents who say they will support one of 5 candidates: 1 through 5. * Based on these 3 piecharts, answer the following questions: * At time point A, is candidate 5 doing better than candidate 4? * Did candidate 3 do better at time point B or time point C? * Who gained more support between time point A and time point B, candidate 2 or candidate 4?  ---  ---  ---  --- # Principle 5 * The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data. .pull-left[ <img src="img./Lec3_3D_confusing_in_bar_graph.svg.png" width="350" height="300" /> By <a href="//commons.wikimedia.org/wiki/User:Smallman12q" title="User:Smallman12q">Smallman12q</a> - <span class="int-own-work" lang="en">Own work</span>,
CC0
,
Link
] --- # Recap Principles of Graphical Integrity * Show ***data variation***, not design variation. * Clear, detailed, and thorough ***labeling*** and ***appropriate scales***. * Size of the ***graphic effect*** should be ***directly proportional to the numerical quantities***. * Be mindful of ***context*** when designing data graphics. * ***Don't*** use pie charts and 3D charts. --- # Data-to-ink Ratio * Tufte argues that we can pursue "graphical integrity" by aiming to keep our ***data-to-ink*** ratio on a plot as close to 1 as possible. * This means that the amount of ink we add to the plot should be comparable to the amount of data we display on a plot, minimizing decoration. * The goal is to avoid other distractions in order to focus our attention on just the data. --- # Ethical Implications * It is important to note that clarity and legibility is not always the goal of data visualization. * Feminist data colleagues have shown how sometimes we design visualizations to elicit emotion or provoke contemplation, and that decoration can sometimes animate that. --- # Ethical Implications * There are critical tradeoffs to balancing ***data-to-ink*** on a plot. * Researchers have shown how these conventions can send the signal that data emerged from nowhere in particular, rather than from people with certain standpoints, ideas, and biases. * They have shown that minimizing decoration doesn't necessarily remove human judgment from the data (because human judgment is always in our data) but instead serves the rhetorical purpose of convincing us to trust the data. * As we think about how to design our data visualizations, we might contemplate our goals for display and how the visualization conventions we choose to implement advance them.