class: center, middle, inverse, title-slide # Lec 03: Grammar of Graphics ## SDS 192: Introduction to Data Science ###
Shiya Cao
Statistical & Data Sciences
, Smith College
###
Fall 2024
--- # Today's Learning Goals * Understand basic concepts of grammar of graphics. * Understand basic functions of `ggplot2`. --- # Elements of Data Graphics * Data * data * Geometric objects (What do we literally draw?) * geom_*() * Aesthetic mappings (Visual cues: Position, length, area, etc.; Coordinate system: How are the data points organized?) * aes() * Scale (How does distance translate into meaning?) * scale_*() * Context (In relation to what?) * labs() * Small multiples and layers (How is multivariate information incorporated into a two-dimensional data graphic?) * facet_wrap() ---  ---  ---  --- # Context * Titles * A descriptive title is used to introduce the graph. * Labels * Axes and points are labeled to indicate what data is represented on the graph. * Legends * The meaning of varying colors, sizes, and shapes are represented in a legend. * Captions * Further detail about the plot is provided in explanatory text. ---  ---  ---  --- # Grammar of Graphics * A statistical graphic is a mapping of ***data*** variables to ***aes***thetic attributes of ***geom***etric objects. * Implemented in R as `ggplot2`. * `ggplot2` is included in the `tidyverse` library. --- # [Tidyverse](https://www.tidyverse.org/) .pull-left[ <img src="img./Lec3_tidyverse.jpeg" width="400" /> ] .pull-right[ * Image source: [Posit BBC on X](https://x.com/posit_pbc/status/1145592633823244289) ] --- # Basic Formula `ggplot()` Functions * data: the dataset containing the variables of interest. * aes(): aesthetic mappings (mapped to variables in the dataset). For example, x/y position, color, shape, and size. * In a Cartesian plot, we must supply the variables that will appear on the axes (via `x = ` and `y = `) ```r ggplot(data =
, aes(
)) + geom_
() ``` --- ```r ggplot(data = pioneer_valley_2013, aes(x = CEN_MEDRENT, y = CEN_MEDOWNVAL)) ``` <img src="img./Lec2_ggplot_new_1.png" width="720" /> --- # Where is the Data? * geom_*(): geometric objects (What do we literally draw?). For example, the five named graph (5NG): * Scatterplot: `geom_point()` * Linegraph: `geom_line()` * Histogram: `geom_histogram()` * Boxplot: `geom_boxplot()` * Barplot: `geom_bar()`, `geom_col()` * We append this function, along with additional functions for styling the plot, using a `+` sign. --- ```r ggplot(data = pioneer_valley_2013, aes(x = CEN_MEDRENT, y = CEN_MEDOWNVAL)) + geom_point() ``` <img src="img./Lec2_ggplot_new_2.png" width="720" /> --- # Adding Context to Plots * What context should *always* be included on a plot? * Unit of Observation * Variables Represented * Filters * Geographic Scope * Temporal Scope * We can add this context via titles and labels, using the `labs()` function. --- ```r ggplot(data = pioneer_valley_2013, aes(x = CEN_MEDRENT, y = CEN_MEDOWNVAL)) + geom_point() + labs(title = "Housing Characteristics of Pioneer Valley Municipalities, 2013", x = "Median Gross Rent", y = "Median Value-Owner-Occupied Housing") ``` <img src="img./Lec2_ggplot_new_3.png" width="500" /> --- # Adjusting the Scale * scale_*(): range of values, colors, etc. --- ```r # Adjusting the Scale ggplot(data = pioneer_valley_2013, aes(x = CEN_MEDRENT, y = CEN_MEDOWNVAL)) + geom_point() + scale_x_log10() + labs(title = "Housing Characteristics of Pioneer Valley Municipalities, 2013", x = "Median Gross Rent", y = "Median Value-Owner-Occupied Housing") ``` <img src="img./Lec2_ggplot_new_4.png" width="450" /> --- # Facets (Small Multiples and Layers) * facet_wrap() --- ```r ggplot(data = pioneer_valley_2013, aes(x = CEN_MEDRENT, y = CEN_MEDOWNVAL)) + geom_point() + facet_wrap(~COUNTY) + # Faceting scale_x_log10() + labs(title = "Housing Characteristics of Pioneer Valley Municipalities, 2013", x = "Median Gross Rent", y = "Median Value-Owner-Occupied Housing") ``` <img src="img./Lec2_ggplot_new_5.png" width="450" />