class: center, middle, inverse, title-slide # Lec 06: Frequency Plots and Facets ## SDS 192: Introduction to Data Science ###
Shiya Cao
Statistical & Data Sciences
, Smith College
###
Fall 2024
--- # Today's Learning Goals * Create histograms using `ggplot2`. * Create barplots using `ggplot2`. * Create facets using `ggplot2`. --- class: center, middle # The most important take-away from today is that frequency plots (histograms and barplots) involve *counting* the values in a variable. --- # Types of Variables * Categorical Variables (Qualitative): * Nominal Variables: Named or classified labels (e.g., names, zip codes, hair color). * Ordinal Variables: Ordered labels (e.g., letter grades, pollution levels). * Numerical Variables (Quantitative): * Discrete Variables: Countable variables (e.g., number of students in this class). * Continuous Variables: Measured variables (e.g., temperature, height). --- # Types of Variables in `ggplot2` * `ggplot2` treats `character` and `factor` as discrete. * `ggplot2` treats `integer` and `double` as continuous. --- # Histograms .pull-left[ * Visualizes *distribution* of a ***numerical*** variable * What are maximum and minimum values? * How spread out are the values? * What is the "center" or "most typical" value? * What are frequent and infrequent values? ] .pull-right[ <!-- --><!-- --> ] --- # Histograms .pull-left[ 1. Create bins for numbers, each with the same range of values [i.e. 10-20, 20-30, 30-40, and so on]. 2. Count the numbers in each bin. 3. Set the height of a bar for that bin to the count. ] .pull-right[ <!-- --> ] --- # Barplots .pull-left[ * Visualizes *counts* of a ***categorical*** variable * Which value appears the most? * Which value appears the least? * How evenly distributed are the counts? ] .pull-right[ <!-- --> ] --- # Barplots .pull-left[ 1. Determine the unique values and places them on the x-axis. 2. Count the number of times each value appears. 3. Set the height of a bar for that category to the count. ] .pull-right[ <!-- --> ] --- # Barplots .pull-left[ * Is *not* pre-counted in your data frame, we use `geom_bar()`. * Is pre-counted in your data frame, we use `geom_col()` with the y-position aesthetic mapped to the variable that has the counts. ] .pull-right[ <!-- --><!-- --> ] --- # Facets (Small Multiples and Layers) .pull-left[ * Split a visualization by the values of another ***categorical*** variable. * facet_wrap(~VARIABLE NAME) or facet_wrap(vars(VARIABLE NAME)) ] .pull-right[ <!-- --> ]