class: center, middle, inverse, title-slide # Lec 24: Iteration ## SDS 192: Introduction to Data Science ###
Shiya Cao
Statistical & Data Sciences
, Smith College
###
Fall 2024
--- # Today's Learning Goals * Understand `across()` and `map()` --- # What is Iteration? * Perform the same action for multiple “things” * `facet_wrap()` draws a plot for each subset * `group_by()` plus `summarize()` computes summary statistics for each subset --- # For Loops ```r for (x in 1:5) { print(x) } ``` ``` ## [1] 1 ## [1] 2 ## [1] 3 ## [1] 4 ## [1] 5 ``` --- # For Loops ```r library(tidyverse) df <- data.frame( name = c("obs1", "obs2", "obs3", "obs1"), a = c(2, 3, 4, 5), b = c(4, 7, 2, 1), c = c(4, 9, 3, 2) ) for (i in df$a) { print(i + 1) } ``` ``` ## [1] 3 ## [1] 4 ## [1] 5 ## [1] 6 ``` --- # For Loops ```r library(tidyverse) df <- data.frame( name = c("obs1", "obs2", "obs3", "obs1"), a = c(2, 3, 4, 5), b = c(4, 7, 2, 1), c = c(4, 9, 3, 2) ) for (i in df |> select(a:c)) { print(sum(i)) } ``` ``` ## [1] 14 ## [1] 14 ## [1] 18 ``` --- # `across()` ```r df |> summarize(across(a:c, function(x) sum(x, na.rm = TRUE))) ``` ```r df |> group_by(name) |> summarize(across(everything(), function(x) sum(x, na.rm = TRUE))) ``` --- # `purrr` * Package for working with functions and vectors * Based on the functional programming paradigm * Provides a family of `map()` functions --- # `map()` <img src="img./Lec21_map.png" width="600" /> --- # Iterate the Function over Those Values .pull-left[ ```r df <- data.frame( name = c("obs1", "obs2", "obs3", "obs1"), a = c(2, 3, 4, 5), b = c(4, 7, 2, 1), c = c(4, 9, 3, 2) ) ``` ] .pull-right[ ```r sum_x_in_df <- function(x) { sum(x, na.rm = TRUE) } ``` ```r # `map()` creates a list map(df |> select(a:c), sum_x_in_df) ``` ``` ## $a ## [1] 14 ## ## $b ## [1] 14 ## ## $c ## [1] 18 ``` ] --- # Iterate the Function over Those Values .pull-left[ ```r df <- data.frame( name = c("obs1", "obs2", "obs3", "obs1"), a = c(2, 3, 4, 5), b = c(4, 7, 2, 1), c = c(4, 9, 3, 2) ) ``` ] .pull-right[ ```r sum_x_in_df <- function(x) { sum(x, na.rm = TRUE) } ``` ```r # `map_dfc()` creates a tibble, (c for columns) stacks them side-by-side map_dfc(df |> select(a:c), sum_x_in_df) ``` ``` ## # A tibble: 1 × 3 ## a b c ## <dbl> <dbl> <dbl> ## 1 14 14 18 ``` ] --- # Iterate the Function over Those Values .pull-left[ ```r df <- data.frame( name = c("obs1", "obs2", "obs3", "obs1"), a = c(2, 3, 4, 5), b = c(4, 7, 2, 1), c = c(4, 9, 3, 2) ) ``` ] .pull-right[ ```r print_rows <- function(obs = "obs1") { df |> filter(name == obs) } ``` ```r `map_dfr()` creates a tibble, (r for rows) stacks the smaller tibbles on top of each other map_dfr(c("obs1", "obs2"), print_rows) ``` ``` ## name a b c ## 1 obs1 2 4 4 ## 2 obs1 5 1 2 ## 3 obs2 3 7 9 ``` ]