class: center, middle, inverse, title-slide # Lec 18: Writing Functions ## SDS 192: Introduction to Data Science ###
Shiya Cao
Statistical & Data Sciences
, Smith College
###
Fall 2024
--- # Today's Learning Goals * Be able to write basic functions. --- # Why Write Functions? * Sometimes we find ourselves writing very similar lines of code over and over again for different data frames or variables. --- # Example ```r library(tidyverse) diamonds |> filter(cut == "Premium") |> head(1) ``` ```r diamonds |> filter(cut == "Very Good") |> head(3) ``` ```r diamonds |> filter(cut == "Good") |> head(6) ``` --- # User-defined Functions Basic format: ```r function_name <- function(arguments) {
return(x) } ``` > By default output of last line in function. --- # Writing Functions ```r diamonds |> filter(cut == █) |> head(█) ``` --- # Writing Functions ```r my_diamonds <- function(rank, n) { diamonds |> filter(cut == rank) |> head(n) } ``` ```r my_diamonds("Good", 6) ``` --- # Making Arguments Optional ```r my_diamonds <- function(rank = "Good", n = 6) { diamonds |> filter(cut == rank) |> head(n) } ``` ```r my_diamonds() ``` --- # Overriding Defaults ```r my_diamonds <- function(rank = "Good", n = 6) { diamonds |> filter(cut == rank) |> head(n) } ``` ```r my_diamonds("Premium", 1) ``` > `Premium` overrides the default value of `Good` for argument `rank`; `1` overrides the default value of `6` for argument `n` --- # Naming Arguments Optional ```r my_diamonds <- function(rank = "Good", n = 6) { diamonds |> filter(cut == rank) |> head(n) } ``` ```r my_diamonds(rank = "Premium", n = 1) ``` ```r my_diamonds(n = 1, rank = "Premium") ``` ```r my_diamonds("Premium", 1) ``` > Order matters if arguments not named! --- # Generalizing Functions for Data Frames ```r diamonds |> group_by(cut) |> summarize(mean = mean(carat)) ``` ```r diamonds |> group_by(clarity) |> summarize(mean = mean(depth)) ``` ```r diamonds |> group_by(color) |> summarize(mean = mean(price)) ``` --- # Generalizing Functions for Data Frames ```r █ |> group_by(█) |> summarize(mean = mean(█)) ``` --- # Embracing ```r function_name <- function(data, group_var, summary_var) { data |> group_by({{ group_var }}) |> summarize(mean = mean({{ summary_var }})) } ``` > Use `{{...}}` to pass a variable name in a function. --- # Embracing ```r grouped_mean <- function(data, group_var, summary_var) { data |> group_by({{ group_var }}) |> summarize(mean = mean({{ summary_var }})) } ``` ```r grouped_mean(data = diamonds, group_var = cut, summary_var = carat) ``` --- # Writing Functions to Create Plots ```r diamonds |> ggplot(aes(x = carat)) + geom_histogram(binwidth = 0.1) ``` ```r diamonds |> ggplot(aes(x = depth)) + geom_histogram(binwidth = 1) ``` ```r diamonds |> ggplot(aes(x = price)) + geom_histogram(binwidth = 1000) ``` --- # Writing Functions to Create Plots ```r █ |> ggplot(aes(x = █)) + geom_histogram(binwidth = █) ``` --- # Writing Functions to Create Plots ```r histogram <- function(data, var, binwidth = NULL) { data |> ggplot(aes(x = {{ var }})) + geom_histogram(binwidth = binwidth) } ``` ```r histogram(data = diamonds, var = carat, binwidth = 0.1) + labs(x = "Size (in carats)", y = "Number of diamonds") ```