class: center, middle, inverse, title-slide # Brief Summary: Data Wrangling Learning Reflections ## SDS 192: Introduction to Data Science ###
Shiya Cao
Statistical & Data Sciences
, Smith College
###
Fall 2024
--- # Single Table Analysis * "I have learned how to use select, mutate, summarize, filter, etc to clean up data and make it easier to work with." * "the mutate function creates a new column" * "I have learned how the data wrangling functions work together. For example, most of the time, "group_by()" and "summarize()" need to be used together to work and be efficient. Furthermore, I learned that within the "summarize()," you can adjust data within a column using division or multiplication with functions like 'sum()' or 'na.rm = TRUE.'" * "I learned that you can use select() to create subsets of data with specific columns, filter() to create subsets with specific rows..." * "You need to know how to use each of the wrangling verbs because they all have different uses." * "Even though some functions are easy to get mixed up, the more you practice, the better you use them." --- # Joining Data * "how to join tables together (and what the different join functions do)" * "I have learned the difference between join functions: inner join, left join, and right join" * "you need an overlapping, common column when joining two data sets" * "All the different types of joins how they are effective for different types of situations." * "I feel like I understand how to join data frames together and why the direction in which you join them has an effect on the resulting data frame." * "To join tables, you have to be aware of which data set you want to be favored and act accordingly." --- # Tidy Data * "The differences between pivot_wider and pivot_longer and how to use them" * "The tidy data concepts are especially good to know." * "Pivot wider takes data from a long format to a wide format, while pivot longer takes data from a wide format to a long format" --- # Other Data Wrangling Functions * "how to split columns into multiple columns" * "I learned how to use case_when to assign new values to a cell when a certain condition is met" * "Pay attention to each code the professor showed in the class. There must be something you don't know before." --- # Functions and Iteration * "write functions to avoid coding redundancy" * "I learned how to write functions in R in order to do repetitive coding much quicker." * "the syntax for writing functions" * "I really find data iteration is very useful because it can help users save a lot of time of using and repetitive work." --- # Coding in General * "I learned how to use pipes in R." * "Reading the context, figuring out the relationships between each variable before writing the code, and reflecting the meaning of R after coding." * "There can be multiple ways to write a function for the same results, you should use whichever makes the most sense to you." * "start simple - it's good to run a minimum viable product before trying something complicated that could contain many errors." * "If you are confused on a function you can type it into the console with a question mark and it will give you examples on how it is supposed to be used." --- # Data Ethics * "Further discuss ethical issues regarding the dataset, reminding me to think critically about the results."