Basic Information

  • Course title: SDS 192 Introduction to Data Science
  • Instructor: Shiya Cao (she/her/hers) - Assistant Professor of Statistical & Data Sciences
  • Office location: McConnell 211
  • Email: Slack: Click hashtag icon in navbar for the browser interface or use the desktop/mobile app.
  • Meeting times/location:
    • Section 01: MWF 9:25 AM - 10:40 AM / Seelye 311
    • Section 02: MWF 10:50 AM 10:55 AM - 12:05 PM 12:10 PM / Seelye 206
  • Getting help:
    • For non personal/sensitive questions, ask on the #questions channel on Slack. For personal/sensitive matters, Slack DM me.
    • The Spinelli Center supports students doing quantitative work across the curriculum. In particular, they employ:
    • Student hours: WF 3:45 PM - 4:45 PM, TH 9:00 AM - 10:00 AM / McConnell 214
    • Your fellow students are also an excellent source for explanations, tips, etc.

Instructor work-life balance

  • I will respond to Slack messages sent during the week within 24h. I will respond to Slack messages sent during the weekend at my own discretion. It is important that you plan when you start your assignments accordingly.
  • If possible, please only Slack me with briefer and administrative questions; I prefer having more substantive conversations in person as it takes me less energy to understand where you are at.
  • I will do my best to return all grading as promptly as possible.

How Can I Succeed in This Class?

Listen to the strategies your peers who have taken this course recommend:

  • "If you study regularly, attend lectures, and do lab/practice, you will do awesome in the course."
  • "The in–class exercises are an important way of making sure you understand the material before you try to do a lab or a mini–project."
  • "...remember that you can do it even when things get challenging!"
  • "If a student is considering taking this course in the future, I'd say to actually type out the code – don't copy and paste. That will get your muscle memory and just your regular memory up to date faster, and will help you code better in the future."
  • "I recommend that students ask questions in class and share what they think the answers are for in class exercises because although I would blurt out the wrong answer sometimes it is what ultimately helped me learn..."
  • "The readings can be very useful for having a strengthened knowledge of challenging concepts, so I recommend utilizing the textbooks."
  • "Always do the reading and reviewing the slide. Try to practice coding as much as possible and ask instruction or peers for help."
  • "Practice a LOT and be patient! The more you practice, the more you learn. Lastly, Prof. Cao will say this in a class, and it is very important: Error Codes are our friends! The mistakes we make teach us what works and what doesn't."
  • "...study for quizes, and try some code in the scratchwork qmd you'll be using for the quiz beforehand to make sure everything is working well."
  • "Office hours are incredibly helpful and encouraging if at any point you are struggling or feel discouraged."
  • "Take advantage of the spinelli center and office hours!"
  • "I came into this class having never coded before, so R was the first language I learned... Read the textbooks provided and go over/reference assignments if you are confused. I would also recommend messaging prof Cao on Slack if you have any questions or, even better, going to her office hours. She is very helpful, very available, and wants to help you learn and succeed in her class. She will work with you! I am leaving SDS 192 confident in my ability to code in R and, very honestly, with an excitement for data science which I didn't think I would have before taking this class."

Course Description & Objectives

This introduction to data science covers a variety of data science aspects using R. You will learn how to visualize multidimensional data; design accurate, clear, and appropriate data graphics; manipulate data in a variety of formats; create data maps and perform basic spatial analysis; conduct exploratory data analysis; implement reproducible data science workflows using RStudio and GitHub as well as project workflows such as “minimally viable product”; and understand common issues related to data ethics. SDS 100 is required for students who have not previously completed other SDS courses.

Upon completion of this course, you are expected to:

  1. Gain a working understanding of how to create effective data graphics, tranform datasets into needed formats, and create basic maps using R.
  2. Gain proficiency in R programming skills, specifically in data visualization, data wrangling, and mapping skills, how to interpret output, and how to author data-driven articles using a Markdown document.
  3. Gain competence in performing reproducible data science workflows using RStudio and GitHub.
  4. Gain experience conducting exploratory data analysis and applying data science skills in the disability inclusion context.
  5. Gain a working understanding of how to think more critically about datasets and how best to work ethically with them.

Lecture Schedule

The lecture schedule and associated readings can be found on the main page of this course webpage. The ModernDive textbook and the MDSR textbook are accessible on the navigational bar of this webpage.

Class norms

  • You are expected to attend in-person classes--no expectation of a remote attendance option this semester per the department policy:

    In keeping with Smith’s core identity and mission as an in-person, residential college, SDS affirms College policy (as per the Provost and Dean of the College) that students will attend class in person. SDS courses will not provide options for remote attendance. Students who have been determined to require a remote attendance accommodation by the Office of Disability Services will be the only exceptions to this policy. As with any other kind of ADA accommodations, please notify your instructor during the first week of classes to discuss how we can meet your accommodations.

  • I expect you to attend classes because knowledge acquired in this course is accumulated; it will be challenging for you and me to help you succeed in this course if you frequently miss classes. Consistent with college policy, if you miss more than four weeks of class meetings, you are in danger of failing to earn credit for this course (receiving a grade of E). I understand things happen, therefore, occasional absences are excused. If you can’t make it to class, follow along with the class slides that will always be posted to our course webpage. If you must miss a class entirely, you are responsible for asking your peers for what you missed. For example, makeup lectures will not be held during student hours.
  • Bring your laptop, a set of headphones, pens/pencils with paper notebook (or tablet with stylus) to every lecture.
  • You are expected to show up to each class period time on time and stay until the end of lecture. If you need to leave early, please confirm with me at the beginning of lecture and sit somewhere where your departure will be minimally disruptive.
  • Lecture will be held as usual on Monday 11/25 (before Thanksgiving).
  • I will set aside some class time (approximately 15 minutes) during the last week of classes for you to complete the course feedback questionnaire.
  • As for whether you should be on Smith campus during exam week, this is for you and your groupmates to decide. You do not need to consult me.

Assessment

All due dates can be found on the main page of this course webpage. This course will employ standards-based assessment.

Policies

  1. Labs resubmission: Because labs will be graded for completion, you will have a chance to revise a lab to gain full credits within two weeks after the grade of the lab is posted. Labs resubmission will not be accepted outside of the two weeks' time frame.
  2. Late assignments: I understand that you will sometimes need to prioritize other things over meeting assignment deadlines (e.g., your health, wellness, families, communities, jobs, and other coursework). My late policy attempts to balance flexibility with accountability. There is a 24-hour grace period on all labs and mini-project assignments except Mini-project 3 (due the last day of exams). There will be no penalties for submitting the labs and the mini-projects within this 24-hour period, and you do not need to inform me that you intend to take the extra time. You can also request up to a 72-hour extension on any mini-project assignment by DMing me on Slack, as long as you make that request at least 48 hours before the original assignment due date. Please note that because the mini-project assignments are collaborative, communicate in your group about the intended late submission first. Please note that extensions will not be granted for quizzes.
  3. Academic honesty: All your work must follow the Smith College Academic Honor Code Statement. Any cases of dishonesty or plagiarism will be reported to the Academic Honor Board. Examples of dishonesty or plagiarism include:
    • Submitting work completed by another student as your own.
    • Copying and pasting words from sources without quoting and citing the author.
    • Paraphrasing material from another source without citing the author.
    • Submitting AI-generated content without proper attribution -- Whenever you get help with your programming tasks, it is crucial to provide clear and transparent attribution. Include a comment or annotation in your code specifying that certain sections were generated with the help of an AI code-completion tool.
    • Failing to cite your sources correctly.
    • Falsifying or misrepresenting information in submitted work.
    • Paying another student or service to complete assignments for you.
  4. Grading: I reserve the right to not discuss any grading issues in class and instead direct you to student hours.

Accommodations

It is my goal for everyone to succeed in this course. If you have personal circumstances that may impact your experience of our classroom, I encourage you to contact Accessibility Resource Center (ARC) at 413-585-2071 or at arc@smith.edu or in College Hall 104. The Center will generate a letter that indicates to me what kind of support you need and how I can make your classroom experience more accommodating. Once you have this letter, I would like you to DM me on Slack and schedule an appointment here: https://calendar.app.google/z77ksBdRCP3hXfJT7 to discuss ideas about how we can tailor the course accordingly. While you can request accommodations at any time, the sooner we start this conversation, the better. At no point will I ask you to divulge details about your personal circumstances to me.


Jacobson Center for Writing, Teaching & Learning

The Statistical & Data Sciences Program is committed to ensuring that our students learn the skills necessary to become exemplary writers within our field. To that end, we have adopted a curricular model called Writing Enriched Curriculum, which has enabled us to articulate the writing skills we hope our graduates will acquire. These include:

  • the ability to adapt voice and expectations to the requirements of different genres (e.g. blog posts vs. analytical essays);
  • the ability to follow a writing process that includes brainstorming, outlining, initial drafting, peer review, editing, and revising;
  • the ability to write with clarity and precision, even about issues of ambiguity and uncertainty;
  • the ability to prioritize the important parts of a process and/or project to communicate;
  • the ability to communicate a research question and how analysis will support that question;
  • the ability to create impactful figures and tables; and
  • the ability to code with documentation and comments that are correctly indented, use a consistent style, and are human-readable.

Much of what you do in this course will support your understanding and development of these skills. If you have any questions about them, or would like more help in this work, please contact Sara Eddy and/or make an appointment to take your work to the Jacobson Center for Writing on their website.


Student Well-being

College life is stressful, and life outside of college can be overwhelming. It is my position that attending to your physical and mental health and well-being should be a top priority. I will remind you of this often throughout the semester. I encourage you to schedule a time to talk with me if you are struggling with this course. If you, or anyone you know, is experiencing distress, there are numerous campus resources that can provide support via the Schacht Center. I can point you to these resources at any time throughout the semester.

If you need on-campus support, I encourage you to make an appointment by calling 413-585-2800, emailing counselingservices@smith.edu, or visiting the Schacht Center in person between the hours of 9:00 AM - 4:30 PM. You can access after-hours and weekend support by calling 413-585-2800 or using the TELUS Health app.

Getting help is a smart and courageous thing to do -- for yourself and for those who care about you.


Trigger Warnings

A trigger is a topic or image that can precipitate an intense emotional response. In this course, I am going to assign disability inclusion datasets for three mini-projects and some disability inclusion background readings. The goals of integrating disablity inclusion components into introductory data science pedagogy include making connections between STEM fields and disability studies so that STEM subjects become more appealing to traditionally underrepresented groups, encouraging diversity of thought in approaching data science problems, as well as promoting disability awareness in the data science and statistics community. We take a social model perspective and critical disability lens in this course, which means that we view disability as human diversity and engage more people to address societal issues. However, I recognize that people may have different perspectives, thus I provide a trigger warning before you engage in our disability inclusion context. I also offer an opportunity for discussion by setting up an anonymous form where you can provide feedback on disability inclusion related topics covered in this course.


Group Dynamics

Working in a group can be challenging at times. I hope that we can foster a collaborative and caring environment in this classroom.

Care for each other

  • If your groupmate (and any other classmate) helps you out on an assignment, says something brilliant that solidifies the material for you, or just listens when you've had a bad day, give them a shout-out on our #appreciations Slack channel.
  • Check-in with colleagues before starting collaborative work. “What three words describe how you're feeling?” “Name one challenge and one success from this week.” “What are you doing for self-care right now?” Thank each other for sharing where they're at.
  • Cheer on colleagues as they give presentations or try something out for the first time.
  • Ask questions often in our #questions channel. Even better, help each other out by answering questions when you can.

I also suggest that your group defines roles in the group. Your group will function better when everyone has a clear understanding of their roles and responsibilities.

Define roles

  • No one should have a role that doesn’t involve writing R code. Learning how to write code is an essential part of this course, so if you aren’t doing that, you won’t be able to achieve the learning goals of the course.
  • Groups should be non-hierarchical. Leaders may emerge naturally and that is great, but no one in the group is the boss. Decisions should be made as a group.
  • Leaving someone out—regardless of the reasons—exemplifies poor leadership. If a group member is struggling to keep up, help them. If a group member is not carrying their weight, talk to them. Ignoring interpersonal conflicts will not make them go away. The goal of working in groups in this class is to equip you with skills to collaborate effectively and equitably in groups throughout your careers.
  • When issues arise, let me know as early as possible. However, I will ask you to take responsibility to speak with your partner first if you haven't done so and then we can come together or meet one-on-one to come up with a solution.

Code of Conduct

As the instructor and assistants for this course, we are committed to making participation in this course a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. Examples of unacceptable behavior by participants in this course include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.

As the instructor and assistants we have the right and responsibility to point out and stop behavior that is not aligned to this Code of Conduct. Participants who do not follow the Code of Conduct may be reprimanded for such behavior. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the instructor.

All students, the instructor, and all data assistants are expected to adhere to this Code of Conduct in all settings for this course: lectures, student hours, tutoring hours, and over Slack.

This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available here.