class: center, middle, inverse, title-slide # Lec 21: Ethical Issues in Data Science ## SDS 192: Introduction to Data Science ###
Shiya Cao
Statistical & Data Sciences
, Smith College
###
Fall 2024
--- # Today's Agenda * Data ethics framework for this course * Algorithmic bias * Ethical lenses --- # Data Ethics in this Course 1. Data is not "out there" or "given" but generated by individuals and institutions that hold particular assumptions and commitments. 2. Because data science has historically been exclusionary, narrow worldviews have been at the helm in deciding what goes into data collection and analysis. This means that other assumptions and worldviews are often not considered in data. 3. There are both benefits and harms to data collection that are often not equitably distributed amongst diverse social groups. With this in mind, we form our ethics framework for this course: 1. What assumptions and commitments informed the design of this dataset? 2. Who has had a say in data collection and analysis regarding this dataset? Who has been excluded? 3. What are the benefits and harms of this dataset, and how are they distributed amongst diverse social groups? --- # Exmples of Ethical Concerns of the Mini-project 1 Dataset * What assumptions and commitments informed the design of this dataset? * "One ethical concern is how this survey quantifies disability. Within this study, travel disabilities are categorized by length of time, ranging from less than six months to being lifelong. However, it's unclear whether this length of time refers to when a person has received an official disability diagnosis or when they believed that they first started showing symptoms. Women and people of color are less likely to receive accurate diagnoses for medical conditions^[Szabo, L. (2024, January 18). Women and minorities bear the brunt of medical misdiagnosis - KFF Health News.], and therefore may be misclassified within this survey as having no travel disability even though that may not reflect their lived experiences." --- # Exmples of Ethical Concerns of the Mini-project 1 Dataset * Who has had a say in data collection and analysis regarding this dataset? Who has been excluded? * "The Federal Highway Administration compiled this data so they had the final say in terms of methodology, data collection, and analysis. This is important to consider because data collection and analysis is inherently presented in a biased manner given the internal objectives of the organization that paid for the study. This data only provides insight into people who decided to fill out the survey, therefor it is not representative of the whole US population and only provides insight into the lives of the survey respondents." --- # Exmples of Ethical Concerns of the Mini-project 1 Dataset * What are the benefits and harms of this dataset, and how are they distributed amongst diverse social groups? * "Some benefits of this data set is that it does provide insight into the travel patterns of people with disabilities living in traditional housing. A harmful aspect of the dataset is that it fails to include a large portion of abled and disabled people living in nontraditional housing which could lead us to ignore potentially accessibility barriers for those people." --- # Algorithmic bias * Algorithms reflect the biases of their creators (both people and data). --- # Data Ethics Discussion 1 > Deep neural networks are more accurate than humans at detecting sexual orientation from facial images 1. What assumptions and commitments informed the design of this dataset? 2. Who has had a say in data collection and analysis regarding this dataset? Who has been excluded? 3. What are the benefits and harms of this dataset, and how are they distributed amongst diverse social groups? 4. Could this algorithm be used to discriminate against people based on their sexual orientation? 5. Have you seen similar examples like this? --- # Data Ethics Discussion 2 > What does it mean to ‘solve’ the problem of discrimination in hiring? Social, technical and legal perspectives from the UK on automated hiring systems 1. What practices of the three automated hiring systems (AHSs) mentioned in the article might mitigate bias and discrimination in hiring? 2. What practices of the three AHSs might reinforce existing bias and discrimination in hiring? 3. Have you seen similar examples like this? --- # Ethical Lenses * **Deontological ethics** focuses on rights, principles, and duties. * Rights Approach: Which option best respects the rights of all who have a take? * Justice Approach: Which option treats people equally or proportionately? --- # Ethical Lenses * **Consequentialist ethics**: If you want to know if an action is ethical, this framework tells us, then look at its consequences. * Utilitarian Approach: Which option will produce the most good and do the least harm? * Common Good Approach: Which option best serves the community as a whole, not just some members? --- # Ethical Lenses * **Virtue ethics** highlights the need for people with well-habituated virtues of moral character and well-cultivated, practically wise moral judgement to fill the gap. * Virtue Approach: Which option leads me to act as the sort of person I want to be? --- # There Are Other Ethical Lenses * Ethical lenses from different cultures. --- # Data Ethics Resources * [Federal Data Strategy](https://resources.data.gov/assets/documents/fds-data-ethics-framework.pdf) * [Algorithmic Justice League Equitable AI](https://assets.website-files.com/5e027ca188c99e3515b404b7/5e332b739c247f30b4888385_AJL%20101%20Final%20_1.22.20.pdf) * [Data Values and Principles Manifesto](https://datapractices.org/manifesto/) * [Design Justice Principles](https://designjustice.org/read-the-principles) * [Deon Data Ethics Checklist](https://deon.drivendata.org/) * [Ethics in Technology Practice](https://www.scu.edu/ethics-in-technology-practice/)