Himalaya-The Podcast Player

4.8K Ratings
Open In App
title

Linear Digressions

Ben Jaffe and Katie Malone

94
Followers
334
Plays
Linear Digressions

Linear Digressions

Ben Jaffe and Katie Malone

94
Followers
334
Plays
OVERVIEWEPISODESYOU MAY ALSO LIKE

Details

About Us

Explorations in Machine Learning and Data Science

Latest Episodes

A Data Science Take on Open Policing Data

A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay Area data scientist and a volunteer who reached out to tell us about his studies with the Stanford Open Policing dataset.

23 MIN2 d ago
Comments
A Data Science Take on Open Policing Data

Procella: YouTube's super-system for analytics data storage

This is a re-release of an episode that originally ran in October 2019. If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that data) and went a different way: they built one analytics data system to serve them all. Procella, the system they built, is the topic of our episode today: by deconstructing the system, we dig into the four motivating uses of this system, the complexity they had to introduce to service all four uses simultaneously, and the impressive engineering that has to go into building something that “just works.”

29 MIN1 w ago
Comments
Procella: YouTube's super-system for analytics data storage

The Data Science Open Source Ecosystem

Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom do this maintenance on a purely volunteer basis. The health of the data science ecosystem depends on the support of open source projects, on an individual and institutional level. https://hdsr.mitpress.mit.edu/pub/xsrt4zs2/release/2

23 MIN2 w ago
Comments
The Data Science Open Source Ecosystem

Rock the ROC Curve

This is a re-release of an episode that first ran on January 29, 2017. This week: everybody's favorite WWII-era classifier metric! But it's not just for winning wars, it's a fantastic go-to metric for all your classifier quality needs.

15 MIN3 w ago
Comments
Rock the ROC Curve

Criminology and Data Science

This episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Zach specializes in bringing data science methods to studies of criminal behavior, and got in touch after our last episode (about racially complicated recidivism algorithms). Our conversation covers a wide range of topics—common misconceptions around race and crime statistics, how methodologically-driven criminology scholars think about building crime prediction models, and how to think about policy changes when we don’t have a complete understanding of cause and effect in criminology. For the many of us currently re-thinking race and criminal justice, but wanting to be data-driven about it, this conversation with Zach is a must-listen.

30 MINJUN 15
Comments
Criminology and Data Science

Racism, the criminal justice system, and data science

As protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into one of the ways that data science perpetuates and amplifies racism in the American criminal justice system. COMPAS is an algorithm that claims to give a prediction about the likelihood of an offender to re-offend if released, based on the attributes of the individual, and guess what: it shows disparities in the predictions for black and white offenders that would nudge judges toward giving harsher sentences to black individuals. We dig into this algorithm a little more deeply, unpacking how different metrics give different pictures into the “fairness” of the predictions and what is causing its racially disparate output (to wit: race is explicitly not an input to the algorithm, and yet the algorithm gives outputs that correlate with race—what gives?) Unfortunately it’s not an open-and-shut case of a tuning parameter being off, or the ...

31 MINJUN 8
Comments
Racism, the criminal justice system, and data science

An interstitial word from Ben

A message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.

5 MINJUN 5
Comments
An interstitial word from Ben

Convolutional Neural Networks

This is a re-release of an episode that originally aired on April 1, 2018 If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tasks.

21 MINJUN 1
Comments
Convolutional Neural Networks

Stein's Paradox

This is a re-release of an episode that was originally released on February 26, 2017. When you're estimating something about some object that's a member of a larger group of similar objects (say, the batting average of a baseball player, who belongs to a baseball team), how should you estimate it: use measurements of the individual, or get some extra information from the group? The James-Stein estimator tells you how to combine individual and group information make predictions that, taken over the whole group, are more accurate than if you treated each individual, well, individually.

27 MINMAY 25
Comments
Stein's Paradox

Protecting Individual-Level Census Data with Differential Privacy

The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That problem has motivated the study of differential privacy, a set of techniques and definitions for keeping personal information private when datasets are released or used for study. Differential privacy is getting a big boost this year, as it’s being implemented across the 2020 US Census as a way of protecting the privacy of census respondents while still opening up the dataset for research and policy use. When two important topics come together like this, we can’t help but sit up and pay attention.

21 MINMAY 18
Comments
Protecting Individual-Level Census Data with Differential Privacy

Latest Episodes

A Data Science Take on Open Policing Data

A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay Area data scientist and a volunteer who reached out to tell us about his studies with the Stanford Open Policing dataset.

23 MIN2 d ago
Comments
A Data Science Take on Open Policing Data

Procella: YouTube's super-system for analytics data storage

This is a re-release of an episode that originally ran in October 2019. If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that data) and went a different way: they built one analytics data system to serve them all. Procella, the system they built, is the topic of our episode today: by deconstructing the system, we dig into the four motivating uses of this system, the complexity they had to introduce to service all four uses simultaneously, and the impressive engineering that has to go into building something that “just works.”

29 MIN1 w ago
Comments
Procella: YouTube's super-system for analytics data storage

The Data Science Open Source Ecosystem

Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom do this maintenance on a purely volunteer basis. The health of the data science ecosystem depends on the support of open source projects, on an individual and institutional level. https://hdsr.mitpress.mit.edu/pub/xsrt4zs2/release/2

23 MIN2 w ago
Comments
The Data Science Open Source Ecosystem

Rock the ROC Curve

This is a re-release of an episode that first ran on January 29, 2017. This week: everybody's favorite WWII-era classifier metric! But it's not just for winning wars, it's a fantastic go-to metric for all your classifier quality needs.

15 MIN3 w ago
Comments
Rock the ROC Curve

Criminology and Data Science

This episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Zach specializes in bringing data science methods to studies of criminal behavior, and got in touch after our last episode (about racially complicated recidivism algorithms). Our conversation covers a wide range of topics—common misconceptions around race and crime statistics, how methodologically-driven criminology scholars think about building crime prediction models, and how to think about policy changes when we don’t have a complete understanding of cause and effect in criminology. For the many of us currently re-thinking race and criminal justice, but wanting to be data-driven about it, this conversation with Zach is a must-listen.

30 MINJUN 15
Comments
Criminology and Data Science

Racism, the criminal justice system, and data science

As protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into one of the ways that data science perpetuates and amplifies racism in the American criminal justice system. COMPAS is an algorithm that claims to give a prediction about the likelihood of an offender to re-offend if released, based on the attributes of the individual, and guess what: it shows disparities in the predictions for black and white offenders that would nudge judges toward giving harsher sentences to black individuals. We dig into this algorithm a little more deeply, unpacking how different metrics give different pictures into the “fairness” of the predictions and what is causing its racially disparate output (to wit: race is explicitly not an input to the algorithm, and yet the algorithm gives outputs that correlate with race—what gives?) Unfortunately it’s not an open-and-shut case of a tuning parameter being off, or the ...

31 MINJUN 8
Comments
Racism, the criminal justice system, and data science

An interstitial word from Ben

A message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.

5 MINJUN 5
Comments
An interstitial word from Ben

Convolutional Neural Networks

This is a re-release of an episode that originally aired on April 1, 2018 If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tasks.

21 MINJUN 1
Comments
Convolutional Neural Networks

Stein's Paradox

This is a re-release of an episode that was originally released on February 26, 2017. When you're estimating something about some object that's a member of a larger group of similar objects (say, the batting average of a baseball player, who belongs to a baseball team), how should you estimate it: use measurements of the individual, or get some extra information from the group? The James-Stein estimator tells you how to combine individual and group information make predictions that, taken over the whole group, are more accurate than if you treated each individual, well, individually.

27 MINMAY 25
Comments
Stein's Paradox

Protecting Individual-Level Census Data with Differential Privacy

The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That problem has motivated the study of differential privacy, a set of techniques and definitions for keeping personal information private when datasets are released or used for study. Differential privacy is getting a big boost this year, as it’s being implemented across the 2020 US Census as a way of protecting the privacy of census respondents while still opening up the dataset for research and policy use. When two important topics come together like this, we can’t help but sit up and pay attention.

21 MINMAY 18
Comments
Protecting Individual-Level Census Data with Differential Privacy
hmly
Welcome to Himalaya LearningDozens of podcourses featuring over 100 experts are waiting for you.