Linear Digressions

Linear Digressions

Author: Vários
Narrator: Vários
Publisher: Podcast
Duration: 96:08:51

More information

Synopsis

Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.

Show more

Episodes

Scikit + Optimization = Scikit-Optimize

12/09/2016 Duration: 15min

We're excited to welcome a guest, Tim Head, who is one of the maintainers of the scikit-optimize package. With all the talk about optimization lately, it felt appropriate to get in a few words with someone who's out there making it happen for python. Relevant links: https://scikit-optimize.github.io/ http://www.wildtreetech.com/

Listen
Two Cultures: Machine Learning and Statistics

05/09/2016 Duration: 17min

It's a funny thing to realize, but data science modeling is usually about either explainability, interpretation and understanding, or it's about predictive accuracy. But usually not both--optimizing for one tends to compromise the other. Leo Breiman was one of the titans of both kinds of modeling, a statistician who helped bring machine learning into statistics and vice versa. In this episode, we unpack one of his seminal papers from 2001, when machine learning was just beginning to take root, and talk about how he made clear what machine learning could do for statistics and why it's so important. Relevant links: http://www.math.snu.ac.kr/~hichoi/machinelearning/(Breiman)%20Statistical%20Modeling--The%20Two%20Cultures.pdf

Listen
Optimization Solutions

29/08/2016 Duration: 20min

You've got an optimization problem to solve, and a less-than-forever amount of time in which to solve it. What do? Use a heuristic optimization algorithm, like a hill climber or simulated annealing--we cover both in this episode! Relevant link: http://www.lizsander.com/programming/2015/08/04/Heuristic-Search-Algorithms.html

Listen
Optimization Problems

22/08/2016 Duration: 17min

If modeling is about predicting the unknown, optimization tries to answer the question of what to do, what decision to make, to get the best results out of a given situation. Sometimes that's straightforward, but sometimes... not so much. What makes an optimization problem easy or hard, and what are some of the methods for finding optimal solutions to problems? Glad you asked! May we recommend our latest podcast episode to you?

Listen
Multi-level modeling for understanding DEADLY RADIOACTIVE GAS

15/08/2016 Duration: 23min

Ok, this episode is only sort of about DEADLY RADIOACTIVE GAS. It's mostly about multilevel modeling, which is a way of building models with data that has distinct, related subgroups within it. What are multilevel models used for? Elections (we can't get enough of 'em these days), understanding the effect that a good teacher can have on their students, and DEADLY RADIOACTIVE GAS. Relevant links: http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf

Listen
How Polls Got Brexit "Wrong"

08/08/2016 Duration: 15min

Continuing the discussion of how polls do (and sometimes don't) tell us what to expect in upcoming elections--let's take a concrete example from the recent past, shall we? The Brexit referendum was, by and large, expected to shake out for "remain", but when the votes were counted, "leave" came out ahead. Everyone was shocked (SHOCKED!) but maybe the polls weren't as wrong as the pundits like to claim. Relevant links: http://www.slate.com/articles/news_and_politics/moneybox/2016/07/why_political_betting_markets_are_failing.html http://andrewgelman.com/2016/06/24/brexit-polling-what-went-wrong/

Listen
Election Forecasting

01/08/2016 Duration: 28min

Not sure if you heard, but there's an election going on right now. Polls, surveys, and projections about, as far as the eye can see. How to make sense of it all? How are the projections made? Which are some good ones to follow? We'll be your trusty guides through a crash course in election forecasting. Relevant links: http://www.wired.com/2016/06/civis-election-polling-clinton-sanders-trump/ http://election.princeton.edu/ http://projects.fivethirtyeight.com/2016-election-forecast/ http://www.nytimes.com/interactive/2016/upshot/presidential-polls-forecast.html?rref=collection%2Fsectioncollection%2Fupshot&action=click&contentCollection=upshot®ion=rank&module=package&version=highlights&contentPlacement=5&pgtype=sectionfront

Listen
Machine Learning for Genomics

25/07/2016 Duration: 20min

Genomics data is some of the biggest #bigdata, and doing machine learning on it is unlocking new ways of thinking about evolution, genomic diseases like cancer, and what really makes each of us different for everyone else. This episode touches on some of the things that make machine learning on genomics data so challenging, and the algorithms designed to do it anyway.

Listen
Climate Modeling

18/07/2016 Duration: 19min

Hot enough for you? Climate models suggest that it's only going to get warmer in the coming years. This episode unpacks those models, so you understand how they work. A lot of the episodes we do are about fun studies we hear about, like "if you're interested, this is kinda cool"--this episode is much more important than that. Understanding these models, and taking action on them where appropriate, will have huge implications in the years to come. Relevant links: https://climatesight.org/

Listen
Reinforcement Learning Gone Wrong

11/07/2016 Duration: 28min

Last week’s episode on artificial intelligence gets a huge payoff this week—we’ll explore a wonderful couple of papers about all the ways that artificial intelligence can go wrong. Malevolent actors? You bet. Collateral damage? Of course. Reward hacking? Naturally! It’s fun to think about, and the discussion starting now will have reverberations for decades to come. https://www.technologyreview.com/s/601519/how-to-create-a-malevolent-artificial-intelligence/ http://arxiv.org/abs/1605.02817 https://arxiv.org/abs/1606.06565

Listen
Reinforcement Learning for Artificial Intelligence

03/07/2016 Duration: 18min

There’s a ton of excitement about reinforcement learning, a form of semi-supervised machine learning that underpins a lot of today’s cutting-edge artificial intelligence algorithms. Here’s a crash course in the algorithmic machinery behind AlphaGo, and self-driving cars, and major logistical optimization projects—and the robots that, tomorrow, will clean our houses and (hopefully) not take over the world…

Listen
Differential Privacy: how to study people without being weird and gross

27/06/2016 Duration: 18min

Apple wants to study iPhone users' activities and use it to improve performance. Google collects data on what people are doing online to try to improve their Chrome browser. Do you like the idea of this data being collected? Maybe not, if it's being collected on you--but you probably also realize that there is some benefit to be had from the improved iPhones and web browsers. Differential privacy is a set of policies that walks the line between individual privacy and better data, including even some old-school tricks that scientists use to get people to answer embarrassing questions honestly. Relevant links: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42852.pdf

Listen
How the sausage gets made

20/06/2016 Duration: 29min

Something a little different in this episode--we'll be talking about the technical plumbing that gets our podcast from our brains to your ears. As it turns out, it's a multi-step bucket brigade process of RSS feeds, links to downloads, and lots of hand-waving when it comes to trying to figure out how many of you (listeners) are out there.

Listen
SMOTE: makin' yourself some fake minority data

13/06/2016 Duration: 14min

Machine learning on imbalanced classes: surprisingly tricky. Many (most?) algorithms tend to just assign the majority class label to all the data and call it a day. SMOTE is an algorithm for manufacturing new minority class examples for yourself, to help your algorithm better identify them in the wild. Relevant links: https://www.jair.org/media/953/live-953-2037-jair.pdf

Listen
Conjoint Analysis: like AB testing, but on steroids

06/06/2016 Duration: 18min

Conjoint analysis is like AB tester, but more bigger more better: instead of testing one or two things, you can test potentially dozens of options. Where might you use something like this? Well, if you wanted to design an entire hotel chain completely from scratch, and to do it in a data-driven way. You'll never look at Courtyard by Marriott the same way again. Relevant link: https://marketing.wharton.upenn.edu/files/?whdmsaction=public:main.file&fileID=466

Listen
Traffic Metering Algorithms

30/05/2016 Duration: 17min

This episode is for all you (us) traffic nerds--we're talking about the hidden structure underlying traffic on-ramp metering systems. These systems slow down the flow of traffic onto highways so that the highways don't get overloaded with cars and clog up. If you're someone who listens to podcasts while commuting, and especially if your area has on-ramp metering, you'll never look at highway access control the same way again (yeah, we know this is super nerdy; it's also super awesome). Relevant links: http://its.berkeley.edu/sites/default/files/publications/UCB/99/PWP/UCB-ITS-PWP-99-19.pdf http://www.its.uci.edu/~lchu/ramp/Final_report_mou3013.pdf

Listen
Um Detector 2: The Dynamic Time Warp

23/05/2016 Duration: 14min

One tricky thing about working with time series data, like the audio data in our "um" detector (remember that? because we barely do...), is that sometimes events look really similar but one is a little bit stretched and squeezed relative to the other. Besides having an amazing name, the dynamic time warp is a handy algorithm for aligning two time series sequences that are close in shape, but don't quite line up out of the box. Relevant link: http://www.aaai.org/Papers/Workshops/1994/WS-94-03/WS94-03-031.pdf

Listen
Inside a Data Analysis: Fraud Hunting at Enron

16/05/2016 Duration: 30min

It's storytime this week--the story, from beginning to end, of how Katie designed and built the main project for Udacity's Intro to Machine Learning class, when she was developing the course. The project was to use email and financial data to hunt for signatures of fraud at Enron, one of the biggest cases of corporate fraud in history; that description makes the project sound pretty clean but getting the data into the right shape, and even doing some dataset merging (that hadn't ever been done before), made this project much more interesting to design than it might appear. Here's the story of what a data analysis like this looks like...from the inside.

Listen
What's the biggest #bigdata?

09/05/2016 Duration: 25min

Data science and is often mentioned in the same breath as big data. But how big is big data? And who has the biggest big data? CERN? Youtube? ... Something (or someone) else? Relevant link: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195

Listen
Data Contamination

02/05/2016 Duration: 20min

Supervised machine learning assumes that the features and labels used for building a classifier are isolated from each other--basically, that you can't cheat by peeking. Turns out this can be easier said than done. In this episode, we'll talk about the many (and diverse!) cases where label information contaminates features, ruining data science competitions along the way. Relevant links: https://www.researchgate.net/profile/Claudia_Perlich/publication/221653692_Leakage_in_data_mining_Formulation_detection_and_avoidance/links/54418bb80cf2a6a049a5a0ca.pdf

Listen

|<
<<
>>
>|

page 11 from 15