Synopsis
Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.
Episodes
-
Sold! Auctions (Part 2)
25/01/2016 Duration: 17minThe Google ads auction is a special kind of auction, one you might not know as well as the famous English auction (which we talked about in the last episode). But if it's what Google uses to sell billions of dollars of ad space in real time, you know it must be pretty cool. Relevant links: https://en.wikipedia.org/wiki/English_auction http://people.ischool.berkeley.edu/~hal/Papers/2006/position.pdf http://www.benedelman.org/publications/gsp-060801.pdf
-
Going Once, Going Twice: Auctions (Part 1)
22/01/2016 Duration: 12minThe Google AdWords algorithm is (famously) an auction system for allocating a massive amount of online ad space in real time--with that fascinating use case in mind, this episode is part one in a two-part series all about auctions. We dive into the theory of auctions, and what makes a "good" auction. Relevant links: https://en.wikipedia.org/wiki/English_auction http://people.ischool.berkeley.edu/~hal/Papers/2006/position.pdf http://www.benedelman.org/publications/gsp-060801.pdf
-
Chernoff Faces and Minard Maps
18/01/2016 Duration: 15minA data visualization extravaganza in this episode, as we discuss Chernoff faces (you: "faces? huh?" us: "oh just you wait") and the greatest data visualization of all time, or at least the Napoleonic era. Relevant links: http://lya.fciencias.unam.mx/rfuentes/faces-chernoff.pdf https://en.wikipedia.org/wiki/Charles_Joseph_Minard
-
t-SNE: Reduce Your Dimensions, Keep Your Clusters
15/01/2016 Duration: 16minEver tried to visualize a cluster of data points in 40 dimensions? Or even 4, for that matter? We prefer to stick to 2, or maybe 3 if we're feeling well-caffeinated. The t-SNE algorithm is one of the best tools on the market for doing dimensionality reduction when you have clustering in mind. Relevant links: https://www.youtube.com/watch?v=RJVL80Gg3lA
-
The [Expletive Deleted] Problem
11/01/2016 Duration: 09minThe town of [expletive deleted], England, is responsible for the clbuttic [expletive deleted] problem. This week on Linear Digressions: we try really hard not to swear too much. Related links: https://en.wikipedia.org/wiki/Scunthorpe_problem https://www.washingtonpost.com/news/worldviews/wp/2016/01/05/where-is-russia-actually-mordor-in-the-world-of-google-translate/
-
Unlabeled Supervised Learning--whaaa?
08/01/2016 Duration: 12minIn order to do supervised learning, you need a labeled training dataset. Or do you...? Relevant links: http://www.cs.columbia.edu/~dplewis/candidacy/goldman00enhancing.pdf
-
Hacking Neural Nets
05/01/2016 Duration: 15minMachine learning: it can be fooled, just like you or me. Here's one of our favorite examples, a study into hacking neural networks. Relevant links: http://arxiv.org/pdf/1412.1897v4.pdf
-
Zipf's Law
31/12/2015 Duration: 11minZipf's law is related to the statistics of how word usage is distributed. As it turns out, this is also strikingly reminiscent of how income is distributed, and populations of cities, and bug reports in software, as well as tons of other phenomena that we all interact with every day. Relevant links: http://economix.blogs.nytimes.com/2010/04/20/a-tale-of-many-cities/ http://arxiv.org/pdf/cond-mat/0412004.pdf https://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/
-
Indie Announcement
30/12/2015 Duration: 01minWe've gone indie! Which shouldn't change anything about the podcast that you know and love, but we're super excited to keep bringing you Linear Digressions as a fully independent podcast. Some links mentioned in the show: https://twitter.com/lindigressions https://twitter.com/benjaffe https://twitter.com/multiarmbandit https://soundcloud.com/linear-digressions http://lineardigressions.com/
-
Portrait Beauty
27/12/2015 Duration: 11minIt's Da Vinci meets Skynet: what makes a portrait beautiful, according to a machine learning algorithm. Snap a selfie and give us a listen.
-
The Cocktail Party Problem
18/12/2015 Duration: 12minGrab a cocktail, put on your favorite karaoke track, and let’s talk some more about disentangling audio data!
-
A Criminally Short Introduction to Semi Supervised Learning
04/12/2015 Duration: 09minBecause there are more interesting problems than there are labeled datasets, semi-supervised learning provides a framework for getting feedback from the environment as a proxy for labels of what's "correct." Of all the machine learning methodologies, it might also be the closest to how humans usually learn--we go through the world, getting (noisy) feedback on the choices we make and learn from the outcomes of our actions.
-
Thresholdout: Down with Overfitting
27/11/2015 Duration: 15minOverfitting to your training data can be avoided by evaluating your machine learning algorithm on a holdout test dataset, but what about overfitting to the test data? Turns out it can be done, easily, and you have to be very careful to avoid it. But an algorithm from the field of privacy research shows promise for keeping your test data safe from accidental overfitting
-
The State of Data Science
10/11/2015 Duration: 15minHow many data scientists are there, where do they live, where do they work, what kind of tools do they use, and how do they describe themselves? RJMetrics wanted to know the answers to these questions, so they decided to find out and share their analysis with the world. In this very special interview episode, we welcome Tristan Handy, VP of Marketing at RJMetrics, who will talk about "The State of Data Science Report."
-
Data Science for Making the World a Better Place
06/11/2015 Duration: 09minThere's a good chance that great data science is going on close to you, and that it's going toward making your city, state, country, and planet a better place. Not all the data science questions being tackled out there are about finding the sleekest new algorithm or billion-dollar company idea--there's a whole world of social data science that just wants to make the world a better place to live in.
-
Kalman Runners
29/10/2015 Duration: 14minThe Kalman Filter is an algorithm for taking noisy measurements of dynamic systems and using them to get a better idea of the underlying dynamics than you could get from a simple extrapolation. If you've ever run a marathon, or been a nuclear missile, you probably know all about these challenges already. By the way, we neglected to mention in the episode: Katie's marathon time was 3:54:27!
-
Neural Net Inception
23/10/2015 Duration: 15minWhen you sleep, the neural pathways in your brain take the "white noise" of your resting brain, mix in your experiences and imagination, and the result is dreams (that is a highly unscientific explanation, but you get the idea). What happens when neural nets are put through the same process? Train a neural net to recognize pictures, and then send through an image of white noise, and it will start to see some weird (but cool!) stuff.
-
Benford's Law
16/10/2015 Duration: 17minSometimes numbers are... weird. Benford's Law is a favorite example of this for us--it's a law that governs the distribution of the first digit in certain types of numbers. As it turns out, if you're looking up the length of a river, the population of a country, the price of a stock... not all first digits are created equal.
-
PFun with P Values
02/09/2015 Duration: 17minDoing some science, and want to know if you might have found something? Or maybe you've just accomplished the scientific equivalent of going fishing and reeling in an old boot? Frequentist p-values can help you distinguish between "eh" and "oooh interesting". Also, there's a lot of physics in this episode, nerds.