ABOUT WRANGLE 2016 ◆◆◆

Wrangle is a single-day, single-track industry event about the principles, practice, and application of Data Science, across multiple data-rich industries. It includes talks from practitioners at innovative companies about the hardest problems they've faced, and the solutions they found for them.

If you're also a practicing Data Scientist, or aspire to be one, Wrangle is for you! For taste of Wrangle, view session video from 2015.

LOCATION

Broadway Studios
435 Broadway Street @ Montgomery
San Francisco, California 94133

PROGRAM COMMITTEE

Drew Conway (Alluvium)
Clare Corthell (Luminant Data)
Beau Cronin (21 Inc.)
Juliet Hougland (Cloudera)
Wes McKinney (Cloudera)
Sean Owen (Cloudera)
Monica Rogati (Data Collective)
Pete Skomoroch
Sean Taylor (Facebook)
Josh Wills (Slack)

Tickets ◆◆◆

SPEAKERS ◆◆◆

  • Kirstin Aschbacher

    Jawbone

    Kirstin is a data scientist working on the Smart Coach team at Jawbone, and an Adjunct Assistant Professor in the School of Medicine at University of California, San Francisco. She is a licensed clinical psychologist, with specialized training in helping patients with physical and mental conditions better manage their health. She has coauthored 40 peer-reviewed publications which can be found on Google Scholar, as well as several Data Science blogposts. Kirstin is also a mother of a three-year-old who loves to be in movement and learning, preferably at the same time.

  • Michael Bentley

    Lookout

    Michael Bentley is a researcher with many facets of security issues under his belt. He is a veteran of the US Navy, has trained and lead network security teams, and navigated through incident response nightmares. Currently Michael leads Research and Response at Lookout. Tools he’s written include those for malware similarity, metadata correlation, automatic heuristic creation, and making it all into animated designs for people so they go ‘Ahhhhhh, big data’.

  • Jon Bruner

    O'Reilly

    Jon oversees O'Reilly's publications and conferences on bots, hardware, the Internet of Things, manufacturing, and electronics, as well as any other emerging technologies that need investigating. Before coming to O'Reilly, he was data editor at Forbes Magazine, where he combined writing and programming to approach a broad variety of subjects, from the operation of the Columbia River's dams to migration within the United States. He studied mathematics and economics at the University of Chicago and lives in San Francisco, where he can occasionally be found at the console of a pipe organ.

  • Michelle Casbon

    Qordoba

    Michelle Casbon is director of data science at Qordoba, where she is using machine learning to create a better experience for users and make localization less painful for engineers. Previously, she was a senior data science engineer at Idibon, where she built language-independent tools for generating predictions on textual datasets. Her development experience spans more than a decade across various industries, including media, investment banking, healthcare, retail, and geospatial services. Michelle completed a Masters at the University of Cambridge, focusing on natural language processing, speech recognition, speech synthesis, and machine translation. She loves working with and contributing to open source projects.

  • Chris Diehl

    The Data Guild

    Chris Diehl is a principal and co-founder of The Data Guild, a mission-driven data product studio aiming to address the toughest societal challenges. Chris has extensive experience defining and developing analytics for a variety of sense-making and prediction tasks. As the principal data scientist at Jive Software, he focused on designing and developing advanced analytics for enterprise social search and online community health assessment. Prior to Jive, Chris spent over ten years as a senior research scientist at The Johns Hopkins University Applied Physics Laboratory and Lawrence Livermore National Laboratory. There he defined and developed machine learning approaches to address a variety of inference challenges across the Department of Defense and intelligence community. He holds a Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University.

  • Abe Gong

    Aspire Health

    Abe is the Chief Data Officer at Aspire Health, a health tech startup that provides in-home nursing services to the sickest patients in the U.S. Prior to Aspire, Abe was the founding member of the data science team at Jawbone, and lead data scientist at Massive Health. He earned his PhD at the University of Michigan in Public Policy, Political Science, and Complex Systems. All told, Abe been leading teams using data and technology to solve problems in education, health, and public policy for over a decade.

  • Joel Grus

    AI2

    Joel Grus is a research engineer at the Allen Institute for Artificial Intelligence and the author of the bestselling O'Reilly book, Data Science from Scratch: First Principles with Python. Previously he was a software engineer at Google and a data scientist at a variety of startups. He lives in Seattle, where he organizes various Data Science Happy Hours.

  • Juliet Hougland

    Cloudera

    Juliet is a data scientist at Cloudera. Her commercial applications of data science include developing predictive maintenance models for oil & gas pipelines at Deep Signal, and designing/building a platform for real-time model application, data storage, and model building at WibiData. Juliet was the technical editor for Learning Spark by Karau et al. and Advanced Analytics with Spark by Ryza et al. She holds an MS in Applied Mathematics from University of Colorado, Boulder and graduated Phi Beta Kappa from Reed College with a BA in Math-Physics.

  • Sanny Liao

    IFTTT

    Sanny Liao is a Senior Data Scientist at IFTTT (If This Then That), a service for users to automate various services in their lives, from social apps to the internet of things.

  • Leah McGuire

    Salesforce

    Leah McGuire is a Lead Member of Technical Staff at Salesforce, implementing data-driven features and recommendations in Salesforce products. Before joining Salesforce, Leah was a Senior Data Scientist on the data products team at LinkedIn working on personalization, entity resolution, and relevance for a variety of LinkedIn data products. She completed a PhD and a Postdoctoral Fellowship in Computational Neuroscience at the University of California, San Francisco, and at University of California, Berkeley, where she studied the neural encoding and integration of sensory signals.

  • Xiangrui Meng

    Databricks

    Xiangrui Meng is a technical lead of machine learning and data science at Databricks. His main interests center around building simple and scalable solutions for advanced analytics. He is also a PMC member and committer of Apache Spark, primarily contributing to MLlib, PySpark, and SparkR. Before Databricks, he worked as an applied research engineer at LinkedIn, where he was the main developer of an offline machine learning framework in Hadoop MapReduce. His Ph.D. work at Stanford is on randomized algorithms for large-scale linear regression problems.

  • Wes McKinney

    Cloudera

    Wes is a Software Engineer at Cloudera focusing on data science tools and platforms, the creator of Python’s pandas library and the Ibis project, and a committer to Apache Arrow and Apache Parquet. Previously, Wes was co-founder of DataPad, and CTO and Cofounder of Lambda Foundry, Inc. He graduated from MIT with an S.B. in Mathematics. Wes is author of the O'Reilly book, "Python for Data Analysis."

  • Sean Owen

    Cloudera

    Sean is Director of Data Science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd. (now the Oryx project) to commercialize large-scale real-time recommender systems on Apache Hadoop. He is an Apache Spark committer and a co-author of O’Reilly Media’s Advanced Analytics with Spark. He was a committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google. He holds an MBA from London Business School and a BA from Harvard University.

  • Sandy Ryza

    Clover Health

    Sandy is a senior data scientist at Clover Health. He was previously at Cloudera doing engineering and data science. He is an author of O'Reilly's Advanced Analytics with Spark, as well as an Apache Spark committer and member of the Apache Hadoop PMC. He graduated Phi Beta Kappa from Brown University.

  • Mohammad Saffar

    Arimo

    Mohammad Saffar is a deep learning software engineer at Arimo, a startup focussed on Intelligence Augmentation for the Enterprise. He is working on designing innovative techniques to use deep learning for time-series problems. Prior to joining Arimo, he was a researcher at University of Nevada-Reno working on computer vision and activity understanding. The results of his work was published in several scientific journals and conferences.

  • Pete Skomoroch

    Stealth

    Pete is a data scientist focused on building intelligent systems to collect information and enable better decisions. He specializes in solving hard algorithmic problems, leading cross-functional teams, and developing engaging products powered by data and machine learning. He's currently working on a new startup based in San Francisco.

  • Jeremy Stanley

    Instacart

    Jeremy is currently the VP of data science at Instacart, where he works closely with data scientists who are integrated into product teams to drive growth and profitability through logistics, catalog, search, consumer, shopper, and partner applications. Previously, Jeremy was Chief Data Scientist and EVP of engineering at Sailthru, which builds data-driven solutions for marketers to drive long-term customer engagement and optimize revenue opportunities. Earlier in his career, Jeremy was the CTO of Collective, where he led a team of product managers, engineers, and data scientists in creating technology platforms that used machine learning and big data to address challenging multiscreen advertising problems, and he founded and led the Global Markets Analytics Group at Ernst & Young (EY), which analyzed the firm’s markets, financial and personnel data to inform executive decision making.

  • Moritz Sudhof

    Kanjoya

    Moritz Sudhof loves language, math, and machines, and he especially loves baking all three into products organizations use to better themselves. Moritz is Chief Data Scientist at Kanjoya, and he earned his BS and MS in Computer Science from Stanford University, where he served as a Mayfield Fellow.

  • Anu Tewary

    Intuit

    Anuranjita Tewary is Director of Product Management at Intuit. She was a founder at Level Up Analytics, a data startup, which was acquired by Intuit. Level Up Analytics had a team of 14 data scientists, engineers, and product managers. Her previous roles include data scientist at LinkedIn, and product management at AdMob and Microsoft. Anu is the founder of The Technovation Challenge, a global programming and entrepreneurship program for girls, which is in its sixth year, with over 4,000 participants. Anu holds a PhD in Applied Physics from Stanford and BS degrees in Physics and Math with CS from MIT.

  • Josh Wills

    Slack

    Josh Wills is the Director of Data Engineering at Slack. Prior to joining Slack he built and led data science team at Cloudera, working with customers and engineers to develop Hadoop-based solutions across a wide-range of industries. Prior to Cloudera, Josh worked at Google, where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+. Josh is the founder of the Apache Crunch project, and co-authored the O'Reilly Media book, "Advanced Analytics with Spark."

SCHEDULE ◆◆◆

Time Session Speakers Description
9:00AM-9:15AM Welcome Sean Owen, Cloudera
Juliet Hougland, Cloudera
 
9:15AM-9:50AM (Panel) When Good Algorithms Go Bad Peter Skomoroch, Stealth (mod)
Josh Wills, Slack
Jon Bruner, O'Reilly Media
Anu Tewary, Intuit

 

9:50AM-10:15AM Data Science in the Age of the On-Demand Economy Jeremy Stanley, Instacart Fifteen years ago, Webvan spectacularly failed to bring grocery delivery online. Speculation has been high that the current wave of on-demand grocery delivery startups - and other companies in the on-demand economy - will meet similar fates. Jeremy explains why this time the story will be different—data science is the key. Innovations in mobile applications have paved the way, but significant investments in algorithms to optimize efficiency will drive profitable growth.
10:15AM-10:40AM Data Science for HR Moritz Sudhof, Kanjoya Data science is revolutionizing HR. With NLP and machine learning, organizations can now understand their employees automatically and in real-time. With NLP and machine learning, organizations no longer need armies of consultants and months of labor to determine what their employees want and need, to identify training initiatives that will maximize performance and leadership, and to surface and address critical issues before high performers leave. In this talk, Moritz explores how data science is used in organizations today and possibilities for the space in the future.
10:40AM-11:10AM BREAK    
11:10AM-11:35AM Growth of IOT, in Numbers Sanny Liao, IFTTT As the Internet of Things (IOT) becomes more mainstream, we are seeing a similar shift in the usage of IOT products by our users at IFTTT. This talk will focus on the growth of IOT, the emotional responses to IOT, and interesting adoption trends that we see among different groups of people.
11:30AM-12:00PM (Panel) Metrics Before Models: Approaching Data Science Like an Engineer Sean Owen, Cloudera (moderator)
Michelle Casbon, Qordoba
Leah McGuire, Salesforce
Xiangrui Meng, Databricks
 
12:00PM-12:35PM Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Time-Series Data with Deep Networks Mohammad Saffar, Arimo Time-series (longitudinal) data occurs in nearly every aspect of our lives; including customer activity on a website, financial transactions, sensor/IoT data. Just like in written text, specific events in a sequence of events are affected by the past and affect events in the future, and this can reveal a lot of hidden structure in the source of the events. Yet, today's predictive techniques largely rely on demographic (cross-sectional) data and do not take into account the sequences of events as they occur. In this session, Mohammad will discuss techniques for taking time-series data from a variety of domains and sources and grouping entities based on temporal behavior, using RNNs. These clusters of time-series sequences can either be visualized or used for campaign targeting in the case of user clickstream behavior or understanding stock symbols that behave similarly based on their trading behavior.
12:35PM-1:35PM LUNCH    
1:35PM-2:00PM (Lightning Talk) FizzBuzz in TensorFlow Joel Grus, AI2 FizzBuzz is a ubiquitous, nearly trivial problem used to weed out developer job applicants. Recently, Joel wrote a joking-not-joking blog post about a fictional interviewee who solves it using neural networks. After the blog post went viral, he spent a lot of time thinking about FizzBuzz as a machine-learning problem. It turns out, it's surprisingly interesting and subtle! Here, Joel talks about how and why.
2:00PM-2:25PM Malware Tracking at Scale Michael Bentley, Lookout Historically, mobile-device malware detection has required security researchers to write a heuristic, then scan binaries for a match. Rinse, recycle, and repeat until the entire malware family can be detected. This approach has been effective, but it does not scale to Lookout’s challenge of analyzing more than 30 million applications. In this session, Michael explains how Lookout took an entirely different approach: using graph data modeling techniques. One significant outcome of this approach is a new data model that has the powerful ability to track variants of malware that are under active development. This model also allows Lookout to extract more metadata about malware families through the discovery of relationships that were previously unknown.
2:25PM-2:50PM Digital Vulnerability: Characterizing Risks and Contemplating Responses Chris Diehl, The Data Guild Our modern world represents an inextricable blend of the cyber and physical domains. Networked devices with sensing and computational capabilities continue to expand the extent of the cyber-physical interface, increasing risks across scales from the individual to the societal level. No longer do we fully understand the extent of what is being collected and assimilated about us. For marginalized populations globally, the consequences of this can be severe. What is the nature of the risk at the level of the individual? How can the data science community respond to improve the current reality? Chris will present some initial thoughts to frame the concerns and outline a way forward.
2:50pm-3:20pm BREAK    
3:20PM-3:55PM The Future of Wearables: Data Science Meets Behavior Change Kirstin Aschbacher, Jawbone The future of wearables is moving beyond a fashion gadget to a must-have device for managing chronic health conditions. Value in this space is not only about chic products and quantifying the self; it’s about helping users change the behaviors that shape their long-term health. Fundamentally, this is not only an algorithmic challenge but a psychological one. Many of us already know how to be healthier, but what motivates us to actually change? This talk will illustrate how the integration of data science and health psychology can drive personalized interventions that make wearables more effective. We test our hypotheses in a pilot randomized, controlled trial targeting self-monitoring profiles to help users lose weight.
3:55PM-4:20PM Driving Healthcare Operations with Small Data Sandy Ryza, Clover Health How do you get people with chronic heart conditions to take their medication? Or diagnose complications as early as possible? Healthcare operations--the set of actions that organizations like insurers take to interact with their members--sit in some sort of nebulous shadow realm between social science, medicine, and corporate bureaucracy. In this talk, Sandy will throw some additional nouns that seem more at home in the modern web era, like "machine learning" and "A/B testing," into the mix. He'll also walk attendees through an example of now Clover Health builds and tests models for predicting which of diabetic members are likely to develop complications.
4:20PM-4:45PM Staying Hippocratic with High Stakes Data Abe Gong, Aspire Health Abe has spent the last year building data systems to forecast personal medical calamities: hospitalization, debilitation, and death. This talk will share perspective from this experience, with two main goals:
  • Demystify the process of working with highly regulated medical data and legacy healthcare IT
  • Continue last year’s conversation about ethical algorithms and the potential harms of data work
Ultimately, all data is high-stakes data. Abe's hope is that discussing data science in a life-and-death medical context can further a community conversation about how to do no harm—and more more good—with data.
4:45PM-5:00PM What We All Learned Sean Owen, Cloudera
Juliet Hougland, Cloudera
 
5:00PM-7:00PM Cocktails & Grub in the Saloon!    

ARCHIVE ◆◆◆

Interested in reviewing what we covered in a previous event? Check it out here!

Videos     Photos     Slides

CONDUCT ◆◆◆

Wrangle is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of conference participants in any form. Conference participants violating these rules may be sanctioned or expelled from the conference without a refund at the discretion of the conference organizers.

Harassment includes offensive verbal comments related to gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention. Participants asked to stop any harassing behavior are expected to comply immediately.

If a participant engages in harassing behavior, the conference organizers may take any action they deem appropriate, including warning the offender or expulsion from the conference with no refund. If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of conference staff immediately. Conference staff can be identified by special badges.

Conference staff will be happy to help participants contact hotel/venue security or local law enforcement, provide escorts, or otherwise assist those experiencing harassment to feel safe for the duration of the conference. We value your attendance.

We expect participants to follow these rules at all conference venues and conference-related social events.