It’s on—Wrangle, July 20

What is it like to have a real job in data science? How can we get a reasonable data forecast on messy data with no manual effort? Can ethics be guided with algorithms? Is it possible that governments can be more collaborative with open data? How can we improve data-driven financial institutions? What are the latest data science tools and experimentation coming out of Facebook, Airbnb, UCSF, Capital One, and Salesforce?

Wrangle is a one-day, single track community event that hosts the best and brightest in the Bay Area talking about the principles, practice, and application of Data Science, across multiple data-rich industries. Join Cloudera to discuss future trends, how they can can be predicted, and most importantly—how can they be anticipated.

“#WrangleConf was fantastic! By far the most practical/thought-provoking data science conference I’ve ever attended. Big thanks to @cloudera!” —@dynamicwebpaige


8:30am - 9:30am - Breakfast and Networking

9:30am - 10:00am - What Would a CIA Data Scientist Do?

Drew Conway, CEO, Aluvium

Earlier this year I had the opportunity to speak to data scientist at the Central Intelligence Agency about the discipline of data science in 2017. That talk was equal parts technical and philosophical, but here I will focus on the latter. The discipline of data science has a very different feel today than it did a year ago. What was once the realm of the "unicorns" and "rocks stars," is now a place of scorn and ridicule. This is particularly sensitive for data scientist in the intelligence community. I will discuss how the last year has changed our professional posture, its affect on those in the intelligence community, and ask the audience to consider their own work through this prism.

10:00am - 10:30am - Digital Government: Data + Government Isn't Enough

Trey Causey, Product Manager, Socrata

Government agencies are collecting and producing data at an accelerating rate, and constituents want access to this data with decreasing latency. Meeting a digitally savvy polity's desire for data while ensuring that data is open, accessible, and interpretable by all comes with unique challenges. I'll share some of these while walking through how governments are building their own data products using open data as well as empowering civic hackers. I'll also walk through why data science at the government level is fundamentally different than data science in the private sector.

10:30am - 11:00am - A Leap of Faith

Kathryn Hume, VP Product and Strategy,

One key ethical consideration in building data product is interpretability, or the ability to explain why a model produced an output or prediction. Organizations applying AI are often forced to make a trade-off between accuracy and interpretability to satisfy regulatory or compliance requirements, particularly with complex, non-linear models. But is the right to transparency the right way to address the problem? Or should we shift our explanatory paradigm, taking a leap of faith that embraces model complexity but seeks a different means to evaluate whether results are appropriate?

11:00am - 11:15am - Break

11:15am - 11:45am - Unlocking the Power of Causal Inference; Recent Innovations @ Netflix

Kelly Uphoff, Director of Growth Data Science, Netflix

Causal inference is key to how Netflix uses data to make decisions. Over the last several years, we've expanded our application of causal inference beyond A/B Testing in our product. Netflix has advanced new methods for using cities and countries as test and control groups, exploiting natural variation in the data to gain a causal inference and merging the worlds of causal inference and machine learning. This talk will cover these innovations and how they've unlocked new sources of evidence used by top decision-makers throughout the company.

11:45am - 12:15pm - Talk TBA

12:15pm - 12:45pm - The 'Joy' and Surprise of Healthcare Data

Jasmine Tsai, Data Platform Engineering Manager, Clover Health

Healthcare, like other industries with legacy systems, is full of data with particularly archaic and mysterious formats. It is also particularly hierarchical and networked, because of the nature of its systems (just think of what a hospital entails) and the complexity human body (this is not a joke). In this talk, we will talk through some salient features and landscape of healthcare data and the particular challenges and rewards it presents in transformations for usage — and how a modern data system might approach it differently from its older counterparts.

12:45pm - 2:00pm - Lunch

2:00pm - 2:30pm - Building Robust Pipelines with Airflow

Erin Shellman, Senior Data Scientist, Zymergen

The data science team at Zymergen is applying machine learning techniques to identify genetic targets, work that is supported by extensive analytical automation that systematically identifies outliers, removes process-related bias, and quantifies performance improvements. We’re using Apache Airflow to construct robust data pipelines that allow us to produce clean, reliable inputs to our predictive models. In this talk, I’ll discuss the unique data processing challenges we face in working with high-throughput, biological data and provide an overview of how we’re using Apache Airflow to meet those challenges.

2:30pm - 3:00pm - Measurement with Intention

Sean Taylor, Research Scientist, Facebook

What we choose to measure has a profound impact on every decision we make, from our day-to-day personal habits to strategies for major corporations and governments. Metrics create a shared understanding of a problem, suggest paths toward solutions, and create or destroy incentives. With the proliferation of measurement technologies and data-driven decision making in the digital age, choosing the right concepts to measure and pay attention to may ultimately be the most important decisions we make. I'll discuss what qualities good metrics have, how people decide what to measure in practice, and how have innovations in measurement technologies have had dramatic impacts across a variety of domains.

3:00pm - 3:30pm - Talk TBA

3:30pm - 3:45pm - Break

3:45pm - 4:15pm - Talk TBA

4:15pm - 4:45pm - Talk TBA

4:45pm - 5:15pm - Talk TBA

5:15pm - 6:15pm - BBQ Dinner and Beer Garden


“@MarinaSirota presentation on genome research shows that data science can be applicable to small data #wrangleconf”—@TonyBaer

Wrangle is being held at the Chapel in San Francisco

The Chapel occupies an historic 1914 building with 40’ high arched ceiling and was beautifully remodeled to create a stunning venue. The Chapel is located in the heart of SF’s dynamic Valencia corridor at 777 Valencia in San Francisco, California.


Wrangle passes are $400 each

Academic Pass Application
If you are an academic affiliated to a university, please email from your school email address to receive a free pass. Only 20 are available, so don’t wait.

Program Committee


If you have media questions, or would like to find out about sponsoring Wrangle, please contact