fastai Explainability with SHAP

I’ve been working my way through Practical Deep Learning for Coders, a fantastic resource from the authors of fastai. But while I enjoy the deliberate way the authors are slowly peeling back the layers to uncover what makes a neural network tick, I wanted to just rip the lid off.

So I built a classifier using the techniques provided by fastai but applied the explainability features of SHAP to understand how the deep learning model arrives at its decision.

I’ll walk you through the steps I took to create a neural network that can classify architectural styles and show you how to apply SHAP to your own fastai model. You’ll learn how to train and explain a highly accurate neural net with just a few lines of code!

Gather the data

I followed this great guide on image scraping from Google to gather images for my training set.

Due to the limited availability of images, I settled on seven architectural styles:

Gothic

Victorian

Craftsman

Classical

Modern

Tudor

Cape Cod

I do have some class imbalance:

    • Cape Cod: 94

    • Craftsman: 94

    • Tudor: 49

    • Victorian: 73

    • Classical: 148

    • Modern: 75

This could be concerning, especially given the large spread between the number of images available for Classical and Tudor architectural styles. However, the main point of this guide is to show how you can apply SHAP to a fastai model so we won’t worry too much about class imbalance here.

Set up your environment

I’ve been using Paperspace to train deep learning models for my personal use. It’s a clean, intuitive platform, and there’s a free GPU option, although those instances are first-come, first-serve so they’re often unavailable.

Instead, I opted to pay $8 a month for their Developer plan to gain access to the upgraded P4000 GPU at $0.51 per hour. Completely worth it, IMO, for the speed and near-guaranteed access.

BEWARE! Paperspace does not autosave your notebooks. I’ve been burned by this too many times. Don’t forget to hit save!

Ensure you have the following packages imported into your workspace:

import fastbook
from fastbook import *
from fastai.vision.all import *
fastbook.setup_book()

import tensorflow
import shap

import matplotlib.pyplot as pl
from shap.plots import colors

Create a DataLoaders object

fastai has a DataLoaders class that reads in your data, assigns it the correct data type, resizes, and performs data augmentation—all in one!

dblock = DataBlock(
    # define X as images and Y as categorical
    blocks=(ImageBlock(), CategoryBlock()), 

    # retrieve images from a given path
    get_items=get_image_files, 

    # set the directory name as the image classification
    get_y=parent_label, 

    # resize the images to squares of 460 pixels
    item_tfms=Resize(460), 

    # see explanation below for batch_tfms
    batch_tfms=[
        *aug_transforms(size=224, min_scale=0.75),
        Normalize
    ]
)

The batch_tfms argument performs the following transformations on each batch:

    • Resizes to squares of 224 pixels

    • Ensures that cropped images are no less than 0.75 of the original image

    • By default, flips horizontally but not vertically (desired behavior for images of buildings)

    • By default, applies a random rotation of 10 degrees

    • By default, adjusts brightness and contrast by 0.2

    • The Normalize method will normalize your pixel values to have a mean of 0 and a standard deviation of 1.

These batch transformations are performed on the GPU after the resizing specified in item_tfms, which takes place on the CPU. This order of operations ensures that our images are standardized first before we hand them off to the GPU to perform the more intensive transformations.

Let’s look at one batch.

dls = dblock.dataloaders('images/', bs=32)
dls.show_batch()

We see that some slight transformations have been performed to augment our dataset but that the integrity of the architecture has been maintained (i.e. straight lines are still straight, buildings are still the correct side up).

Train a neural net

We’ll use a technique described in Chapter 7 of Practical Deep Learning for Coders to train our neural network: progressive resizing.

The first layers of a neural network are only focused on high-level image characteristics, like edges and gradients, and the later layers start to discern finer features, like windows and cornices.

We can save time by training the neural network initially on smaller images so that the model begins to build those early layers on basic features. Then we hone our accuracy by training the model further on larger images that show more of the details.

Let’s first train on images that are 128 pixels square.

def get_dls(bs, size):
    dblock = DataBlock(
        blocks=(ImageBlock, CategoryBlock),
        get_items=get_image_files,
        get_y=parent_label,
        item_tfms=Resize(460),
        batch_tfms=[
            *aug_transforms(size=size, min_scale=0.75),
            Normalize
        ]
    )
    return dblock.dataloaders('images/', bs=bs)

dls = get_dls(128, 128)
learn = Learner(
    dls, 
    xresnet50(n_out=dls.c),
    loss_func=LabelSmoothingCrossEntropy(), 
    metrics=accuracy
)
learn.fit_one_cycle(8, 3e-3)
epochtrain_lossvalid_lossaccuracytime
02.0790401.9926930.16800000:12
12.0071773.2953600.16000000:10
21.9638402.7414980.18400000:09
31.8929743.0447730.19200000:09
41.8202232.2328640.34400000:10
51.7170042.5429910.33600000:09
61.6402532.1232040.34400000:09
71.5817731.8131850.42400000:10

Now we increase our image size to 224 pixels square.

learn.dls = get_dls(32, 224)
learn.fine_tune(9, 1e-3)
epochtrain_lossvalid_lossaccuracytime
01.1898841.1303920.64800000:12
11.1676261.1707920.68000000:12
21.1889281.3499470.55200000:12
31.1586771.1234950.68000000:12
41.1196561.1128120.69600000:11
51.0677101.0947950.72000000:12
61.0115491.0119650.79200000:11
70.9620060.9759190.80000000:12
80.9334950.9638010.80800000:11

Not bad! Roughly 80% accuracy after barely any code and just a few minutes of training time.

Evaluate the model

We do have some significant class imbalance so the accuracy shown above isn’t telling us the full story. Let’s look at class-based accuracy to see how the model performs on each architectural style.

preds = learn.get_preds()
pred_class = preds[0].max(1).indices
tgts = preds[1]

for i, name in enumerate(dls.train.vocab):
    idx = torch.nonzero(tgts==i)
    subset = (tgts == pred_class)[idx]
    acc = subset.squeeze().float().mean()
    print(f'{name}: {acc:.1%}')
cape_cod: 82.4%
classical: 88.9%
craftsman: 75.0%
gothic: 94.4%
modern: 92.3%
tudor: 62.5%
victorian: 63.6%

We have decent accuracy on across the classes. Let’s create a confusion matrix to see which architectural styles the model mistakes for another.

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

We see that Tudor can sometimes be misclassified as Craftsman. Perhaps this is because both styles rely on exposed beams. Similarly, we could surmise that Gothic is most often confused with Victorian because both contain ornate decorations or that Classical can be mistaken for Modern due to a prevalence of clean lines.

How can we test these hypotheses? This is where SHAP comes into play.

Explain with SHAP

SHAP explains feature importances through Shapley values, a concept borrowed from game theory. If you’re interested in learning more, I suggest checking out the SHAP documentation.

Let’s apply SHAP to the model we trained above. First, we determine a background distribution that defines the conditional expectation function. Then we sample against this background distribution to create expected gradients, allowing us to approximate Shapley values.

# pull a sample of our data (128 images)
batch = dls.one_batch()

# specify how many images to use when creating the background distribution
num_samples = 100
explainer = shap.GradientExplainer(
    learn.model, batch[0][:num_samples]
)

# calculate shapely values
shap_values = explainer.shap_values(
    batch[0][num_samples:]
)

Now we can overlay the Shapley values on the images to see which features the model focuses on to make a classification.

In the images below, positive Shapley values in red indicate those areas of the image that contributed to the final prediction whereas negative Shapley values in blue show areas that detracted from that prediction.

import matplotlib.pyplot as pl
from shap.plots import colors

for idx, x in enumerate(batch[0][num_samples:]):
    x = x.cpu() # move image to CPU
    label = dls.train.vocab[batch[1][num_samples:]][idx]
    sv_idx = list(dls.train.vocab).index(label)

    # plot our explanations
    fig, axes = pl.subplots(figsize=(7, 7))

    # make sure we have a 2D array for grayscale
    if len(x.shape) == 3 and x.shape[2] == 1:
        x = x.reshape(x.shape[:2])
    if x.max() > 1:
        x /= 255.

    # get a grayscale version of the image
    x_curr_gray = (
        0.2989 * x[0,:,:] +
        0.5870 * x[1,:,:] +
        0.1140 * x[2,:,:]
    )
    x_curr_disp = x

    abs_vals = np.stack(
        [np.abs(shap_values[sv_idx][idx].sum(0))], 0
    ).flatten()
    max_val = np.nanpercentile(abs_vals, 99.9)

    label_kwargs = {'fontsize': 12}
    axes.set_title(label, **label_kwargs)

    sv = shap_values[sv_idx][idx].sum(0)
    axes.imshow(
        x_curr_gray,
        cmap=pl.get_cmap('gray'),
        alpha=0.3,
        extent=(-1, sv.shape[1], sv.shape[0], -1)
    )
    im = axes.imshow(
        sv,
        cmap=colors.red_transparent_blue, 
        vmin=-max_val, 
        vmax=max_val
    )
    axes.axis('off')

    fig.tight_layout()

    cb = fig.colorbar(
        im, 
        ax=np.ravel(axes).tolist(),
        label="SHAP value",
        orientation="horizontal"
    )
    cb.outline.set_visible(False)
    pl.show()

Excellent! This aligns with our intuition. We see that the model relies on the beam latticework to predict Tudor, the roofline to predict Craftsman, and flying buttresses to predict Gothic. It seems to consider window and door trim important to Victorian architecture and narrow windows a key feature of Classical design.

Conclusion

It’s easier than ever to apply deep learning techniques to any project. But with great power comes great responsibility! Understanding how a model arrives at its conclusions is essential to building trust with stakeholders and debugging your model.

Our architecture classifier could also be a visual teaching tool to explain what makes each architectural design distinct and could be incorporated into some kind of flashcard system to help others learn the differences. Sometimes, the explanations can also be the goal itself!

Applying DAG’s to Causal Models

I’ve been reading “The Book of Why” by Judea Pearl over the past few weeks, which has really helped formalize my intuition of causation. However, the book would be much better if Pearl left out any sentences written in the first-person as he has an annoying tendency to style himself as a messiah proclaiming the enlightened concepts of Causation to all the lowly statisticians still stuck on Correlation.

If we can look past his self-aggrandizing remarks, “The Book of Why” applies causal models to examples from the surgeon general’s committee on smoking in the 1960’s to the Monty Hall paradox. By reducing these multi-faceted problems down to a causal representation, we can finally put our finger on contributing factors or “causes” and control for them (if possible) to isolate the effect we are attempting to discover.

Perhaps the biggest takeaway for me from this book is the need to understand the data generation process when working with a dataset. This might sound like a no-brainer but too often, data scientists are so eager to jump in to the big shiny ball pit of a new dataset that they don’t stop to think about what this data actually represents.

Data scientists with a new dataset

By including the process by which the data was generated in these causal models, we can augment our own mental model and unlock the true relationships behind the variables of interest.

So what’s a DAG?

Directly acyclic graphs (DAG’s) are a visual representation of a causal model. Here’s a simple one:

You were late for work because you had to change your car’s tire because it was flat. Of course, we could add on much more than this (why was it flat?) but you get the idea.

Junction Types

Let’s explore what we can do with DAG’s through different junction types.

Chain

This is the simplest DAG and is represented in the example above. A generalized representation below shows that A is a cause of B, which is itself a cause of C.

Collider

Now we have two causes for C. Both A and B affect the outcome C.

Conditioning on C will reveal a non-causal, negative correlation between A & B. This correlation is called collider bias.

We can understand this effect in crude mathematical terms. If A + B = C and we hold C constant, then we must increase A by the same amount we decrease B.

Additionally, this phenomenon is sometimes also called the “explain-away effect” because C “explains away” the correlation between A and B.

Note that the collider bias may be positive in cases when contributions from both A and B are necessary to affect C.

An example of a collider relationship would be the age-old nature vs. nurture question. Someone’s personality (C) is a product of both their upbringing (A) and the genes (B).

Fork

In the case of a fork, A affects both B and C.

Without conditioning on A, there exists a spurious (non-causal) correlation between B & C. A classic example of a spurious correlation is the relationship between crime (B) and ice cream sales (C). When you plot these two values over time, they appear to increase and decrease together, suggesting some kind of causality. Does ice cream cause people to commit crime?

Of course, this relationship can be explained by adding in temperature (A). Warmer weather causes people to leave their homes more often, leading to more crime (B). People also crave ice cream cones (C) on hot days.

Node Types

Mediators

A mediator is the node that “mediates” or transmits a causal effect from one node to another.

Again using the example below, B mediates the causal effect of A onto C.

Confounders

Harking back to the crime-and-ice-cream example, temperature is the confounder node as it “confounds” the relationship between ice cream sales and crime.

If we control for the confounder (A), we can isolate the relationship between C and B, if one exists. This is a key concept for experimental design.

Correcting for Confounding

Let’s spend some more time on this subject. Pearl’s assertion is that if we control for all confounders, we should be able to isolate the relationship between the variables of interest and therefore prove causation, instead of mere correlation.

Pearl defines confounding more broadly as any relationship that leads to P(Y|do(X)) \neq P(Y|X), where the do operator implies an action. In other words, if there is a difference between the probability of an outcome Y given X and the probability of Y given X in a perfect world in which we were able to change X and only X, then confounding is afoot.

Four Rules of Information Flow

Pearl has 4 rules for controlling the flow of information through a DAG.

    1. In a chain (A → B → C), B carries information from A to C. Therefore, controlling for B prevents information about A from reaching C and vice versa.

    2. In a fork (A ← B → C), B is the only known common source of information between both A and C. Therefore, controlling for B prevents information about A from reaching C and vice versa.

    3. In a collider (A → B ← C), controlling for B “opens up” the pipe between A and C due to the explain-away effect.

    4. Controlling for descendants of a variable will partially control for the variable itself. Therefore, controlling the descendant of a mediator partially closes the pipe, and controlling for the descendant of a collider partially opens the pipe.

Back-door criterion

We can use these causal models as represented by DAG’s to determine how exactly we should remove this confounding from our study.

If we are interested in understanding the relationship between only X and Y, we must identify and dispatch any confounding back-door paths, where a back-door path is any path from X to Y that starts with an arrow into X.

Pearl’s Games

Pearl devises a series of games that involve increasingly complicated DAG’s where the objective is to “deconfound” the path from X to Y. This is achieved by blocking every non-causal path while leaving all causal paths intact.

In other words, we need to identify and block all back-door paths while ensuring that any variable Z on a back-door path is not a descendant of X via a causal path to Y.

Let’s go through some examples, using the numbered games from the book.

Game 2

We need to determine which variables (if any) of A, B, C, D, or E need to be controlled in order to deconfound the path from X to Y.

There is one back-door path: X ← A → B ← D → E → Y. This path is blocked by the collider at B from the third rule of information flow.

Therefore, there is no need to control any of these variables!

Game 5

This one’s a bit more interesting. We have two back-door paths:

    1. X ← A → B ← C → Y

    2. X ← B ← C → Y

The first back-door path is blocked by a collider at B so there is no need to control any variables due to this relationship.

The second path, however, represents a non-causal path between X and Y. We need to control for either B or C.

But watch out! If we control for B, we fall into the condition outlined by Pearl’s third rule above, where we’ve controlled for a collider and thus opened up the first back-door path in this diagram.

Therefore, if we control for B, we will then have to control for A or C as well. However, we can also control for only C initially and avoid the collider bias altogether.

Conclusion

DAG’s can be an informative way to organize our mental models around causal relationships. Keeping in mind Pearl’s Four Rules of Information Flow, we can identify confounding variables that cloud the true relationship between the variables under study.

Bringing this home for data scientists, when we include the data generation process as a variable in a DAG, we remove much of the mystery surrounding such pitfalls as Simpson’s Paradox. We’re able to think more like informed humans and less like data-crunching machines—an ability we should all be striving for in our increasingly AI-driven world.

How to Scrape LinkedIn Sales Navigator with Python

This guide will show you how to use Python to:

    1. Log into LinkedIn Sales Navigator

    2. Search by company name

    3. Filter search results

    4. Scrape returned data

My code can be found on Github but I’ll explain how each section works, if you’d like to customize it for your own project.

What is LinkedIn Sales Navigator?

LinkedIn Sales Navigator is LinkedIn’s paid sales toolset. It mines all that data you and I have freely handed to LI over the years and gives sales organizations the power to create leads and manage their pipeline. It can integrate with your CRM to personalize your results and show additional information.

LI Sales Navigator markets itself to sellers but its data aggregations are also a gold mine for creating insights. For example, I wanted data on employees’ tenure with their company. This would be very difficult using vanilla LinkedIn–I’d have to click into each employee’s profile individually.

LinkedIn Sales Navigator brings that data directly to you and allows you to filter results by geography or years of experience. The screenshot below gives you an idea of the kind of data returned for employees (known as leads in LinkedIn parlance) but similar aggregations are performed on companies (known as accounts).

Example of LinkedIn Sales Navigator search results

Is scraping legal?

Scraping publicly accessible online data is legal, as ruled by a U.S. appeals court.

But just because scraping is legal doesn’t mean LinkedIn is going to make this easy for us. The code I’ll walk through will show some of the challenges you might run into when attempting to scrape LI but beware! Scraping is software’s equivalent of the Castaway raft—hacked-together and good for one ride only. LI changes its website frequently and what worked one day may not work the next.

Image credit to Spirituality and Practice

How to scrape

Setting up

I used the Python library selenium to scrape data from LinkedIn Sales Navigator within the Chrome browser. You’ll need to run the following from a terminal window to install the required libraries:

pip install selenium
pip install webdriver_manager

It’s best practice to set up a dedicated virtual environment for this task. I recommend conda, though other options include virtualenv or poetry. Check out this helpful resource to access your new conda environment from a Jupyter Notebook.

Logging in

I chose to wrap my scraping functions within a class named LIScraper. An object-oriented approach allowed me to create new Chrome sessions with each instance of LIScraper, simplifying the debugging process.

The only input when instantiating the class is path_to_li_creds. You’ll need to store your LinkedIn username and password within a text file at that destination. We can then instantiate our scraper as shown below.

scraper = LIScraper(path_to_li_creds='li_creds.txt')
scraper.log_in_to_li_sales_nav()

This code will open up a new Chrome window, navigate to the LinkedIn Sales Navigator home page, and log in using the provided credentials.

Start with the goal

Before we go any further, let’s take a quick look at the master function gather_all_data_for_company that accomplishes my specific scraping task.

I wanted to search for a given company, find all current employees with the keyword “data” in their job title, and then scrape their job title and company tenure from the website.

Let’s break this down sequentially.

1. Search by company

I needed to scrape results for 300 companies. I didn’t have the time or patience to manually review LI’s search results for each company name. So I programmatically entered in each company name from my list and assumed that the first search result would be the correct one.

This was a faulty assumption.

I then tried to guide the search algorithm by restricting results to companies within my CRM (crm_only=True) but this still did not guarantee that the first search result was the right one.

As a safeguard, I logged the name of the company whose data I was collecting and then manually reviewed all 300 matches after my scraping job finished to find those that did not match my expectations. For any mismatches, I manually triggered a scraping job after selecting the correct company from LI’s search results.

2. Search for employees

I then wanted to find all job titles containing a specific keyword.

You might notice several layers of nested try-except clauses in this function. I could not understand why the code would run successfully one minute but would then fail when I tried to execute it again immediately after. Alas, the problem was not in how I selected the element on the page but in when I attempted to select it.

I just needed to add more time (ex. time.sleep(4)) before executing my next step. Webpages can take a long time to load all their elements, and this loading time can vary wildly between sessions.

Helpful hint: If your scraping code does not execute successfully in a deterministic manner, add more time between steps.

3. Gather the data

We’re now ready to scrape some data!

First, we scroll to the bottom of the page to allow all the results to load. Then we assess how many results LI returned.

CAUTION: The number of results actually returned by LinkedIn does not always match the number LinkedIn claims to have returned.

I had initially tried to scrape 25 results per page or the remainder if the number of results returned was not an even multiple of 25. For example, if the number of results LI claimed to have returned was 84, I’d scrape three pages of 25 results each and then scrape the remaining 9 results on the last page.

But my job would throw an error when this last page contained just 8 results. Why would LI claim to have found 84 results when in reality, it only had 83? That remains one of the great mysteries of the internet.

To get around this issue, I counted the number of headshots on the page to indicate how many results I’d need to scrape.

Scraping the data itself is relatively trivial once you understand the structure of the search results page. My strategy to find the path to the element I wanted was to right click and select “Inspect”. Then right-click on the highlighted HTML on the far-right (check out the orange arrow below) and go to Copy → Copy full XPath.

I stored the page’s search results in a pandas dataframe and concatenated the data from each new page onto the previous ones.

One last warning

Remember LI isn’t a big fan of scrapers. Your Chrome window will flash the dreaded 429 error if you hit their webpage too frequently. This error occurs when you exceed LI’s rate-limiting threshold. I am not sure what that threshold is or how exactly long you must wait before they reset your allowance.

I needed to scrape data from 300 companies whose returned search results ranged from 10 to 1000. My final dataset contained nearly 32,000 job titles. I only ran into the 429 error twice. Each time I simply paused my work for a couple hours before I restarting.

Notes on Humankind: A Hopeful History

Sometimes I wonder when it happened.

At what point did we take a wrong turn in the space-time continuum, away from the path of sanity and reason, to end up here?

The past few years read like a mash-up of tired Hollywood storylines. A deadly pandemic sweeping the globe, wildfires engulfing the American West, hoodlums storming the Capitol, war in Ukraine, elementary school students gunned down in classrooms.

Even the most mentally resilient among us are numb and drained. The world seems like an increasingly unfriendly place, filled with gun-toting crackpots and rabidly partisan politics.

Rutger Bregman argues otherwise in “Humankind: A Hopeful History”. He believes that certain influences—cognitive biases, excessive government, unscrupulous sociologists—have clouded our judgment and caused us to lose sight of the good inherent in humanity.

Could this be true? Or is Bregman a victim of wishful thinking, a Pollyanna living in an alternative reality?

The power of suggestion

A major theme in Bregman’s book is the idea of nocebos. Nocebos are intrinsically innocuous substances that actively cause harm when patients are led to believe that the substance is dangerous. A medical example might be a patient experiencing side effects from a medication when those side effects have been publicized but suffering no complaints when side effects were not previously listed.

Famous instances of nocebos include the 1939 Monster Study when researchers from the University of Iowa turned several orphans into lifelong stutterers after a round of “negative speech therapy” that criticized any speed imperfections.

Bregman considers our natural human tendency toward negativity bias to be acting as a nocebo within society. Our attraction to negative information served humans well in the distant past when we needed to stay on guard against lions and other imminent physical dangers. But the modern era has twisted this cognitive bias into a threat to today’s fast-paced and interconnected society.

Our addiction to negative stories has been amplified by social media and the 24-hour news cycle. Now the world is regularly depicted to us as a dark and dangerous place. In many cases, these stories are grossly exaggerated. Reports of widespread murder and rape after Hurricane Katrina were pure fiction. And the infamous tale of Kitty Genovese, whose dying screams were ignored by indifferent neighbors? Sensationalistic reporters chose not to include the facts that multiple calls were placed to the police during the murder and that a neighbor rushed to Kitty’s aid and held her as she took her last breath.

Never let the truth get in the way of a good story.

Add the nocebo effect into the mix, and we’ve turned perception into reality. Beliefs lead to actions. And in this case, our belief about the world’s unfolding disasters spawns apathy. Rolf Dobelli in his book “Stop Reading the News: A Manifesto for a Happier, Calmer and Wiser Life” makes the case that all this negative news teaches us “learned helplessness”. We don’t become more engaged citizens through voracious news consumption. Instead, we lose hope that such large intractable problems could ever be solved, and we simply accept our fate.

Belief is destiny

As someone who believes in an objective, physical truth, I found the idea that we can will a world into existence simply mind-blowing. But think of all the myths that people believe—the Catholic Church, the United States of America, the New York Stock Exchange. We’ve collectively agreed that these institutions exist although prior to their inception, there was no physical evidence to suggest that was the case. We willed them into being.

As Yuval Noah Harari detailed in “Sapiens: A Brief History of Humankind”, the human brain can only juggle about 150 relationships at a time. This phenomenon is called Dunbar’s Number and was first identified when researchers discovered a ratio between a primate’s brain size and the number of individuals in its social group. Applying that ratio to the size of a human brain gives us 150.

This magic number appears time and again throughout human society. Mennonite community sizes, factory employee counts, Christmas card lists—all hover optimally around the 150-person threshold.

If we aren’t able to form meaningful relationships with more than 150 people, how can we build a society across millions of individuals? This is where the concepts of “belief” and “myths” enter the picture. Belief is the glue that holds us together. Simply put, belief scales.

The lack of trust in our society today is so alarming because trust is a form of belief in one another. And without it, the elaborate societal infrastructure we’ve built around us will start to crumble.

Misleading evidence

But should we trust each other? Aren’t humans innately cruel and selfish creatures?

Bregman devoted much of the book to debunking several long-standing pieces of evidence that supposedly showed “humanity’s true face”. Let’s cover a few.

Stanford Prison Experiment

In this famous 1971 experiment run by Dr. Philip Zimbardo, undergraduate students were assigned to act as either prisoners or guards within a newly constructed “jail” in the basement of Stanford University. Over the space of just a few days, the guards meted out increasingly harsh and humiliating punishments to the prisoners. Zimbardo finally plugged the plug on the sixth day due to the rapidly deteriorating conditions. The conclusion from the aborted experiment: we quickly lose our humanity when put in a position of absolute power over others.

But this is misleading. The original intention of the study was to show how prisoners behave under duress. To aid that effort, the guards were briefed before the experiment began on how to behave (i.e. referring to the prisoners by number only and generally disrespecting them) and were reminded throughout the study to be “tough”.

The BBC partially replicated this sensational study in a made-for-television special but this time, the guards hadn’t been given any prior instructions. And the result must have greatly disappointed the show’s producers. No major drama, no dehumanization, no revelation of humanity’s evil. Instead, the prisoners and guards got along swimmingly.

Zimbardo’s botched experimental design leads me to agree with Bregman. The Stanford Prison Experiment is no proof of inherent human brutality.

Milgram’s Shock Experiments

In 1961, Yale University’s Dr. Stanley Milgram assigned study participants to a “teacher” role and informed them that a student sat in another room wired to an electrical shock machine. The teacher was to gradually increase the level of shocks applied to the student even as they heard cries of pain coming from the other room. Incredibly, 65% of the study participants completed the full treatment of shocks to the student.

Coming on the heels of WWII and the Holocaust, the study supposedly demonstrated the lengths people would go to bow to authority. This study has been replicated with similar results (unlike the Stanford Prison Experiment).

Bregman tries to view the experiment’s results in a positive light. His hypothesis is that the study participants simply wanted to be helpful to the researchers. In social research, demand characteristics refer to the idea that subjects alter their behavior to demonstrate what they expect the experimenter wants to see. In Bregman’s eyes, these participants are innately good people because they wanted to aid the experimenter, not harm the student.

While Bregman’s idea may have some validity, I question what he considers “good” and “evil” in this case. If an action is “good” only because someone else wants us to do it, and we are therefore helping them, is that not a kind of moral relativism? I hope we can all agree that causing unprovoked harm to another person is an absolute wrong, and the mental gymnastics involved in calling this behavior “good” reeks of retroactive justification.

While I normally try not to paint the world in black and white, Bregman’s central thesis forces us to consider where we draw the line between good and evil. Bregman never bothers to define what he considers to be “good” but I’ll offer my own definition: choosing the path that minimizes total harm inflicted. When we apply this definition to Milgram’s study, we see that the total harm to the student outweighs the total harm (inconvenience, really) done to the experimenter by refusing to continue with the study. Therefore, any rational person attempting to do “good” should bow out of the experiment, and I cannot subscribe to Bregman’s argument that this study proves humanity’s inherent good through “helpfulness”.

Nazism

Bregman then tries to downplay Nazism. The trial of Holocaust organizer Adolf Eichmann inspired philosopher Hannah Arendt’s idea of the “banality of evil”, which maintains that Eichmann was motivated by not fanaticism but by complacency.

Bregman devoted an early chapter to explaining how Homo sapiens evolved to rule the earth through superior sociability and learning skills. It is precisely this tendency to “go with the flow” and adopt the mentality of others that makes us human. But that doesn’t mean we should excuse complacency in the face of evil. Applying our definition above of “good” shows that total harm inflicted by complacency is much greater than that caused by standing up for what is right.

Bregman tries to reframe humanity’s sheeplike tendencies as either proof of intrinsic good or as neutral behavior that cannot be used as evidence of our innate evil. Yet I remain convinced that inaction in the face of evil is an evil itself.

Recognizing our humanity

This is not to say that Bregman’s arguments are worthless. I agree that active evil, evil independently conceived, is probably rare in our society. And that most people want to do good, although their definition of good may differ from my own.

Bregman also makes a compelling case against the idea that humans are innately bloodthirsty and war-mongering. He cited several examples from historical battles where soldiers aimed over each other’s heads or filled muskets several times over just to avoid having to shoot. Only 15% of American soldiers in WWII ever fired a weapon in action, even when ordered to do so.

That percentage no longer holds true in today’s military. Infantry training now involves a desensitization process where soldiers lose their reluctance to shoot at another human. Though the need for them to shoot at all is becoming less likely with the arrival of armed drones. The more distance we can put between ourselves and those we wish to harm, the more willing we are to commit acts of violence.

People find killing another human cognitively difficult when they recognize them as another human being. A classic example comes from George Orwell who fought in the Spanish Civil War:

“A man presumably carrying a message to an officer, jumped out of the trench and ran along the top of the parapet in full view. He was half-dressed and was holding up his trousers with both hands as he ran. I refrained from shooting at him…I did not shoot partly because of that detail about the trousers. I had come here to shoot at ‘Fascists’; but a man who is holding up his trousers isn’t a ‘Fascist’, he is visibly a fellow-creature, similar to yourself, and you don’t feel like shooting at him.”

Bregman maintains that recognizing the humanity in each other is one of the most direct ways to combat the rising tide of distrust in society. He cites Norwegian prisons as a shining example of this idea in action.

Instead of the punitive approach favored by the American prison system, Norwegian prisons are designed to be rehabilitative. Many Americans might mistake one of these prisons for a resort complete with yoga classes, cross-country ski trails, and woodworking shops (fully equipped with potentially lethal tools). Guards are encouraged to form relationships with prisoners and to always treat them as fellow human beings.

The results are startling. Norway’s recidivism rate is only 20%. Compare that figure to the U.S. where 76% of criminals are repeat offenders and you can see that these relatively cushy prisons pay for themselves.

Recognizing the humanity in others requires us to bridge the divide between groups. Too often our innate tribalism causes us to objectify members of out-groups and view them with suspicion. We see this perspective in action when tracking support for Trump’s proposed wall along the border with Mexico. Americans who live closer to the border are less likely to approve of a wall against our neighbors.

Our innate xenophobia is a classic case of System 1 thinking, a decision-making process that is instinctual and low-effort. Thinking is hard work! We can’t deeply analyze every decision in our everyday lives so we default to System 1 thinking unless the situation calls for more brainpower (like calculating the tip on a restaurant bill). Anything requiring concentration will kickstart System 2 thinking.

Deciding whether or not to trust someone is a cognitive burden. To compensate, we fall back on heuristics like in-groups and out-groups to shortcut those decisions. We must make an effort to overcome this tendency, trigger our System 2 thinking, and actively evaluate strangers as fully-dimensional human beings, instead of relying on shallow stereotypes or an us-vs-them mentality.

Rules to live by

Bregman ends his book with ten “rules to live by”. While many of these rules seemed rather obvious (“Think in win-win scenarios”), a few were worth incorporating into daily life.

When in doubt, assume the best

Deciding to not trust another person results in asymmetrical feedback: if I trust someone to watch my laptop while I take a bathroom break, I’ll receive either a confirmation of that trust if my laptop is still there upon my return or a contradiction if I return to find both the stranger and my laptop missing. But if I didn’t take a chance on trusting that person, I’d never know if I was right. I could never update my opinions and way of thinking.

Temper your empathy, train your compassion

Empathy is exhausting. Our sympathetic reaction to negative news is why we feel so drained and pessimistic after a daily barrage of bad press.

Instead, Bregman suggests we should feel for others and send out warm feelings of care and concern. He classifies this attitude as “compassion”—a more active attitude that leaves us feeling energized instead of drained.

Love your own as others love theirs

Recognize the humanity in everyone. Extend your circle of love and compassion outside your family and friends, and remember that every stranger you meet is someone’s son, daughter, mother, father, husband, or wife.

This mentality is more cognitively demanding and will require System 2 thinking. But with practice, we can train our brain and form a habit of embracing the full spectrum of a person’s humanity.

Don’t be ashamed to do good

Every spring, I trek out to my neighborhood park to pick up the litter revealed by the melting snow. And every year, I feel self-conscious wandering around the park, garbage bag in tow, grabbing at old chip bags with my trash picker.

I don’t perform this annual rite of spring cleaning to spur anyone else into action. On the contrary, I’d much prefer if there were no witnesses at all. But Bregman reminds me that doing good can be contagious and that my act of service can have ripple effects beyond the park.

After all, how can we expect to live in a culture of selflessness if we aren’t leading by example?

What’s next

While I found some of Bregman’s arguments flawed, his book still reframed my view of human nature and convinced me that suspicion and cynicism are not default human behaviors. Intervention is still possible to rebuild trust by buying into a belief of goodwill and positivity.

Beliefs are the fabric of society, the only threads tying us together. We must remember that hope is a type of belief. It is only through collective hope that anything of value has been achieved in society. Dismayed we are divided. Hopeful we are united. We must all believe in a positive future to make it so.

The dark side of data storytelling

Data storytelling is arguably the most important skill an analyst or data scientist can possess. We can consider it the “last mile” of analytics—transforming a jumble of numbers and statistics into a memorable narrative.

This step is crucial because humans crave stories. We learn best from lessons packaged in narratives.

But forming our statistics into a compelling story requires making a series of choices that—in the wrong hands—can twist our results into a misleading conclusion.

To illustrate this potential danger, let me tell you a story.

The path not taken

Doug is a data analyst at Doober, a ride-sharing platform for dogs, and the bigwigs upstairs are noticing that more users than usual are defecting from the app after their first trip. Doug has been tasked with investigating why.

Doug digs into the data. This is his chance to shine in front of the execs! He analyzes anything he can think of that could be related to the root cause of these defections–user demographics, ride length, breed of the dog passengers.

Nothing seems to lead to any conclusions.

Finally, he checks driver tenure and discovers that nearly 90% of defections occur after a ride with a driver who had logged fewer than 5 trips.

“Aha!”, Doug thinks. “Perhaps inexperienced drivers don’t know how to handle these dogs just yet. We should pair new users with more experienced drivers to gain their trust in our platform.”

Satisfied with this insight and already imagining the praise he’ll receive from his manager, Doug starts to put together a presentation. But then, on a whim, he runs the numbers on driver experience across the entire population.

Doug finds that inexperienced drivers account for 85% of first-time trips for new users. This makes sense because Doober is a rather new platform, and most drivers haven’t had time to rack up many trips yet.

Is the difference in the fraction of trips logged by inexperienced drivers statistically significant between the defections and the wider population?

Doug could run a t-test against the trips that didn’t result in a defection to find out…or he could ignore this insight. After all, it doesn’t fit in his narrative, and Doug ­needs a narrative to present to the execs. Data storytelling is important, right?

Is Doug’s insight that most defections occur after a trip with an inexperienced driver wrong? No. But Doug has invited flawed conclusions in favor of telling a slick story.

Cherry-picking

The story of the Doug the analyst is an especially egregious example. Data contortions committed in the name of building a “narrative” are usually more subtle and unconsciously done.

An analyst may simply stop looking once they believe they’ve found an answer and have the data to back it up. After all, time is money.

Now I don’t mean to imply that all analysts are willfully distorting the statistics, like our friend at Doober. But with such a strong emphasis on data storytelling as a core component of the job, junior analysts may start to prioritize narrative over completeness.

Anyone who has read “How to Lie with Statistics” is familiar with the fact that a single dataset can produce diverging storylines, depending on how you slice and parse the data. Telling the story that best represents the data relies on an analyst’s professional integrity and judgment.

How to tell a data story with integrity

In an ideal world, analysts would complete their analysis in a vacuum of rationality before beginning to form a narrative. But that’s never how it really works.

Analysts must form narratives in their mind while analyzing data. Putting results in the context of a narrative allows them to ask the next logical question: if I’ve discovered x, what does that mean for y?

So how can we preserve our analytical integrity while exercising our story-telling creativity? The recommendations below are general best practices for any analysis but especially before committing yourself to a final data narrative.

    1. Ensure your data is sampled appropriately. Could your data collection method have been biased in any way? Do you have enough data points to draw a reasonably confident conclusion? Consider including confidence intervals on any supporting plots or in footnotes.

    2. Carefully consider your data’s distribution. Will your chosen statistical method best describe your data? For example, if you have a long-tailed population, it may be sensational to report a mean (“Average salary at our company is $500,000!”) when a median would better represent the answer to the question you are being asked (“A typical employee at our company makes $75,000 a year”).

    3. Be extremely explicit about correlation vs. causation. This was one of the major blunders Doug made. Just because defections appeared to be correlated with driver inexperience did not mean that driver inexperience caused user defections. Even if Doug presented his findings without any causal language, the execs would infer causation from the context in which he’s presenting.

      Clearly differentiate between causation and correlation. Use bold font, stick it in a big yellow box on your slide, scream it from the rooftops.

    4. Use footnotes for any messy details. All stories require omission. Every author must constrain themselves to the most relevant details while continuing to engage their audience.

      If you have additional facts and figures that support your story but don’t add to the overall narrative, include them as footnotes or within an addendum. Most execs just want the headline but you never know when you might be presenting to a particularly engaged manager who wants to dig into the details.

    5. Don’t be afraid to say “we don’t know”. The pressure to craft a data narrative can be strong, and confessing you don’t have a story seems tantamount to admitting failure.
      But sometimes you just might not have the necessary data. The business might not have been tracking the right metric over time or the data could be so riddled with quality issues as to be unusable.
      Learn from this experience, and implement a plan to start tracking new relevant metrics or fix the cause of the data quality issues. Look for other angles to attack the problem—perhaps by generating new data from a survey sent to defected users. It’s always better to admit uncertainty than to send the business on a wild goose chase.

FAQ: Data Science Bootcamps

For the past two years, I’ve been mentoring aspiring data scientists through Springboard’s Data Science Career Track Prep course. I myself graduated from Springboard’s Career Track program back in 2017, so I know firsthand how intimidating it can be to try and rebrand yourself as a data scientist without a traditional degree.

I’ve noticed my mentees often have the same questions as their peers concerning the data science industry and the bootcamp track they’ve chosen. I’m compiling a list of these FAQ’s for all aspiring data scientists who are considering a bootcamp or just want my take on breaking into this field.

What advice do you have about searching for my first job?

I highly recommend that your first job as a data scientist (or data analyst) is at an organization large enough to already have a data team in place. You’ll need a mentor—either informally attached or formally assigned—in those first few years, and that sexy itty-bitty start-up you found on AngelList isn’t going to provide that to you.

I also advise that you read the job descriptions very closely. The title of “data scientist” can carry some prestige so companies will slap it on roles that aren’t actually responsible for any real data science.

Even if you think the job description outlines the kind of role you’re interested in, ask lots of probing questions in the interview around the day-to-day responsibilities of the job and the structure of the team. You need to understand what you’re agreeing to.

Some suggested questions:

    • What do you expect someone in this role to achieve in the first 30 days? The first 90?

    • What attributes do you think are necessary for someone to be successful in this role?

    • How do you measure this role’s impact on the organization?

    • How large is the team and what is the seniority level of the team members?

    • Is your codebase in R or Python?

Don’t be afraid to really dig deep in these conversations. Companies appreciate candidates who ask thoughtful questions about the role, and demonstrating that you’re a results-oriented go-getter will separate you from the pack.

What’s the work/life balance like for data scientists?

I think I can speak for most data scientists here when I say work/life balance is pretty good. Of course, situations can vary.

But I believe data scientists occupy a sweet spot in the tech industry for two reasons:

    1. We don’t usually own production code. This means that, unlike software engineers, we aren’t on call to fix an issue at 2 am.

    2. Our projects tend to be long-term and more research-oriented so stakeholders typically don’t expect results within a few days. There are definitely exceptions to this but at the very least, working on a tight deadline is not the norm.

That being said, occasionally I do work on the weekends due to a long-running data munging job or model training. But those decisions are entirely my own, and overall I think the data scientist role is pretty cushy.

Besides Python, what skills should I focus on acquiring?

Contrary to what a lot of the internet would have you believe, I really don’t think there’s a standard skillset for data scientists outside of basic programming and statistics.

In reality, data scientists exist along a spectrum.

On one end, you’ve got the Builders. These kinds of data scientists have strong coding skills and add value to an organization by creating or standardizing data pipelines, building in-house analytical tools, and/or ensuring reproducibility of projects through good software engineering principles. I associate with this end of the spectrum.

On the other end, we have the Analysts. These data scientists have a firm grasp of advanced statistical methods or other mathematical subjects. They spend most of their time exploring complex feature creation techniques such as signal processing, analyzing the results of A/B experiments, and applying the latest cutting-edge algorithms to a company’s data. They usually have an advanced degree.

It is very rare to find someone to truly excels at both ends of the spectrum. Early on in your career, you might be a generalist without especially strong skills on either end. But as you progress, I’d recommend specializing on one end or the other. This kind of differentiating factor is important to build your personal brand as a data scientist.

Don’t I need a PhD to become a data scientist?

It depends.

If your dream is to devise algorithms that guide self-driving cars, then yes, you’ll need a PhD. On the other hand, if you’re just excited about data in general and impacting organizations through predictive analytics, then an advanced degree is not necessary.

Sure, a quick scroll on LinkedIn will show a lot of data scientist job postings that claim to require at least a master’s degree. But there are still plenty of companies that are open to candidates with only a bachelor’s degree. Just keep searching. Or better yet, mine your network for a referral. This is hands-down the best way to land a job.

Another route is to get your foot in the door through a data analyst position. These roles rarely require an advanced degree but you’ll often work closely with data scientists and gain valuable experience on a data team.

Many companies will also assign smaller data science tasks to analysts as a way to free up data scientists’ time to work on long-term projects. Leverage your time as an analyst to then apply for a data scientist role.

I’m considering applying for a master’s program instead of going through the bootcamp program. What do you suggest?

I’m hesitant about data science master’s programs.

When you commit to a master’s degree, you’re delaying your entry into the job market by two years. The field of data science is changing so rapidly that two years is a long time. The skills hiring managers are looking for and the tools data teams use may have shifted significantly in that time.

My personal opinion is that the best way to optimize your learning curve is to gather as much real-world experience as possible. That means getting a job in the data field ASAP.

Additionally, the democratization of education over the past decade now means that high-quality classes are available online for a tiny fraction of the price traditional universities charge their students. I’m a huge fan of platforms like Coursera and MIT OCW. My suggested courses from these sites can be found here.

At the end of the day, companies are just relying on these fancy degrees as a proxy for your competence to do the job. If you can show that competence in other ways, through job experience or a project portfolio, shelling out tens of thousands of dollars and multiple years of your life is not necessary.

I’m planning on working full-time while enrolled in the bootcamp program. Do you think that’s doable?

Yes.

But it will require discipline and long unbroken stretches of time. You won’t be able to make any meaningful progress if you’re only carving out time to work on the bootcamp from 5-6 pm every weeknight. Your time will be much better spent if you can set aside an entire afternoon (or better yet, an entire weekend) to truly engage with the material.

Committing to the bootcamp requires a re-prioritization of your life. As Henry Ford said, “if you always do what you’ve always done, you’ll always get what you’ve always got.”

What advice do you have about the capstone projects?

Find data first.

I recommend Google’s dataset search engine or AWS’s open data registry. Municipal governments also do a surprisingly good job of uploading and updating data on topics ranging from employee salaries to pothole complaints.

Once you’ve found an interesting dataset, then you can start to formulate a project idea. For example, if you have salary data for municipal employees, you could analyze how those salaries vary by a city’s political affiliation. Are employees in Democratic-controlled areas paid more than Republican-controlled or vis versa? What other correlations could affect this relationship?

Once you understand those interactions, you could train a machine learning model to predict salary based on a variety of input, from years of experience to population density.

It is much harder to start with an idea and then scour the internet trying to find the perfect publicly available dataset. Make your life easier, and find the data first.

Closing

All that being said, the most important piece of advice I can give is to have fun.

You’re making this career change for a reason, and if you’re not enjoying the learning process, then you might be on the wrong path. Becoming a data scientist is not about the money or the prestige—it’s about the delight of solving puzzles, the joy of discovering patterns, the gratification of making a measurable impact. I sincerely hope that you find your career in data as satisfying as I’ve found mine so far.

So buckle up and enjoy the ride!

A guide for data science self-study

My path to becoming a data scientist has been untraditional.

After receiving a B.S. in chemical engineering, I worked as a process engineer for a wide range of industries, designing manufacturing facilities for products as varied as polyurethanes, pesticides, and Grey Poupon mustard.

Tired of long days on my feet starting up production lines and longing for an intellectual challenge, I discovered data science in 2017 and decided to pivot my career.

I participated in Springboard’s part-time online bootcamp and managed to land a job as a junior data scientist shortly thereafter. But my journey to learning data science was really only just beginning.

Besides the bootcamp, all my data science skills are self-taught. Fortunately, today’s era of education democratization has made that kind of path possible. For those that are interested in pursuing their own course of self-study, I’m including my recommended classes/resources below.

Python

MIT OCW’s Introduction to Computer Science and Programming

I enjoy lecture-style classes with corresponding problem sets, and I thought this class catapulted my Python skills farther and faster than a lot of the online interactive courses like DataCamp.

Additionally, this course covers more advanced programming topics like recursion that—at the time—I hadn’t thought were necessary for data scientists. Fast-forward six months when I was asked a question on recursion during the interview for my first data science job! I was so grateful to this course for providing a really solid education in Python and general coding practices.

Algorithms and Data Structures

Coursera’s Data Structures and Algorithms Specialization

I only completed the first two courses (Algorithmic Toolbox and Data Structures) of the specialization but I don’t believe the more advanced topics are necessary for your average data scientist.

I can’t recommend these courses highly enough. I originally had enrolled just hoping to become more conversant with common algorithms like breadth-first search but I found myself using these concepts and ways of thinking at my job.

The professors who designed these online classes have done a fantastic job of incorporating games to improve your intuition about a strategy and designing problem sets that force you to truly understand the material. There’s no fill-in-the-blank here—you’re given a problem and you must code up a solution.

I also recommend starting with the Introduction to Discrete Mathematics for Computer Science specialization, even if you already have a technical background. You’ll want to make sure you have a solid foundation in those concepts before undertaking the DS&A specialization.

Linear Algebra

MIT OCW’s Linear Algebra

I needed a refresher on linear algebra after barely touching a matrix in the ten years since my university days. And MIT’s videotaped lectures from 2010 with associated homework and quizzes was a great way to cover the basics.

The quality of this course is entirely thanks to Professor Gilbert Strang, who is passionate about linear algebra and passionate about teaching (a rare combination). He covers this subject at an approachable level that doesn’t require much complicated math.

I did supplement this course with 3Blue1Brown‘s YouTube series on Linear Algebra. These short videos can really help visualize some of these concepts and build intuition.

Machine Learning

Andrew Ng’s Machine Learning

Taking this course is almost a rite of passage for anyone choosing to learn data science on their own. Professor Andrew Ng manages to convey the mathematics behind the most common machine learning algorithms without intimidating his audience. It’s a wonderful introduction to the ML toolbox.

My one gripe with this course is that I didn’t feel like the homework really added to my understanding of the algorithms. Most of the assignments required me to fill in small pieces of code, which I was able to do without fully comprehending the big picture. I took the course in 2017, however, so it’s possible this aspect of the class has improved.

Deep Learning

fastai’s Practical Deep Learning for Coders

Andrew Ng’s Deep Learning course is just about as popular as the Machine Learning course I recommended above. But after completing his DL class, I only had a vague understanding of how neural networks are constructed without any idea of how to train one myself.

The folks at fastai take the opposite approach. They give you all the tools to build a neural network in the first few lessons and then spend the remaining chapters digging into opening the black box and discussing how to improve the performance. This is a much more natural way of learning and leads to better retention upon course completion.

There are videotaped lectures discussing these concepts but I would recommend just reading the book because the lectures don’t add any new material. The book is actually a series of Jupyter notebooks, allowing you to run and edit the code.

Closing

I will warn that this path is not for everyone. There were many times when I wished I could work through a problem with a classmate or dig deeper into a concept with the professor. Online discussion forums for these kinds of classes are not the same as real-time feedback. Perseverance, self-reliance, and a lot of Googling are all necessary to get the most out of a self-study program.

The variety of skills and knowledge data scientists are supposed to have can be overwhelming to newcomers in this field. But just remember—no one knows it all! Simply embrace your identity as a lifetime learner, and enjoy the journey.

Tackling climate change with data viz

Do you feel overwhelmed by the seemingly impossible task of averting the approaching climate crisis?

I usually do. Modern humans (or at least Americans) are addicted to their F-150’s, filet mignon, and flights abroad. Relying on individual restraint will not solve global warming.

Our best chance is for governments to step in and steer us toward a carbon-free future. But where to start?

This is where an intuitive climate simulator called En-ROADS comes into play. Created by Climate Interactive and MIT Sloan’s Sustainability Initiative, this tool allows a user to effectively create their own policy solution to climate change.

Where are these numbers coming from?

Of course, it’s not that simple. Under the hood, the simulator is running nearly 14,000 equations over 110 years from 1990 to 2100 in just 60 milliseconds. These equations rely on factors such as delay times, progress ratios, price sensitivities, historic growth of energy sources, and energy efficiency potential culled from the literature.

If you’re interested in more of the science and math behind the simulation, the En-ROADS team has documented all their assumptions, parameters, and equations in a reference guide that runs nearly 400 pages long. Climate Interactive’s docs offer a more digestible read that also includes a “Big Message” takeaway for each of the levers.

Let’s start building our climate-friendly world

The interface looks like the screenshot below, which shows the starting scenario. This is “business-as-usual”, leading us to an increase of 3.6°C by the year 2100. We see from the colorful plot on the left that the model already predicts a rise in renewable energy by that time—however it looks like the additional renewables go directly to powering a more energy-intensive society as the exojoules expected from other energy sources are roughly constant.

Let’s try to avert this disaster. Coal seems like an easy place to start. We’ll tax it to the max ($110 per ton).

Temperature increase is now at 3.4°C. Not exactly the big boost we were hoping for.

I spent some time playing around with the simulator to limit our warming to 2°C, which is often cited as the threshold before catastrophe. I tried to implement policies that I thought might be politically feasible in the United States. Ones that either retooled existing jobs (electric cars vs. conventional ones) or created new business without disrupting existing ones (increased energy efficiency in buildings).

My main takeaways from the simulation:

Carbon price of $70 per ton → 0.5°C temperature reduction

Implementing a carbon price resulted in the biggest bang for our buck.

The carbon impact of an economy flight from NYC to LA is a half ton of CO2, which I used as a quick benchmark to set a carbon price that didn’t send me into sticker shock. I considered an additional charge of $35 for that flight to be a fee I could swallow, a price En-ROADS labeled “high”.

Note that the simulator also allows you to choose the timeline for this carbon tax to phase in. The default was 10 years to reach the final price, which I did not change.

Population of 9.1 billion in 2100 → 0.1°C temperature reduction

I set population all the way to the left, which corresponds to the lower bound on the 95% probability range from the UN. Considering that not having children is the most environmental choice you can make, I expected a bigger boost from fewer people on the planet.

I suspect the reason behind such a small decrease in warming is that UN model assumes that depressed population growth would come from women’s empowerment campaigns in developing countries, which do not account for the lion’s share of emissions.

Growth of 0.5% GDP/year → 0.1°C temperature reduction

Given the emphasis on moving to a circular economy and away from a growth mindset, I figured limiting our economic growth would result in a sizable reduction in warming.

Wrong.

Granted, the model allows 75 years to achieve this lower GDP growth rate from the current rate of 2.5% GDP growth but a sacrifice of 1% GDP growth in exchange for just 0.1°C in temperature reduction seems like a waste of political capital.

Methane reduction of 75% → 0.3°C temperature reduction

Methane is low-hanging fruit. While the simulator allows us to also limit emissions from certain industries as shown below (agriculture, mining, etc.), I set my reduction as 75% across the board. This would require a more plant-based diet for all, as well as increased accountability within heavy industry to reduce methane emissions.

Other actions taken to limit warming to 2°C included a reduction in deforestation with an increase in afforestation, as well as a modest subsidy for renewable energy. Notably, I did not pull the lever on technological carbon removal, although it’s an easy win to reduce warming. Those technologies have not been proven out, and I don’t believe we can count on them to swoop in and save the day.

My final world is one in which we’ve electrified our homes and transit, phased out coal, aggressively plugged methane emissions, and protected and planted forests. Of course, these initiatives are not simple—electrifying and retrofitting every home in America sounds daunting at best.

But it’s not impossible. From 1968 to 1976, The UK converted every single gas appliance in the country from town gas to natural gas. Just eight years to accomplish what some called “the greatest peacetime operation in the nation’s history”.

Where there’s a will, there’s a way.

Data visualization for social change

But perhaps a more immediate takeaway from the En-ROADS simulation is the experience of using the simulator itself. By distilling the giant thorny problem of climate change into a tangible set of levers, the tool allows stakeholders (humans like us) to grasp the problem and its potential solutions. It offers the ability to drill into a section if we want to understand the technical nitty-gritty but doesn’t overwhelm the user with detail at first glance.

It’s a powerful example of how investing in intuitive data visualization and ceding power to data consumers multiplies the impact of your work.

Of course, the tool isn’t perfect. Specifically, the UI after drilling into a lever doesn’t fill the screen and leads to an awkward user experience. But democratizing access to this kind of scientific literature in a way that a non-technical audience can appreciate is perhaps one of the most important ways to tackle misinformation and overcome apathy.

Tackling climate change is possible. We just need to know where to start.

Classifying Labradors with fastai

Although I’ve been a practicing data scientist for more than three years, deep learning remained an enigma to me. I completed Andrew Ng’s deep learning specialization on Coursera last year but while I came away with a deeper understanding of the mathematical underpinnings of neural networks, I could not for the life of me build one myself.

Enter fastai. With a mission to “make neural nets uncool again”, fastai hands you all the tools to build a deep learning model today. I’ve been working my way through their MOOC, Practical Deep Learning for Coders, one week at a time and reading the corresponding chapters in the book.

I really appreciate how the authors jump right in and ask you to get your hands dirty by building a model using their highly abstracted library (also called fastai). An education in the technical sciences too often starts with the nitty gritty fundamentals and abstruse theories and then works its way up to real-life applications. By this time, of course, most of the students have thrown their hands up in despair and dropped out of the program. The way fastai approaches teaching deep learning is to empower its students right off the bat with the ability to create a working model and then to ask you to look under the hood to understand how it operates and how we might troubleshoot or fine-tune our performance.

A brief introduction to Labradors

To follow along with the course, I decided to create a labrador retriever classifier. I have an American lab named Sydney, and I thought the differences between English and American labs might pose a bit of a challenge to a convolutional neural net since the physical variation between the two types of dog can often be subtle.

Some history

At the beginning of the 20th century, all labs looked similar to American labs. They were working dogs and needed to be agile and athletic. Around the 1940’s, dog shows became popular, and breeders began selecting labrador retrievers based on appearance, eventually resulting in what we call the English lab. English labs in England are actually called “show” or “bench” labs, while American labs over the pond are referred to as working Labradors.

Nowadays, English labs are more commonly kept as pets while American labs are still popular with hunters and outdoorsmen.

Physical differences

English labs tend to be shorter in height and wider in girth. They have shorter snouts and thicker coats. American labs by contrast are taller and thinner with a longer snout.

American labrador
English labrador

These differences may not be stark as both are still Labrador Retrievers and are not bred to a standard.

Gathering data

First we need images of both American and English labs on which to train our model. The fastai course leverages the Bing Image Search API through MS Azure. The code below shows how I downloaded 150 images each of English and American labrador retrievers and stored them in respective directories.

path = Path('/storage/dogs')

subscription_key = "" # key obtained through MS Azure
search_url = "https://api.bing.microsoft.com/v7.0/images/search"
headers = {"Ocp-Apim-Subscription-Key" : subscription_key}

names = ['english', 'american']

if not path.exists():
    path.mkdir()
for o in names:
    dest = (path/o)
    dest.mkdir(exist_ok=True)

    params  = {
        "q": '{} labrador retriever'.format(o),
        "license": "public",
        "imageType": "photo",
        "count":"150"
    }

    response = requests.get(search_url, headers=headers, params=params)
    response.raise_for_status()
    search_results = response.json()

    img_urls = [img['contentUrl'] for img in search_results["value"]]

    download_images(dest, urls=img_urls)

Let’s check if any of these files are corrupt.

fns_updated = get_image_files(path)
failed = verify_images(fns)
failed
(#1) [Path('/storage/dogs/english/00000133.svg')]

We’ll remove that corrupt file from our images.

failed.map(Path.unlink);

First model attempt

I create a function to process the data using a fastai class called DataBlock, which does the following:

    • Defines the independent data as an ImageBlock and the dependent data as a CategoryBlock

    • Retrieves the data using a fastai function get_image_files from a given path

    • Splits the data randomly into a 20% validation set and 80% training set

    • Attaches the directory name (“english”, “american”) as the image labels

    • Crops the images to a uniform 224 pixels by randomly selecting certain 224 pixel areas of each image, ensuring a minimum of 50% of the image is included in the crop. This random cropping repeats for each epoch to capture different pieces of the image.
def process_dog_data(path):
    dogs = DataBlock(
        blocks=(ImageBlock, CategoryBlock), 
        get_items=get_image_files, 
        splitter=RandomSplitter(valid_pct=0.2, seed=44),
        get_y=parent_label,
        item_tfms=RandomResizedCrop(224, min_scale=0.5)
    )

    return dogs.dataloaders()

The item transformation (RandomResizedCrop) is an important design consideration. We want to use as much of the image as possible while ensuring a uniform size for processing. But in the process of naive cropping, we may be omitting pieces of the image that are important for classification (ex. the dog’s snout). Padding the image may help but wastes computation for the model and decreases resolution on the useful parts of the image.

Another approach of resizing the image (instead of cropping) results in distortions, which is especially problematic for our use case as the main differences between English and American labs is in their proportions. Therefore, we settle on the random cropping approach as a compromise. This strategy also acts as a data augmentation technique by providing different “views” of the same dog to the model.

Now we “fine-tune” ResNet-18, which replaces the last layer of the original ResNet-18 with a new random head and uses one epoch to fit this new model on our data. Then we fit this new model for the number of epochs requested (in our case, 4), updating the weights of the later layers faster than the earlier ones.

dls = process_dog_data(path)

learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)
epochtrain_lossvalid_losserror_ratetime
01.3924890.9440250.38983100:15
epochtrain_lossvalid_losserror_ratetime
01.1348940.8185850.30508500:15
11.0096880.8073270.32203400:15
20.8989210.8336400.33898300:15
30.7818760.8546030.37288100:15

These numbers are not exactly ideal. While training and validation loss mainly decrease, our error rate is actually increasing.

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(5,5))

The confusion matrix shows poor performance, especially on American labs. We can take a closer look at our data using fastai’s ImageClassifierCleaner tool, which displays the images with the highest loss for both training and validation sets. We can then decide whether to delete these images or move them between classes.

cleaner = ImageClassifierCleaner(learn)
cleaner

English

We definitely have a data quality problem here as we can see that the fifth photo from the left is a German shepherd and the fourth photo (and possibly the second) is a golden retriever.

We can tag these kinds of images for removal and retrain our model.

After data cleaning

Now I’ve gone through and removed 49 images from the original 300 that were not the correct classifications of American or English labs. Let’s see how this culling has affected performance.

dls = process_dog_data()

learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)
epochtrain_lossvalid_losserror_ratetime
01.2550600.7269680.38000000:14
epochtrain_lossvalid_losserror_ratetime
00.8264570.6705930.38000000:14
10.7973780.7447570.32000000:15
20.7239760.8096310.26000000:15
30.6600380.8496960.28000000:13

Already we see improvement in that our error rate is finally decreasing for each epoch, although our validation loss increases.

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(5,5))

This confusion matrix shows much better classification for both American and English labs.

Now let’s see how this model performs on a photo of my own dog.

Using the Model for Inference

I’ll upload a photo of my dog Sydney.

btn_upload = widgets.FileUpload()
btn_upload
English
img = PILImage.create(btn_upload.data[-1])
out_pl = widgets.Output()
out_pl.clear_output()
with out_pl: display(img.rotate(270).to_thumb(128,128))
out_pl

English

This picture shows her elongated snout and sleeker body, trademarks of an American lab.

pred,pred_idx,probs = learn.predict(img)
lbl_pred = widgets.Label()
lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'
lbl_pred
Label(value='Prediction: american; Probability: 0.9284')

The model got it right!

Take-aways

Data quality

If I were serious about improving this model, I’d manually look through all these images to confirm that they contain either English or American labs. Based on the images shown by the cleaner tool, the Bing Image Search API does not return many relevant results and needs to be supervised closely.

Data quantity

I was definitely surprised to achieve such decent performance on so few images. I had always been under the impression that neural networks required a lot of data to avoid overfitting. Granted, this may still be the case here based on the growing validation loss but I’m looking forward to learning more about this aspect later in the course.

fastai library

While I appreciate that the fastai library is easy-to-use and ideal for deep learning beginners, I found some of the functionality too abstracted at times and difficult to modify or troubleshoot. I suspect that subsequent chapters will help me become more familiar with the library and feel more comfortable making adjustments but for someone used to working more with the nuts and bolts within Python, this kind of development felt like a loss of control.

Model explainability

I’m extremely interested to understand how the model arrives at its classifications. Is the model picking up on the same attributes that humans use to classify these dogs (i.e. snouts, body shapes)? While I’m familiar with the SHAP library and its ability to highlight CNN feature importances within images, Chapter 18 of the book introduces “class activation maps” or CAM’s to accomplish the same goal. I’ll revisit this model once I’ve made further progress in the course to apply some of these explanability techniques to our Labrador classifier and understand what makes it tick.

Being Radically Candid amidst the Chaos of 2020

It’d be an understatement to say that so far, 2020 has been a tough year for nearly everyone. From a global pandemic sickening millions of people to civil unrest rocking the United States and beyond, the world seems to have turned upside down.

If you’ve found yourself in a management position during this chaos, you may be wondering how best to navigate the shift of your company to remote work, the mental health of your team, and the need to address systemic racism in your organization.

Applying the concepts behind “Radical Candor” can help you tackle these issues head-on. And anyone—from CEO’s to individual team members—can start using these lessons today to begin effecting change.

What is Radical Candor?

Radical Candor is a 2017 book and management philosophy from Kim Scott, a former manager at Apple and Google. We can sum it up with the following:

Managers should care personally and challenge directly.

Fleshing this out a bit more, Scott created a matrix to show how managers might fall short on either of these goals.

Credit to https://www.radicalcandor.com/our-approach/
  • Obnoxious Aggression: A manager who isn’t afraid to challenge his/her employees but has made no effort to show that he/she cares about them as people or is invested in their success.
  • Ruinous Empathy: A manager who shies away from providing “uncomfortable” criticism out of fear he/she may hurt their employee’s feelings. The vast majority of managers fall into this category, and this is definitely where I naturally land.
  • Manipulative Insincerity: A manager who doesn’t bother to give any direct feedback or show interest in his/her employees’ careers. In other words, the worst kind of manager you could get.
  • Radical Candor: A manager who recognizes that giving direct feedback in a respectful manner is the best way to help his/her employees succeed and who takes the time to demonstrate personal and professional investment in them.

To sum up, promoting a trusting environment where team members aren’t afraid to challenge each other or their manager is the quickest way to organizational success. I can’t imagine how much time and productivity we lose by withholding from someone the feedback they desperately need if they want to improve their performance at work (or anywhere!).

Compound Interest of Continuous Feedback

Scott also emphasizes the importance of real-time feedback. You might typically bottle up all your feedback for Employee Eric throughout the week and then unleash it on him during your regularly scheduled one-on-one. This can backfire for two reasons.

  • Sense of Whiplash: The situation for which you’re giving Eric this feedback—maybe he presented a sloppy demo to marketing on Monday and couldn’t answer any follow-up questions from the team—is now far in the past. Eric might have thought he knocked that presentation out of the park, and he internalized that view for several days before you dashed cold water all over it.
  • Repeat Offender: Even worse, Eric might have already given another presentation in the meantime with the same poor quality and lackluster results.

While these two possibilities should be reason enough to instate give feedback immediately whenever possible, I like to think about real-time feedback as analogous to compound interest. Any armchair investor knows that continuously compounded interest grows at a much faster rate than annual or “simple” interest.

Credit to https://www.fool.com/knowledge-center/compound-interest.aspx

Feedback works the same way. If we frequently give small amounts of both positive and negative feedback, the recipient will compound their growth accordingly.

Radical Candor Today

Ok, this all sounds like a great way to run a company in normal times. But these are not normal times. What lessons can we learn for today?

Caring Personally

In typical circumstances, showing that you care personally about your employees or team members might not be that simple. Some people don’t like to discuss their personal lives at work or make small talk, especially introverts. And building trust organically takes time.

But today, checking in on the personal lives of your co-workers and especially your direct reports isn’t just sanctioned—it’s expected.

When we first moved to WFH, we needed to understand how this sudden shift was affecting those around us in the new virtual workplace.

  • Are they feeling isolated/burnt-out/unmotivated?
  • Do they have kids home from school which affect what hours they can be online?
  • Are they caring for elderly relatives or neighbors that might add to whatever stress they’re already feeling?

These conversations started to crack open the door to discuss feelings and invite vulnerability as the line between personal and professional started to blur.

The riots incited by the death of George Floyd and his death itself also warranted checking in with our colleagues.

  • How are they coping with the sense of unrest roiling our country?
  • How are they feeling in general given current events?
  • Has their neighborhood been looted or burned?
  • Are they safe?

These last questions, especially, are not ones I ever expected to ask in my role as a manager. And while I hope these circumstances will never be repeated, I am grateful that this situation has destigmatized discussing our personal emotions at work and has given me the opportunity to show that I care personally about my team as human beings.

Confronting Racism

Our current crisis also necessitates that we act along the other axis above—challenging directly.

I admit that I fell into the contingent of white people who put our heads in the sand by believing that by simply being “not racist”, we had overcome ingrained biases and systemic prejudice in this country.

The death of George Floyd and the protests sweeping the country were a long overdue wake-up call, and like many of my peers, I took the time to try and educate myself. I’ve been reading “How to Be an Anti-Racist” by Ibram X. Kendi, which has been eye-opening. I’m ashamed that I wasn’t aware of much of the historical context around concepts of race and hadn’t realized how claiming to be “colorblind” actually hurt communities of color by turning a blind eye to racist policies.

Kendi’s proposed antidote is that we become actively anti-racist. We must constantly evaluate our beliefs, actions, and words for unconscious bias. Yes, that sounds exhausting, and it is. But people of color have been exhausted for centuries—from slavery, from blatant discrimination, from the possibility of being shot by the police—so much so that it has taken a toll on their physical and mental health.

And we must adopt this anti-racist attitude in the workplace, as well, by challenging directly. Kim Scott herself provided an example of unconscious racism embedded in the first edition of her book in a recent blog post. The book suggested using a stuffed monkey called “Whoops the Monkey” as a prop in the office to encourage team members to discuss their mistakes. As a white woman, Scott did not realize that being called a monkey is a common denigration targeted at black people, and bringing this symbol into the office, especially as a representation of mistakes, was inappropriate.

Likely she would never have known had not someone spoken up. Only by directly challenging those we see engaging in acts of racism, even unconsciously, can we start to affect real change in mindsets, language, and policies.

Conclusion

Radical Candor offers an actionable framework for managers (or anyone!) to create an open environment where direct feedback is delivered promptly and where empathy establishes a relationship of trust. These aspirations also have immediate applications to the world of 2020—by engaging with our co-workers’ personal needs amidst continuing stress and by directly confronting racial attitudes and policies in the workplace.