ŷ

Jump to ratings and reviews
Rate this book

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy

Rate this book
We live in the age of the algorithm. Increasingly, the decisions that affect our lives--where we go to school, whether we can get a job or a loan, how much we pay for health insurance--are being made not by humans, but by machines. In theory, this should lead to greater fairness: Everyone is judged according to the same rules.
But as mathematician and data scientist Cathy O'Neil reveals, the mathematical models being used today are unregulated and uncontestable, even when they're wrong. Most troubling, they reinforce discrimination--propping up the lucky, punishing the downtrodden, and undermining our democracy in the process.

259 pages, Hardcover

First published September 6, 2016

4,089 people are currently reading
62.5k people want to read

About the author

Cathy O'Neil

9books515followers
Cathy O’Neil is the author of the bestselling Weapons of Math Destruction, which won the Euler Book Prize and was longlisted for the National Book Award. She received her PhD in mathematics from Harvard and has worked in finance, tech, and academia. She launched the Lede Program for data journalism at Columbia University and recently founded ORCAA, an algorithmic auditing company. O’Neil is a regular contributor to Bloomberg View.

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
7,400 (25%)
4 stars
12,559 (43%)
3 stars
7,169 (24%)
2 stars
1,446 (5%)
1 star
316 (1%)
Displaying 1 - 30 of 3,497 reviews
28 reviews24 followers
February 10, 2017
This was such a Malcolm Gladwell take on data science. I think this book touches on an important subject, and people should be aware of the issues O'Neil discusses. But instead of doing a deep dive into the subject, it just felt like a list of bad algorithms with instances of the people they hurt. It didn't contain many examples of "WMDs" that I had not already heard of, and that might be because she cited the New York Times for *like* all her case studies.

As someone who works in the field, I don't think this book was geared towards me. It wasn't technical or specific enough. Knowing her background, I really wanted her to get into the nitty gritty of some of the mathematics behind the algorithms. I know she is capable of doing this, but I think she instead chose to appeal to a wider audience. That's cool... except I don't think she did a great job of that either because this book lacked the context necessary to give people unfamiliar with the field a view of how machine learning and analytics typically work.

Ultimately, I thought the book felt unfocused, and it showed in the conclusion where proposed a series of pretty ridiculous recommendations within the span of about 15 pages. She has strong opinions on the topics she covered in the book. Although I agree with almost all of them, that isn't the point. The book is supposed to be about the algorithms, and instead she takes us on a tour of a collection of business and public policy malpractice, stating that the solution is to "encode values into our algorithms." Wut. Leaving the logistics of this aside... if the people doing the coding don't share your values (and they clearly don't), why would they do that? O'Neil herself noted a misalignment of incentives in many cases, particularly where the data work has been contracted out to other parties. Telling them to have values and trash their contracts is obviously not gonna fly! You can't fear-monger the whole book and then think everything will be happy sunshine rainbows once we take the data science Hippocratic oath.

I don't think there are easy answers, and I think it is ok to admit that. The takeaway should have been that data science isn't better than humans... because it is humans. We made the algorithms, so they run the gamut of use cases and demonstrate all the shades of gray we exhibit ourselves (yes, all fifty of them.)

TL;DR - Boo. I expected more from you, O'Neil.
Profile Image for Yun.
603 reviews32.9k followers
February 28, 2022
I'm a data person. I pride myself on being logical and looking at the numbers before making decisions. And for quite a few years, I worked at a data visualization company and was a self-professed data geek. But can more data actually lead to worse results? That is what Weapons of Math Destruction tries to understand.

Insightful and timely, this book provides a detailed look at how algorithms based on big data don't always tell the truth or lead to a more fair world as they are purported to do. Rather, they contribute to a system that is opaque and hard to challenge, increasing the divide between the privileged and everyone else.

Each chapter provides a thoughtful exploration of an area where big data is supposed to be helping, such as college rankings, recidivism of convicts, applying for jobs, and getting loans. Algorithms in each area help define which are the best colleges, which convicts are most likely to reoffend, what personality types are best suited for a job, and who should get the best interest rates. That sounds useful, right?

But unfortunately, ideal data is not always available, so bad or irrelevant data is often used instead. And the resulting predictions are treated as gospel, increasing efficiency of the system, but harming those caught on the wrong side. It hurts a segment of the population while providing the rest of us with the false belief that fairness and justice has being done. In many cases, the algorithms' predictions create a negative feedback loop, directly influencing the outcome they were objectively trying to determine.

I found this book to be interesting and relevant. It really goes to show that your predictions are only as good as the data you've got. Whether you're a data geek like me or you just want to learn a little more about big data's potentially harmful effects, this is a worthwhile book to check out.
Profile Image for Trish.
1,413 reviews2,683 followers
April 16, 2017
O’Neil deserves some credit right off the bat for not waiting until her retirement from the hedge fund where she worked to tell us the secrets of how corporations use big data (our data). Underlying the collection and use of big data is an attempt to utilize efficiencies in the market place for goods, money, and talent. Big data ostensibly can also “set us free� from time constraints and uneven knowledge dispersal. Conversely the opposite is often true. We are at the mercy of how our own data is shredded and packaged, and errors in the model can mean mutually assured destruction—for the school, corporation, family.

The book starts with examples any readers who actually picked up this book to read might recognize: the chances of getting into a major university. O’Neil doesn’t go into the actual algorithms but just explains the variables chosen to populate the algorithms. Just when I was wondering who this book is targeted at, since after all, we kind of know how to get into university already, she comes up with examples of big data messing with aspirations that are still (hopefully not) in our futures.

She addresses the real pain-in-the-ass nature of minimum wage jobs where the inadequate part-time hours are constantly changing to maximize profits for owners and to screw with employees ability to plan their life, their children’s lives, and the children’s caretaker’s lives. O’Neil addressed the situation in 2009 when Amex decided to reduce the risk of credit card nonpayment by reducing the credit ceilings on users who shopped at certain stores, like Walmart. She shows us the way micro-targeting ends up using data to perpetuate inequities in opportunity and “social capital.�

The hardest part of reading this book (there is no actual math), was keeping my mind on what O’Neil was saying. Every time she'd mention another example of the ways big data was screwing us over, my mind would wander to experiences of my own, or ones I’d heard from friends, family, or others. This is real stuff, and just when I thought that it would be an excellent book for those with skills and interest in social justice to take to an interview with Google, Amazon, or a big bank, in she comes with another example of how the “fixes� are almost worse than the disease (Facebook’s method of who your friends are determining your credit risk).

But O’Neil reminds us big data, mathematics, algorithms, etc. aren’t going to go away.
"Data is not going away. Nor are computers—much less mathematics. Predictive models are, increasingly, the tools we will be relying on to run our institutions, deploy our resources, and manage our lives. But as I’ve tried to show throughout this book, these models are constructed not just from data but from the choices we make about which data to pay attention to—and which to leave out. Those choices are not just about logistics, profits, and efficiency. They are fundamentally moral."
Exactly. We still have to use our brains, not just our computers. It is critical that we inject morality into the process or it will always be fundamentally unfair in some way or another, especially if the intent is to increase profits for one entity at the expense of another. One simply can’t include enough variables or specifics. Some universities have begun to audit the algorithms—like Princeton’s Transparency and Accountability Project—by masquerading as people of differing backgrounds and seeing what kind of treatment these individuals receive from online marketers.

O’Neil suggests that sometimes data might be used to good effect by targeting frustrated online commenters with solutions to their issues: i.e., affordable housing info, or by searching out possible areas of workplace or child abuse and targeting that area with resources. She wades into national election data and notes that only swing states get candidates attention, suggesting, by the way, that the electoral college has outlived its usefulness to the citizenry. Algorithms are not going to administer justice or democracy unless we find a way to use them as a tool to root out inequities and try to find ways to deliver needed services where they are deficient.

When I look at the totality of what O’Neil has discussed, I am inclined to think this book is best targeted to thoughtful high schoolers and college-aged students who are thinking about planning their careers, who have a penchant for mathematical and computer modeling, and who think their dream job might be with an online giant. I’d be happy to be disabused of this notion if someone wants to challenge my thought that much of this information is known to many of us who have been out of school for awhile and who have been paying attention to our online experiences and junk mail solicitations. But it is always interesting to read someone as coherent and on the side of social justice as Ms. O’Neil.

It might be noted that also talks about the use of big data to steer our thinking and makes a preliminary suggestion that individuals should be paid for their data—for data that is collected about them, for profit. It is an interesting discussion as well. Love these intersections of technology and humanity.
Profile Image for Leo Walsh.
Author3 books126 followers
April 5, 2017
Captivating. Insightful. And important. A 50,000 foot view of how automated big-data is a great tool for understanding human nature. How it has great promise to make our lives easy. And yet, a very real takedown of how systems engineers -- and corrupt power-seekers, like corporate executives and for-profit universities -- misuse this powerful tool. And the even worse cases where people start with good intentions, like ridding school systems of bad teachers, only to toss out "false negatives."

I found Cathy O'Neil's "Weapons of Math Destruction" a very important book that highlights a lot of what's been going on in America over the past 30 or 40 years. For instance, I just finished "The New Jim Crow," and wondered how the Supreme Court would continue to rule in favor of crime policing tactics that 1) target poor urban areas -- populated by mostly black men, and 2) allow the police the ability to stop people, willy-nilly, that they found "suspicious" -- despite the fact that minorities are caught in stop-and-frisk situations while most are innocently going about their lives. Problem is, many of these people going to jail for nuisance crimes -- possessing small amounts of marijuana, open containers, driving on expired tags, etc. Things that seem the exact definition of "systematic racism."

But when O'Neil lays out how systems engineers have written algorithms that send police to "hot spots," those rulings make sense. For instance, large crimes that we all want to stop -- car thefts, burglaries, rapes, assault and murder -- are rare. While petty and nuisance crimes -- jaywalking, possessing weed, noise violations, vandalism, etc. -- common. Based on the law of large numbers, a program trying to optimize "broken windows" policing would send more officers to a district with a higher concentration of people, typically ghettos. Which leads to more petty arrests in those areas for crap that doesn't matter much. Which sends even more police into these areas, making more arrests for petty offenses... and so on and so on.

In short, these ghettos are caught in a negative feedback loop. And the residents more likely to end up in jail for something stupid. Like smoking dope which, based on most evidence, is just as high among whites than blacks. So a white 19-year-old frat boy at Ohio State can smoke-up at will, with little possibility of being caught. While a nearby ghetto-dweller, who works maintenance or in the office at the university, or attends the university while living at home, will have a greater likelihood of being arrested. Just based on where they live.

Same crime -- use of narcotics. Two different outcomes.

That is what O'Neill terms as a "weapon of math destruction." Since it is pervasive, destructive, and opaque.

And things get worse if the two hypothetical dope-smokers get arrested. Odds are, the white frat boy's parents live in a safe suburban neighborhood and thus knows zero convicted felons. So his court-appointed "recidivism" score -- attained by another destructive math weapon -- will be lower than the ghetto dweller's who lives near many felons. Thus, the courts may lessen the frat boys charge to a misdemeanor while charging the ghetto dweller with a felony. Due to that recidivism score.

Talk about kicking someone when they're down.

And when both released on parole, and ordered to "steer clear of felons." the frat boy will have no problems following this. While the ghetto kid, thanks in part to the policing software noted above, will be surrounded by them. Which, of course, adds to the already disadvantaged the further risk of being pegged as a parole violator.

Smack.

O'Neill lays out other ways that system engineers perpetuate injustice. She takes-down for-profit universities -- like the University of Phoenix -- who actively target poor people with aspirations. Not to help, since a University of Phoenix degree costs tens of times more than a community college degree, while adding less salary. Instead, University of Phoenix and their ilk exist to cash in the student loans guaranteed by the government.

Yep. Poor people, who don't know any better, are targeted by companies that give them less while charging them more. Using data-driven web advertising, another Weapon of Math Destruction, all for a quick buck.

These are just a few WMD's that O'Neill examines. She looks at others credit scoring, e-scoring (a sort of electronic "credit score" derived on you based on your social media friends and activities), the USA Today college rankings, which have lead to spiraling tuition costs while providing questionable value.

All of these WMD's lead to increasing social stratification. And, in many ways, drive the "winner take all" nonsense that gives big money to a handful of developers who program an app.

But the nice thing is that O'Neill ends by providing valuable insight into how, when properly used, deep-dive statistics can actually help people. For instance, O'Neill was part of a task-force that examined New York City's homeless. They uncovered the single unequivocal variable that would keep people off the streets -- access to Section 8 housing. And once these people were housed, they'd move on to get jobs --since having a stable residence makes it easier to gain employment. And thus, less likely to end up on the streets. And all this research came at a time when Mayor Bloomberg was contemplating reducing Section 8 housing. So it's prove important.

She also points to other positive uses of algorithms -- all of which point to putting our compassionate, human-based morality ahead of the appearance of objective, measurable efficiency -- with "appearance" being the operative word. Since O'Neill makes the cost of following these algorithms clear.

All-in-all, "Weapons of Math Destruction" is the best science book I've read over 2016. Since it focuses not only what we can do -- the science and how it makes things more efficient -- but forces us to focus on the ethics, the "why" we may choose a less efficient alternative as it may be more just. Especially when blindly accepting a model often degenerate to "pseudoscience." And that anti-scientific narrative can be amplified if the person wielding the WMD is either greed or malicious.

5-stars.

That said... YAY!!! This is my 80th book of the year. So I've just completed my goal.
Profile Image for Clif Hostetler.
1,230 reviews947 followers
January 18, 2019
"Welcome to the dark side of big data." Thus the author concludes the Introduction section of this book. Computers and the internet have enabled us to advance into the new world of algorithms and big data with ramifications that most people are unaware of.

Surfing the web, clicking "like" in Facebook, Googling (i.e.searching on line), and making online purchases are common examples where big data is either tracking and potentially impacting our lives. Some of these are benign and can be helpful. But this book focuses sharply in the other direction, on the "harmful examples that effect people at critical life moments, going to college, borrowing money, getting sentenced to prison, finding and holding a job. All these life domains are increasingly controlled by secret models wielding arbitrary punishments."

One of the most shocking pieces of information provided by this book are the examples of how big data contributes to economic inequality. It is devastating how efficiently private colleges and payday loan companies target the economically stressed portion of the population. It's also astounding how frequently limitations and flaws of big data are ignored and substituted for the truth. It's almost comical how big data systems can be manipulated (a.k.a gaming the system) by clever institutions and companies.

Whether we like it or not this is the new world in which we all live. Citizens of this new environment need to become knowledgeable about how big data works. Otherwise we will all be its clueless victims.

The following are some quotations from the book.

The following are the closing sentences of the Introduction:
Big data has plenty of evangelists, but I'm not one of them. This book will focus sharply in the other direction, on the damage inflicted by WMDs and the injustice they perpetuate. We will explore harmful examples that effect people at critical life moments, going to college, borrowing money, getting sentenced to prison, finding and holding a job. All these life domains are increasingly controlled by secret models wielding arbitrary punishments. Welcome to the dark side of big data.
In the following the author defines some of the shortcomings she observed in how big data was being used.
More and more I was worried about the separation between technical models and real people and about the moral repercussions of that separation. In fact I saw the same pattern emerging that I had witnessed in finance, a false sense of security was leading to widespread use of imperfect models, self-serving definitions of success, and growing feedback loops. Those who objected were regarded as nostalgic Luddites.
The author worked at a hedge fund during the 2008 financial collapse. Thus when she moved into the field of consumer data modeling she looked for flaws in the use of data that were similar to what led to the credit crisis.
I wondered what the analog to the credit crisis might be in big data. Instead of a bust I saw a growing dystopia with inequality rising. The algorithms would make sure that those deemed losers would remain that way. A lucky minority would gain evermore control over the data economy raking in outrageous fortunes and convincing themselves all the while that they deserved it.
The following is the author's summary near the end of the book.
In this march through a virtual lifetime we've visited school and college, courts and the work place, even the voting booth. Along the way we have witnessed the destruction caused by WMDs. Promising efficiency and fairness, they distort higher education, drive up debt, spur mass incarceration, pummel the poor at nearly every juncture, and undermine democracy.
The author, Cathy O'Neil is highly qualified (she got her Ph.D. in math at Harvard), had work experience from inside the system (a “quant� at D.E. Shaw—a major hedge fund), and has evolved into a "Occupy Wall Street" activist. Thus she has experience and qualifications needed to explain how big data effects our lives. It's needed information.

Author's Ted Talk:


Cathy O'Neil was the winner of the 2019 MAA Eurler Book Prize:
Profile Image for Nilesh Jasani.
1,151 reviews220 followers
November 27, 2016
For the most part, WMD is a rant with only impractical statements as solutions.

The author is onto something critically important when one reads the title of the book and goes through the first few pages. However, it is a tragedy to see the author falling in love with her own phrase WMD and completely losing the plot. The examples used are good in the beginning but soon turn ridiculous (they would be laughable if not so lamentable for the people involved). In the process, the author loses her credibility as a champion of the topic.

Consider the following example: a person is unable to find a job as he claims he fails all pre-interview personality tests. The author debunks the objectivity of tests in favor of the subjectivity of interviewers with the claim that in another world, one or the other interviewer would have given the job to this person. The author never thinks about the interviewers' non-transparent and more difficult to fathom biases wreaking greater damage in that world versus the current one where the biases are at worst in models that are far easier to scrutinize and amend. The author failts to realize that every world with fewer desirable jobs compared to applicants will leave some unhappy - the fairness is perhaps far better where the selection process is more equal for all.

In a way, the author wishes "we" to stand up and help reduce the use of math. The author is clear in the "reduction" goal without ever realizing how "we" would ever decide to what extent the reduction should take place and whether the constituents of "we" could ever agree on the replacement. The repeated call for subjective biases assumes that the biases come in only one form and are acceptable to all. The absurdities reach a peak when the author declare that at least subjective biases evolve over time and hence should be preferred while the mathematical models are static!

The author's hatred for anything involving models not only creates a highly one-sided discourse but also leads to inadequate treatment of any important points within. The only mathematical concept discussed somewhat is that of the circularity caused by the self-reinforcing feedback in multi-equilibrium equations leading to solutions that might be locally optimum but not globally (all my words). Otherwise, the book almost exclusively focuses on the unsuitability of results arrived through statistical sampling on any individual case.

To summarize, the author seems to yearn for insurance companies to abandon actuarial methods, schools to move away from test scores for admissions and financial institutions/tech companies/governments to reduce their use of numbers. The author wants humans to take decisions that these models might be taking using their value judgments. Or the world to simply not take those decisions.

There is so much sensible that needs to be discussed on the damage created by the excessive use of math in the modern world but this book is not the best place to explore it.
Profile Image for Trevor.
1,463 reviews24k followers
August 26, 2018
We like to think of mathematics as basically pure and free from the nastier side-effects of human nature. And this purity rubs off � so that the closer a science is to being able to be described in numbers, the more highly we regard it � so, physic is seen as somehow higher than biology, and economics than anthropology. That is, if you can predict behaviour on the basis of an algorithm � whether that be the behaviour of a billiard ball or a homeless person � then this is proper science and it has a claim to a kind of objective truth that sets it aside from being challenged.

And that is the point of this book � it seeks to help shatter this illusion, particularly in relation to the human sciences, but also, and perhaps more importantly, in relation to marketing, insurance, policing, education and other social activities that are increasingly being modelled and even normalised by algorithms. She refers to these algorithms as the weapons of math destruction in the title (the title is better in English, of course, where we say maths, rather than math, but her point stands).

The destruction such algorithms can cause became clear to her while she was working in finance just before the 2008 crash. It certainly isn’t that she sees maths itself as being the problem, she refers to herself almost immediately as a kind of math nerd and proud of that designation. Her point is that these algorithms are dangerous because we tend to think of them as being purely objective and therefore the results they provide as being beyond question. And this state is helped along by more than just the fact we hold mathematics in such high regard. It is furthered by the fact that often the algorithms that spit out these assessments of us are obscure, unfair and grow exponentially. These are the three conditions that the author believes makes an algorithm a likely WMD.

So, in looking at these in turn, is the algorithm is obscure? � and most of them tend to be, as she says at one point, they are the ‘special sauce� for many companies and so they need to remain hidden from the competition. For instance, if you have an algorithm that allows you to predict who is going to make an ideal partner for someone else, then your dating site is going to make you lots of money. You are hardly going to want to let your competition know you secret. But the problem is that by keeping your algorithm obscure and hidden from outside analysis, you can say nearly anything you like about its effectiveness, if no one is then able to check. You know, this is a version of the ‘they can vote any way they like, as long as I get to count� story.

But it isn’t just outright fraud that is the problem here � although, that doesn’t mean fraud isn’t a problem. Just as bad is the idea that often these models bury their false negatives. That is, if the algorithm says ‘never employ anyone with green eyes, they are under-performers� the company that follows this advice is unlikely to ever find out if this is true or not. That’s because they won’t have employed anyone with green eyes to test the model � so the model will be confirmed by default. The problem is that many of the algorithms used � say in policing � can encourage over-policing of certain populations (this is set in the US, so let’s just call those certain black, Hispanic and Muslim populations to save time) and this over-policing, by defining certain populations in these ways are also likely to create the monster they were supposed to be eliminating. A nice example is an algorithm that denies jobs to people according to their credit scores, which then means these people are less able to pay off their debts, which gives them a worse credit score, which then confirms the risk � vicious cycle anyone?

Other examples focus on the use of psychometric tests for all manner of things, but increasingly as a pre-employment test. You might think these ought to be relatively easy to game � you know, how stupid would you have to be to answer ‘very true� to the prompt, ‘I sometimes fly off the handle for no real reason�. You know, unless you are going for a job in as a wrestler, your employer is probably not going to be overly impressed with that answer. But as she points out, sometimes applicants, for example, for a job at McDonalds, are asked to choose between one of two alternatives: “It is difficult to be cheerful when there are many problems to take care of � or � Sometimes, I need a push to get started on my work�. I have no idea what the ‘right� answer to that choice is. I’m not even sure which of the two really applies to me � although, there have been times in my life when both of them have, and in your life too, I suspect. That is, it isn’t at all clear what this is seeking to achieve, but it is clear that there is likely to be a ‘wrong� answer, in the sense that making that ‘choice� might leave you without a job.

But even this level of obscurity isn’t her main worry (at least you know you’ve been asked a question here that might be used against you) � but it is too often easy to correlate factors to ensure that certain people are excluded due to their sex, sexuality, race, social class and so on, merely by the underlying assumptions of those programming these algorithms � and the algorithms aren’t ‘objective� examples of the purity of mathematics, but rather socially produced ephemera that are likely to have been shaped by the social stereotypes of the society and the people who produce them.

The author mentions that curious fact that orchestras now appoint five times as many female players since auditions have been held with the player behind a curtain. Who’d have thought women would play so much better when they were hidden from sight? Such shy and retiring little things, bless them�

The solution is to ensure that algorithms used to judge us are open and available for anyone to check � a condition that will become increasingly unlikely as these algorithms are proprietary.

The question of fairness isn’t at all easy to address and for many of the reasons already mentioned. One of the examples given is teacher scores being used to determine who should get pay increases and who should be removed from the profession. Basically, the idea is to compare student outcomes so that the teachers who do not increase student scores enough should be shown the door. But, as is pointed out here, student scores aren’t only affected by teacher performance. And worse, if you are teaching students who are either very far behind or very far in front you are unlikely to affect as big a movement in their scores as you might if you are teaching kids in the middle. If you are going to be assessed on the basis of a score, that score ought to be related to something you have control over, rather than merely something that is relatively easy to measure. The author points out that far too often the kinds of assessments made of teachers based on student attainment produce virtually random results year on year. Since it is very unlikely that an exceptional teacher will become a very poor teacher from one year to the next, then any assessment that produces such a result ought to be suspect. And this is also true of loan application algorithms that judge you based upon the people you associate with. For instance, I’ve read a few times lately that your credit rating can be impacted by your ‘friends� on FaceBook, but even if this is not literally true, algorithms are shown here to judge you by many factors that assume associations between you and other people more or less likely to be a credit risk. For instance, if you buy things in certain stores, as American Express admitted recently they restricted access to credit to people based on them frequenting particular stores.

The last condition of a WMD is that the algorithm can be scaled up to cover large populations. An algorithm that works well in one set of circumstances might be both transparent and fair within its limited application, but because it is then being used across a larger population it might suddenly stop being fair or reasonable. This is because it creates a new norm in the population at large and that might well have seriously disadvantageous impacts on populations beyond the one it was originally intended to be used upon. Again, the examples I jump to tend to be associated with education, where large scale testing programs have a disproportionate impact on poor communities which are defined as failing and then ‘success� becomes defined as ‘doing well on the test� � so that the tail starts wagging the dog. And this then has impacts on how kids get taught � if you are only going to be measured by test results, then we should just prime you for taking tests. Which then makes education as boring as it is possible to make it for kids that were already struggling to see the point of education in the first place. But the communities that do well on these tests are normally the already advantaged and they suffer none of these negative impacts, because, well, they do well on these tests, so no point priming them on more of them…Let them do art.

I really would recommend this book � she gives lots and lots of examples and it is vitally important that we understand that this is the world we are increasingly moving towards. More and more of our lives are going to be influenced by algorithms and big data � and yet too many of us are so terrified of mathematics that we will blame ourselves when these algorithms punish us. It isn’t at all clear to me how we might go about making these algorithms transparent, fair or limited to a scale that keeps them safe � but these are questions we really ought to think about and act upon.
Profile Image for Mario the lone bookwolf.
805 reviews5,298 followers
December 11, 2018
Is it legitimate to reduce people to the data that can be extracted from them?

Please note that I put the original German text at the end of this review. Just if you might be interested.

Especially the predictions made possible by big data are a frightening aspect so that behavior and personal development can be predicted with increasing probability. Also, like any artificial intelligence, the algorithms and programs become more and more efficient, both in parallel with the growing amount of data about the individual and through the optimization of the functioning.
Whether habits, buying behavior, Facebook Likes, movement patterns, writing style, illnesses, search machine inputs, political activities, professional aspects or social contacts, everything flows together in a large memory.

Particularly explosive is the ability to accurately predict the electoral behavior of specific groups of the population and thereby be able to address election advertising even more targeted. It also opens up the option of manipulating groups, which are more likely to be responsible for the competition camp, utilizing disinformation and misinformation in their awareness-raising and decision-making. Even outside of elections, the data volumes offer promising perspectives for influencing opinions, facts, trends and mood barometers.

The subtle, digital prompters are hardly recognized as such. The information is automatically trimmed in the form of auto-completion and suggestion functions, the individually compiled search results and purchase recommendations based on the previous buying behavior. Moreover, directed to the individual interests or the opinion to be shaped. Even what you type in and then delete again or also revise and change, is saved.

This can be considered optimistic as practical, time-saving and relieve and helper at work. Alternatively, as an indirect kind of influencing, over which only the compulsory consumption is promoted as optimized as possible. As an alternative to the offer of tailor-made products, there is the option of deliberately focusing just on specific information and news. So that unpopular and annoying reports, opinions and activists are intentionally not even displayed in the higher result ranges. To merely skip the scissors in the head and defamation immediately and proceed directly to censorship.

Perfide is the playful way in which free offers lure the consumer. For example, a health application for smartphones, with which one competes in real time with friends, in which insurance and health insurance should also be interested in. These hands-on free programs provide a fun and competitive and user-driven option for businesses to access users' location, interest, and behavioral data.

The book is an excellent appetizer, with the benefit of awakened interest in possible deepening of the study of the topic by further and detailed, related literature.
Given the overall situation a worthwhile endeavor. Contrary to the apathy and disinterest of large populations owed the fact that it could come too far. So without long-awaited outcry across all layers of civil society, much more could go on.

In Western countries, social networks, search engines, and Internet merchants are only interested in manipulating buying behavior and collecting data to monetize it. The direction of China with the Citizen Score is likely to find imitators. Then, the arbitrary and negligently disclosed data would not anymore only decide the price of a product or the next banner ad. However, about your future and maybe your survival or death. For example, if you criticize the government.

Ist es legitim, Menschen auf die Daten zu reduzieren, die sich aus ihnen extrahieren lassen?

Gerade die durch Big Data möglich werdenden Prognosen stellen einen erschreckenden Aspekt dar, lässt sich Verhalten und persönliche Entwicklung auf diese Weise doch mit immer höherer Wahrscheinlichkeit voraussagen. Und wie jede künstliche Intelligenz werden die Algorithmen und Programme sowohl parallel zu den wachsenden Datenmengen über die jeweilige Person als auch durch die Optimierung der Funktionsweise immer effizienter. Ob Gewohnheiten, Kaufverhalten, Facebook Likes, Bewegungsmuster, Schreibstil, Krankheiten, Suchmaschineneingaben, politische Aktivitäten, berufliche Aspekte oder Sozialkontakte, alles fließt in einem großen Speicher zusammen.

Besonders brisant ist die Möglichkeit, Wahlverhalten bestimmter Bevölkerungsgruppen präzise vorhersagen zu können und dadurch Wahlwerbung noch zielgerichteter adressieren zu können. Es öffnet auch die Option, eher dem Konkurrenzlager anzurechnende Gruppen gezielt mittels Des- und Falschinformation in ihrer Bewusstseinsbildung und Entscheidungsfindung zu manipulieren. Auch abseits von Wahlen bieten die Datenmengen verheißungsvolle Perspektiven für die Beeinflussung von Meinungen, Fakten, Trends und Stimmungsbarometern.

Dabei werden die subtilen, digitalen Souffleure nur schwerlich als solche erkannt. In Form von Autovervöllständigungs- und Vorschlagfunktionen, den individuell zusammengestellten Suchergebnissen und auf das bisherige Kaufverhalten angelehnte Kaufempfehlungen werden die Informationen automatisch beschnitten. Und auf die individuellen Interessen oder die zu prägende Meinung hin gelenkt. Selbst was man eintippt und dann wieder löscht oder noch überarbeitet und verändert, wird gespeichert.
Das kann optimistisch als praktisch, zeitsparend und die Arbeit erleichternd betrachtet werden. Oder als eine indirekte Art der Beeinflussung, über die vorerst nur der Konsumzwang möglichst optimiert gefördert wird. Alternativ zur Offerierung der auf den Leib geschneiderten Produkte bietet sich die Option, gezielt nur bestimmte Informationen und Nachrichten in den Fokus kommen zu lassen. Damit missliebige und lästige Berichte, Meinungen und Aktivisten gezielt gar nicht erst bis in den höheren Ergebnisrängen angezeigt werden. Sondern die Schere im Kopf und Diffamierung gleich zu überspringen und direkt zur Zensur voranzuschreiten.

Perfide ist die spielerische Art und Weise, mit der der Konsument durch Gratisangebote geködert wird. Zum Beispiel eine Gesundheitsapplikation für Smartphones, mit der man in Echtzeit mit Freunden konkurriert, was Versicherungen und Krankenkassen auch interessieren dürfte. Diese praktischen kostenlosen Programme ermöglichen eine spielerische und auf Wettbewerb und den Ehrgeiz der Anwender zielende Option für Unternehmen, an Standort-, Interessens- und Verhaltensdaten der Benutzer zu kommen.

Das Buch ist ein guter Appetitmacher, mit dem Interesse an einer etwaigen Vertiefung der Beschäftigung mit dem Thema durch weiterführende und auf einzelne Aspekte detaillierte eingehende Literatur geweckt wird. Angesichts der Gesamtsituation ein erstrebenswertes Unterfangen. Entgegen der, der Apathie und Desinteresse breiter Bevölkerungsgruppen geschuldeten Tatsache, dass es so weit kommen konnte.Und, ohne längst fälligen quer durch alle Schichten der Zivilgesellschaft gehenden Aufschrei, noch viel weiter gehen könnte.

In den westlichen Ländern sind die sozialen Netzwerke, Suchmaschinen und Internethändler nur an der Manipulation des Kaufverhaltens und dem Sammeln von Daten zu dessen Monetarisierung interessiert. Die Richtung, die etwa China mit dem Citizen Score geht, dürfte Nachahmer finden. Dann entscheiden die willkürlich und fahrlässig preisgegebenen Daten nicht nur mehr über den Preis eines Produkts oder das nächste Werbebanner. Sondern über die eigene Zukunft und vielleicht auch das eigene Überleben oder Sterben. Wenn man etwa die Regierung kritisiert.
Profile Image for Suzanne.
1,773 reviews
September 7, 2016
This book did a nice job describing large-scale data modeling and its pitfalls in a very accessible manner. It is so easy to think of computer algorithms as unbiased; however, the author demonstrates how they really do discriminate. Next time I teach a class involving statistics, I may use this book to show students how it is dangerous to blindly believe the numbers.
Profile Image for David Rubenstein.
847 reviews2,745 followers
December 10, 2017
The subtitle of this book, How Big Data Increases Inequality and Threatens Democracy really says it all. Big data has come into our lives in numerous ways, and many of them are a scourge on our lives. Big data, in and of itself, is not to blame, but the uses to which it is put are often outrageous. Take the case of automated teacher evaluations. These are often based on the improvement of students' scores. It seems like a no-brainer, and since the scores take into account the improvement rather than the absolute scores, they seem to be very fair. However, one New York teacher received an abysmal score of 6 (out of 100) one year, and the following year received a wonderful score of 96. Obviously the teacher did not suddenly improve his teaching methods.

If a mathematician were to analyze the scores from such evaluations, their random scatter would be instantly recognized as meaningless. Yet, these automated evaluations are used for hiring/firing decisions, as well as compensation decisions. The worst aspect, is that there is no attempt to improve the algorithms. The algorithms used in these evaluations are opaque, and no attempt is made to apply feedback to tweak the scores to make them more accurate and fair.

Such algorithms are used in many avenues of life. Credit scores are used by loan companies--and even by auto insurance companies! Police departments use algorithms to plan targeting of police activity, while courts use algorithms to predict recidivism. Colleges, especially for-profit colleges, use algorithms to target potential students.

While this book comes across as very preachy, and very liberal, it does make some very good points. Big data is often used--intentionally or not--to punish the down-trodden and to increase inequality. Profitability is usually the goal, and while fairness is often the purported goal, these algorithms rarely turn out to have the effect of being fair.

I recommend this book to people who are curious about the effects of big data on their lives and on democracy.
Profile Image for Emily.
754 reviews2,508 followers
February 7, 2017
Big Data is opaque, complicated, managed by profit-seeking corporations, and is more and more dictating certain societal conditions: from getting a job to applying to college to receiving healthcare. "Data," on its own, seems amoral, a way to implement systems that are more fair. But O'Neil's point in this book is that all algorithms include basic assumptions, and sometimes those basic assumptions are full of bias and not grounded in fact. If the algorithms aren't regularly inspected, they create their own feedback loops. As O'Neil says, "In each case, we must ask not only who designed the model but also what that person or company is trying to accomplish."

This book touches lightly on a number of topics to give an overview of the algorithms that O'Neil finds the most objectionable. Each of the different sections is interesting, but they don't delve particularly deeply into the topic; for example, if you're interested in how poor feedback loops on policing and prisoner recidivism work together to create our prison industrial complex, you're better off reading The New Jim Crow. But if you're interested in an introduction to the ways in which data runs our lives, this is a good place to start.

I was particularly interested in the section on college admissions, because O'Neil ties skyrocketing college tuition prices to the US World News ranking (the first ranking for universities in the US). In 1988, journalists used a number of proxies to create a way to rank colleges against each other. This example hits on a few of the points that O'Neil brings up again and again: the journalists wanted proxies that would reaffirm their own biases, so Stanford and Harvard had to be at the top of the list. But they weren't able to fully factor in the things that mattered, like the quality of education, so they relied on proxies like admissions rate and freshman retention rate. Cost was conspicuously excluded, and thus - or so O'Neil says - college tuition has increased 500 percent between 1985 and 2013. Colleges are gaming the system to get themselves back up to the top of the list, but then are passing those costs directly on to the students who attend.

While this book is nothing groundbreaking, it's another reminder that tech HAS to do better if we are going to put more of our personal data - our lives - into its hands. I personally believe we're all being swindled by the "sharing economy" (heavy quotes there), and the example about Lending Club hammers that home. O'Neil uses Lending Club as an example of a service that was supposed to "democratize" loans, but it's quickly devolved into 80% institutional money: that is, money from big banks. Why? Banking has regulations, and banks aren't legally allowed to discriminate against consumers on the basis of e-scores and other algorithmically created scoring systems that take into account factors like zip code, punctuation on applications, and social networks. These institutions have found a way to get outside of the law, apply their sketchily drawn "data" models (that have no statistical basis), and reap the rewards, while further exacerbating class inequality.
Profile Image for Scott.
34 reviews4 followers
October 25, 2016
Book reviews are all about expectations, and honestly I, as someone doing data science and grappling with issues, expected more. With a data scientist writing a full length book inditing data science one expects a deep dive revealing real points. Instead it ends up being a very surface level essay without the deeper exploration and meaning one expects from a full length work. Perhaps more worrisomely, her own definition of a WMD it introduces is often worked around to bring in arguments she wants to make.

I do agree with her main argument/definition of a Weapon of Math Destruction(WMD), though the name is far too punny for real use. More or less the author argues that bad algorithms are secret, and expand out rapidly without checks on actual results. This is hardly a novel problem statement, but it is certainly a real problem. However several times she applies shortcuts of assumed results to actual results in evaluating presumed WMDs which is the very same problem she is complaining about. She also seems to have added an unspoken fourth criteria being anything that offends her liberal (and she is very liberal, and interventionist) ideals. This fourth criteria is apparently enough to give a pass to what are clearly bad algorithms that fail her own tests. This is needless to say, not a great way to prevent issues with unethical use of algorithms, unless you happen to have the same ideals and goals as Cathy O'Neil.

Somewhat surprisingly, this is a data science book without any data. It doesn't even have in text footnote (at least in my kindle edition, though they do sometimes update that), and the notes section at the end shows the citations to be really quite poor and sparse. We are talking college freshman "The professor said 10 citations so I better cite a bunch of random webpages" level of citations. To be fair her arguments are only weakly data driven, but I still expected something with a deeper academic backbone.

This is also a data science book that won't teach you anything about data science. If you are looking for that you should look elsewhere. The author has co-written another book , and it shows her quite capable of handling the topic. I do wish she had brought more of that into this book, which will be many people's first book in data science.

The structure of the book plays out more as a series of related blog posts than a book that builds on themes and has progression. Partly this may be an unfortunate result of the author being a blogger, but it leaves one wanting more.

Profile Image for Maru Kun.
221 reviews550 followers
August 22, 2017
Forget those cute pastel illustrations from the fifties with their flying cars, robot servants and dreams of unlimited leisure. Our future has finally arrived and for most of us, especially for the less rich and less privileged who won’t qualify for individualized attention, the computer says “No�.

‘Weapons of Math Destruction� is a timely book about the increasing influence of algorithms to control the news we see, the jobs we can get and the politicians we vote for; algorithms working tirelessly on someone’s behalf (not yours), unseen and unaccountable.

The book explains statistical and methodological problems with these algorithms and illustrates how these same problems manifest themselves when they are applied to real world situations.

An example gives a flavor of the issues that recur throughout the book:

An algorithm which compared changes in student performance year-on-year was used to decide which teachers were a poor performers who would be let go. In one terminated teacher’s case subsequent evidence suggested that student’s scores had been tampered with the previous year and as a result she had inherited a class whose performance had been overstated, so was bound to deteriorate with the algorithm marking her out as a poor teacher.

The teacher was not told how the algorithm applied nor allowed to appeal the decision; the algorithm was a black box that produced a result the school system wanted � terminations. And, notwithstanding the lack of transparency in this decision, a sample of scores from one class alone - perhaps twenty to thirty students - was not sufficient to produce a statistically valid result in any event. One more reason why being an American public school teacher must be one of the worst teaching jobs in the developed world.

Please read the book for more egregious examples of “algorithm abuse�. Below is my summary are some of the key problems the book outlines. It is helpful to think about these problems in two categories, problems which come from poor application of the technical aspects of the algorithms (bad programming, misapplication of statistical or machine learning methods) and problems which come from how the algorithms are developed and used, how they incorporate hidden biases or value judgments, poorly thought out objectives and other ‘human factors� beyond just poor maths:

Some of the common issues were:

� Lack of transparency in how the models operate and how they make a decision, often leading to no means of appeal against a clearly unjust outcome. Related problems include over reliance on models in the face of contradictory data or, where people do understand the models but are benefitting from them, a lack of integrity in applying them. Models used to price mortgage backed securities during the financial crisis are a leading example.

� The use of “proxy data� for model input or output because the real data desired is unavailable, too expensive to obtain or cannot be objectively measured. Does sending out more e-mails with “creative phrases� mean that you are really a more creative and innovative person? FICO scores are a relatively good model for use in predicting credit risk, but not as a proxy when used for a whole host of other unrelated things such as predicting future job performance when used in the hiring process.

� Feedback loops, whereby a model increasingly encourages non optimal outcomes by rewarding certain behaviors at the cost of intended benefits. An example given is US college rankings which increasingly reward “user experience� and “research citations� rather than the actual educational outcomes for students.

� Algorithms that, mainly for efficiency purposes, use data from arbitrarily selected groups when individual data is available. Why should an algorithm price insurance for a driver based on the experience of other drivers that live near him or are in a similar economic position instead of on his own individual driving record?

� Optimization that is good for an algorithm’s owners but not for society as a whole. Monetary return is the most common priority for an algorithm used in the private sector, but is this what society wants when it results in micro-targeting the poor with the marketing of for-profit colleges that provide a below average education at great cost?

� Hidden biases and unfairness when assumptions are built into an algorithm that are reflective of social factors rather than individual experience. Algorithms used by the police to forecast crime are first used to predict easy to identify nuisance crimes which, unsurprisingly, occur mainly in deprived neighborhoods. The police have yet to develop an algorithm that forecasts where white collar crime takes place, although if they did Wall Street would surely light up red.

� Incorrect use of statistics; lack of feedback of model results into predicting outcomes. Baseball is a good field for prediction because outcomes � homes runs, strikes, batting averages - can be objectively measured and predictions can be fed back into the model to improve it. Using FICO scores in recruitment is not what the developers of FICO intended; there is no study suggesting statistical correlation between FICO and subsquent good job performance; good or bad job performance is also never used to assess HR models or improve their reliability.

The last chapter of the book looks at the micro targetting of voters with political messages through Facebook and similar social media sites. This was written in September 2016, a respectable amount of time before the world had a glimpse into the dizzying vortex of fake news, Russian hacking and tweeting Presidents. This is an excellent chapter and may well be seen as prophetic once we look back over our current period of political chaos, if it ever ends.



I resolved to write notes on the books I read as I get so much more out of them if I do. This resolution will last about one and a half books I’m sure, but if anyone is interested here they are:

Introduction: Mathematical models are being increasingly used to make decisions that have real world impacts. However the algorithms they use are opaque except to a limited number of mathematicians or computer scientists and may, sometimes unknowingly, encode human prejudices, misunderstandings and biases.

Assumptions may be camouflaged by the maths and go untested and cannot be questioned by the persons to whom they are applied. These algorithms are often used in inappropriate contexts where there is insufficient objective data to properly apply the underlying statistical theory; algorithms may work for baseball with a many tens if not hundreds of thousands of objective, constantly updated data points but not to evaluate a teacher with a class of thirty.

The algorithms produce “feedback loops� under whereby they produce output that eliminates particular classes of result based on false assumptions, but as a result reinforce the weighting given the assumption in the model. Injustice is reinforced as these algorithms are applied to ever larger numbers of people, although the rich and privileged may still be assessed on an individual basis rather than as one of the masses assessed by machine. People unfairly denied opportunities as a result of this software are “collateral damage�.

The algorithms optimize a payoff designed by the developers of the model - in the case of many private sector models, monetary profit - but who is to say that this is the optimum payoff for society as a whole?

Chapter 1, Bomb Parts - What is a Model? The author explains how her managing of her family cooking is a form of “data model�: inputs are family preferences, appetite that day; available food, special cases such as cooking on a birthday, the output is “family satisfaction� and the model determines menu for that day The model could be pre-programmed with a set of rules to determine its menu or could be trained through observing many examples. In either case mistakes may be made, perhaps through forgetting a rule or not including a rare case in the training data. The key point is that the model can incorporate personal biases that are not visible, in the case of the menu model towards healthy food and away from ice-cream.

Models don’t have to be complicated to be effective; a smoke alarm is a model intended to identify fire that operates on only a single input, the concentration of smoke particles.

Three questions are posited to evaluate models: Firstly, is the model understandable - or even visible to - by the people to whom it is applied? Secondly, does the model work in the subjects� interests? Is it fair or can it cause unjust damage? Thirdly, does the model scale? May it be used in ever wider circumstances, at the same time reinforcing its hidden biases as it is applied in ever wider circumstances?

An example is the model used to determine sentence length in US courts based on a model about the risk of recidivism. The model takes into account factors, such as previous criminal record or age of first contact with the police, which would not be admissible as evidence in court and which may unfairly discriminate against certain sections of the population.

Chapter 2 talks about the role of algorithms in the financial crisis, noting two key issues that lead to the collapse in the mortgaged backed securities market. Firstly the assumption that models had been subject to proper mathematical vetting; in reality few people understood the mathematical and statistical issues and many that did lacked the integrity to speak up, especially as initial success had created its own feedback loop encouraging growth in the market. Secondly modern computing power had allowed a massive secondary infrastructure to grow around the market - credit default swaps, CDOs etc - that instead of diversifying the risk masked, magnified and concentrated it.

A very interesting point was made based on the author's experience in a risk assessment firm, that many traders are remunerated based on their Sharpe ratios, the ratio of revenues to risks taken, and accordingly are motivated to �...actively seek to underestimate…� risk in order to effectively manipulate the Sharpe ratio and hence their bonuses. This contrasts with the approach of hedge funds, which genuinely care about risks (given their own money is at risk) and traders at large financial institutions which don’t have their own financial capital at stake.

Chapter 3 looks at the impact of the algorithm developed by US News to rank colleges in the US. 75% of the ranking was based on proxy items intended to measure college success and 25% on subjective evaluation. The proxies selected, for example admission ratios, SAT scores, invited gaming that distorted applications; colleges would invest in sports in order to encourage applications that could be rejected; in an extreme case a new Saudi college required part time professors with large numbers of citations to change their site reference to the college in order to move up the rankings. The system does not measure the key success of education - what the students learnt at each school. The ranking generates its own feedback loop; colleges that rank high attract more applicants, generating more rejections thus moving them further up the rankings. Wealthy applicants pay consultants to game the system.

Crucially the original algorithm did not take into account college tuition costs. This guaranteed the early rankings being in line with existing “common sense� with Yale, Harvard and other wealthy colleges ranking high but thereby excluded an issue critical to students, value for money from an education, while encouraging colleges to spend excessively in order to improve student experience and hence ranking without regard to cost, ultimately leading to more student debt.

US govt has now made available data on schools allowing students to check directly.

Chapter 4 looks at targeted online advertising, taking for profit colleges that target poor and vulnerable people as a particularly nefarious case. The internet gives instant feedback for targeted marketing campaigns through Facebook or Google in particular where successful or failing ads can be evaluated in real time. Bayesian analysis is used to evaluate success.

20-30% of a for profit college’s budget may go on lead generation with more spent on recruitment than on education itself. Specialist lead generation firms exist targeting particular communities, posting fake job ads or promising health coverage.

Chapter 5 looks at “Predpol� an algorithm to predict crime which has a key input the geographic location of crime but which excludes data on race. The algorithm is successful in forecasting where crimes can occur, but mainly because it includes low level crimes - public drunkenness, jaywalking - which are better correlated with geographic location. However these crimes are also often associated with poverty and indirectly with race; more serious crimes are more difficult to detect while the system ignores some crimes altogether, such as white collar fraud. On the surface the algorithm is objective but under the hood it reflects value judgments around where the police direct their attention.

Issues of probable cause are also raised by the use of algorithms that predict whether a person will commit crime based on proxy data such as location of residence, whether or not employed or similar. These algorithms raise questions around to what extent the public is prepared to balance efficiency in police work against fairness, but without any public debate.

Chapter 6 looks at the use of algorithms in employee hiring decisions.. Personality tests are often used for screening job applicants, but these may be “run arounds� laws preventing discrimination against people with disabilities and, a key consideration for the correct use of algorithms, are rarely updated through monitoring their actual success in predicting good job performance. This may not be a problem for a system evaluating baseball stars, whose performance can be objectively statistically measured and where the individuals may be paid millions of dollars, rather the burden falls on lower paid staff in the collateral damage suffered by individuals that are screened out by the algorithms but are capable of doing the job.

These algorithms include negative feedback - those discriminated against fail to get good jobs, thus justifying the discrimination - and may be based on data that reflects historically discriminated hiring practices; an example is given of a system used in a British hospital that after many years of use was held to discriminate against women and immigrants, repeating discrimination reflected in the original data on which the system was based.

Chapter 7 examines the impact of algorithms in the workplace.

Job scheduling software uses data to schedule peaks and troughs of staffing, dehumanizing employees in the process who are unsure of their work schedule until only a day or two before being called on.

Another algorithm - ‘Cataphora� - attempted to identify the most creative and innovative employees through tracking the flow of e-mail including certain key phrases through the e-mail system. There is little evidence that this approach worked, but employees who were not among those identified as the most creative may be first in line for termination. This algorithm suffers from the two classic problems identified early in the book, the difficulty of finding measurable proxy data for the items (soft skills such as creativity) that you want to measure and the lack of any feedback on success and failure of those measured to help the algorithm learn.

The chapter explained an egregious error in the evaluation of teacher performance through the use of SAT scores. The ‘Nation at Risk� report issued by the Reagan administration was intended to address the decline in teaching standards as measured by a gradual decline in SAT scores of graduating students over the years. In fact the data illustrated an example of ‘Simpson’s Paradox� in which the whole body of data showed one trend but when examined on a segmented trend the opposite trend was apparent. In this particular case the scope of people taking SAT tests had expanded over time including enough people at the lower end of the scoring range to lower the overall average (i.e. elite students had been sitting the test for many more years, so there was little room for scope to increase at the higher end of the scoring range). When the data was segmented in narrow SAT ranges SAT scores had increased in all segments over time. The whole premise of the report was false.

Chapter 8 looked at FICO scores. FICO scores themselves have some good features as data; they are based on an individual’s history (in contrast to being based on aggregate data inferred from people who resemble the individual), default on a loan is relatively objectively measured and feedback is used to improve scoring.

Problems arise when FICO scores are used as proxies for other data that is not so easily or objectively identified, for example future job success when used as screening in the hiring process, or when combined with other data in developing proprietary “e-numbers� used by firms for purposes such as marketing. These metrics are not transparent to consumers and risk being a backdoor for discrimination because they may use factors that are indirectly linked to poverty or race such as residence.

Chapter 9 considered how the whole insurance business model may be undermined by data algorithms. Insurance relies on the pooling of a wide range of risks, good and bad, to allow the pricing of the risk in aggregate. Algorithms allow such risks to be segmented in a non-transparent way which, in addition to undermining the insurance model itself by pricing out high risks, permits price gouging of other segments. The use of non risk related indicators, for example credit scores or location of residence, in pricing motor insurance also allows price gouging. Why should a drop in a driver’s credit rating impact his premium when the risk that he has an accident is unchanged? Drivers are being assessed on the consumer patterns of their friends and neighbors rather than on their accident record.

Chapter 10 looks at the application of algorithms to civic life, in particular the use of micro targeting of political messages through Facebook. Facebook has experimented with manipulating emotional responses of users through manipulating their newsfeeds.

Facebook’s approach to disseminating news is contrasted to a conventional newspaper in which the Editor makes a decision on what to put on the front page but that decision is seen by everyone and open to public debate whereas the criteria for selection of articles for a newsfeed by Facebook are opaque and unique to the viewer. Facebook micro targeting may be one of the reasons many Republican voters still believe Obama ‘birther� and other conspiracies.

Micro targeting of voters risks disenfranchising everyone to the benefit of those paying for it. Voters in swing states are subject to more focused and intensive campaigns which are relevant only to themselves to the detriment of the democratic process for all.
Profile Image for Amin.
408 reviews424 followers
January 2, 2024
شاید بتوان عنوان کتاب را "سلاحهای کشتار جبری" ترجمه کرد تا منظور نویسنده بهتر حاصل شود، با این توضیح که منظور از جبر، معنای آن در ریاضی است و نه به معنای جبر و اختیار. فکر می کنم با محبوبیت این کتاب در دنیا و نقش رو به رشد داده پردازی در کشور ما دیر یا زود شاهد ترجمه آن باشیم.

بارها برای من پیش آمده که در فضای پیرامون که مملو از دانشجویان و محققین حوزه داده و مهندسی است بحثی پیش آید که مرزهای فنی و غیرفنی کار با داده چیست و تا چه حد پاسخ تمام مسائل به خودی خود از دل داده ها در می آید. یک بار کسی به من گفت دیگر نیازی به عامل انسانی نیست چون هوش مصنوعی خودش راه حل ها را می یابد و عرضه می کند. اما می دانیم که مثلا در کلان داده ها، به لحاظ آماری می شود بین پدیده های دلخواه بسیاری ارتباط معنادار یافت و جهت گیری محقق تعیین می کند که به چه ابعادی از آن معنا بدهد. به طور خلاصه اول مساله معنادار بودن نتایج را داریم و در مرحله بعد استفاده و تفسیر خاصی که از آنها می شود که در هر دو مرحله نقش عامل انسانی پررنگ است

این کتاب اما یک قدم هم فراتر می گذارد. الگوریتم هایی که نتایج تحلیل داده هایشان هم معنادار است و هم کاربردهای مشخصی دارد. اما گاهی خواسته و گاه ناخواسته معضلاتی درست می کنند که غالبا در راستای تخریب دموکراسی و افزایش فاصله طبقاتی هستند. با این که مثالها برای جامعه امریکا هستند، اما همین موقعیت جغرافیایی شاید بهترین مثال باشد تا وضعیتی رادیکال تر از استفاده از داده های بیشمار در تصمیم گیری های اجتماعی و سیاسی را ببینیم، البته با محوریت منافع اقتصادی و نقش شرکت ها. یک نکته مشترک هم در تمام این الگوریتم ها غیرشفاف بودن کارکرد آنها برای ناظران بیرونی و عدم وجود بازخورد برای اصلاح آنهاست که باعث شده نویسنده به هر کدام از آنها عنوان یک ابزار کشتار جبری بدهد

از سیستم رتبه بندی معلمان که گاه علیه معلمانی شایسته تر نتیجه می دهد و کل سیستم را بجای بهبود سیستم آموزشی به سوی افزایش امتیاز برای بقا در سییستم سوق میدهد، تا کارکرد شرکت های بیمه برای افزایش سودآوری که در نهایت به ضرر قشر آسیب پذیر عمل می کند، یا با همین منطق الگوریتم هایی که به پلیس کمک می کند تا خلافکاران بالقوه را شناسایی کند و در نتیجه زندان پر از افرادی می شود که غالبا از مناطق آسیب پذیر یا سیاه پوستان هستند. از این کاربردهای اجتماعی به سمت بازاریابی هدفمند که برویم پای دانشگاههای پولی و تبلیغات وسط می آید که به زبان نویسنده طبقه مرفه آگاه تر و باهوش تر از آن هستند که گول چنین سیستم هایی را بخورند؛ و در نهایت به سیستم اثرگذاری این الگوریتم ها در شبکه های اجتماعی میرسیم که اطلاعات دریافتی از سوی هر فرد را تحت تاثیر قرار می دهند تا جهت گیری سیاسی جمع زیادی از رای دهندگان مردد را قبل از زمان انتخابات تغییر دهند و به نوعی مانعی بر سر دموکراسی به شمار آیند

به نظرم فصل ها جذابیت های کم و زیاد متنوعی دارند. گاهی می توان با نظر نویسنده موافق نبود یا احساس کرد که در بیان نتیجه می تواند اغراق کند، اما در کل این کتاب را اثری خواندنی یافتم که با تخصص خود نویسنده که در همین باره است قابل اعتنا می شود
Profile Image for Andy.
1,911 reviews576 followers
May 7, 2022
It's good to critique what the author calls WMDs, but for me this book missed the mark.

For example, the entire example of teacher-rating in DC is a red herring. The real question is: What is proven to deliver success in an urban US school system? (Answer: ) So it's valid to point out the silliness of a silly GIGO teacher-rating algorithm, but it's a distraction to fall down the vortex of dissecting that silliness in detail. We need to focus instead on facts about relevant outcomes. This is sorta kinda vaguely the theme the author seems to be developing but never fully formulates: the afterword delivers a pretty wishy-washy conclusion about the nature of objective truth.

Trying to say something nice... The flip side of the superficiality of this book is that it alights on numerous interesting factoids as it skims along. Also, it is clearly written.

Alternatives:





Profile Image for Monica.
735 reviews674 followers
October 12, 2024
I thought the book was very good, but the unintended effect of reading it 8 years after it was first published for me led to nothing new or revelatory here. But I do think it would have blown my mind in 2016 🤯🤓

4 Stars

Listened to the audio book narrated by the author. She was very good.
Profile Image for Chad Kohalyk.
296 reviews33 followers
October 3, 2016
Solid overview of the various mathematical models that govern our education, labour, wealth, and commerce. O'Neil packs in many examples and unpacks how simplistic, unfair and damaging to already disadvantaged people these models can be. As someone who worked on the front lines of developing models for predictive internet shopping, I was familiar with many of the tactics mentioned in this book, and their ethical shortcomings (which finally led to me leaving the business). What she says is entirely true, and it makes a small amount of people a lot of money. More people should be outraged. This book could help.

There are two shortcomings, though, both possibly due to its short length. First, there is no discussion at all of government abuses of all the data collection detailed within its pages. Maybe in the age of Snowden this is just assumed, but I think it is an important outcome of the big data revolution that should at least be addressed, even in passing. Secondly, I was hoping for more imaginative solutions. The is a nice idea... from twenty years and two big economic recessions ago. Advocating for stronger regulation is certainly prudent. However, I was hoping for something new.

Weapons of Math Destruction is a well documented tour of the standard examples of the misuse of math and big data, and concludes with the standard solutions. A good book to recommend to friends who need a primer on these issues that we have been facing for the last decade or so.
Profile Image for Atila Iamarino.
411 reviews4,474 followers
February 7, 2017
Uma discussão muito boa e ponderada sobre o que acontece quando começamos a usar algoritmos para avaliar pessoas. A passa por vários sistemas de avaliação automatizados e o tipo de viés que eles introduzem, quem fica de fora ou quem é mau avaliado. De sistemas de crédito a seguros, saúde, vagas de emprego e antecedentes criminais. Uma visão bastante importante do que está por vir com o dataísmo crescente (ver ).

Não que isso seja um problema novo. Preconceito, premissas erradas e mais uma série de falhas desfavorecem muita gente desde sempre. Mas agora que podemos automatizar essa avaliação, o que eram vários humanos errando em várias direções vira um algoritmo errando consistentemente da mesma forma. O que pode aumentar o estrago feito, mas também dá margem para um aprimoramento muito mais eficiente.

A mensagem mais importante do livro me parece ser que, já que o uso dos nossos dados para tudo é inevitável, pelo menos saber o que está sendo levado em conta quando somos avaliados ajuda muito a previnir erros com consequências graves.
Profile Image for Jack Teng.
Author8 books7 followers
October 7, 2016
I can't stress how important of a book this is. I don't think people really know how the obsession with Big Data and algorithms is about to control/influence our lives. I suppose I sound paranoid, but I really don't think I am. There are too many tinkerers out there who have some degree of competence at math and who think they'll solve all the world's problems with the next greatest optimization formula, and yet they lack even the most basic experience in asking proper research questions and understanding the major limitations of data. My own experience in theoretical ecology exposed me to empiricists wanting to get in on the modeling action and just produced garbage papers. Beware, people. Beware.
Profile Image for C.P. Cabaniss.
Author9 books123 followers
February 20, 2017
*I received a copy of this book through Netgalley. All thoughts are my own.*

This book did not turn out to be what I was expecting. I expected O'Neil to go more in depth about the math behind data she was discussing, to explore the algorithms in greater detail. That was not what I got, however. This book turned out to be more of a superficial look at some of the ways big data can/has impacted society in the author's opinion. And while I found some of the material presented interesting and informative, I was overall not impressed by the arguments the author made. She didn't seem to have a clear idea for a resolution.

More thoughts will be up on my blog:
Profile Image for Murtaza.
700 reviews3,388 followers
January 21, 2018
A short and concise overview of the problems being wrought by the algorithms that are now quietly governing our society. While the overview is useful, I was a bit surprised at how little new information that there was in here. The issues that she raises should be familiar to anyone who has been following the impact of Big Data in even a cursory way. Facebook news bubbles, insurance profiles and other common phenomena are the main examples she uses. I did appreciate the social justice focus of the book, which makes sense given that the author is a former hedge fund analyst disenchanted by the malign behavior that she witnessed on Wall Street.

One key takeaway is that it there is a drive for poor and middle class people to have their lives processed by algorithms, while the rich will continue to depend on real-life personal connections to get by. In effect the masses are being turned into raw numbers with scores attached, while a small elite will continue to be authentic "personalities" who live in the more nuanced way.

This book lays out some of the ways to fight back against opaque algorithmic power and make these algorithms more fair. A solid overview but nothing groundbreaking.
Profile Image for Graeme Roberts.
539 reviews36 followers
November 3, 2021

writes well and provides some good information, but I found the fundamental thesis of to be foolish and irresponsible. The title, condensed inevitably to WMD, is used promiscuously to describe any application of data science, statistics, or even information technology that she considers to be unfair or discriminatory. In some cases, she may be right; some applications are used by the greedy and the uncaring with insufficient thought of the consequences, but she rarely considers the positive attributes of the applications. Once branded as a WMD, a term indelibly appended to the evil trifecta of George W. Bush, Saddam Hussein, and the Iraq War, how are readers meant to consider their many positive aspects and what can be done to make them better?

At first, I thought that Ms. O'Neil had been the victim of a foolish PR person in her publishing company who had come up with this catchy name, but her ardent use of the term convinced me that she had invented it.

Ms. O'Neil's own career seemed to follow a downward trajectory from her Harvard PhD in mathematics and a teaching position at Barnard to a leading hedge fund and a series of other ethically questionable environments that left her convinced that data science, and even mathematics, are roots of much evil. I don't buy it. They are tools that can be used or misused, and her diatribe does nothing to help.

She asserts, on page 210:

If we find (as studies have already shown) that the recidivism models codify prejudice and penalize the poor, then it's time to have a look at the inputs. In this case, they include loads of birds-of-a-feather connections. They predict an individual's behavior on the basis of the people he knows, his job, and his credit rating—details that would be inadmissible in court. The fairness fix is to throw out that data.

But wait, many would say. Are we going to sacrifice the accuracy of the model for fairness? Do we have to dumb down our algorithms?

In some cases, yes. If we're going to be equal before the law, or be treated equally as voters, we cannot stand for systems that drop us into different castes and treat us differently.

I disagree. Some data is better than none (provided it is accurate and not intentionally manipulative), though we should be looking to constantly improve our data science systems. I would like to know that the recently paroled criminal moving in next door had been released by a judge in full possession of whatever facts are available. So, I suspect, would Ms. O'Neil.
Profile Image for Lauma.
232 reviews1 follower
January 29, 2021
Datu zinātnieka obligātā literatūra.

Interesants ieskats dažādos matemātisko modeļu piemēros, kas cilvēku pieļautu kļūdu vai alkatības dēļ pasaulei dod vairāk slikta, nekā laba. Jebkurš modelis ir tik subjektīvs, cik subjektīvs ir tā veidotājs.

Mašīnām var iemācīt diskrimināciju.
Cilvēki jauc korelāciju ar cēloņsakarību.
Cilvēkiem patīk pelnīt uz citu cilvēku rēķina.

Autore norāda, ka bīstami matemātiksie modeļi ir mums apkārt un kontrolē mūsu dzīves daudz vairāk nekā mēs iedomājamies. Augstās mācību maksas ASV universitātēs? Pie vainas izglītības iestāžu tops, kurā samaksa par studijām nav iekļauta. Matemātiskie modeļi, kas paredz noziedzību pirms tā vēl notikusi? Ideja lieliska, izpildījums pieklibo, jo policistiem ir tieksme fokusēties uz maziem pārkāpumiem, kas bieži norisinās nabadzīgajos rajonos, kurus galvenokārt pārstāv ne tās gaišākās ādas krāsas pārstāvji. Jo vairāk tādiem pievērš uzmanību, jo vairāk modelis domā, ka tur notiek visaktīvākā noziedzība, kamēr bagāto un 'pieklājīgo' cilvēku rajonā vardarbību ģimenē gadiem neviens nepiefiksēs. Demokrātija? Cilvēkiem pielāgotas reklāmas for the win - dažādi cilvēki saņem dažādus solījumus no viena un tā paša politiķa. Facebook arī spēj ietekmēt cilvēku emocijas pietiekami, lai tas varētu izšķirt vēlēšanu dienu rezultātu. Vēlēšanu rezultāti nav jāvilto, pietiek ar pareizu cilvēku manipulāciju e-vidē, un matemātiskie modeļi to spēj nodrošināt.

Piemēru ir daudz, taču visus sliktos modeļus vieno 3 svarīgas īpašības - tie nav caurredzami, tie ietekmē lielas cilvēku grupas un to ietekme ir spēcīga ne tajā labākajā nozīmē. Nav tā, ka paši matemātiskie modeļi ir slikti, tie ir tikai instruments cilvēku rokās, un jebkurš instruments ir tik labs, cik labs un spējīgs ir tā izmantotājs. Veltot pietiekami daudz uzmanības sava darbam, jebkurš datu zinātnieks var parūpēties par to, lai viņa veidotais modelis nenonāktu šo piemēru sarakstgā un neradītu kaitējumu cilvēcei. Arī man ir gadījies upurēt modeļa precizitāti, no tā izņemot cilvēku radīto diskrimināciju datos. Jāatzīst, ka to izdarīt nemaz nav tik viegli, ja esi mācījies tiekties pēc iespējami precīzākā iznākuma, taču ir jāsaprot, ka ne vienmēr augstākais % skaitļos atbilst realitātei.

Sometimes the job of a data scientist is to know when you don't know enough.

Iesaku arī tiem, kas īpaši nedraudzējas ar matemātiku. Grāmata uzrakstīta pietiekami saprotamā valodā un vairāk iedziļinās tajā, kā strādā pasaule, kurā dzīvojam, pārāk neiedziļinoties matemātisko modeļu tehniskajās detaļās.
Profile Image for Nick Klagge.
826 reviews70 followers
February 5, 2017
Cathy O'Neil was my professor for number theory in college, I think in 2006. I thought she was a great teacher, but didn't keep in touch at all after the class. I was somewhat aware that she was involved with Occupy Wall Street's financial policy arm, and after I heard about this book, also learned that she had been co-hosting a podcast on Slate (which she is now about to leave!--but I still have a lot of back-episodes to listen to).

I'm broadly in agreement with the thrust of her argument in this book, and only have quibbles with regard to ways I think it necessarily simplifies things to be a popular-consumption nonfiction book. The main thrust of the book is that people frequently conflate "algorithmic decision-making" with "neutral decision-making", and that this is a fallacy (one that algorithm-purveyors are happy to perpetuate, for the most part). As "big data" and quantitative models get applied to more and more aspects of everyday life, it's incumbent on us to understand this, and to consider ways in which algorithmic decision-making can be problematic, biased, or dangerous.

O'Neil describes three features that characterize a "weapon of math destruction": scale, secrecy, and destructiveness. We mostly only need to focus on algorithms that have all three of these features, at least to some degree--a model that is in limited use, transparent, or harmless is not much of a cause for concern. She gives examples of algorithms in many fields that she sees as meeting all three of these criteria. One example is recidivism risk modeling, which is now used in many states to determine, at least in part, criminal sentencing. Errors or bias in these algorithms may result in additional years behind bars for the individuals they apply to, and they are both widespread and not publicly disclosed. There are many other interesting (/troubling) examples in the book, such as teacher value-added models.

An emergent property of many such algorithms is that they may engender undesirable feedback loops. For example, a recidivism risk model will be biased against black people if it is trained on historical data that cover an environment characterized by bias against black people. (If black people are more heavily prosecuted in general, they are likely to appear as higher recidivism risks, and even if the algorithm doesn't use race directly, it will pick up on correlated factors and amount to the same thing.) This is largely a function of the opacity characteristic--if an algorithm is publicly disclosed, people can bring scrutiny to it and highlight flaws.

The issue of model opacity is an especially interesting one to me, as someone who works on regulatory financial models that are intentionally not disclosed. There are strong reasons for not disclosing any models used for high-stakes decisions (in my case, setting minimum capital levels for banks). A primary concern is that a transparent algorithm will be "gamed", in the sense that those subject to it will figure out ways to make themselves "look good" to the algorithm that are driven more by the details and limitations of the model rather than by the underlying substance. A second concern is that a transparent regulatory model can encourage a "monoculture" in which those subject to it will simply adopt the model for themselves, rather than developing their own models that, while still flawed, will have different flaws than the regulatory model.

I don't think there is an obvious solution to this transparency problem. One solution that I definitely don't agree with is to eschew quantitative decision-making altogether. As O'Neil clearly states in the book, we shouldn't assume that pre-algorithmic decision-making was unbiased either--it seems quite apparent that, for example, there is bias in judgmental sentencing, perhaps more than in algorithmic sentencing. O'Neil herself has one proposed solution to this (which I don't think she really discusses in the book)--she has started a consulting company whose intent is to audit existing algorithms for potential biases or other damaging impacts. This would allow some degree of independent assessment while not disclosing the model generally. I think this is an interesting idea and I hope it takes off, but there are limitations. Especially in the private sector, it's not clear why a company would voluntarily request such an audit, especially if an algorithm is making them lots of money. Fear of regulatory penalties could be one motivation, but we're clearly entering an era of deregulation. Regulators themselves might force audits, but again that requires a strong regulator (also, who watches the watchers?).

One approach that I think could make sense in at least some contexts is a hybrid algorithmic-judgmental process. (At least right now, hybrid processes can be most effective in many fields--for example, the best chess player is neither a computer nor a human, but a human assisted by a computer.) To take the example of recidivism risk, we might have an algorithm that is publicly disclosed that produces a publicly-disclosed outcome to the judge (perhaps a recommended range). The judge may then choose to depart from the recommendation, but needs to give a written description of her reasoning for doing so. In this way, the algorithm can be audited by any outside party for potential biases, but the final judgmental step serves as insurance against flagrant cases of gaming the system, or cases with significant factors that are not considered by the algorithm.

As I said earlier, my only real quibbles with the book are around simplifications that I think are a reality of publishing a non-fiction book for popular consumption. For example, the terminology of "WMD" encourages us to think in a binary way (is it or isn't it one?), rather than seeing a continuum from OK to troubling, which I think is a better reflection of reality.

Finally, I'll add that this book, published in September 2016, proved to be quite prescient. The public controversy about "fake news" and the Facebook newsfeed algorithm arose shortly after its publication. Interestingly, I think before this happened, few would have identified the newsfeed algorithm as a potential WMD, because the vector for "destructiveness" was non-obvious. O'Neil has written some articles on this topic since the publication of the book, which are worth looking up.
Profile Image for Jill.
2,243 reviews95 followers
September 4, 2016
This book shows the hidden ways in which the use of "Big Data" is much more far-reaching and harmful than expected. Big data refers to the massive amount of information now available because of computers that is collected and analyzed and sold to third parties.

In particular, as the author demonstrates convincingly, applications of Big Data “punish the poor and the oppressed in our society, while making the rich richer.� She paints a sobering picture.

The author calls the mathematical models employing Big Data and used to such harmful effect “Weapons of Math Destruction� or WMDs.

In WMDs, she explains, “poisonous assumptions . . . camouflaged by math go largely untested and unquestioned.� They create their own toxic feedback loops, and, to an extent which shocked me, guide decisions in a large variety of areas ranging from advertising to prisons to healthcare to hiring and firing decisions. Most importantly, because they rely on esoteric mathematical models, no matter that they are many times based on biased and/or erroneous premises:

“They’re opaque, unquestioned, and unaccountable, and they operate at a scale to sort, target, or ‘optimize� millions of people.�

The goal is always profit, but what is lost is fairness, the recognition of individual exceptions, and simple compassion and humanity, adding to the inequality gap, not to mention downward spirals for some unfortunate victims from which it is almost impossible to escape.

I am not at all well-versed in math, but the author manages to explain how all this works without requiring that one understand specific algorithms. She provides examples from the worlds of teacher evaluations, hiring decisions generally, advertising, insurance, police programs, college admissions, lending and credit evaluation, and political targeting.

One of the saddest chapters (and they are all sad, unfortunately) is about the many for-profit universities (Trump University comes to mind) that specifically target people in great need, selling them overpriced promises of success. Her quotes from the marketing materials of these places are horrifying. They look for individuals who are “isolated,� with “low self esteem� who have “few people in their lives who care about them� and who feel “stuck.� She shows how they use google searches, residential data, and Facebook posts, inter alia, to find “the most desperate among us at enormous scale�:

“In education, they promise what’s usually a false road to prosperity, while also calculating how to maximize the dollars they draw from each prospect. Their operations cause immense and nefarious feedback loops and leave their customers buried under mountains of debt.�

The chapter on the way the “stop and frisk� policing operates is also very depressing; and in truth we have seen the tragic results in city after city.

The fact is, the whole book is rather a downer, albeit an important one. Although O’Neil cites a few programs that have used Big Data to help people rather than to enrich a few and oppress the rest, can one really think that “moral imagination� can take precedence over prejudice and greed? Personally, I’m not so sure. The author provides ideas about how to change (and importantly, regulate) uses of Big Data, but she is more optimistic than I am, ending on a positive note:

“We must come together to police these WMDs, to tame and disarm them. My hope is that they’ll be remembered, like the deadly coal mines of a century ago, as relics of the early days of this new revolution, before we learned how to bring fairness and accountability to the age of data. Math deserves much better than WMDs, and democracy does too.�

Evaluation: I hope this important book gets a lot of attention. My husband always makes the argument about privacy concerns that what do we care if we’ve done nothing wrong? This book shows how, astoundingly, that isn’t enough to stop Big Data from hurting us in many aspects of our lives. It is a critical lesson for today’s world, and the world of our children.
Profile Image for Kurt Pankau.
Author12 books21 followers
November 1, 2016
My father, in one of his grouchier get-off-my-lawn moments, complained to me about computers in the workplace. Specifically, it bothered him that people were so trusting in the software that they were unwilling to gut-check outputs. This conversation happened maybe twenty years ago, but it stuck with me, and part of the reason is because he was absolutely right. We've all heard the phrase "garbage in, garbage out" to describe this phenomenon.

O'Neil takes the idea of "garbage in, garbage out" and compounds it, and this book is an exploration of the pitfalls, both real and potential, involved in letting algorithms dictate our lives, in everything from teacher retention to prison recidivism to workplace wellness programs. I'm a huge fan of O'Neil's contributions to the Slate Money podcast, which is how I learned of this book in the first place. And if you're at all interested in the Big Data economy, you should go read it right now. Even if you're afraid of math--in fact, especially if you're afraid of math. O'Neil's writing is not overly technical. This book is a very easy read and it is dense with ideas.

Many of these ideas resonated the instant they hit my brain, making me recognize things that had been around me for months or years. To give an example, O'Neil has a chapter on insurance and describes how more insurers are using e-scores based on "birds-of-a-feather" logic and demographic information. She points out the extent to which these things can become ridiculous, where having a drunk driving conviction can result in LOWER car insurance premiums than having a bad credit rating. While reading this, I remembered the "Sorta Marge" commercial Esurance put out and thought "Oh, that's what that was about."

There's a big social justice component to this book that hinges on the fact that algorithms sacrifice individuals in favor of trends, and in doing so they create feedback effects. Another example: some states include the defendant's zip code in sentencing models. This would be inadmissible in a court case, because it's a proxy for race and/or income and is irrelevant to the individuals guilt or innocence. But sentencing modelers argue that it accurately predicts how likely they are to be a repeat offender. So a person from the "wrong" zip code gets a longer sentence, then it's harder for him/her to find a place in society after release, and they end up going back to jail. The model doesn't just predict, it reinforces, and this is a problem. Because inherent in these models is an idea that O'Neil circles back on again and again: since we use these models to help shape the future, they have to reflect the kind of future we want to live in rather than just codifying the past.

It's another book I desperately want everyone I know to go out and read so we can have lengthy discussions about it in bars. Here's my only real gripe: I hate the phrase "Weapons of Math Destruction." It's an excellent title and a key concept in the book, but O'Neil uses "WMD" as shorthand throughout. Every time I saw it, I had to remind myself that it meant "Math" destruction instead of "Mass" destruction, and by about Chapter 5 the joke had stopped being funny to me. End rant.

I loved this book and I will no doubt be reading it again. I could talk about it for a long time, so I'm just going to stop. But, as O'Neil points out, we're in the early days of Big Data. These mathematical models are affecting your life whether you realize it or not, and they don't have to be pernicious. But now is the time to start informing yourself if you want to help shape the conversation tomorrow.
Profile Image for Wick Welker.
Author8 books616 followers
December 9, 2020
Big data codifies prejudice and penalizes the poor.

A decent take on the "garbage in-garbage out" theory of data utilization. Cathy O'Neil appears aptly qualified to comment on the phenomenon of data models becoming an amplified version of societies biggest problems. O'Neil has invented her own term, Weapons of Math Destruction (WMDs) to lift the veil on a hidden but very active phenomenon that is perpetrating the racial and class biases that are already prevalent.

Per O'Neil, WMDs are any data algorithm that is used by any governmental or private organization to increase output, decrease waste and offload the critical thinking from humans to computers. The problem with WMDs is that they are not dynamic models that adapt and change based on a feedback loop. Instead, old data is feed into these machines which then crunch numbers and output a result that is considered sacrosanct by whoever is using the algorithm. The WMD almost becomes like a mystic--you cannot question the WMD, it is beyond repudiation. Yet the output is opaque and the result compounds existing problems.

Examples abound. School districts use WMDs to determine who the "bad teachers" are to then get rid of them. Yet these programs will score one teacher at the bottom one year and at the top the next year in an arbitrary way. The issue is that a teacher has only a few number of students, not say 10,000. These low numbers make the data amplify noise, not signal. Other models like baseball statistics can be effective because it is wholly transparent, where WMDs are not, using proprietary algorithms. Bad models are opaque, blind to pertinent information and are not dynamic.

Data models are a black box beyond scrutiny and have a positive feedback loop within themselves that serves as self prophecy. The US world and News report offers a great example of becoming the standard upon which college data models have been fit. The metrics become the target, completely invalidating the list to begin with. The cost of education isn't put into the formula which creates a "gilded checkbook" for colleges. Tuition has sky-rocketed as a result of this phenomenon. Predatory for-profit colleges take advantage of the vulnerable by using WMDs on social media.

Geography used in crime WMDs are a surrogate for poverty and race because cities are highly racially segregated to begin with. Data models for lending, insurance and employment based on zip code amplify racial wealth disparities. Prison recidivism rates based on WMDs have the exact same problem. Scheduling practices for workers using WMDs create unstable work life balances for the impoverished. Credit scores used as proxies for hiring compound the positive feedback loop of poverty and debt. Credit is a marker of wealth and wealth is a marker of race. All of these data go into WMDs and amplify the problems once again. There is asymmetry in political micro targeting thanks to WMDs.

Overall, informative, if not a tad outdated (written 2016) read about the misuse of bad data. I recommend.
Profile Image for Tanja Berg.
2,181 reviews529 followers
February 25, 2018
Much of our lives are now influenced by algorithms. These are deemed to be neutral, but in fact many are based on models that are glaringly discriminatory. Zip codes stand in for race, for example. The poor are being particularly targeted for pay-day loans. Teacher are measured on opaque proxies that do no reflect their skills at all. Your credit score is as much a result of the group of people you are thrown in with, as with your own behavior. These algorithms are what the author calls "weapons of math destruction".

Every on-line decision you take, every click you make, is being monitoried. Your ads are tailored, and so is your Facebook news feed. "Big brother sees you" has a whole new meaning since "1984" was written 50 years ago. We willingly leave a trail of information across a range of sites. We take stupid tests on FB that sells information on ourselves and our friends to the highest bidder. Some times it seems that you only need to consider making an investment for the ads to start trailing your feed and your online existence.

Turns out that in making hiring decisions, intelligence tests are prohibited - that is why personality tests flourish. Personality is being used as a proxy for intelligence. This despite the fact that there is little correlation between personality and job performance and that only one trait - conscientiousness - bears any correlation to work performance at all. Job seekers aren't told when their personality test red-flags them even for minimum wage jobs.

This book reinforces my notion that we're all screwed. Still, awareness that the internet bubble we all live in does not represent the ultimate truth, would be beneficial.

Profile Image for Aaron.
203 reviews43 followers
March 3, 2017
We've all seen the Big Data books: the future is now! A/B testing forever! AlphaGo crushed it! OkCupid says you shouldn't have a shirtless fish pic, you adorably dull redneck!

But Big Data has a darkside, and O'Neil goes through each segment of our life to show how these "models" can be used against us, to extract goods from us, and to keep us poor. Unfortunately, she also loses her argumentative power that could come with nuance, and she has to disregard nuance in order to make it understandable to the layperson (in other words, I don't think she's very charitable to the layperson).

The first big issue she brings up and has lots of evidence for is: Transparency. Lots of Big Data models that we use (and are fed into) on a daily basis, are not transparent. They're opaque equations sitting on a server farm. A teacher being graded on a value added model doesn't know where their score is coming from. A potential hire doesn't know why he failed his psyche evaluation. A criminal standing before judge doesn't know why the score said he'd be more like to return to jail.

This is honestly one of my biggest takeaways, and it hits close to home. When you apply for a credit card, they tell you why you failed to get one. You can access your "data point" and know why you have the FICO score you do. Google and Facebook tell you the metrics they use to serve you ads in their settings. I know that Amazon is trying to get me to buy another wallet even though I just bought a wallet because they showed me five hundred ads for another wallet. But what about those ads that Forbes tries to serve me? The cookies sitting on my computer watching me? I have no ideas what they're doing and not readily known way to find out. What about non-regulated credit systems that exist out there in Web 2.0 land? Bank of America controls for race and tries to stop redlining when they make a new policy, but will Peter Thiel try to do that when he invests in E-Corp? Probably not.

The second big problem with some Big Data systems is that they create feedback loops that increase inequality. Here, O'Neil is super weak except when she brings up the criminal justice example- we could be using big data to help keep people out of prison and make programs that lower recidivism, but instead we're using it as a way to keep white people out of prison.... but am I really supposed to believe that ads help make poor people poorer? She brings up for-profit schools using targeted ads to lure immigrants and poor people into massive student debt to make a profit, and, while I admit that's super shady, it's not the targeted ads' fault, is it?

The third problem with these "Weapons of Math destruction" is that they often have skewed data. This is the old line "garbage in, garbage out" except now it's "racist/sexist garbage in, racist/sexist garbage out." For example, if you make a employment system that filters out resumes, than teach it on a bunch of older resumes, you're inputting the bias of those older resumes. So if the guy that was reading those resumes was racist, you might be teaching a racist model.

Technically O'Neil has two other "bad points" about "WMDs" but they're just about scale.

Now, there are a lot of problems in this book and O'Neil kind of goes on tangents. For one thing, the WMDs she brings up are less "weapons" than they are symptoms of a bigger societal problem. Take "democracy": our current political system allows a few people- those that live in Orlando, Florida and Pennsylvania, basically- to chose who will be the President. This is messed up, but it means that the Democratic Party could build a powerful machine learning system that most efficiently spent money in locations to help change hearts and minds and win. She really dislikes this Big Data system, and says it's a threat to democracy...

... but the electoral system itself is giant problem and threat to democracy (see: election of 2000)! Big Data has nothing to do with it!

The book ought to have been longer, and it ought to have included more counterexamples of positive data models (I can recall only two, FICO and some housing model). I think that she should've, if not had hand written equations or step-by-step instructions, at least given some background on actual data science. The way it is written makes it seem like she's a magician-mathematician that wandered down from the Ivory tower and realized that bankers were using magic for evil and now she wants to raise hell.

But I guess if I wanted authors to stop writing popular non-fiction books that they A/B tested on their blogs and turned into TED talks, I should stop reading popular non-fiction.
Profile Image for Krysten.
531 reviews22 followers
September 29, 2016
"Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that's something only humans can provide."

So, for work, I read a lot about big data. A lot. And it's all basically *jackoff motion* uhhnnnggg big dataaaa unnnnnnngggghhh yeah. And that makes me want to die.

This book is refreshingly critical of big data and algorithms, from a blessedly human approach. You might expect a lot of statistics and dryness, but it's a lot of real life stories to illustrate O'Neil's arguments, which makes it easy to read and care about. It's also a fairly quick read and doesn't waste a lot of time defining concept. O'Neil gets right to it, and her arguments are powerful. If I ever wonder why I can't seem to get my credit card APR down to a reasonable level even though my credit score is excellent, I have to think - maybe it's my zip code! So many pieces of data are taken into account for so many things that end up affecting poor people and people of color in wildly disproportionate ways. You might pay 5x more for auto insurance because you drive through "bad" zip codes at certain times of the day.

This book is pretty amazing and I'm so, so glad I read it.
Displaying 1 - 30 of 3,497 reviews

Can't find what you're looking for?

Get help and learn more about the design.