We are Data explores what identity means in an algorithmic age: how it works, how our lives are controlled by it, and how we can resist it.
Algorithms are everywhere, organizing the near limitless data that exists in our world. Derived from our every search, like, click, and purchase, algorithms determine the news we get, the ads we see, the information accessible to us and even who our friends are. These complex configurations not only form knowledge and social relationships in the digital and physical world, but also determine who we are and who we can be, both on and offline.
Algorithms create and recreate us, using our data to assign and reassign our gender, race, sexuality, and citizenship status. They can recognize us as celebrities or mark us as terrorists. In this era of ubiquitous surveillance, contemporary data collection entails more than gathering information about us. Entities like Google, Facebook, and the NSA also decide what that information means, constructing our worlds and the identities we inhabit in the process.
We have little control over who we algorithmically are. Our identities are made useful not for us but for someone else. Through a series of entertaining and engaging examples, John Cheney-Lippold draws on the social constructions of identity to advance a new understanding of our algorithmic identities.
Read this: "More comprehensively, an algorithmic gender's corrupt univocality substitutes for the reflexive interplay implicit in the gender's constructionism."
This is what you will suffer through. When people write like this, I think that either they are so far up in their academic tower that they have no idea how to explain an idea to another mammal, or they are obfuscating simply to prove to anyone who will listen that they belong in that tower. Either way, if you can't dumb it down enough to get your point across, you're not communicating. If you're not communicating, you're just background noise.
Or maybe me dumb. Either way me have gum for chewing! Oops, me tripped. Maybe if I no chewed gum at same time I read book, I would make it past 35 page (seriously?) introduction! Idea of throwing away gum make me sad. Idea of reading this book again make me sadder. Maybe I put gum in book so no one else be sad! Yay gum! Oops, me tripped again.
If suffering is the key to enlightenment, then grab a seat under the Bodhi tree, because this might just be your ticket to Nirvana. And it's 300 pages, so you might be able to take a few of your friends along. Bring gum. It's a long ride
We live in a world of ubiquitous networked communication almost entirely dependent on internet which are profoundly woven into the stuffs of our daily lives and obstinacy of these resources seem inevitable.
Today, Google records data from more than a billion Google users, more than three billion search queries a day, more than 425 million Gmail accounts, and traffic from an estimated one million websites, including almost half of the ten thousand most visited. I wanted to know the underlying base elements behind these resourceful providers who provide us services for our social emotional fulfilment.
The book provides possible ways of algorithms that concentrates over the data and metadata that we produce by our actions in our online life. I ain't possess any expertise on data structural fields but it didn't matter when it comes to reading the book. The Author expressed the technicality situations with discernible examples occasionally.
Algorithmic patterns of internet Giants like Google, Facebook cases are provided such as the flow algorithms that considers being a celebrity, of language translation (google), individual and dividual aspects of our data surveillance, and of susceptible odds of me being an American citizen for a while and becoming a foreigner occasionally, and some that disillusions the reader in briefly explaining the problems and beliefs of online privacy with a minute of silence.
It reminds me that we also live in a world of ubiquitous surveillance, a world where these same technologies have helped spawn an impressive network of governmental, commercial, and unaffiliated infrastructures of mass observation and control and what our ultimate fate maybe in the digital front representing the 'Us'. The Book will be available by May 2, 2017 from NYU Publications.
Some invoking excerpts,
"When we are made of data, we are not ourselves in terms of atoms. Rather, we are who we are in terms of data. this book makes are not claims about algorithms at large. Rather , they are attempts to show how data about us is used to produce new versions of the world � versions that might differ greatly from their non-algorithmic counterparts"
"We are not just categorized. We are also manipulated and controlled, made homogeneous as entries in lines of a database or records list."
"If you use Facebook, short of making a new Facebook, you will be under the terms of service and thus the terms of Facebook� same with Netflix, Twitter, and any other (capitalist) web presence. We are always on unequal footing. Everything from plane prices to friends to news content to even whom we might date is determined for us on the basis of how our data is made useful. Rarely in life are we so unknowing of the knowledge that makes us, and our world, in such rarefied ways."
Last year, I read the book, on why we people want social sites. Well this book, helps us comprehend why social sites want us.
About the Author
John Cheney-Lippold is a Professor at University of Michigan. His research and teaching explore the relationship between new media, technology, identity, and the concept of privacy.
I'm thankful to Netgalley and New York University Publishing for the opportunity.
The author looks at the present and future of identity on line. Every net search we make, stored and classified, tells some databank something about us, whether characteristics are correctly assumed or not. Referencing Frank Pasquale's term Black Box Society, a book I can recommend, the author describes the complex algorithms and various purposes that store and classify data about people, as individuals or groups.
Cheny-Lippold mentions that these judgements are used to show us specifically targeted ads. Ads, some of them infected with spyware and malevolent bugs, which he doesn't mention, are the reason to use ad blockers, not mentioned. I seldom see an ad on my own computers. Privacy law in Europe is a separate issue as search engines have to remove outdated results if a customer complains. And he explains that Google, say, may assign us to categories like male / female, but does not care if we are, if we search and buy like that category. Layers of identity build up for race, age, country of residence etc. Unlike traditional role boxes however, Google's are more dynamic, shifting with trends and new data input.
Critical scholars, philosophers (one discussing Civilisation III), digital media commentators and industry experts are all quoted. Concrete examples are also shown, such as a white and a black store assistants who found that HP software could only face-track the white one. I am sure the surveillance techs are working hard on this as we write. We're told that in 2012 the Department of Labour's statistics showed the top ten Silicon Valley companies employed 6% Hispanic and 4% black workers. At executive and higher level this was 3% and 1%. (I'm wondering how many women they employed.) And crime data shows some people, whose associates experience the criminal justice system, are more at risk of dealing with crime themselves. We don't really need to be told this to understand it, but some police are already using generated patterns to knock on doors of 'at risk' citizens.
A chapter on using data is rather scarily about using what looks like terrorist-involved web or phone activity to get someone labelled as worth a drone strike. Who they are doesn't matter, it's what they are thought to be doing. And sometimes that is a false assumption. We see a little of data mining for text associations. And an amusing anecdote is that Google thinks a neuroscientist researcher who is a young woman, is actually an older man, because she spends all her time reading science articles written by older men. So I imagine they won't be advertising high heels to her. Did you know that 'Angry Birds' was in some way profiling your sexual practices? And leaking its conclusions through bad security?
The second chapter is about control; from computer games to enabling some people to access buildings and not others. Health programmes exist at the government and personal level, including self-tracking with IoT devices. I don't see the term Internet of Things used.
Subjectivity is the next topic, comparing the NSA to Google. Leakers are discussed, Assange and Snowden. We're warned about receiving mails from someone using Tor. The gender question returns. And Facebook knows or assumes a lot about you, whether you use it or not, from what others post about you. An airline or hotel knows not only if you are a returning customer, but what kind of computer you are using, and may adjust its price accordingly.
Privacy begins with the chilling case of a man whose agonised phone call to ask for an ambulance was met by an operator running through a list of possible symptoms he didn't have; the man later died. Personal privacy, we're told is something we don't really have any more. We have patient and social security records, or use a store loyalty card. Some gay people are identified as such by big data; others may be erroneously identified that way. The author suggests using a program that throws random search terms into the data stream constantly, obfuscating the real searches. (Methinks some of those fakes could get you into trouble, and can you prove they weren't typed by you?) And the Tor browser is described but some drawbacks specified.
This author is Assistant Professor of American Culture at the University of Michigan. I found the book densely written in places, suited to a university text rather than a general readership, which is why I am giving four stars, though it may be an excellent scholarly work. American-centric, discussing the abstract and experience of big data. No mention of Python, a language used to classify and interpret words from text, nor of the physical complexities and expansion of the IoT and server or storage banks. Terms like material temporality, epistemological gaps, antiessentialism, an infinitely material posthuman assemblage.
Graphs, digitally accented photos and still frames are included to demonstrate points. Notes P269 - 303. I counted 110 names that I could be sure were female, including George Eliot. Women were quoted more on personal identity and men more on counter terrorism. I downloaded an ARC from Net Galley. This is an unbiased review.
Disclosure: *Was given a copy of this book by the publisher for an honest opinion*
Wow.. how does one describe this book. The book synopsis definitely interested me but at the same time I wondered if it would be dry and bore me to death. The opposite was true! The author did a wonderful job at laying out an informational book on this subject. John Cheney-Lippold uses examples to show the reader how our advancement in technology, the masses of our personal data (perhaps collected through surveillance), change the human experience and our identity. For me, the book was very thought provoking. Although I was aware of how data collected on each of us is already affecting our lives, this book put this topic forefront in my mind. I have observed that many people either forget or dont want to know that everything they do is adding digital records to their 'life' folder. It's something I feel everyone should remember and keep in mind always. Even now, some data is being used in negative ways towards us. As each year passes, our data files grow. What will it be like in 10 years? The author uses humerous examples such as not getting a job as Santa because our digital files indicate we don't like red. This is a trivial example but the meaning behind it is huge. I definitely recommend this book to all who want to take one step closer to understanding more of how 'our data' affects each of us in our lives.
We are data delves into our surveillance society: on the internet, and increasingly in person, algorithms assign meanings to our behaviours to attempt to determine our identity. Your browsing history, purchase history, location history, expressed opinions, and interactions make "you" to algorithms, and the outcomes of feed into algorithms, often with real-decision making / decision-influencing power. This process raises many issues: Through these assessments, algorithms get to decide the meaning of these labels ('race', 'gender', etc) through processes accountable to no-one. Your 'individual privacy' is reduced to the point of oblivion. You cease being a person, with knowledge, needs and wants, and become a set of ever-shifting data characteristics with no consistency or transparency to you, yet which have consequences ranging from advertising preferences to life and death decisions in the case of automated triage systems. This state of affairs kills people.
While the book itself examines very interesting phenomena, it can be quite difficult to parse at times - references to Foucault, while eventually helpful, inevitably lead to confusion. I actually had to pick it up and try several times to get through the whole thing. This is one of those "great if you're interested, terrible if you're not" tomes. If I was to re-write it, I'd probably cut about a third of it, but then I would also admit I (likely deliberately) let some of it pass me by - I still can't explain what a "dividual" is concisely after 265 pages.
Side Note: Reading this while also playing Watch Dogs: Legion made for a more thought provoking experience, and provided some colour to the web Cheney-Lippold unthreads. It also exposed quite how surface-level and un-clever the writing of WD:L is on its own, despite the potential of the themes and the setting.
This book provides a lot of great insight into the datafied world we live in and creates a new language to help us navigate it. It's central topic is the dividual self which is generated algorithmically, and somewhat surreptitiously, via our tracked moves on electronic platforms are with electronic devices.
I don't think there's a whole lot of new information about data being collected about us, here, but the book closely examines how the interpretations of our collected data end up inadvertently creating alternative versions of our selves. Most of us nowadays are largely aware that everything we type into Google search and post on Facebook is collected - this is why we get to do all of these things for free - but what does all of this collected data mean. The author uses a lot of philosophy and social theory to help exude some meaning to the process and what it potentially means for us. Here's a paragraph from page 166 that exemplifies this:
"I propose that we see our position within this assemblaged algorithmic identity as unsuited for fixity and thus akin to gossip. Scholar San Jeong Cho writes in her work on women's subjectivity in nineteenth-century English literature that "gossip is a vehicle of making what is considered 'trifling and silly' personal matters in the discourse of official histories meaningful and significant, because gossip secularizes the universalized notion of life by representing the specific ordeal of life and immediate human frailty.""
"We Are Data" portrays all of the collected data as an alternative version of our selves - a self which is generated using 1 and 0s - which has ultimately been created outside of ourselves and we don't know exactly how. Proprietary algorithms aren't shared or reviewed with us nor do we get a chance to challenge the assumptions they make. The author treats treat this as a potential threat to privacy. On page 235 he quotes Richard W. Severson "we must learn to think of personal data as an extension of the self and treat it with the same respect we would a living individual. To do otherwise runs the risk of undermining the privacy that makes self-determination possible."
The greatest takeaway at the end of the book for me was that our dividual selves are more and more defining us as algorithms tend to gain an avowed reputation and dependence in our increasingly automated and connected world. If anything, this brings up valid concerns regarding whether we can review and validate what's being assumed about us and do we ultimately have any say on this. "You, as an individual, might not want to be profiled and manipulated without your knowledge, but your fate is tied to the rest of the population whose dividual lives become the patterns by which you are recognized." P 238
This is not one of those fashionable, fun books about big data. This is an academic text dealing with the implications of our datafied age on categorization, control, subjectivity, and privacy (the section titles) written from a strongly continental stand, adopting the terminology of Foucault, Butler, Deleuze, etc. In this respect it was an outside-the-box experience for me because I usually prefer to keep my distance from the continental school, and reading how they confront the issue of datafied selves � a topic I'm very much � gave me some usable insights along with new surprises about how radically my thinking differs from them.
Although I was able to tolerate and even benefit from the excessively continental content (ending, to my surprise, with a reappropriation of Ryle's "ghost in the machine"), I have two structural problems with the book:
1- The amount of quotations. After I noticed this problem, I datafied it to get my point across: as a rough measure of the quoted terms/phrases/sentences from other authors, I got a count for the left double quotation marks in Indesign. There are 1135 of them (the 'Notes' section excluded). This means an average of 4.5 quoted terms/phrases/sentences per page. Yes, there are even many single paragraphs crammed with 3 quoted sentences from different authors. It feels like reading an MA thesis and gets distractive and tiring after a while: a constant struggle to adapt in and out of the context and the jargon of 4 different authors every page. Although a good practice in normal circumstances, the fact that every quotation starts/ends with the full name and occupation of its author contributes to the distraction. "Okay, I get it, lots of people wrote about this, but what were we talking about again?"
2- The amount of repetitions. The author has a number of insights to offer, but they are repeated to the point of exhaustion. There are many 10-page blocks where you learn or think of nothing new because you just read the same idea formulated over and over and over again with different sentences. "Okay, I got this 25 pages ago, can we please move on?"
So this book doesn't lack research or ideas, it just lacks editing. Cutting out many quotations and repetitions without losing much, it could be a 180-page book with much better readability. (Speaking of readability, I'm also on the fence about setting a whole book with a slab-serif typeface. Was it tiring? Was it refreshing? I'm not sure.)
In short: If you love Foucault, go for it. If you can at least tolerate armies of quotations, repetitions, and the idea that everything is power relations, you may give it a shot.
While I did learn a few things about the processes involved in big data, I ultimately had to put it down without taking away what I had assumed I would gain from the read. Other reviews have mentioned some pretty heavily constructed sentences. I fully understand them and would tend to agree.
The book was on loan from a dear friend and unfortunately, I wasn't able to complete it before it was due for return.
i had to force myself to finish this book. it felt like the author was just trying to reach a word/page count. what this book says about data: we build the algorithm, then the algorithm goes off & labels us, our data is sold to keep training the algorithm that labels us. there’s no such thing as privacy & capitalism rules us. there really wasn’t anything i learned from this book that i hadn’t already learned from a Medium article or a digital ethics class i took in college.
I almost want to give this book 2 reviews simultaneously. I feel as though this book does something particularly well that works for an audience it didn't intent. And yet, it works terribly for the audiences it does seem to approach. Because of this, I averaged my review of these two things. I will begin with the good:
This book should not be read as a "theory" of data science, formal "critique", "schema" of digital selves, "surveillance" textbook, or an book on digital "identity" theory. What it does is offer a buffet, a montage, a kaleidoscope, and/or a faceted view of how other theorists think about the "digital self" in a socio-political way. It does this by generating a series of narratives that exemplify how the digital self is built, moving seamlessly from one theoretical lens to another compounding the potentials of how such a narrative could approach these problems. However it never really resolves any of these problems despite occasionally proposing ethical arguments. This book is an unorthodox textbook that arranges theories comparatively for someone who might be somewhat aware of many of these theorists. This is a book that requires to be reorganized in a piecemeal fashion. To use it, someone must be read enough to know what to borrow and why. Anyone trying to put Deleuze next to Giddens in theory of identity isn't really looking for theoretical consistency so much as critical comparison.
This brings me to the bad. If you don't have an awareness of continental traditions of philosophy or a Science and Technology Study background, a lot of these words can easily be abused or misinterpreted by an undergraduate. This book uses lots of terminology that have been adopted into critical theory and theories of the digital, however many of these terms are more simplified here to an extreme. This is to say, this book is quickly consumable, but not theoretically pedagogical. It does not teach theory, but at times it pretends to. It does not teach criticality, but sometimes it pretends to. It does not teach data science, STS, or media theory, but at times it pretends to. What it does is introduce fear and open the door to theoretical appropriation for someone who has not read far into the citations that it represents. If you give this book to an undergraduate, be prepared to tell them that the book does not do enough to teach them anything beyond "The Internet will steal you and your friends' souls and criminalize you if you don't look out" and from the people I've spoken to who've read this book, that's essentially the dominant interpretation. But with the theories that it references, that sort of fearmongering is less weighty.
There still exists a theoretical space where the Internet does not use narrative montages reminiscent of the tones of conspiracy theorists. Don't misunderstand me, this book isn't a conspiracy theory book, but you'll walk away thinking that there is a digital boogeyman everywhere that's plotting against you. It can turn unassuming readers into conspiracy theorists of data if they don't have proper guidance. I am assuming that is not the intended goal of the author, and for people who have read a little Deleuze, a little Donna Haraway, or a little STS theory, this book is rather obvious and not as scary as it might seem.
This is a timely book on a subject that is relevant to pretty much everybody in today’s digital world. The fact that everything we do online is reduced to a set of data points that may not reflect our real selves and may change on a daily basis and be widely divergent according to each user’s specific objectives is important for people to understand. The fact that one user may categorize me as a young single female with a propensity for fashion while another user may see me as a middle aged married man with anti-government tendencies is irrelevant. The data provides the "truth" to enable the user (a marketer, a government entity etc.) to take an action relevant to its objectives.
It is therefore unfortunate that the author makes the book less accessible (or understandable) to many of the people who would benefit from a greater understanding of the subject. For example, he writes: “More comprehensively, an algorithmic gender’s corrupt univocality substitutes for the reflexive interplay implicit in gender’s social constructionism.� That sentence may be true - but it is hardly a page-turner!
Of course, like sausage making, the data aggregation and analysis process is messy and ugly, but if the sausage tastes good and the algorithm results in a welcome product offering or mitigates a potential loss then what is the harm. People willingly hand over their personal information and while this information may be anonymized, aggregated and analyzed, most people do not give a second thought as to what happens afterwards. Unless, that is, there is a data breach at a credit bureau or other data aggregator or some other adverse event occurs. Perhaps worse, the data may be used to make potentially damaging predictions about behavior (predictive policing).
Perhaps the message of this book is that there needs to be a Dummies Guide to Data Usage in the Connected World (maybe it already exists). This book may serve academia but there needs to be a text for the rest of us.
We are Data is a very readable (for the field) sociological text about the proscriptive powers of data-fied understandings of humanity. His work can be viewed as an extension of Foucault's work on biopolitics.
The book abley describes the way in which 'traits' (distinct from actual traits in that they are probabilistic and impossible to pin down) are assembled and related to us. We all have a 'gender,' which is to say there is a statistical definition of 'man' and 'woman' (and perhaps others), but none of these quoted categories have a real world meaning. Not all men are 'men' - some are 'women', some aren't either. The nature of these categories does not support connecting them directly to the real world, because they do not depend on real-world definitions.
However, once getting that far, Cheney-Lippold loses his way. What are we to think about this? It's bad, of course, as these labels are applied to us and we can't access them or understand them. But...it's not clear how it's worse (or even that different) than the traditional management techniques of Biopolitics. Both use categories that are more based on the needs of the managers than lived experience. The question is how does this manifest in the real world?
Late in the book, he gives an account of how a british man died because he didn't appear like he was dying to the health service. This is probably death-by-algorithm, but it's not death by data, any more than policies of minimal treatment in the US are death by data. Cheney-Lippold can't seem to connect these unfairly-manipulated bits of data "out there" to our subjective worlds. His examples of harm are from good old-fashioned intentional refusal to see. It's not clear that the fact that the powers that be are now running around making definitions and manipulating those definitions to their hearts' content changes facts on the ground.
The book seems designed to raise our awareness about how data collected about us gives us an identity that can be used by corporations and that may not correspond to our offline identity—but in fact the two cannot be very well separated. In turn our data and traffic helps shape the identity categories online. I want sure whether the author was nostalgic or idealistic about other more traditional forms of identity, or if simply concerned that these could be used for political action whereas online data-driven identities constructed by corporations can be used primarily for commerce and surveillance (and maybe drone strikes?), and not by us. I think the book needed a clearer sense of stakes and more of a sense of what collective, as opposed to solely individual, action might be taken. It needed fewer quotes from scholars introduced by epithets with Homeric regularity and more concrete examples. However, as someone brought up in our reading group, it is hard to write about something you have no access to (here, proprietary algorithms).
The fact that this book is super heavy on the reading part does not diminish the value of the brilliant research that birthed it. I didn’t exactly learn anyting new as my grad school training had a great overview of Cheney-Lippold’s work, but it is always challenging and exciting to follow his well formed ideas poppin up and tying off loose ends in later chapters well after an idea was introduced. I guess I empathise the frustration of many reviews that fail to see the brilliance due to the academic writing filter. Nevertheless, it is an acquired taste and I, for one, enjoyed the struggle
Not terrible, but disappointing. Privacy and gender, as concepts, do a lot of heavy lifting they maybe shouldn’t, and the chapters aren’t clearly differentiated enough. Jumps between clear writing and heavy jargon use, and often appears to be a survey more than an argument. Mostly (but not entirely) ignores other writers working on these issues now in favour of philosophers who wrote about adjacent issues decades ago.
Finally i finished this book and I am so glaaaad. Mindblowing book and I think I should read more book from John Cheney-Lippold. It would be quick review. I mean, it is indeed really really quick. There are three main key words about this book: mass surveillance, data, and dividual privacy. And you need to highlight what does dividual privacy mean.
Recommend those who are interested in learning about digital twins and the measurable type to explore how data is instrumentalized as power. This book should be on every reading list focused on critical data studies.
A bit confused who the intended audience is. Suffused with convoluted academic jargon at times and really basic observations at others. Read most of it for comps!
I’d like to preface this by saying my grandpa told me to read this book. For the actual review: the concepts presented are marvelous. John Cheney-Lippold shares fascinating tidbits into what truly makes up our identity in this digital age. However, and I find myself unsurprised saying this (should’ve known from the cover!), this book is unbearable. Cheney-Lippold went to thesaurus.com and searched for the longest, most perplexing synonyms for each and every descriptive word in this three-hundred-page ardor. This is certainly an academic text, in every sense of the word. In fact, it comes across as more of a dissertation than a novel. I have no doubt that this is a fantastic read for a computer science PhD student or a software engineer, but as for the average reader (or even moderately intelligent reader, this is too much. Superscripts appear everywhere, far too many quotations, endless introductions, and painful mansplaining plagues what should be an interesting novel. It’s a little delusional, and impossible to get through . A thirty-page introduction, seriously? A great idea with pathetic (though clearly painstaking) execution- not for me. I was disappointed. Compsci fans and research facilitators, however, I hope you enjoy!
Still intrigued? Let me give you an example. “Drawing further, the ‘flu� is this cybernetic m home in epistemically form, a discursive vessel whose defining composition changes over time on the basis of new inputs.� And that was one of the more simple sentences. Try your luck if you must, but unless you plan on citing this as a source, I don’t recommend.