Name: Mining of Massive Datasets
Rating: 4.35 (19 reviews)
ISBN: 9781107015357

Rate this book

Mining of Massive Datasets

Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman

Rate this book

The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.

GenresComputer ScienceProgrammingTechnicalScienceTextbooksAcademicArtificial Intelligence

326 pages, Hardcover

First published October 27, 2011

91 people are currently reading

1,048 people want to read

About the author

Jure Leskovec

1��book2��followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

120 (48%)

4 stars

97 (39%)

3 stars

22 (8%)

2 stars

5 (2%)

1 star

1 (<1%)

Displaying 1 - 19 of 19 reviews

Ben Haley

58 reviews16 followers

October 31, 2011

The mining of massive datasets a clear, practical, and studied exploration of how to extract meaning from huge datasets (Terabytes, Exabytes, Petabytes oh my). I recommend the .

The book uses practical examples including spam email, google's page rank, and netflix's recommendation service to explore the algorithms necessary to process huge data on infrastructures like map reduce.

The authors have the necessary experience to define the field. Ullman is the powerhouse behind several venerable CS textbooks, the '' (Compilers: Principles, Techniques, and Tools), and Database Systems: The Complete Book. He also advised Sergei Brin before that student went on to co-found his own small startup. The first author Anand Rajaraman is another one of Ullman's students who's had his own helping Amazon get its wings.

The book is not a statistical exploration, but a true computer science book. The statistics they do employ are simple. They avoid the mathematical rigour of validating their statistical approaches and take a more intuitive approach, employing the simples statistical models they can. But calculating even simple statistics can be complex when data is distributed across hundreds of computers.

For example google's page rank algorithm has to multiply together matrices that represents the entire web's link structure. If you'd like to know how that kind of work is done, this book is for you. However, the text is not a how-to focused on implementation details. This means it will age gracefully, but will require supplementary reading before you can analyze huge data on your own.

The most profound moment of this book came to me while reading about the . Without going into detail, the bloom filter is capable of filtering a huge list of incoming items and answering the simple question 'have I seen this before'. But the bloom filter is optimized so that it doesn't remember what it has seen, just that it has seen it. Therefore, the bloom filter can recognize that an object is familiar without the ability to pinpoint when it first saw it.

This algorithm produces a remarkably human result. I know the experience of recognizing that an item is familiar without being able to figure out where I've seen it before. It seems our minds have developed something very similar to the bloom filter to mine the data of our experience.

☘Misericordia�� ⚡ϟ⚡⛈⚡�� ❇️❤❣

2,519 reviews19.2k followers

September 29, 2020

Lots of insight into assorted subjects: networks, algorythms, matrix operations for CS, machince learning and advertising (of all things). Some history on data dredging and its development. Quite a lot of data modeling made simple and easy to understand.
Takeouts:
- Pagerank
- Apriori
- MapReduce
- hashing
- graphs
- simrank
- CineMatch
- CUR-decomp
Easy material delovery and quite a lot of breadth in topics selection and coverage. The only minus I see is that there is not as much practical tasks as I like. That's easily outweighed by the easy explanations of all kinds of unwieldier theoretical concepts.

Natalia Shakhalova

5 reviews

November 30, 2014

This is a text book for Mining of Massive Datasets course at Stanford. Was very helpful when taking this course at Coursera. It describes different aspects of the domain and the theory behind existing solutions (search engines, networks analysis, recommender systems, online algorithms). It keeps a good balance of strict mathematical theory with all the proofs and references to its practical applications in modern systems. Wide variety of algorithms and ideas for applications in different domains. Not boring at all, I recommend it.

Mehrnaz

180 reviews90 followers

January 16, 2020

Great book.

nonfiction

Jami

195 reviews35 followers

Want to read

March 28, 2013

free own

Wojtekwalczak

16 reviews2 followers

May 28, 2017

Sooner or later you're going to discover problems too big to solve with most traditional approaches. The authors show a wide range of problems where size can get out of control: finding similar items, working with data streams and graphs and many more. I loved the Coursera course, but I found the book a bit too dry. Many of these problems can be well-illustrated but the book lags in terms of visual layer. Still, it is a great resource written in rather accessible manner. It could have been more entertaining, though.

machine-learning

Shane

97 reviews2 followers

August 6, 2013

This is more what I was looking for with the other "Big Data" book I read.
Although, this is quite a bit over my head, and more positioned at college study.

I expect this is something I will reference back to later. I read the "free" pdf version, but I'd like to have a copy of the updated version when it becomes available.

business

Yasiru

197 reviews138 followers

April 10, 2015

There's an up-to-date free version at and a full-fledged Stanford MOOC on Coursera. I took the course initially without much reference to the text, but while the lectures were excellent (and the whole course one of the best I've taken on the platform) I wish I'd had more time to go through the book first.

Kjn

19 reviews4 followers

December 20, 2014

I got a lot from this and that was surprising as I had already read some books here. The nicest part is the Locally Sensitive Hashing.

This is just very good quality. Quite a bit of ideas you can use in your practise.

Victor

72 reviews9 followers

September 14, 2015

I skimmed this book to decide whether to enroll on the Stanford course with the same name, definitely I will enroll on the next available session, very interesting stuff about squeezing information from big data sets

a-i computer-science

Akash Goel

164 reviews13 followers

November 18, 2015

This book is definitely a great companion of the Coursera MMDS class. But it lacks a few things, such as proper introductions and a natural information flow. Good for quick reference and examples, not too great to study or understand in depth.

Bryan

670 reviews24 followers

April 27, 2015

It is a text book. It is a good book for the topic. Not much to say here.

Delhi Irc

992 reviews24 followers

Read

October 14, 2016

Location: GG5 IRC
Accession No: DL028520

new-arrival-14-oct-2016

Nick Greenquist

121 reviews3 followers

June 18, 2018

one of the best books you can read in the realm of data mining, machine learning, and generally doing really cool things with piles of data

Stein Karlsen

67 reviews1 follower

July 7, 2018

Contains lots of algorithms and formulas explaining the various algorithms

machine-learning

Mickaël A

137 reviews7 followers

December 12, 2018

A great way to learn on many roots of modern software engineering. I missed sometimes more figures and visual explanations, but it's an excellent book overall.

science

Cem

48 reviews

December 5, 2019

A really good textbook for the "Foundations and Applications of Data Mining" INF 553 class.

Yk Chia

75 reviews1 follower

February 6, 2021

Read the part on recommendations. Combined it with the online lecture. good for theoretical, not suitable if you are in dire need to implement the code

Varun Reddy

17 reviews

April 4, 2019

Brilliant introduction to Data Mining and it's real world usages and scope. The chapters on Frequent Itemsets and Mining Social graphs are explained brilliantly.

Highly recommend this book for anybody interested in Data Mining.

academic

Displaying 1 - 19 of 19 reviews