Name: Fundamentals of Data Engineering: Plan and Build Robust Data Systems
Rating: 4.2 (81 reviews)
ISBN: 9781098108304

Rate this book

Fundamentals of Data Engineering: Plan and Build Robust Data Systems

Joe Reis, Matt Housley

Rate this book

Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available in the framework of the data engineering lifecycle.

Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, governance, and deployment that are critical in any data environment regardless of the underlying technology.

This book will help you:

- Assess data engineering problems using an end-to-end data framework of best practices
- Cut through marketing hype when choosing data technologies, architecture, and processes
- Use the data engineering lifecycle to design and build a robust architecture
- Incorporate data governance and security across the data engineering lifecycle

GenresProgrammingTechnologyTechnicalComputer ScienceNonfictionSoftwareEngineering

426 pages, Paperback

First published June 25, 2022

756 people are currently reading

2,073 people want to read

About the author

Joe Reis

7��books12��followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

340 (44%)

4 stars

268 (35%)

3 stars

128 (16%)

2 stars

18 (2%)

1 star

8 (1%)

Displaying 1 - 30 of 81 reviews

Francois D’Agostini

61 reviews11 followers

July 21, 2022

I have mixed feelings about this book. At some places, it structures the subject of data engineering, providing high level information. But at the same time, I felt it repeated a lot its subjects, and never really got into more detailed architectures or information. I get that talking about implementation will yield a much bigger book. But at the same time, keeping a high level stance does not require such a long book.

As I was reading into the book, I felt less engaged as I felt a lot of advice were not actionable enough.

Maybe that is just me, but I feel my learning was limited with this book.
But good writing and overall good advices

Leo

300 reviews25 followers

March 3, 2023

It’s surprisingly shallow for a book of that size, so I definitely can’t recommend it to engineers that already have some experience of working with data pipelines. It might be somewhat useful for data scientists though, but I can’t speak about that audience.

Sometimes people read “tech�� books to learn something new, sometimes - to better structure prior knowledge, especially if they have some experience, but lack the wider perspective. While reading this one I was switching between “I already know that�� and “I have no idea was that means��, actually missing the educational aspect.

Book stays on such a high level of abstraction, that it barely says anything at all. I see that in rapidly evolving world of data engineering showing examples with some tools or frameworks would make a book dated very quickly, but in that case absence of examples make any advice very hard to apply.

Second part is a huge improvement over the first one, chapters on storage and data structure were quite useful. I think this book could benefit from a significant cut of part 1, but expanding and deepening part 2.

Walter Ullon

318 reviews153 followers

November 28, 2022

As a data scientist who is increasingly having to dip his toes into the Data Engineering waters, this book has been an invaluable resource. It is very clearly laid out following the prototypical "Data Engineering Lifecycle" and its undercurrents.

Ingestion, storage, transformation, serving, cloud economics, data models, schema, security, etc... are all given due attention. Best practices and potential pitfalls are discussed in every chapter.

I was deeply unaware of the many intricacies of DE, and though this book didn't make me an expert on the subject, it at least gave me enough exposure to be able to adequately communicate with stakeholders and understand technical bottlenecks.

Highly recommended!

data-science

Ali

369 reviews

September 10, 2024

great survey of data engineering concepts and broad design considerations. “Fundamentals�� in the title is the key word for this text, as after four chapters of foundations and building blocks, the engineering in depth sections still don’t go in too deep but keep exploring current landscape for each stage of the data lifecycle. Similarly it doesn’t get into modeling or algorithms, or specific tech stacks in the data pipeline. For me it was a good complement to Data Science for Business by Provost. For beginners this may be very verbose so maybe read only the parts you need. For seasoned engineers Kleppman’s Designing Data Intensive Apps is a much better reference.

Lorenzo Boccalon

98 reviews1 follower

August 11, 2022

This book covers data engineering from a high level: its lifecycle, undercurrents, technologies, and best practices.
With some effort, I've read the book front-to-back in the shortest time possible. As a newbie DE coming from DS, I empathised with the authors, who had a similar background. The book is not perfect, but perfect is the enemy of the good, and this book is "just good enough." The authors' premise was to survey the DE world with an agnostic approach, which was indeed respected. They present three phases of the data stack; past, present, and (opinionated) future:
- Old data stack: in-house datacenters with on-premises technologies. Characterised by high upfront investment and relegated to big companies.
- Modern data stack: democratised, cloud-based, and favourites the use of open source software (possibly managed)—discrete view of the data, batch-based.
- Live data stack: the natural evolution of MDS, focused on streaming analytics and near-real-time response—continuous view of the data. The authors also forecast a more "enterprisey" role for data engineers, with more diversification of skills (ML, Ops, etc.)

Ricardo Castro

8 reviews

November 27, 2024

This is an interesting book for engineers who have none or a bit of knowledge on the topic. It will cover the fundamentals and give you a good base to explore the field further.

That said, the book is a bit too repetitive. Some sections could be simplified and page count trimmed. Despite understanding that some topics are revisited under different contexts, there's a lot of repetition in the book.

Erika RS

829 reviews254 followers

October 10, 2022

This is a dense book. Even the detailed table of contents in my personal notes is multiple pages long. However, that density is well structured. This is a technical reference book, so while you can read it end-to-end (like I did), you can also dive in and out, following your interests. It does feel a bit repetitive reading it end-to-end, but given the nature of the book, that's a reasonable trade-off. I really enjoyed this book, although sometimes I found that it spent more time explaining software engineering concepts (which I am familiar with from my career) and not quite enough explaining some concepts from the analytical side (which I am less familiar with).

In terms of the content, this book provides a deep dive into the key concepts Data Engineering. It aims to be technology neutral so that it stays useful even as tools change. While I am sure that it will eventually become obsolete as paradigms change, I suspect that this book has many years ahead of it. When specific technologies are mentioned, they are generally used as illustrative examples rather than being the key focus of the discussion.

Part I of this book is where you learn about the basic building blocks of Data Engineering. Chapter 1 defines data engineering and describes the different types of data engineering roles.

Chapter 2 discusses the data engineering lifecycle and some key undercurrents. The concepts in this chapter provide the structure for the rest of the book. The data engineering lifecycle goes through generation from source systems, storage, ingestion, transformation, and serving data. Undercurrents include security, data management, DataOps, data architecture, orchestration, and software engineering.

Chapter 3 discusses key concepts in defining data architectures including principles and major architectural concepts. The principles matched well to those I have brought to software development generally including plan for failure, architect for scalability, build loosely coupled systems, make reversible decisions.

Chapter 4 discusses areas to consider when choosing technologies. The authors covered technical concerns (e.g., serverless vs servers), human concerns (e.g., team size and capabilities), and business concerns (e.g., cost, time to market).

After setting this detailed foundation, the rest of the book dives into depth on these topics, structuring the discussion around the data engineering lifecycle stages. These chapters had a lot of information, and I know that I won't retain it all. Instead, I look at my initial read as a way to prepopulate an index in my mind for when I need to use the concepts later. In facts, I've already referenced this book a couple of times since finishing.

Overall, this book is an excellent �� albeit dense �� guide to data engineering suitable for experienced software engineers, data scientists, or analysts with a technical background who want to understand data engineering and for practicing data engineers who want to bring more structure to their knowledge.

oreilly technology

Tu Pham

22 reviews2 followers

December 15, 2022

Highly recommended.
This book gives a high level (sometimes very detailed) overview of how every component of data engineering works: from data sources, storage to ingestion, transformation, and data serving.
It helped me a lot in structuring all my hands-on, incomplete and messy knowledge.

Kiryll Kadyrka

107 reviews7 followers

August 28, 2023

The book is perfect for someone who needs to become more familiar with the concept. What's important is that it barely mentions any tools and focuses on the foundational parts, the logic behind data engineering pipelines, its use cases, and applications. I just missed a deeper dive and more advanced scenarios. But as a start or a resource to bring your knowledge in order –��it's unbelievably good.

eng-data

Mark Torres

16 reviews

December 24, 2024

Great introduction to data engineering, giving a practical overview of not only high-level data engineering concepts, but also how they relate to each other and to the lived day-to-day work of a data engineer. Also does a great job of embedding the role of a data engineer within enterprise and reviewing how the concepts that data engineers care about relate to the needs of stakeholders. That was probably the best part of this book, as most technical books treat technicians in a silo, whereas this book reviews how the work of a data engineer fits into wider organizational goals. Overall a great book that every data engineer should read.

Bojan Vujanic

8 reviews

December 4, 2022

Though the book is around fundamentals and therefore does not go in depth on any of the topics I steel feel like it should have touch based more so in certain areas, there are places where a reader could potentially think that he's grasping something new when in fact the surface has barely been scratched, that being said the book really does explain modern "high level", "enterprisey" data engineering really well.

Rafael Manzoni

33 reviews3 followers

October 7, 2023

This book handle in an broader way all aspect of data lifecycle like: creating, storing, ingesting, transforming and serving.

But this book fails to go deeper in each topic and address trade offs.

For those who wants understand how to handle data in conple systems. I recommend the book Design Data Intensive Applications.

Thang

98 reviews13 followers

January 21, 2024

This book covers basic Data Engineer concepts, from ingestion, and transformation to data serving.
- Touch different techniques for Data Modeling, including Inmon, Kimball, and Data Vault.
- Addresses the data pipeline's future, from batch to real-time live data stack.

Martina Sušilović

13 reviews

May 28, 2024

Gives a really high overview of the field. Seems unnecessarily long and very repetitive, no deep dives into technology. Still, a good overview for someone just getting into the field, especially from data science.

Andreas Beger

17 reviews1 follower

October 4, 2023

A high-level, conceptual overview of data engineering. Coming from another data profession, this book was useful for finding specific areas and topics to dig further into.

Giulio Ciacchini

344 reviews10 followers

March 3, 2023

As the title suggests this textbook is a comprehensive introduction to the realm of data enginnering.
It is codeless, or better tool-agnostic, which can be a pro or a con depending on the reader's background.
The authors are well aware of the fast pace at which the tech world is evolving and by choosing to give a high overview of the subject, they're trying to avoid obsolescence.
They explain the core concepts more theoretically than practically.

The text is at times a bit redundant and verbose: some concepts are explained multiple times.
I've particularly appreciated the low-level storage explanation and the architecture section.

NOTES:
Business responsibilities: know how to communicate with the non-technical and technical people; understand how to scope and gather business and product requirements; understand the cultural foundation of agile, DevOps and DataOps; Control costs.

Data engineering life-cycle: generation, ingestion, transformation, serving, analytics, machine learning.

Major undercurrents: security; data management; data, governance; Data architecture; orchestration; soft and engineering.

Principles of good data architecture: choose the common components wisely; plan for failure; architect for scalability, elasticity, availability, reliability; build loosely coupled systems; prioritize security.

A Domain is the real word, the subject area for which you are architecting.
A service is a set of functionalities, whose goal is to accomplish a task.
A domain can contain multiple services.

Architecture tiers
- Single tier the DB and application are tightly coupled residing on a single server (cloud or even your laptop). If a component fails, the architecture fails. Not advisable to run analytics queries against the PRD application DB, because there is a risk of overwhelming the DB and causing the application to become unavailable.
- Multitier is composed of separately layers for data, application, business logic. The layers are bottom-up and hierarchical. Three tiers: data, application logic, presentation.
- Monolith includes as much as possible under one roof with both technical and domain coupling. This implies a lack of modularity of its components, reusing components across the architecture is very difficult.
- MicroServices comprise separate, decentralized and loosely coupled services. Each service has a specific function and is decoupled from other services operating within its domain.

Data Warehouse: organizational or technical. Traditionally it pulls data from application system by using ETL. Data Marts are more refine subset of DWH designed to serve analytics and reporting.

Data Lake: instead of imposing tight structural limitations on data, simply dump the data, both structured and unstructured into a central location, where storage and compute power are uncoupled. Downside: very complex to manage Hadoop.

Server VS Serverless: run your own server when the usage and cost exceed the ongoing cost of running and maintaining a server.

Data Generation in Source Systems
Data is an unorganized, context-less collection of facts and figures, it exist in many places: Files, APIs (REST, GraphQL), application DBs (OLTP), OLAP, Logs, CRUD (create, read, update, delete), Messages/Streams (queues, delivery frequency, scalability, topics) , NoSQL (key-values, documents, wide-column, graph DBs, search, timeseries).

Storage
Data storage and query systems generally follow complex recipes involving distributed the systems, no Meadows services, and multiple hardware is storage layers. These systems require the right raw ingredients to function correctly.

Raw Ingredients:
- Magnetic Disk Drive utilize spinning platters coated with ferromagnetic film, magnetized by a read/write head during write operations to physically encode binary data. The read/write head detects magnetic field and outputs a bit-stream during read operations. The rate at which data can be read and written, does not to scale in proportion with disk capacity, but they are cheap.
- Solid-State Drives (SSDs) store data as charges in flash memory cells, hence eliminate the mechanical components of magnetic drives. They are quick to search, but expensive.
- Random Access Memory (RAM) is attached to CPU, is volatile and expensive.
- Serialization is the process of flattening and packing the data into a standard format that a reader will be able to decode, a format suitable for storage on disk or transmission over a network
- Compression makes data smaller and increase network performance.
- Caching stores frequently or recently accessed data in a fast access layer, we can think of archival storage as a reverse cache.

Storage Systems:
- Distributed Storage coordinate the activities of multiple servers to store, retrieve it and process data faster, and at a larger scale, all while providing redundancy in case the server becomes unavailable.
- Eventual consistency allows to retrieve data quickly without verifying that you have the latest version across all the nodes; Strong consistency ensures that writes to any note are first distributed with a consensus, and that any reads against the database return consistent values. For instance, DynamoDB supports eventually consistent reads and strongly consistent reads.
- HDFS is based on Google file system, initially engineered to process data with the MapReduce programming model. It combines compute and storage on the same nodes.
- Indexes, partitions, clustering.

Storage Abstractions:
- DWH is a standard OLAP data architecture.
- Data Lake conceived as a massive store where data was retained in a raw, unprocessed form; recently separation of compute and storage.
- Data Lakehouse adds features to designed to streamline data management, such as schema support.
- Data platforms are ecosystems of interoperable tools with tight integration.
- Data catalog is a centralized metadata store to provide a central place to look for data.
- Colocation of compute and storage allows fast, low latency disk reads and high bandwidth.
- Cloud objects storis allows ephemerality, scalability, durability and availability.

Storage Lifecycle:
- Hot data has instant or frequent access requirements, SSD is suited for fast access and reads.
- Warm data is accessed regularly, like monthly.
- Cold storage is popular for archiving data.
- Data retention based on value, time, compliance, cost.

Row based Serialization: CSV (error-prone, poor performance); XML (slow to ser-des); JSON (standard for exchange data over APIs) and JSONL

Columnar Serialization: store each column into its own set of files, it allows to read data from only a subset of fields rather than having to read full rows at once, but we cannot easily access individual data records and record updates are also challenging. Parquet is perfect for data lakes.

Ingestion
Data Ingestion is the process of moving data from one place to another, namely from source systems into storage.

Bounded (bucketing data) VS Unbounded (ongoing and flowing data).
Frequency: batch, micro batch, real-time.
Synchronous VS Asynchronous ingestion.
Push (source system sending data to a target) VS Pull (target reading data from a source) VS Poll patterns (periodically checking a data source for any changes).
Batch Ingestion (snapshot, delta, ETL, ELT, Migration) VS Message and Stream Ingestion (schema, replay, time to live, message size, error handling).

Direct DB connection: ODBC uses a driver hosted by a client accessing the database to translate commands issued to the standard ODBC API into commands issued to the database; JDBC a Java driver connects to a remote database and serves as a translation layer between the standard JDBC API, and the native network, interface of the target database, it is popular because JVM is standard, portable across hardware architectures. Both connections struggle with nested data and they send data as rows.

Change Data Capture is the process of ingesting changes from a source database system.

Message Queues and Event-Streaming Platforms

Secure Shell Protocol, SFTP, SCP, Webhooks

Transformation
A query allows to retrieve and act on data: the SQL in parsed and converted into bytecode.

Strict modeling is less popular because they were developed, when data warehouses were very expensive, on premises and heavily resource constrained with tightly, coupled compute and storage.
The popularity of the cloud means that the storage is very cheap, cheaper than finding the optimum way to represent the data in storage. The popularity of nested data, like JSONs means to schemas are flexible in sources and analytical systems.
A wide table is a highly denormalised collection of many fields, typically created in a columnar database, facts and dimensions are represented in the same table. They are usually sparse. This what is extremely expensive in a traditional relational database because it allocates a fixed amount of space for each field entry, but nulls take up the virtually no space in a columnar DB. Analytics queries on wide tables often run faster, than equivalent queries on highly normalized data requiring many joins.

Batch Transformations:
- ETL shortcomings: extract phase was a bottleneck and the transformations were handled in a dedicated system because the target system was extremely resource-constrained in both storage and CPU capacity.
- ELT does not transform data at the time it is loaded, the transformations happen later.
In the data lake the distinction of the two has become blurry. [Reverse ETL]
- Update, insert only, delete, upsert/merge, data wrangling.
- MapReduce job is a collection of map tasks that read individual data blocks scattered across the nodes, followed by a shuffle that redistributes result data across the cluster and a reduce step that aggregates data on each node: the table data is spread across nodes in data blocks. It aggregates (reduce) to gather results from the full cluster, redistribute results by key so that each key ends up on one and only one node.
- Views are DB objects that reference other tables.
- Materialized views do some or all of the view computation in advance.
- Data virtualization entails data processing and query system that doesn’t store data internally.

coding non-fiction

Henry Gravity

1 review

December 11, 2023

Sehr guter Start ins Thema als relativer Beginner!

Paul Hallaste

1 review

August 6, 2022

Great book. It contains a lot of practical information about data engineering best practices as well as detailed explanations of core concepts in data engineering. The explanations are approachable, building up concepts from first principles and giving a thorough overview of each part of the data engineering lifecycle. With finesse, the writers alternate between explaining how things work and how they fit into the larger picture.

Especially enjoyed the last chapter, where the newest trends on the different parts of the data lifecycle get synthesized into predictions about how the industry as a whole is moving. Will be interesting to look back on this five years from now and onward, especially with regard to whether more wild prognostications like the live data stack will become mainstream.

Samuel

49 reviews6 followers

January 4, 2023

Chapter 8 on Queries, Modeling and Transformation, was for me the most interesting chapter (so far), and is kind of the meat of the book.

In fact, it is quite much the missing chapter of Kleppmann's "Designing Data-Intensive Applications", that provides deep and useful info on how to handle the actual structure of the data. An area where I found Kleppmann's book was sorely lacking.

Philip Anderson

38 reviews21 followers

June 13, 2023

This was a bit disappointing for me. It’s a good very high-level survey of data engineering topics, and would be useful if you know next to nothing about the field. It is very shallow though, and doesn’t go into any technical depth at all. There was also a fair amount of repetition, as if chapters were written somewhat independently of each other.

36 reviews6 followers

August 22, 2023

GOOD:
Book is a good intro and high-level overview of the discipline.
It has great collection of links at the end of each chapter.

BAD:
What I missed is in-depth info on the data systems design and architecture (i.e. detailed description of Data Engineer's knowledge - how does he design?). There is just a high-level advice.
Links in chapters may compensate that - to some extent...

software-design-architecture

Kris

51 reviews2 followers

March 25, 2023

This book seems to have a very good and clear structure but lack of good storytelling and practical examples. It goes deep in the unnecessary parts and doesn't for the useful parts.

data self-help

Max Fudim

11 reviews1 follower

January 1, 2025

for.folks who have experience with data eng. Also.can be used by good intro for someone who is new to the domain

Vincent V

290 reviews

September 1, 2024

Great overview of this field: A field that it important, but that can be simplified into lesser amount of tools to master due to constant change of the field. This in return warrants going back to the basics i.e. : Git, SQL databases, NoSQL databases, probability, statistics & from these getting grounding knowledge. Otherwise one is doomed to get their favourite tool: Snowflake, Azure Synapes, Google BigQuery replaced within one or two years... Also this approach opens up doors for other fields at the same time, thus increasing optionality for further topics not covered under the umbrella term: "data engineer", I for once would never want to exclusively master this field, only to know couple of tools that matter. A mastery of this field will not make many opportunities, but the ability to train yourself fast into the topic will give an endless opportunities. Thus a more grounded approach, where topics early mentioned is expedient.

A question for filtering out bad topics to learn: "Is there a marketplace for this knowledge? Is there evolving systems for this knowledge?" If it's a monopoly knowledge, then learn it, otherwise get ready to have your knowledge replaced by the next system. This monopoly knowledge gets good value, i.e. it stays put. SQL => 40 years old, Git => Industry Standard... Master these, everything else remains less salient for any uses. Azure Synapse => BAM gone in 2 years(maybe not as MS is a strong system builder that can compete against other system makers, thus retaining your knowledge across time in more high value setting)...

Aditya Gupta

13 reviews

March 3, 2024

The book sets the expectation very clearly with its name: "Fundamentals" of Data Engineering.

It gives a comprehensive understanding of the data engineering lifecycle, discussing each stage with great detail. From data generation to ingestion, transformation, and serving, we get to know about each step along with key considerations, important concepts, stakeholders, and the role of different data undercurrents (like Data architecture, DataOps, Orchestration, Security, etc) in each of the stages.

If you are someone who has started working in an entry-level data engineering job, this book can give you a complete perspective on the data engineering domain and expand your understanding.

If you are an experienced data engineer, you would probably know about most of the concepts in this book. But this book can help you organize your understanding of the concepts, and maybe you find something new.

If you are someone preparing for your first data engineering job, I won't recommend this book. (Certainly any book for that matter). You will be good with making projects and "learn by doing".

tech

Ahmad hosseini

313 reviews69 followers

November 23, 2024

Who Should Read This Book
Audience for this book consists of technical practitioners, mid to senior-level software engineers, data scientists, or analysts interested in moving into data engineering; or data engineers working in the guts of specific technologies, but wanting to develop a more comprehensive perspective.
Review
This book provides a great foundational overview of various architectures, approaches, methodologies, and patterns that anyone working with data needs to be aware of. But what is even more valuable is that this book is full of golden nuggets of wisdom, best-practice advice, and things to consider when making decisions related to data engineering. This book isn’t about data engineering using a particular tool, technology, or platform. Instead, Authors focus on the fundamental concepts behind data engineering.
You’ll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology.

data programming software-engineering

Tobias Witter

3 reviews

December 17, 2023

The book is a introduction to data engineering emphasizing the data engineering lifecycle. I like the general notion of putting structure on data engineering. The authors provide anecdotes and examples on each topic, which are interesting to read but oftentimes highly specific adding little value to the reader's learning. While the book is an interesting read, it is not precise enough to me on the topics it covers on the 'data engineering lifecycle'. I would wish for more structure within each chapter. Some parts read as if they are a collection of rather general facts on a topic and anecdotes of the authors. I think there is room for improvement. If the book were to come with more domain knowledge providing a quick start on each topic of the lifecycle, it could make this book a go-to-guide for data engineering. In the current version, it is rather a nice read as a refresher or light intro to the topic.

Nader Kahwaji

4 reviews

January 3, 2025

The book serves as a nice introduction to data engineering, offering a unique perspective on the stages of the lifecycle while providing historical context and insights into the underlying undercurrents such software engineering, dataops, security and orchestration etc.

Readers should not expect extensive elaboration on architectural or implementation details, as the primary focus of the book is to establish foundational axioms of data engineering, which can serve as a basis for exploring more advanced concepts.

One area for improvement in future editions would be the inclusion of more concrete business use cases and examples. Some concepts may appear overly abstract without such context, and comprehensive examples spanning all stages of the data lifecycle would greatly enhance the book’s practical applicability.

tech

Timothy

79 reviews

April 8, 2023

The authors outstandingly accomplish their goal here.

Data engineering is covered in all its aspects: history, philosophy, and current/previous best practices.

My favorite part of this book is that everything is framed through the lens of business. Data engineering exists to provide business value and the authors here keep reminding of this and urging not to under or over engineer data products.

The pragmatic tone really serves to assist in critical thinking about data projects and identifying the best ways to approach them. Perfection, or even creating a good data product can be hard but this book gives you the tools to approach projects with the right mindset.

Brett Vogelsang

8 reviews

April 29, 2023

If you are looking for a low-level Data Engineering(DE) book that explains the nitty gritty details of every DE tool out there, you’ve come to the wrong place. This book is a high-level overview of the major aspects of DE. It truly is the “Fundamentals�� of DE, no more, no less.
I applaud the authors for undertaking such a large project during a time when it feels like there are new DE tools releasing every day. They traversed through the muddy DE waters and found the fundamental aspects of the job that will *hopefully* be around for at least a few more years. Overall, this is a great book for anyone looking to solidify their knowledge in DE. Highly recommend picking up a copy.

Displaying 1 - 30 of 81 reviews

More reviews and ratings