The Emerging Architectures for Modern Data Infrastructure

Metadata

Author: a16z.com
Full Title: The Emerging Architectures for Modern Data Infrastructure
Category: #articles
URL: https://a16z.com/2020/10/15/the-emerging-architectures-for-modern-data-infrastructure/

Highlights

many of today’s fastest growing infrastructure startups build products to manage data. These systems enable data-driven decision making (analytic systems) and drive data-powered products, including with machine learning (operational systems). (View Highlight)
Most importantly, data (and data systems) are contributing directly to business results – not only in Silicon Valley tech companies but also in traditional industry. (View Highlight)
- Note: People are able to get value out of data. What makes a “business result”?
The race towards data is also reflected in the job market. Data analysts, data engineers, and machine learning engineers topped Linkedin’s list of fastest-growing roles in 2019 (View Highlight)
- Note: Job rush - training needed
Data infrastructure serves two purposes at a high level: to help business leaders make better decisions through the use of data (analytic use cases) and to build data intelligence into customer-facing applications, including via machine learning (operational use cases). (View Highlight)
The data warehouse forms the foundation of the analytics ecosystem. Most data warehouses store data in a structured format and are designed to quickly and easily generate insights from core business metrics, usually with SQL (although Python is growing in popularity (View Highlight)
The data lake is the backbone of the operational ecosystem. By storing data in raw form, it delivers the flexibility, scale, and performance required for bespoke applications and more advanced data processing needs. Data lakes operate on a wide range of languages including Java/Scala, Python, R, and SQL. (View Highlight)
But what’s really interesting is that modern data warehouses and data lakes are starting to resemble one another – both offering commodity storage, native horizontal scaling, semi-structured data types, ACID transactions, interactive SQL queries, and so on. (View Highlight)