Hadoop Data Lake Business Architecture
Use immediately

Hadoop Data Lake Business Architecture

Making the Hadoop Data Lake More Consumable

Enabling data science and machine learning at scale

SQL App

BUSINESS ANALYSIS

DATA SCIENTISTS

DATA LAKE

Hive, HBase, etc

DATA LAKE

DATA LAKE

1) Important people and tools

are cut-off because of SQL.

completeness or

performance

2) Data scientists still have to resort

to sampling if they can't run

analytics in database at scale

3) There are multiple data sets

and formats within Hadoop

9
0
1
publish time: 2021-07-16
Kiraaaa

A data scientist can use EdrawMax or EdrawMax Online to create a Hadoop Data Lake diagram for their usage. A Hadoop data lake is a data management platform comprising one or more Hadoop clusters. As shown in the below architecture diagram, it is used principally to process and store non-relational data, such as log files, internet clickstream records, sensor data, JSON objects, images, and social media posts. While the data lake concept can be applied more broadly to include other types of systems, it most frequently involves storing data in the Hadoop Distributed File System (HDFS) across a set of clustered compute nodes based on commodity server hardware. As the below image suggests, a Hadoop enterprise data lake can complement an enterprise data warehouse rather than supplant it entirely.

See More Related Templates