Advertisement

A Brief History of the Hadoop Ecosystem

In 2002, internet researchers just wanted a better search engine, and preferably one that was open-sourced. That was when Doug Cutting and Mike Cafarella decided to give them what they wanted, and they called their project “Nutch.” Hadoop was originally designed as part of the Nutch infrastructure, and was presented in the year 2005. The […]

Data Lakes: What They are and How to Use Them

Click to learn more about author Jaya Shankar Byrraju. For most companies, having data means having access to wealth. And the key to fully leveraging the wealth that data represents lies in how effectively companies harness, manage, parse, and interpret it. But first, the data must exist somewhere. Enter data lakes. These are central repositories […]

Unifying Big Data Workloads

Try querying Big Data sets and computing results through high volumes and variety across multiple independent storage systems – you’ll find a tangled web in the Tower of Babel, where platforms communicate in different languages. Then ask for speedy manipulations with that data set and it seems almost impossible. This describes the challenge faced by […]

Benchmarking Hadoop Performance: On-Premises S3-Compatible Storage Keeps Pace with HDFS

Click to learn more about authors Gary Ogasawara and Tatsuya Kawano. When deploying Hadoop, scaling storage can be difficult and costly because the storage and compute are co-located on the same hardware nodes. By implementing the storage layer using S3-compatible storage software and using an S3 connector instead of HDFS, it’s possible to separate storage […]

Essential Open Source Big Data Tools

Click to learn more about author Paul Bates. The analysis of Big Data is a phenomenon that has gained considerable momentum in the past decade. The transition into the information age has made the analysis and visualization of Big Data vital to the success of any business. Data visualization tools enable researchers to gain insight into […]

Changing the Dynamics of Big Data Analytics

The tail end of 2017 saw the official delivery of Hadoop 3.0 from the Apache Software Foundation. With it came HDFS erasure coding, which enables a significant reduction in storage overhead and its costs. Opting to use erasure coding instead of three-way replication – in which three copies of each block of data must be […]