HDFS Archives - DATAVERSITY

A Brief History of the Hadoop Ecosystem

Keith D. FooteMay 27, 2021May 25, 2021

In 2002, internet researchers just wanted a better search engine, and preferably one that was open-sourced. That was when Doug Cutting and Mike Cafarella decided to give them what they wanted, and they called their project “Nutch.” Hadoop was originally designed as part of the Nutch infrastructure, and was presented in the year 2005. The […]

Data Lakes: What They are and How to Use Them

Jaya ByrrajuAugust 11, 2020August 6, 2020

Click to learn more about author Jaya Shankar Byrraju. For most companies, having data means having access to wealth. And the key to fully leveraging the wealth that data represents lies in how effectively companies harness, manage, parse, and interpret it. But first, the data must exist somewhere. Enter data lakes. These are central repositories […]

Data Orchestration Brings Your Data Closer and Makes Access Faster

Jennifer ZainoOctober 22, 2019October 18, 2019

Data orchestration means trying to bring order and speed to a complex Big Data ecosystem, a conglomeration of storage systems like Amazon S3, Apache HDFS, or OpenStack Swift and computation frameworks and applications such as Apache Spark and Hadoop MapReduce. The data stack is fragmented and performance-challenged by a proliferation of data silos. The technology […]

Unifying Big Data Workloads

Michelle KnightMay 23, 2019May 10, 2019

Try querying Big Data sets and computing results through high volumes and variety across multiple independent storage systems – you’ll find a tangled web in the Tower of Babel, where platforms communicate in different languages. Then ask for speedy manipulations with that data set and it seems almost impossible. This describes the challenge faced by […]

Benchmarking Hadoop Performance: On-Premises S3-Compatible Storage Keeps Pace with HDFS

Gary Ogasawara and Tatsuya KawanoApril 5, 2019March 29, 2019

Click to learn more about authors Gary Ogasawara and Tatsuya Kawano. When deploying Hadoop, scaling storage can be difficult and costly because the storage and compute are co-located on the same hardware nodes. By implementing the storage layer using S3-compatible storage software and using an S3 connector instead of HDFS, it’s possible to separate storage […]

Three Big Data Fears (And Why You Should Not Worry)

Mathias GolombekFebruary 15, 2019February 8, 2019

Click to learn more about author Mathias Golombek. We live and work in a world that has seen Big Data come to the forefront of nearly every sector of our lives. From medicine and mechanics to technology and retail, data gathering is big business, and now more than ever, it’s shaping the way we live. […]

A Big Data State of the Union

Oksana Sokolovsky and Rohit MahajanJanuary 21, 2019January 20, 2019

Click to learn more about author Oksana Sokolovsky and Rohit Mahajan. With the Big Data market and its seminal technology, Hadoop, each about a decade old, enterprise customers need answers to important questions about the Big Data ecosystem health and state of adoption. The ecosystem is maturing. What does this mean for Hadoop, Big Data […]

Predictions for Big Data Analytics in 2019

James KobielusJanuary 7, 2019January 3, 2019

Click to learn more about author James Kobielus. Big Data Analytics has been one of the dominant tech trends of this decade, and it’s also been one of the most dynamic and innovative segments of the IT market. Today’s Big Data Analytics market is quite different from the industry of even a few years ago, and […]

Essential Open Source Big Data Tools

Paul BatesOctober 16, 2018October 18, 2018

Click to learn more about author Paul Bates. The analysis of Big Data is a phenomenon that has gained considerable momentum in the past decade. The transition into the information age has made the analysis and visualization of Big Data vital to the success of any business. Data visualization tools enable researchers to gain insight into […]

Changing the Dynamics of Big Data Analytics

Jennifer ZainoMay 8, 2018May 6, 2018

The tail end of 2017 saw the official delivery of Hadoop 3.0 from the Apache Software Foundation. With it came HDFS erasure coding, which enables a significant reduction in storage overhead and its costs. Opting to use erasure coding instead of three-way replication – in which three copies of each block of data must be […]