Quantcast
Channel: HDFS – Cloudera Engineering Blog
Browsing all 95 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Meet the Engineer: Aaron T. Myers

As I mentioned in my inaugural post last week, it’s important to shine a spotlight on the Cloudera engineers who have a hand in making the Hadoop projects run. It’s an obvious point, and yet an...

View Article



Image may be NSFW.
Clik here to view.

What Do Real-Life Apache Hadoop Workloads Look Like?

Organizations in diverse industries have adopted Apache Hadoop-based systems for large-scale data processing. As a leading force in Hadoop development with customers in half of the Fortune 50...

View Article

Exploring Compression for Hadoop: One DBA’s Story

This guest post comes to us courtesy of Gwen Shapira (@gwenshap), a database consultant for The Pythian Group (and an Oracle ACE Director). Most western countries use street names and numbers to...

View Article

Image may be NSFW.
Clik here to view.

Meet the Engineer: Jon Natkins

In this installment of “Meet the Engineers”, meet Jonathan Natkins,  also known as “Natty” by his friends and colleagues.  What do you do at Cloudera, and in which Apache project are you involved? For...

View Article

Image may be NSFW.
Clik here to view.

Schedule This! Strata + Hadoop World Speakers from Cloudera

We’re getting really close to Strata Conference + Hadoop World 2012 (just over a month away), schedule planning-wise. So you may want to consider adding the tutorials, sessions, and keynotes below to...

View Article


CDH4.1 Now Released!

Update time!  As a reminder, Cloudera releases major versions of CDH, our 100% open source distribution of Apache Hadoop and related projects, annually and then updates to CDH every three months....

View Article

Image may be NSFW.
Clik here to view.

Quorum-based Journaling in CDH4.1

A few weeks back, Cloudera announced CDH 4.1, the latest update release to Cloudera’s Distribution including Apache Hadoop. This is the first release to introduce truly standalone High Availability for...

View Article

Secrets of Cloudera Support: The Champagne Strategy

At Cloudera, we put great pride into drinking our own champagne. That pride extends to our support team, in particular. Cloudera Manager, our end-to-end management platform for CDH (Cloudera’s...

View Article


Apache Hadoop in 2013: The State of the Platform

For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts. In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work...

View Article


Apache Hadoop 2.0.3-alpha Released

Last week the Apache Hadoop PMC voted to release Apache Hadoop 2.0.3-alpha, the latest in the Hadoop 2 release series. This release fixes over 500 issues (covering the Common, HDFS, MapReduce and YARN...

View Article

Demo: HDFS File Operations Made Easy with Hue

Managing and viewing data in HDFS is an important part of Big Data analytics. Hue, the open source web-based interface that makes Apache Hadoop easier to use, helps you do that through a GUI in your...

View Article

Image may be NSFW.
Clik here to view.

How Improved Short-Circuit Local Reads Bring Better Performance and Security...

One of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data — we prefer to move the computation to the data whenever possible, rather than the other...

View Article

Apache Hadoop 2 is Here and Will Transform the Ecosystem

The release of Apache Hadoop 2, as announced today by the Apache Software Foundation, is an exciting one for the entire Hadoop ecosystem. Cloudera engineers have been working hard for many months with...

View Article


Apache Hadoop 2.3.0 is Released (HDFS Caching FTW!)

Hadoop 2.3.0 includes hundreds of new fixes and features, but none more important than HDFS caching. The Apache Hadoop community has voted to release Hadoop 2.3.0, which includes (among many other...

View Article

Image may be NSFW.
Clik here to view.

A Guide to Checkpointing in Hadoop

Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one. Checkpointing is an essential part of maintaining and persisting filesystem metadata in...

View Article


How-to: Use Kite SDK to Easily Store and Configure Data in Apache Hadoop

Organizing your data inside Hadoop doesn’t have to be hard — Kite SDK helps you try out new data configurations quickly in either HDFS or HBase. Kite SDK is a Cloudera-sponsored open source project...

View Article

Project Rhino Goal: At-Rest Encryption for Apache Hadoop

An update on community efforts to bring at-rest encryption to HDFS — a major theme of Project Rhino. Encryption is a key requirement for many privacy and security-sensitive industries, including...

View Article


Why Extended Attributes are Coming to HDFS

Extended attributes in HDFS will facilitate at-rest encryption for Project Rhino, but they have many other uses, too. Many mainstream Linux filesystems implement extended attributes, which let you...

View Article

Image may be NSFW.
Clik here to view.

New in CDH 5.1: HDFS Read Caching

Applications using HDFS, such as Impala, will be able to read data up to 59x faster thanks to this new feature. Server memory capacity and bandwidth have increased dramatically over the last few years....

View Article

Image may be NSFW.
Clik here to view.

New in CDH 5.3: Transparent Encryption in HDFS

Support for transparent, end-to-end encryption in HDFS is now available and production-ready (and shipping inside CDH 5.3 and later). Here’s how it works. Apache Hadoop 2.6 adds support for transparent...

View Article
Browsing all 95 articles
Browse latest View live




Latest Images