Meet the Engineer: Aaron T. Myers
As I mentioned in my inaugural post last week, it’s important to shine a spotlight on the Cloudera engineers who have a hand in making the Hadoop projects run. It’s an obvious point, and yet an...
View ArticleWhat Do Real-Life Apache Hadoop Workloads Look Like?
Organizations in diverse industries have adopted Apache Hadoop-based systems for large-scale data processing. As a leading force in Hadoop development with customers in half of the Fortune 50...
View ArticleExploring Compression for Hadoop: One DBA’s Story
This guest post comes to us courtesy of Gwen Shapira (@gwenshap), a database consultant for The Pythian Group (and an Oracle ACE Director). Most western countries use street names and numbers to...
View ArticleMeet the Engineer: Jon Natkins
In this installment of “Meet the Engineers”, meet Jonathan Natkins, also known as “Natty” by his friends and colleagues. What do you do at Cloudera, and in which Apache project are you involved? For...
View ArticleSchedule This! Strata + Hadoop World Speakers from Cloudera
We’re getting really close to Strata Conference + Hadoop World 2012 (just over a month away), schedule planning-wise. So you may want to consider adding the tutorials, sessions, and keynotes below to...
View ArticleCDH4.1 Now Released!
Update time! As a reminder, Cloudera releases major versions of CDH, our 100% open source distribution of Apache Hadoop and related projects, annually and then updates to CDH every three months....
View ArticleQuorum-based Journaling in CDH4.1
A few weeks back, Cloudera announced CDH 4.1, the latest update release to Cloudera’s Distribution including Apache Hadoop. This is the first release to introduce truly standalone High Availability for...
View ArticleSecrets of Cloudera Support: The Champagne Strategy
At Cloudera, we put great pride into drinking our own champagne. That pride extends to our support team, in particular. Cloudera Manager, our end-to-end management platform for CDH (Cloudera’s...
View ArticleApache Hadoop in 2013: The State of the Platform
For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts. In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work...
View ArticleApache Hadoop 2.0.3-alpha Released
Last week the Apache Hadoop PMC voted to release Apache Hadoop 2.0.3-alpha, the latest in the Hadoop 2 release series. This release fixes over 500 issues (covering the Common, HDFS, MapReduce and YARN...
View ArticleDemo: HDFS File Operations Made Easy with Hue
Managing and viewing data in HDFS is an important part of Big Data analytics. Hue, the open source web-based interface that makes Apache Hadoop easier to use, helps you do that through a GUI in your...
View ArticleHow Improved Short-Circuit Local Reads Bring Better Performance and Security...
One of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data — we prefer to move the computation to the data whenever possible, rather than the other...
View ArticleApache Hadoop 2 is Here and Will Transform the Ecosystem
The release of Apache Hadoop 2, as announced today by the Apache Software Foundation, is an exciting one for the entire Hadoop ecosystem. Cloudera engineers have been working hard for many months with...
View ArticleApache Hadoop 2.3.0 is Released (HDFS Caching FTW!)
Hadoop 2.3.0 includes hundreds of new fixes and features, but none more important than HDFS caching. The Apache Hadoop community has voted to release Hadoop 2.3.0, which includes (among many other...
View ArticleA Guide to Checkpointing in Hadoop
Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one. Checkpointing is an essential part of maintaining and persisting filesystem metadata in...
View ArticleHow-to: Use Kite SDK to Easily Store and Configure Data in Apache Hadoop
Organizing your data inside Hadoop doesn’t have to be hard — Kite SDK helps you try out new data configurations quickly in either HDFS or HBase. Kite SDK is a Cloudera-sponsored open source project...
View ArticleProject Rhino Goal: At-Rest Encryption for Apache Hadoop
An update on community efforts to bring at-rest encryption to HDFS — a major theme of Project Rhino. Encryption is a key requirement for many privacy and security-sensitive industries, including...
View ArticleWhy Extended Attributes are Coming to HDFS
Extended attributes in HDFS will facilitate at-rest encryption for Project Rhino, but they have many other uses, too. Many mainstream Linux filesystems implement extended attributes, which let you...
View ArticleNew in CDH 5.1: HDFS Read Caching
Applications using HDFS, such as Impala, will be able to read data up to 59x faster thanks to this new feature. Server memory capacity and bandwidth have increased dramatically over the last few years....
View ArticleNew in CDH 5.3: Transparent Encryption in HDFS
Support for transparent, end-to-end encryption in HDFS is now available and production-ready (and shipping inside CDH 5.3 and later). Here’s how it works. Apache Hadoop 2.6 adds support for transparent...
View Article
More Pages to Explore .....