Application Performance Analysis of Distributed File Systems under Cloud Computing Environment

The processing efficiency of data-intensive application on Hadoop with the general-purpose distributed file system such as Lustre, as the backend file system, is not clear. This paper focuses on the similarities and differences between Lustre and HDFS (Hadoop Distributed File System).

We propose a Hadoop-Lustre platform and evaluate the performance differences of Lustre and HDFS by using a set of data-intensive computing benchmarks. Experimental results indicate Lustre can reach parity with HDFS, or even better than HDFS if the much faster network interconnect is available. It is necessary to study non-HDFS distributed file system to make up the performance lack of HDFS in some MapReduce-based application scenarios.