hdfs-native is an HDFS client written natively in Rust. It supports nearly all major features of an HDFS client, and several key client configuration options listed below.
Here is a list of currently supported and unsupported but possible future features.
- Listing
- Reading
- Writing
- Rename
- Delete
- Basic Permissions and ownership
- ACLs
- Content summary
- Set replication
- Set timestamps
- Name Services
- Observer reads
- ViewFS
- Router based federation
- Erasure coded reads and writes
- RS schema only, no support for RS-Legacy or XOR
- Kerberos authentication (GSSAPI SASL support) (requires libgssapi_krb5, see below)
- Token authentication (DIGEST-MD5 SASL support)
- NameNode SASL connection
- DataNode SASL connection
- DataNode data transfer encryption
- Encryption at rest (KMS support)
Kerberos (SASL GSSAPI) mechanism is supported through a runtime dynamic link to libgssapi_krb5. This must be installed separately, but is likely already installed on your system. If not you can install it by:
apt-get install libgssapi-krb5-2yum install krb5-libsbrew install krb5Download and install the Microsoft Kerberos package from https://web.mit.edu/kerberos/dist/
Copy the <INSTALL FOLDER>\MIT\Kerberos\bin\gssapi64.dll file to a folder in %PATH% and change the name to gssapi_krb5.dll
The client will attempt to read Hadoop configs core-site.xml and hdfs-site.xml in the directories $HADOOP_CONF_DIR or if that doesn't exist, $HADOOP_HOME/etc/hadoop. Currently the supported configs that are used are:
fs.defaultFS- Client::default() supportdfs.ha.namenodes- name service supportdfs.namenode.rpc-address.*- name service supportdfs.client.failover.resolve-needed.*- DNS based NameNode discoverydfs.client.failover.resolver.useFQDN.*- DNS based NameNode discoverydfs.client.failover.random.order.*- Randomize order of NameNodes to trydfs.client.failover.proxy.provider.*- Supports the behavior of the following proxy providers. Any other values will default back to theConfiguredFailoverProxyProviderbehavior:org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProviderorg.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProviderorg.apache.hadoop.hdfs.server.namenode.ha.RouterObserverReadConfiguredFailoverProxyProvider
dfs.client.block.write.replace-datanode-on-failure.enabledfs.client.block.write.replace-datanode-on-failure.policydfs.client.block.write.replace-datanode-on-failure.best-effortfs.viewfs.mounttable.*.link.*- ViewFS linksfs.viewfs.mounttable.*.linkFallback- ViewFS link fallback
All other settings are generally assumed to be the defaults currently. For instance, security is assumed to be enabled and SASL negotiation is always done, but on insecure clusters this will just do SIMPLE authentication. Any setups that require other customized Hadoop client configs may not work correctly.
cargo build
An object_store implementation for HDFS is provided in the hdfs-native-object-store crate.
The tests are mostly integration tests that utilize a small Java application in rust/mindifs/ that runs a custom MiniDFSCluster. To run the tests, you need to have Java, Maven, Hadoop binaries, and Kerberos tools available and on your path. Any Java version between 8 and 17 should work.
cargo test -p hdfs-native --features intergation-testSee the Python README
Some of the benchmarks compare performance to the JVM based client through libhdfs via the fs-hdfs3 crate. Because of that, some extra setup is required to run the benchmarks:
export HADOOP_CONF_DIR=$(pwd)/rust/target/test
export CLASSPATH=$(hadoop classpath)then you can run the benchmarks with
cargo bench -p hdfs-native --features benchmarkThe benchmark feature is required to expose minidfs and the internal erasure coding functions to benchmark.
The examples make use of the minidfs module to create a simple HDFS cluster to run the example. This requires including the integration-test feature to enable the minidfs module. Alternatively, if you want to run the example against an existing HDFS cluster you can exclude the integration-test feature and make sure your HADOOP_CONF_DIR points to a directory with HDFS configs for talking to your cluster.
cargo run --example simple --features integration-test