TechnicalPhilosophy: hadoop Ingress and egress

Question : How to Ingress and egress from Hadoop ?
Ans : Here are some different ways
  1. Command line
$ echo "jack and jill went up the hill" | hadoop fs -put - /rhyme.txt
$ hadoop fs -cat /rhyme.txt
jack and jill went up the hill
Note: moveFromLocal and moveToLocal options are useful for ingress/egress operations
in which you just want to move not copy.

  2.Java API
Hadoop has an org.apache.hadoop.fs package that contains the filesystem classes.
The FileSystem class is the abstract class that has several implementations including
DistributedFileSystem for HDFS. It exposes basic filesystem operations such as
create, open, and delete, among others.

3. Apache Thrift
It is an open source client-server RPC protocol library. Hadoop has a contrib project
that contains a Thrift server and bindings for various client languages including
Python, Ruby, and Perl.

  4.Hadoop FUSE
Hadoop comes with a component called FuseDFS, which allows HDFS to be
mounted as a Linux volume via Filesystem in Userspace (FUSE).
  5.HTTP
The advantage of using HTTP to access HDFS is that it relieves the burden of having to
have the HDFS client code installed on any host that requires access. Further, HTTP is
ubiquitous and many tools and most programming languages have support for HTTP,
which makes HDFS that much more accessible.

The NameNode has an embedded Jetty HTTP/HTTPS web server, which is used for
the SecondaryNameNode to read images and merge them back. It also supports the
HTFP filesystem, which utilities such as distCp use to enable cross-cluster copies when
Hadoop versions differ. It supports a handful of operations and only read operations
(HDFS writes aren’t supported).

6.HDFS proxy
The HDFS proxy is a component in the Hadoop contrib that provides a web app proxy
frontend to HDFS. Its advantages over the embedded HTTP server are an
access-control layer and support for multiple Hadoop versions.
Because the HDFS proxy leverages the embedded HTTP Jetty server in the
NameNode, it has the same limitations to support file reads.

7. Hoop
Hoop is a REST, JSON-based HTTP/HTTPS server that provides access to HDFS.
Its advantage over the current Hadoop HTTP interface is that it supports writes
as well as reads.

8.WebHDFS
It is included in Hadoop versions 1.x and 2.x, is a whole new API in
Hadoop providing REST/HTTP read/write access to HDFS. It coexists alongside
the existing HDFS HTTP services.
You can use WebHDFS to create a directory, write a file to that directory,
and remove the file.
WebHDFS may be turned off by default; to enable it you may have to set
dfs.webhdfs.enabled to true in hdfs-site.xml and restart HDFS.

TechnicalPhilosophy

Labels

hadoop Ingress and egress

No comments:

Post a Comment