Pages

hadoop Ingress and egress

Question : How to Ingress and egress from Hadoop ?
  Ans : Here are some different ways
        1. Command line
           $ echo "jack and jill went up the hill" | hadoop fs -put - /rhyme.txt
$ hadoop fs -cat /rhyme.txt
jack and jill went up the hill
              Note: moveFromLocal and moveToLocal options are useful for ingress/egress operations
                  in which you just want to move not copy.

        2.Java API
Hadoop has an org.apache.hadoop.fs package that contains the filesystem classes.
The FileSystem class is the abstract class that has several implementations including
DistributedFileSystem for HDFS. It exposes basic filesystem operations such as
create, open, and delete, among others.

       3. Apache Thrift
                It is an open source client-server RPC protocol library. Hadoop has a contrib project
                 that contains a Thrift server and bindings for various client languages including
                Python, Ruby, and Perl.

      4.Hadoop FUSE
Hadoop comes with a component called FuseDFS, which allows HDFS to be
                  mounted as a Linux volume via Filesystem in Userspace (FUSE).
      5.HTTP
The advantage of using HTTP to access HDFS is that it relieves the burden of having to
                 have the HDFS client code installed on any host that requires access. Further, HTTP is
ubiquitous and many tools and most programming languages have support for HTTP,
which makes HDFS that much more accessible.

The NameNode has an embedded Jetty HTTP/HTTPS web server, which is used for
the SecondaryNameNode to read images and merge them back. It also supports the
HTFP filesystem, which utilities such as distCp use to enable cross-cluster copies when
Hadoop versions differ. It supports a handful of operations and only read operations
(HDFS writes aren’t supported).

      6.HDFS proxy
                The HDFS proxy is a component in the Hadoop contrib that provides a web app proxy
                 frontend to HDFS. Its advantages over the embedded HTTP server are an
                 access-control layer and support for multiple Hadoop versions.
                Because the HDFS proxy leverages the embedded HTTP Jetty server in the
                NameNode, it has the same limitations to support file reads.

       7. Hoop
              Hoop is a REST, JSON-based HTTP/HTTPS server that provides access to HDFS.
              Its advantage over the current Hadoop HTTP interface is that it supports writes
              as well as reads.
 
       8.WebHDFS
It is included in Hadoop versions 1.x and 2.x, is a whole new API in
Hadoop providing REST/HTTP read/write access to HDFS. It coexists alongside
the existing HDFS HTTP services.
              You can use WebHDFS to create a directory, write a file to that directory,
              and remove the file.
WebHDFS may be turned off by default; to enable it you may have to set
dfs.webhdfs.enabled to true in hdfs-site.xml and restart HDFS.

No comments:

Post a Comment