Hadoop Oozie basics

Here is some basic understanding and questions on oozie

Q) What are the type of jobs in oozie ?
     Following three types of jobs are common in Oozie −

  • Oozie Workflow Jobs − These are represented as Directed Acyclic Graphs (DAGs) to specify a sequence of actions to be executed.
  • Oozie Coordinator Jobs − These consist of workflow jobs triggered by time and data availability.
  • Oozie Bundle − These can be referred to as a package of multiple coordinator and workflow jobs.

Q) How oozie works internally ?

     Ans: Oozie detects completion of tasks through callback and polling. 
            When Oozie starts a task, it provides a unique callback HTTP URL to the task,
            and notifies that URL when it is complete. If the task fails to invoke the callback 
            URL , Oozie can poll the task for completion.

Q) What are the Editors available for oozie for creation and updation of jobs ?
       Ans: Oozie Editors

  • notepad
  • Hue Editor for Oozie
  • Oozie Eclipse Plugin (OEP)


Q) How do you submit a oozie wf (workflow) ?
     Ex:
      oozie job --oozie http://host_name:8080/oozie -D        oozie.wf.application.path=hdfs://namenodepath/pathof_workflow_xml/workflow.xml-run
  
  with a config file ( property file ) which has the name node job tracker and other details 
  
   oozie job --oozie http://host_name:8080/oozie 
    --config edgenode_path/job1.properties 
    -D oozie.wf.application.path=hdfs://Namenodepath/pathof_workflow_xml/
        workflow.xml –run
  
     Note − The property file should be on the edge node (not in HDFS), whereas the workflow and hive scripts will be in HDFS.
 
Q) How do you set the frequecy of oozie 

   You can set only for co ordinators as follows, similar to cron job ( for every 5 min)
   
    <coordinator-app xmlns = "uri:oozie:coordinator:0.2" name =
   "coord_copydata_from_external_orc" frequency = "5 * * * *" start =
   "2016-00-18T01:00Z" end = "2025-12-31T00:00Z"" timezone = "America/Los_Angeles">
  
Q) How do you verify or kill  a job  
     using command line : oozie jobs 
     using oozie web console 


Sample hPDL work flow for Wordcount program.  
------------
<workflow-app name='wordcount-wf' xmlns="uri:oozie:workflow:0.1">
    <start to='wordcount'/>
    <action name='wordcount'>
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.mapper.class</name>
                    <value>org.myorg.WordCount.Map</value>
                </property>
                <property>
                    <name>mapred.reducer.class</name>
                    <value>org.myorg.WordCount.Reduce</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${inputDir}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${outputDir}</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to='end'/>
        <error to='end'/>
    </action>
    <kill name='kill'>
        <message>Something went wrong: ${wf:errorCode('wordcount')}</message>
    </kill/>
    <end name='end'/>
</workflow-app>


No comments:

Post a Comment