Here is some basic understanding and questions on oozie
Q) What are the type of jobs in oozie ?
Following three types of jobs are common in Oozie −
Q) How oozie works internally ?
Ans: Oozie detects completion of tasks through callback and polling.
When Oozie starts a task, it provides a unique callback HTTP URL to the task,
and notifies that URL when it is complete. If the task fails to invoke the callback
URL , Oozie can poll the task for completion.
Q) What are the Editors available for oozie for creation and updation of jobs ?
Ans: Oozie Editors
Q) How do you submit a oozie wf (workflow) ?
Ex:
oozie job --oozie http://host_name:8080/oozie -D oozie.wf.application.path=hdfs://namenodepath/pathof_workflow_xml/workflow.xml-run
with a config file ( property file ) which has the name node job tracker and other details
oozie job --oozie http://host_name:8080/oozie
--config edgenode_path/job1.properties
-D oozie.wf.application.path=hdfs://Namenodepath/pathof_workflow_xml/
workflow.xml –run
Note − The property file should be on the edge node (not in HDFS), whereas the workflow and hive scripts will be in HDFS.
Q) How do you set the frequecy of oozie
You can set only for co ordinators as follows, similar to cron job ( for every 5 min)
<coordinator-app xmlns = "uri:oozie:coordinator:0.2" name =
"coord_copydata_from_external_orc" frequency = "5 * * * *" start =
"2016-00-18T01:00Z" end = "2025-12-31T00:00Z"" timezone = "America/Los_Angeles">
Q) How do you verify or kill a job
using command line : oozie jobs
using oozie web console
Sample hPDL work flow for Wordcount program.
------------
<workflow-app name='wordcount-wf' xmlns="uri:oozie:workflow:0.1">
<start to='wordcount'/>
<action name='wordcount'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>org.myorg.WordCount.Map</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.myorg.WordCount.Reduce</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
</configuration>
</map-reduce>
<ok to='end'/>
<error to='end'/>
</action>
<kill name='kill'>
<message>Something went wrong: ${wf:errorCode('wordcount')}</message>
</kill/>
<end name='end'/>
</workflow-app>
Q) What are the type of jobs in oozie ?
Following three types of jobs are common in Oozie −
- Oozie Workflow Jobs − These are represented as Directed Acyclic Graphs (DAGs) to specify a sequence of actions to be executed.
- Oozie Coordinator Jobs − These consist of workflow jobs triggered by time and data availability.
- Oozie Bundle − These can be referred to as a package of multiple coordinator and workflow jobs.
Q) How oozie works internally ?
Ans: Oozie detects completion of tasks through callback and polling.
When Oozie starts a task, it provides a unique callback HTTP URL to the task,
and notifies that URL when it is complete. If the task fails to invoke the callback
URL , Oozie can poll the task for completion.
Q) What are the Editors available for oozie for creation and updation of jobs ?
Ans: Oozie Editors
- notepad
- Hue Editor for Oozie
- Oozie Eclipse Plugin (OEP)
Q) How do you submit a oozie wf (workflow) ?
Ex:
oozie job --oozie http://host_name:8080/oozie -D oozie.wf.application.path=hdfs://namenodepath/pathof_workflow_xml/workflow.xml-run
with a config file ( property file ) which has the name node job tracker and other details
oozie job --oozie http://host_name:8080/oozie
--config edgenode_path/job1.properties
-D oozie.wf.application.path=hdfs://Namenodepath/pathof_workflow_xml/
workflow.xml –run
Note − The property file should be on the edge node (not in HDFS), whereas the workflow and hive scripts will be in HDFS.
Q) How do you set the frequecy of oozie
You can set only for co ordinators as follows, similar to cron job ( for every 5 min)
<coordinator-app xmlns = "uri:oozie:coordinator:0.2" name =
"coord_copydata_from_external_orc" frequency = "5 * * * *" start =
"2016-00-18T01:00Z" end = "2025-12-31T00:00Z"" timezone = "America/Los_Angeles">
Q) How do you verify or kill a job
using command line : oozie jobs
using oozie web console
Sample hPDL work flow for Wordcount program.
------------
<workflow-app name='wordcount-wf' xmlns="uri:oozie:workflow:0.1">
<start to='wordcount'/>
<action name='wordcount'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>org.myorg.WordCount.Map</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.myorg.WordCount.Reduce</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
</configuration>
</map-reduce>
<ok to='end'/>
<error to='end'/>
</action>
<kill name='kill'>
<message>Something went wrong: ${wf:errorCode('wordcount')}</message>
</kill/>
<end name='end'/>
</workflow-app>
No comments:
Post a Comment