Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
Luigi 是python的一个框架,可以实现复杂的Job批处理,可以很好的管理Job的依赖,workflow,并提供可视化和错误提示,并提供命令行执行
It includes support for running Python mapreduce jobs in Hadoop, as well as Hive, and Pig, jobs. It also comes with file system abstractions for HDFS, and local files that ensures all file system operations are atomic
支持 spark,Hive 等作业的调度
1.luigi的安装
Luigi的安装可用pip(省略Pip的安装)安装
pip luigi install
2.Luigi参数参考
luigi AllTask --module all_task
e.g.:luigi LuigiJobTask --module Luigi_info_Job --date `date +"%Y-%m-%d"`
-- LuigiJobTask 方法名,luigi_job_task 文件名,传递的参数
luigi LuigiJobTask --module luigi_job_task --date `date +"%Y-%m-%d"`
master = 'spark://127.0.0.1:7077' deploy_mode = 'cluster' driver_memory = '3g' executor_memory = '3g' executor_cores=4 num_executors=6
强烈参考:https://marcobonzanini.com/2015/10/24/building-data-pipelines-with-python-and-luigi/