-
Notifications
You must be signed in to change notification settings - Fork 1
Source code analysis for hadoop-1.2.1
License
BUPTAnderson/hadoop-1.2.1
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
这是hadoop-1.2.1的源码,我在学习阅读该源码的时候加入了自己的注释,方便自己查询,
任何人都可以查看该注释后的源码,水平有限,如有不对,欢迎指正。
我们基于新的mapreduce API, 正常我们写好一个hadoop的mapreduce程序, 将程序编译
好的jar包上传到一台可以运行hadoop mapreduce程序的节点上, 执行命令:
hadoop jar xxx.jar -files=blacklist.txt,whitelist.txt -libjars=third-party.jar
-archives=directionary.zip -input /test/input -output /test/output
一个word count的mapreduce程序示例:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
在我们的mapreduce程序, 会有job实例, 调用job.waitForCompletion(true)? 0:1
这里的waitForCompletion()就是程序的入口了, 源码的分析从调用这个方法开始.
PS: GenericOptionsParser是一个工具类, 对我们执行的命令中的各种参数进行解析, 像:
-files=blacklist.txt,whitelist.txt, 会在conf中设置
<tmpfiles, blacklist.txt,whitelist.txt>, 而new Job(conf, "word count")
Job实例内部有jobconf, jobconf = new org.apache.hadoop.mapred.JobConf(conf)
conf里面的配置信息都会copy到jobconf中, 所以对命令解析后的参数信息都会保存在jobconf
里面。
About
Source code analysis for hadoop-1.2.1
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published