不能光想,必须动手实践:
## Mapper
public class PageMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String data = value.toString();
String[] words = data.split("\n");
for (String word : words) {
context.write(new Text(word), new IntWritable(1));
}
}
}
## Reducer
public class PageReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int total=0;
for (IntWritable value : values) {
total=total+value.get();
}
context.write(key, new IntWritable(total));
}
}
## Main
public class PageMain {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Job job = Job.getInstance();
job.setJarByClass(PageMain.class);
job.setMapperClass(PageMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(PageReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
## 命令行
hadoop jar page-1.0-SNAPSHOT.jar PageMain /input/page /output5
ps:第一次运行报错了~~(不练不知道)
错误:Initialization of all the collectors failed. Error in last collector was :interface javax.xml.soap.
原因:编写Main的时候,Text的引用import错了,习惯了弹出提示直接确定~应该导入`import org.apache.hadoop.io.Text;`
展开