We have a requirement where we need to design our MapReduce architecture in a way that it(MR) won't be depended on input pattern. There will be technique/logic where MapReduce code would be constant and change to input pattern will be managed by custom configurable logic only. Can we do this using custom annotation or are there better approaches to do this.
Any suggestion would be of great help. Many Thanks.
This is already a feature of MapReduce thanks to the
RecordReader. I can't give a much better example here than https://hadoopi.wordpress.com/2013/05/27/understand-recordreader-inputsplit/, but essentially these two classes aren't involved in the core
reduce() logic. The
FileInputFormat is responsible for reading and parsing input data, and it then passes this data to the
RecordReader, which provides single key-value pairs to the mapper.
So the mapper doesn't really have any idea where its key-value pair has come from or how it got there (not entirely true because of
context.getInputSplit()). This means you can mix-and-match input types within the same job, although you can only have one
FileInputFormat per mapper, but you can just use multiple different mappers with the same POJO beneath them.