Currently we run a nutch crawler on a hadoop cluster consisting of one machone and where master/slave are located are runnig on the same machine as well.
We need to get a full hadoop cluster up and running and be able to run nutch on this cluster. Eg so master/slaves are running on separate machines.
Note that we are running slightly modified version of nutch.
The result of thi project should be the following
1. A configured hadoop master instance running
2. A slave running
Documentation that describes how the instances should be configured in the future
We run our servers on amazon ec2.