database to be used is wikipedia database and what needs to be done is
--> extracting the links database(The first thing you need to do is to extract the wikilinks from the xml file. )
-->Writing a Pig/Hive program to compute the page rank
OR, another option is, you choose your own project topic and dataset you want to work with and write project report on it. The size of the dataset that you choose should be at least several GBs(min 5gb) and you can use any Hadoop related project (including MapReduce, Pig, hive, Mahout, etc. ) to process your data. You need to describe the problem considered for your project and propose a solution approach using one of the tools/methodologies such as recommender systems, clustering, classification, etc.
The dataset could be the data you are using at work, some data that is involved in your everyday life, or any publicly available dataset.
8 freelancers are bidding on average $248 for this job
6+ years of experience in machine learning and master degree holder in computer science. Expertise in R,WEKA, JAVA,PYTHON, Hadoop, MapReduce, Pig, Hive. Worked on many projects in machine learning.
Already worked on this domain. Working on my PhD on Graph Based services (inc. pageRank, etc) Expert on Recommender systems, so you just need to findout your DataSet and express your needs