I need to do something like a search engine. I have a number of documents. These documents contain Chinese news articles. I need to do text processing for these Chinese articles. I mean by text processing is tokenization - normalization - and build a matrix that contains the terms and the documents where the terms occur. I also have some queries, so when I enter the query I need to get the query id and the articles id sorted by relevance. ex:
1 (query id)
12 1923 182 7192 19 2988 3999 (news id,separate by space,sorted by relevance)
The system should retrieve at most 100 relevant news articles. I attached one document to have a look at. I look forward to hearing from you.
3 freelancers are bidding on average $40 for this job
Plz check [url removed, login to view]~supporttest/[url removed, login to view] where you can see a sample of the work and let me know if you need a same thing. Thanks, Subhasish