We're looking for a generic implementation of the Bayesian text classification algorithm described at: [url removed, login to view] We're looking to use it in other places than just for spam and so are looking for a couple of generic functions. 1 function would classify strings with an API like, classify($string,$category). A second function would fetch a category for a string like fetchcategory($string) that would return a category. The functions should be written in such a way that it doesn't matter how many categories there are, or what they represent. The data for classification purposes should be stored in MySQL with table names being prefixed by "bayesian_". As an example, this set of functions should be able to be dropped into a web-based proxy server so that when a user indicates that they found a site "interesting" or "boring" that, after enough "tagging" of sites, it would predict a user's interest level based on the content. They should then be able to be dropped into an email system for spam use or another script for another type of text classification. Obviously, it shouldn't take 20 minutes to do one classification.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Installation package that will install the software (in ready-to-run condition) on the platform(s) specified in this bid request. 3) Complete ownership and distribution copyrights to all work purchased.
Needs to be cross platform PHP/MySQL able to run on RedHat as well as WindowsXP with Apache.