PHP script to build database of most common words on internet

Implement a PHP class that takes as an input a single web page address (URL). When called, download and read the web page's text content (i.e. remove all html tags, javascript etc) and update to a database table information about the most common words found, with relation to the web page's TLD. For example, if the system is called with input URL "[login to view URL]", the TLD would be ".com", or if it is called with "[login to view URL]", the TLD would be ".[login to view URL]" The list of possible TLD's can be found e.g. from: [login to view URL]

In other words, the idea is that the system will generate a huge database table that contains information about the most common words found in web pages of different TLD's. The common word matching is, of course, not case sensitive. A word is defined as a string that has 2-100 characters, only characters from A to Z.

The table must contain information how many times any of the words have been found, and when last time (i.e. date).

So, the fields of the table could be:

id - integer - auto increment

word - varchar(100)

hits - integer (number of times this word has been found in different web pages)

last_hit - date (the date when this word was last found)

country - varchar(2) - the two letter ISO code of the country from which this web page containing this word was found).

The hits value gets increased every time the same word is found from different pages. In other words, if a web page contains word "foobar" 10 times, it is still added to the table with a hit count of 1. When the word "foobar" is found from some other web page of the same country, the hits counter is increased by one.

I will use the hits and last_hit data to prune the database table so it does not grow too big. I want to build a table of all the most common words found online, not all words. The job must be implemented using object oriented design using PHP classes. For database, use MySQL. You must develop the script in your own server, you are not given any server access to the production use server.

There is an error in the database table fields: the last field should be "tld - varchar(20)", not "country". Sorry!

Habilidades: PHP

Veja mais: www php code for web design org, web page php develop, web page design wikipedia, web design tags list html, web design generate using php, string matching in c, php string to html, php job uk, matching string, html code to develop a web page, how to develop web content, how to develop own web page, how to develop database in access, how to design web page with html, how to design a web page using html, how to build web pages, how to build web page, how to build a wikipedia page, how to build a web page, how can i build web page

Acerca do Empregador:
( 620 comentários ) Turku, Thailand

ID do Projeto: #5071936

Concedido a:


Hello Jouni, I'm interested in building this script. My php guy would love to do it :) Thanks, Looking forward ;) Adnan

$247 USD em 3 dias
(519 Comentários)

6 freelancers estão ofertando em média $245 nesse trabalho


Hi, i understood what you want and i have 1 question: "build a table of all the most common words found online, not all words", so do you you have list of word ? please talk to me. I'm fast and expert. Thank you !

$236 USD in 5 dias
(517 Comentários)

Hello, I have experience developing similar scripts. As far as i can see you will need a module to manage the "words". right? I am ready to start working on it right now. Thanks for reading my proposal, Bes Mais

$205 USD in 3 dias
(114 Comentários)

Hi, How do you exactly know to prune the words which not frequently used? Maybe one will increase significantly later but you already remove it from DB, so it could not be counted anymore, that way you just decrease Mais

$250 USD in 8 dias
(41 Comentários)

I can do this fast and I also have a server to try the code. I will also provide you with an admin to see all the data you requested in my server.

$333 USD in 2 dias
(0 Comentários)

Hi jv, So you need to start in one page and get all valid TLD order by relevance and save to db, follow these urls and same step, right? is easy. (I can do it in PHP+mySQL+bootstrap + AJAX) I'm perfect for the jo Mais

$200 USD in 7 dias
(0 Comentários)