Concluído

Extract webcontent via rss-feeds

Dear all, I have a list of several rss-feeds. These rss-feeds show the title of articles and weblinks to the article, but NOT the article itself. I want to have the articles themselves in the rss-feed, however. So what needs to be done, is based on the rss-feed crawl the links in the rss-feed, and parse the articles found in a new rss-feed. Let me give you an example: - I have the feed [url removed, login to view] (see att. for cached file) (is from swiss police) - I want to crawl and parse all the articles in this feed. - So first article in the rss is this one: [url removed, login to view] Content should be scraped starting from the html-tag "" and ending at the html-tags This should be done for all articles in the rss-feed, and thus creating a new rss-feed which now contains also the complete articles in it. Deliverables: - an installed-php script with mysql database, which has the following possibility: 1) I enter the address of an rss-feed 2) I specify the beginning and the end html, where in between the data should be crawled 3) A new rss-feed is created, that every 30 minutes is updated via a cronjob. I can make as many rss-feeds as I want, and I should be able to manage them from an admin-interface (so make a new feed, alter an existing feed or delete an existing feed). NEW: i just learned, that 1 of my rss-feeds I want to use, points not towards regular html-webpages, but to pdf-pages. Let me know if this is doable too. Reminder You may not start working in this and any project before your bid is accepted. Any user who violates this policy may have their account permanently suspended.

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

php, mysql

Habilidades: Engenharia, Linux, MySQL, PHP, Arquitetura de software, Teste de Software

Veja mais: software ag, police hire, new hire policy, att com, ag 1 source, linux article, extract data from pdf or html file, extract html code script, software extract code software, php extract pdf file, script enter data web form, script parse rss feed, linux cronjob, can extract software code, let know doable, script php extract data mysql, reminder php script, cronjob linux, pdf extract data, pdf data extract

Acerca do Empregador:
( 2 comentários ) Germany

ID do Projeto: #3973146

Concedido a:

                                                    ensparcvw
                                                's Profile Picture"                                                    >
                                                </a>
                                            </figure>
                                        </div>
                                        <div class=
ensparcvw

See private message.

$95 USD em 10 dias
(38 Comentários)
4.7