Write some Software

Concluído Postado Feb 17, 2016 Pago na entrega
Concluído Pago na entrega

I am looking to get the same info from two different types of pages:

type one: generic search using amazon filters to provide urls:

http://www.amazon.co.uk/s/ref=sr_nr_p_n_availability_1?rh=n%3A340831031%2Cn%3A428654031%2Cn%3A430439031%2Cn%3A6411151031%2Cn%3A430484031%2Ck%3Acomputer+part+supply%2Cp_6%3AAUCLXU4CMAQOY%2Cp_n_availability%3A419162031&keywords=computer+part+supply&ie=UTF8&qid=1455497320

type two: specific product:

Can be either of these two urls as both are generated with Asin and I will have Asins of products I am interested in:

[login to view URL]

[login to view URL]

The info I would like is:

Asin: Amazon Unique Identifier

Title: Name

Fulfillment Channel: Merchant or FBA

Availability: In stock or no

Price: I only care about the lowest price that is FBA

Review Count: Number of Reviews (not a priority)

Category: (not a priority)

Brand: (not a priority)

Web Crawler should be written in Python, and be executable from the command line on AWS servers. If you choose to use a VPN network, details for setting up or maintaining that network should be provided. The code should be accompanied with documentation (a README or comments) explaining the structure of the system and any particular nuances that should be understood. We will provide credentials for writing to a table in our PostgreSQL database system. The output of your software should be written to this table.

I would like a plan for scalability as amazon doesn't like its website crawled. Will you use a revolving vpn, how many instances of aws to run simultaneously, etc

I would like to do a minimum of 500K urls a day

Amazon Web Services Python Arquitetura de software

ID do Projeto: #9688818

Sobre o projeto

11 propostas Projeto remoto Ativo em Feb 19, 2016

Concedido a:

vojd11

Have experience in scrapy and ghost for scrape web page, for anti-scrape amazon policy want use bunch of proxy servers. If web crawler do less that 500k url per day, will add asyncio or multiprocessing. Number of proxy Mais

$780 USD em 12 dias
(9 Comentários)
4.7

11 freelancers estão ofertando em média $1257 nesse trabalho

waema

for 500k urls per day. you need proxies inorder to complete without been banned. I have done thsi before for several sites and several other bots in the past. Please let me know once you are back so that we can talk mo Mais

$1000 USD in 7 dias
(101 Comentários)
6.9
responsiveweb15

Hello: Greeting for the day! We have gone through your job post and are very excited about bringing your project on board. We are design and development company and providing outsource services. We will deliver y Mais

$750 USD in 24 dias
(4 Comentários)
4.8
Rohannaikmit

A proposal has not yet been provided

$1000 USD in 6 dias
(0 Comentários)
0.0
ziplokk

I already have built software to scrape amazon by using a target seller id from a database as input and gathering the asin, product name, and whether the item is prime eligible. The project is built in python using Mais

$1944 USD in 7 dias
(0 Comentários)
0.0