Scrapy trabalhos
Preciso de um especialista em web scraping para coletar in...informações específicas consultando CPF em um site. Campos necessários: - Nome completo - Data de nascimento - Endereço - E-mails - Telefones - Veículo (marca/modelo) - Ano de fabricação - Ocupação - Faixa salarial - Provável empresa Habilidades e Experiência Ideais: - Experiência comprovada em web scraping - Proficiência em ferramentas como Python, Beautiful Soup, Scrapy, ou similares - Capacidade de trabalhar com estruturas de dados complexas - Atenção aos detalhes e precisão na extração de dados - Familiaridade com questões legais e éticas de ...
...automatizado que acesse diariamente a página e extraia todos os detalhes de cada imóvel — preço, localização completa, área, status, número do leilão, link individual e quaisquer outros campos exibidos na ficha. Entrego o trabalho com sucesso quando: • O código roda em ambiente Linux ou Windows, preferencialmente em Python (requests + BeautifulSoup, Scrapy ou Playwright; estou aberto a outras stacks se trouxerem vantagens). • A rotina lida sozinha com bloqueios e limites do site: rotação de user-agents, controle de velocidade, back-off inteligente e, se preciso, proxies ou headless browser. • Os dados são salvos em um único arquivo CSV, organizado em colunas cl...
Preciso transformar em um fluxo único e totalmente automático três tarefas que hoje executo manualmente. 1. Scraping O primeiro passo é coletar exclusivamente informações de contato (nome, e-mail, telefone e, se possível, cargo) a partir de um site que indicarei. A solução pode ser construída em Python (BeautifulSoup, Selenium, Scrapy) ou outra stack que preferir, desde que rode em servidor próprio ou servidor cloud e permita agendar execuções. 2. Enriquecimento Assim que o scraping terminar, quero que os dados sejam enviados automaticamente para Targetsmart ou Targetdata. Caso seja mais simples conectar via Clearbit, ou ZoomInfo, podemos conversar, mas minha preferência é...
ENGLISH / PORTUGUES Hello! I'm looking for a freelancer with proven experience in web scraping public data to extract information from the CREMESP website (Regional Medical Council of the State of São Paulo). The data will be used for a non-commercial, social and informational project. All information is publicly available — no login or authentication is required. Project Overview: Visit the website: Apply the filters: Status: Active Specialty: Gynecology and Obstetrics and/or Obstetrics Scrape all 10,017 records, opening each individual profile to access details. Extract the following fields (when available): Full name City Phone number Email CRM number (medical license) Year of graduation If a record displays “Divulgação não autorizada&r...
...apresentados aos usuários de maneira intuitiva e amigável. Requisitos: Experiência comprovada em web scraping e coleta de dados de diferentes fontes. Capacidade de trabalhar com eficiência e precisão para garantir que os dados sejam coletados de forma correta e atualizada. Conhecimento em linguagens de programação e ferramentas utilizadas para web scraping, como Python, Beautiful Soup, Selenium, Scrapy, entre outras. Experiência em integração de APIs e dados em sites existentes. Comprometimento com prazos e habilidades de comunicação eficazes para relatar o progresso do projeto. Detalhes do Projeto: Duração do projeto: O projeto deve ser concluído em até [INSERIR P...
...apresentados aos usuários de maneira intuitiva e amigável. Requisitos: Experiência comprovada em web scraping e coleta de dados de diferentes fontes. Capacidade de trabalhar com eficiência e precisão para garantir que os dados sejam coletados de forma correta e atualizada. Conhecimento em linguagens de programação e ferramentas utilizadas para web scraping, como Python, Beautiful Soup, Selenium, Scrapy, entre outras. Experiência em integração de APIs e dados em sites existentes. Comprometimento com prazos e habilidades de comunicação eficazes para relatar o progresso do projeto. Detalhes do Projeto: Duração do projeto: O projeto deve ser concluído em até [INSERIR P...
Ola Amigo gostaria de ter um codigo que traga dados da web direto ao banco de dados, de preferencia em python e jsoup ou scrapy de um unico website Saudacoes
Projeto Python Utilizando o Scrapy Habilidades necessárias: Python, Html Script rodará diariamente a cada 10 minutos, com o objetivo de pegar dados da geração de energia solar. Rotina: Acessar um site da fabricante do meu inversor; Pegar os dado que são gerados no formato json
Atualmente possuímos uma plataforma de bet, porém com grandes atualizações que tivemos perdemos o nosso scrapy de resultados e odds, gostaríamos de alguém para nos fornecer esse ODDS ou até mesmo criar uma nova. Essa API deve puxar resultados, campeonatos, times, partidas e todas as odds. Procuro alguém que já tenha experiência com isso.
Projeto Python Utilizando o Scrapy Habilidades necessárias: Python, Html, MongoDB Script rodará diáriamente 1 vez por dia, com o objetivo de identificar novos produtos, se algum produto está indisponível ou se algum produto voltou a disponibilidade. Rotina: Acessar um site de um fornecedor Acessar todas as categorias de produtos Pegar os links de todos os produtos disponiveis recursivamente em todas as paginas da paginação Se o produto ja foi capturado somente atualizar o valor e disponibilidade, se não existir no banco de dados inserir todos os dados do novo produto Acessar todas as páginas dos produtos capturados e pegar os dados necessários Url, Titulo, Valor, Imagens, Descição, Atributo...
Olá RESUMO Gostaria de desenvolver um robô que faça as seguintes ações diariamente: - Acesse um repo...CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML BMW CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CARRO CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML CONTEÚDO XML - O Script deve rodar localmente, em windows ou no Scrapy Cloud - Os e-mails deverão ser enviados usando uma conta ...
Oi suellenol, estou pensando em fazer um scrapy de alguns sites e formar um banco de dados em excel. Queria ver contigo qual a dificuldade dos projetos te mostrar os sites, ver quais os riscos. [Removed for encouraging offsite communication which is against our Terms and Conditions -Section 13:Communication With Other Users]
Necessito programador para desenvolver mineração de dados, relacionado a imobiliária. Será necessário para fazer scrapy nos portais e mostrar resultados. Não estou postando muitas informações, por isso peço para entrar em contato comigo que explicarei melhor. OBS: Ajudaria se pudesse divulgar o valor da sua hora e alguns portfólio ou trabalhos.
Necessitamos criar um robô e/ou web crawler e scrapy, para acessar um sistema web, realizar pesquisa através de formulários do sistema, conforme parâmetros informados, e recuperar os dados. O robô também precisa ser capaz de inserir informações no sistema, através de formulários e retornar o status da operação. O robô será executado em um servidor Linux Ubuntu Server, e deverá funcionar como um script, devendo ser executado na linha de comando do shell, e deverá aceitar parâmetros de entrada, que definira o seu comportamento (consulta ou inserção). Exemplo: ~# ./programa parametro1 parametro2 parametro3 paramentroN Observações: ...
Por favor, Cadastre-se ou Faça Login para ver os detalhes.
Por favor, Cadastre-se ou Faça Login para ver os detalhes.
I need the full membership roster from a single website extracted and handed back as a clean CSV file. My focus is purely on Name and contact details—anything that will let me reach each member directly (e-mail, phone, postal address if shown). This is a one-off job; no scheduling or repeat runs are required. You may use the scraping stack you prefer—Python with BeautifulSoup or Scrapy, Node with Puppeteer, a headless browser, etc.—as long as the result is complete and well-structured. I will provide the target URL and any login credentials immediately after we start. Deliverables • CSV file containing one row per member with separate columns for each contact field found • The scraper script or notebook so I can rerun it later if the site layout sta...
...waiting for a complete price-tracking toolkit. Your job is to install and wire together the following open-source applications so they work as one seamless stack: • Price Buddy – follow the official quick-start guide and then swap in my supplied logo so the interface is fully rebranded. • SearXNG – latest stable build from the GitHub repo. • Selenium + the seleniumbase-scrapper module. • Scrapy for optional custom spiders I will run later. Integration requirement Price Buddy must talk to both SearXNG and Selenium through their APIs (my chosen method) so that search results and browser-based scraping feed straight into the Price Buddy dashboard. When everything is connected, I should be able to trigger a query in Price Buddy and have it au...
I have two e-commerce sites that I need to monitor and capture complete product information from. The goal is to pull each item’s title, price, description, images, SKU or item code—everything publicly shown on the product pages—and turn it into a clean, well-structured dataset I can reuse. You’re free to use Python (BeautifulSoup, Scrapy, Selenium, or similar), Node, or any reliable scraping stack as long as it runs unattended and handles basic anti-bot measures those sites might employ. I’ll provide the URLs once we start; none of them requires a login. Deliverables • Scraping script(s) or workflow ready to run on my machine or a small cloud instance • One up-to-date CSV (or JSON) export per site demonstrating the captured data &bul...
I have a tourism website’s home page that links out to 3,435 individual attraction pages. From every one of those pages I need both the city information and the tourist-destination details extracted and delivered to me in a clean Excel/CSV file. You may use the scraping stack you are most comfortable with—Python + BeautifulSoup/Scrapy, Node + Cheerio, or another proven solution—as long as the final dataset is accurate and neatly structured in columns that clearly separate the city fields from the attraction fields. Please make sure the script you build can be rerun later (the site updates periodically) and include brief instructions on how to do so. I will consider the job complete when I receive: • A single .xlsx or .csv file containing one row per at...
...and a snapshot of the most recent user reviews. To keep the file actionable, I’d like each row to represent one product and each data group in its own column (for example: “Site”, “Product name”, “Specs”, “Price”, “Review rating”, “Review text”, “Timestamp”). The workbook should be ready for pivot-table analysis the moment I open it. You are free to choose the underlying stack—Python with Scrapy, Selenium, Playwright, or a headless browser combo is fine—as long as it runs reliably on a standard Windows machine and can be scheduled to refresh on demand. Graceful handling of site-specific anti-bot measures and captchas is essential. Acceptance criteria • Agent fetch...
I’m looking to replace a tedious manual copy-paste process with a reliable workflow that pulls data directly from a set of public websites and feeds it into my spreadsheet every day. I already know the exact pages and the fields I need; what I need from you is an automated solution—ideally a clean, well-commented script in Python using libraries you feel are best suited (Scrapy, BeautifulSoup, Selenium, or a combination). The final output should be a structured CSV (or Google Sheet via API) that mirrors my current column layout so I can slot it straight into existing reports without any reformatting. The script must run unattended on Windows, handle occasional site layout changes gracefully, and log any rows it can’t capture so I can review them later. A quick REA...
...as” approach will take far too long. You will receive a text file that contains the exact filenames for every document I need. Those filenames appear in the HTML once the record is loaded, so they can be used as reliable anchors for the scrape. The order in which the files arrive does not matter; accuracy and completeness do. I expect an automated approach—Python with Selenium, Playwright, Scrapy, or any comparable tool is fine—as long as it can work around the site’s fragile structure and occasional timeouts. If headless browsing or rate-limiting tricks are required, please build them in. Deliverables: • A zipped archive (or split archives) containing every requested PDF. • The runnable script with clear, inline comments so I can repeat...
...receive: Images from Singapore card vendors showing buyback prices Use AI tools to: Identify the card from the image Search eBay: For recent SOLD listings of the same PSA card Extract: Actual accepted Best Offer prices Calculate: Total landed cost in SGD Compare: Vendor Buyback Price vs Total Cost Flag: Arbitrage opportunity if profit ≥ 15% You’re free to use Python (BeautifulSoup, Scrapy, Selenium, or the official eBay API) or any stack you prefer, so long as the process runs unattended on my VPS and avoids eBay’s anti-bot triggers. Deliverables: 1. Well-documented source code with setup instructions 2. Automated scheduler (cron, Windows Task, or similar) set to run daily 3. An Excel file generated on each run, overwriting or appending as we...
...system (Python 3.11+, aiohttp/httpx/Scrapy/Playwright). • Build deduplication & change-detection logic using hash comparison and timestamps. • Design and connect central database (PostgreSQL + SQLite) to store unique company records. • Integrate proxy rotation and throttling (BrightData/Luminati or similar). • Implement data normalization using ftfy, unidecode, python-phonenumbers, regex, and pandas. • Crawl Impressum pages to auto-fill missing fields (phone, fax, website). • Automate daily/weekly export to Excel / CSV using openpyxl. • Add basic monitoring dashboard (Streamlit) showing live progress, proxy health, and logs. • Deliver well-structured, documented, production-ready code. ⸻ Required Skills • Expert in Pytho...
...harvested content in a clean CSV or Excel file with clear column headings; if you prefer a database export, let me know and we can adjust. • Include the finished script or notebook so I can rerun the extraction later. Accuracy and formatting matter more to me than sheer speed, so please allow time for basic validation before handing over the files. If you normally work with Python (BeautifulSoup, Scrapy, Selenium) or similar tooling, that’s perfect, but I’m open to alternative stacks as long as the output meets the same standard. When you reply, briefly outline: 1. The scraping approach and libraries you’d use 2. Any anti-blocking measures you apply for public sites 3. A realistic timeframe to capture, clean, and hand back the data I’m ready ...
...data pulled from publicly available sites. The focus is simple yet crucial: for every company you find, capture the homepage URL and a working email address. (ask for details in the sheet ) A completely ethical approach is non-negotiable—no gated content, no third-party lists, and no automated harvesting that violates site terms. I’m happy for you to use tools you’re comfortable with (Python, Scrapy, BeautifulSoup, Selenium, Google Apps Script, etc.) as long as you respect and rate limits. Email addresses must appear in plain text within the sheet; please avoid hyperlinks or HTML encoding. Deliverables • A Google Sheet populated with data • A short note on your collection method (manual, scripted, hybrid) so I can replicate or update the data ...
...bateaux. Pour y parvenir, j’ai besoin d’un workflow automatisé reposant exclusivement sur Google Maps – c’est la source retenue – capable de collecter, dédupliquer puis nettoyer les données avant de les mettre en forme dans un CSV directement importable dans le CMS Wix. Les coordonnées devront être fournies en degrés décimaux. Livrables attendus • Un script réutilisable (Python + Selenium, Scrapy ou équivalent) qui interroge Google Maps, gère le rate-limit et documente chaque étape de traitement. • Le fichier CSV final contenant, pour chaque base nautique, les champs suivants : Nom, Adresse complète, Ville, Département, Région, Latitude,...
Project Title Custom Lead Generation & Email Scraper Tool (Google, Yellow Pages, & B2B Directories) Project Description I am looking for an experienced developer to build a robust, high-perfo...Address (Must have a validation check to avoid "dead" emails) Phone Number (Optional but preferred) LinkedIn Profile URL (Optional but preferred) Export Functionality: Capability to export data into CSV or Excel format. Anti-Blocking Measures: Use of rotating proxies or delays to ensure the scraper isn't blocked by Google or directories. Technical Requirements: Preference for Python (Selenium, Scrapy, or BeautifulSoup) or a dedicated desktop application. User-friendly interface (even a simple CLI is fine, but a GUI is a plus). Fast processing speed with the abil...
Project Title Custom Lead Generation & Email Scraper Tool (Google, Yellow Pages, & B2B Directories) Project Description I am looking for an experienced developer to build a robust, high-perfo...Address (Must have a validation check to avoid "dead" emails) Phone Number (Optional but preferred) LinkedIn Profile URL (Optional but preferred) Export Functionality: Capability to export data into CSV or Excel format. Anti-Blocking Measures: Use of rotating proxies or delays to ensure the scraper isn't blocked by Google or directories. Technical Requirements: Preference for Python (Selenium, Scrapy, or BeautifulSoup) or a dedicated desktop application. User-friendly interface (even a simple CLI is fine, but a GUI is a plus). Fast processing speed with the abil...
...downward with every run; newest rows appear at the bottom. Each run should happen on a schedule I can adjust (cron, task scheduler, or similar). • Resilience: graceful handling of captchas or temporary blocks (rotating proxies or headless browsing accepted), clear logging of any skipped items, and an alert if a site layout changes. • Maintainable code: well-commented Python (BeautifulSoup, Scrapy, or Playwright are all fine) or an equivalent language you prefer, plus a short README explaining setup and how to add new sites later. Once delivered I will validate that: 1. Data from all ten sites lands in the workbook with a proper timestamp. 2. Prices from consecutive runs appear on separate rows, preserving history. 3. The script can be launched by a single com...
...and scrape the relevant pages in real time or on a frequent schedule. • Apply NLP or other classification techniques to decide whether a posting is truly AI-related, then tag it by sub-domain (e.g. vision, NLP, MLOps, prompt-engineering). • Deliver concise, deduplicated listings to me through an in-app notification feed—no email or SMS required. For the deployment side I’m open to Python (Scrapy, BeautifulSoup, Selenium), Node, or any stack you are comfortable with so long as it is containerised and can run unattended on a small cloud instance. A lightweight web interface or Electron desktop app for the notification feed is ideal; you can suggest an alternative if it achieves the same user experience. Acceptance criteria 1. Agent successfully scrapes...
...phase—so the job is focused on clean data capture and a flawless import workflow. Descriptions must remain in plain text; no extra HTML markup. Images should arrive attached to the right variation, including separate gallery shots where available, and the colour options need to show as clickable swatches in WooCommerce, not just text labels. I’m comfortable if you use Python (BeautifulSoup, Scrapy) or another scraper, and either the WooCommerce REST API or a CSV/XML tool like WP All Import for the upload, as long as the end result feels native inside my store. Deliverables: • Complete product dataset (titles, plain-text descriptions, all images). • Variations set up so size and colour swatches behave exactly like on Furnx. • Products import...
...and email—nothing more. The final deliverable is a clean, well-structured Excel file ready for me to review. Speed is the priority here: please be able to start right away and turn the file around as fast as possible while still double-checking that every row is accurate and complete. If this timeline works for you and you have solid scraping experience with tools like Python, BeautifulSoup, or Scrapy, let’s move forward now. Budget small as simple Task so Low budget bidder 1st priority. But start now. Simple Task. Start bid with "Urgent" Thanks....
I have two source spreadsheets that I need merged and enriched through automated scraping: • “File 1” – 170 k Spanish local businesses with emails • “File 2” – 65 k additional businesses with websites only Phase 1 – Email extraction Using a Python script and well-known libraries (requests, BeautifulSoup, Scrapy or similar), scan every site listed in File 2, capture all working email addresses you can locate, then append them to the corresponding rows so I can produce a unified “File 3”. Phase 2 – Offer harvesting Next, visit each live site in File 3. Where an offer, deal or promotion is publicly displayed, record the details in a fresh Excel sheet with these exact columns: Business ID | Business Name ...
I have a public-facing website that I need scraped end-to-end. The site is open (no login), but the content is split across multiple pages, so your script will have to detect and follow pagination automatically. Here is exactly what I expect: • A clean, well-commented Python script (requests/BeautifulSoup, Scrapy, or Selenium—your choice) that visits every page, captures the required fields, and writes them to a neatly structured CSV. • The final CSV containing all rows pulled from the site. • A short README that tells me how to run the script and change the target URL or output path if needed. Code quality matters to me: no hard-coded absolute paths, clear variable names, and graceful error handling so the run doesn’t stop if a single page fa...
...just needs to collect every page’s copy accurately and store each page URL, headline, sub-headline, paragraph body, and any inline text in separate columns. Please make the scraper resilient to common roadblocks such as pagination, lazy-loaded sections, and basic anti-bot measures, and keep the code modular so I can rerun it myself if the site layout changes slightly. Python with BeautifulSoup, Scrapy, or Playwright is fine as long as the final CSV is UTF-8 encoded and free of HTML tags. Quantities: - we expect somewhere between 10.000 and 70.000 records - we want to pay in milestones per 5,000 - we want to pay for research work + first 5000 in the first milestone, other amount for following milestones (in case you get blocked, problems arise) Deliverables • Scrap...
Project Description: Find school districts and charter schools who use a specific vendo...Vendor Found"`. - If no website could be loaded, the script should log any failed connections or timeouts. Output Format (CSV) The final deliverable file should be structured with the same columns as the ones provided with the additional column to include your results. Skills Required - Expert proficiency in Python. - Deep experience with web scraping libraries (e.g., Requests, BeautifulSoup, Scrapy, and especially Selenium/Puppeteer for dynamic content). - Experience handling common web scraping challenges (redirects, user-agents, proxy usage (if necessary)). To bid, please confirm your familiarity with scraping dynamic content and provide a brief description of the scraping approach...
...comment text, number of comments, likes, reposts/shares, the post date and any other readily available metadata (author handle, follower count, post URL, media links, etc.). Accuracy is critical because the data will feed a trend-analysis dashboard later. Please build the workflow in a way that respects rate limits and login requirements: if you intend to use official APIs, private APIs, Selenium, Scrapy, Playwright, or headless browsers, spell that out so I know how sustainable the solution will be. The final hand-off should include: • A clean, well-commented reusable script (Python preferred) • A short README explaining environment setup, keyword input format and how to extend to new regions • The full export in CSV so I can validate before sign-off If an...
...reliable associated sources. Specific sources: Euromillones: (since Feb 13, 2004) La Primitiva: (since Oct 17, 1985 – modern version) El Gordo de la Primitiva: (since Oct 31, 1993) Updates automatic at exactly 00:02 the day after each draw, using ethical scraping (BeautifulSoup/Scrapy) with proper user-agent headers to mimic human behavior. Store data in PostgreSQL (structured) or MongoDB (flexible), including all prize categories to enable ROI calculations and backtesting. 2.2. Number Prediction Generate predictions for Euromillones, La Primitiva and/or El Gordo simultaneously using explicit advanced AI models: Machine Learning ensembles (Random Forests) for frequency/statistical
...scraped, the information should be organised into a clean CSV file—one row per page—with columns for page URL, full body text, image file names, and link destinations. Please download the images themselves as well and bundle them in a separate folder (a simple ZIP is fine); the CSV should reference the exact filenames so everything lines up. I’m happy for you to use Python with BeautifulSoup, Scrapy, Selenium or whichever stack you prefer, as long as the final output meets these acceptance criteria: • Complete CSV containing text, image names, and link URLs for each page • All images successfully downloaded and accessible via the filenames listed in the CSV • No duplicates or missing pages from the target site * Images need to be sort...
I have a data-analysis pipeline that relies on a ...after award). • Payload: high-resolution image files plus a CSV/JSON map linking each file to product ID, title, price, and category text that you extract during the same run. • Scale: thousands of products per crawl; a resumable approach is essential so partial failures don’t force a full restart. • Frequency: I’ll trigger the crawl weekly, so reusable code is a must. I’m happy with Python—Scrapy, Selenium, Playwright, or a headless solution of your choice—as long as it respects the site’s anti-bot measures and keeps requests polite. Please include a brief outline of how you’ll handle pagination, lazy-loaded images, and rate limiting. Let me know your proposed stac...
...Excel workbook. Please crawl the entire site, not just a few sections, and return each number alongside the key profile details that make the data usable at a glance—name, profile URL, and any other easily captured identifiers shown next to the number. A clean .xlsx with one row per profile, no duplicates, and clearly labelled columns is the only deliverable I’m expecting. If you prefer Python, Scrapy, Selenium, Beautiful Soup or a comparable stack, go ahead; I’m interested in results, not the specific toolset, as long as the script can be rerun later should the site content change. Before delivery, double-check that: • every row contains a valid phone number and url • no pages on the site were skipped • the sheet opens flawlessly in the late...
...issue and validate JWT tokens for every request beyond the public health-check route. Token refresh, revocation, and a simple role model (“user” vs. “admin”) should be built in from the start. Flight data extraction I do not have official Iberia developer access, so we will need to pull the data ourselves. I’m open to whichever tooling you are most comfortable with — BeautifulSoup, Selenium, Scrapy, or a hybrid approach — as long as the final solution is headless, resilient to minor layout changes, and respectful of Iberia’s rate limits. Only flights that are bookable with Avios need to be captured; no hotel or car-rental data is required. Deliverables • Clean, modular Python code (FastAPI or Flask preferred, but I’...
...issue and validate JWT tokens for every request beyond the public health-check route. Token refresh, revocation, and a simple role model (“user” vs. “admin”) should be built in from the start. Flight data extraction I do not have official Iberia developer access, so we will need to pull the data ourselves. I’m open to whichever tooling you are most comfortable with — BeautifulSoup, Selenium, Scrapy, or a hybrid approach — as long as the final solution is headless, resilient to minor layout changes, and respectful of Iberia’s rate limits. Only flights that are bookable with Avios need to be captured; no hotel or car-rental data is required. Deliverables • Clean, modular Python code (FastAPI or Flask preferred, but I’...
I need a senior-level specialist to harvest product data from several e-commerce sites and deliver it in a single, well-structured CSV file. The task demands production-ready techniques—think Scrapy spiders hardened with rotating proxies, Selenium or Playwright for dynamic content, and solid anti-bot countermeasures. The information I’m after is very specific: product names, prices, pictures, and SKU. Nothing less, nothing more. Your solution must run reliably at scale, cope with frequent layout changes, and leave no trace that could trigger blocks. Python is the preferred stack, but if you have a proven alternative that meets the same bar, I’m open to hearing it. To be considered, include in your proposal: • At least one example of a comparable e-commerce...
I’m expanding our Florida outreach list and need a reliable web-scraped data set of school, college, and university administrators who oversee Nursing or other Healthc...address • State (always Florida) Format & delivery – Send the file in Excel (.xlsx). – First progress drop: within 5 days so I can spot-check. – Final, fully cleaned file: no later than 10 calendar days from project start. Quality matters because this list feeds straight into our marketing campaigns. I’ll spot-verify a sample for accuracy. Feel free to leverage Python, BeautifulSoup, Scrapy, or similar tooling—whatever lets you move quickly while respecting each site’s robots.txt. Let me know if anything needs clarifying before you begin, otherwise I&rsqu...
...need a seasoned Python developer to build a robust scraper that collects the required data and writes it straight to JSON—no additional cleaning or processing necessary. Once we begin I’ll provide the target URL(s) and any access details; for now, assume a standard public site with pagination and occasional anti-bot checks. Core expectations • Written in Python 3 using requests/BeautifulSoup or Scrapy; resort to Selenium only if there’s no lighter workaround. • Handles pagination, retries, and polite delays gracefully so the run can complete unattended. • Config file or clear constants for headers, cookies, and start URLs, letting me tweak targets without editing core logic. • Produces a single JSON file (or one file per page if that...
...build a reliable, well-structured lead list and I already know exactly what it should contain. The task is to extract contact information—email addresses, phone numbers and full mailing addresses—from three sources: company and organisation websites, their public social-media profiles, and well-known online directories. I expect the data to be gathered with a solid scraping workflow (Python, Scrapy, BeautifulSoup, Selenium or an equivalent stack is fine) and then verified so that bounced emails and dead numbers are kept to an absolute minimum. Deliverables • One CSV or Excel file with separate columns for name, company, job title, email, phone, street address, city, state, ZIP/postcode, country, source URL and date collected. • No duplicates; every...
I have a data-analysis pipeline that relies on a ...after award). • Payload: high-resolution image files plus a CSV/JSON map linking each file to product ID, title, price, and category text that you extract during the same run. • Scale: thousands of products per crawl; a resumable approach is essential so partial failures don’t force a full restart. • Frequency: I’ll trigger the crawl weekly, so reusable code is a must. I’m happy with Python—Scrapy, Selenium, Playwright, or a headless solution of your choice—as long as it respects the site’s anti-bot measures and keeps requests polite. Please include a brief outline of how you’ll handle pagination, lazy-loaded images, and rate limiting. Let me know your proposed stac...