
Fechado
Publicado
Pago na entrega
We need a Python scraper that collects European household conversations from multiple public sources and stores them as rows in a Supabase PostgreSQL database. Sources include Reddit (PRAW library, 30+ subreddits across 10 countries), parenting and family forums via Playwright headless browser (Mumsnet UK, [login to view URL] Germany, aufeminin France, ForoCoches Spain, GravidanzaOnline Italy and others), YouTube and TikTok comment sections from family and household videos, news article comment sections, and official government websites. Data collected is text-based: forum posts and replies, social media comments, video transcripts, and household advice articles. All in 7 European languages: German, English, French, Italian, Dutch, Spanish, Polish. Each document stored with country, language, source name, and original URL. Full technical specification provided.
ID do Projeto: 40345404
166 propostas
Projeto remoto
Ativo há 4 dias
Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos
166 freelancers estão ofertando em média $469 USD for esse trabalho

⭐⭐⭐⭐⭐ At CnELIndia, we view this project as an incredible opportunity to flex our expertise in Python, data scraping and PostgreSQL. Having worked with numerous renowned technologies including PHP, WordPress, and even specific data mining tools while collecting vast amounts of information from the web, we are no strangers to the challenges that may come with gathering data from a range of sources. Our capabilities extend beyond basic web scraping; as we understand that your needs go beyond just acquiring the data but also properly storing and organizing it. With regards to linguistics, one of our major advantages is being a multilingual team which means that we fully comprehend the intricacies of handling text-based data in multiple European languages. We are accustomed to leveraging appropriate tools and techniques like using natural language processing libraries to ensure accurate extraction and categorization of content. Our past projects not only demonstrate our proficiency in meeting technical specifications skillfully but also showcase our dedication to maintaining stringent timelines without compromising on quality. Let's collaborate on this exciting venture and build a comprehensive database that can truly revolutionize the understanding of European household dynamics.
$500 USD em 7 dias
9,0
9,0

⭐⭐⭐⭐⭐ Create a Python Scraper for European Household Conversations ❇️ Hi My Friend, I hope you're doing well. I've reviewed your project requirements and see you're looking for a Python scraper to gather household conversations. You don't need to look any further; Zohaib is here to help you! My team has completed 50+ similar projects for data scraping. I plan to collect data from various sources like Reddit, forums, and social media, ensuring it is stored efficiently in a Supabase PostgreSQL database. ➡️ Why Me? I can easily handle your scraping project as I have 5 years of experience in Python and web scraping. My expertise includes data extraction, database management, and multilingual support. Additionally, I have a strong grip on libraries like PRAW and Playwright, ensuring a thorough approach to your project. ➡️ Let's have a quick chat to discuss your project in detail and let me show you samples of my previous work. I'm looking forward to discussing this with you! ➡️ Skills & Experience: ✅ Python Programming ✅ Web Scraping ✅ Data Storage ✅ Database Management ✅ API Integration ✅ PRAW Library ✅ Playwright ✅ Multilingual Support ✅ Data Cleaning ✅ Error Handling ✅ Data Analysis ✅ Technical Documentation Waiting for your response! Best Regards, Zohaib
$350 USD em 2 dias
8,1
8,1

Hello there, I am experienced in web scraping and building scripts or a Windows desktop application using Python. I am also experienced in large data scraping from a given website, bypassing IP, Captcha, and anti-bot or cloud flair protection. Please message me to discuss this project in detail. Best Regards Enamul
$250 USD em 3 dias
8,2
8,2

I've built multi-source scrapers using PRAW for Reddit and Playwright for forum automation. Your project aligns perfectly with my expertise - I'll design a modular Python architecture with rate-limiting, retry logic, and proper error handling for each source type. The Supabase schema will include indexed fields for country/language/source with efficient querying. I'll implement concurrent scraping with Redis for queue management and deliver clean, documented code with a deployment guide. Happy to discuss the technical spec further!
$500 USD em 7 dias
8,1
8,1

Hello, I understand you need a Python scraper to gather household conversation data from various European sources like Reddit, parenting forums, social media comments, news articles, and official websites. The scraper will collect multilingual text data in seven languages and store it in a Supabase PostgreSQL database with Redis for deduplication. I will focus on implementing as per your detailed specification, ensuring coverage of 30+ subreddits, multiple forums via the Playwright browser, and comments from platforms like YouTube, TikTok, and Instagram without making design choices on my end. My approach will be to build reliable Python scripts using PRAW for Reddit, Playwright for web forums, and appropriate APIs or scraping methods for social media and news comments. I'll ensure all data fields like country, language, source, and URL are captured correctly and integrate Redis for efficient duplicate filtering before storing the final dataset in Supabase. I am ready to start immediately and deliver a stable, maintainable scraper according to your full specs. Are there any rate limits or access restrictions I should be aware of for the various sources, especially social media APIs or forums requiring login credentials? Do you have specific guidelines on handling language detection and text normalization across the seven languages? Would you prefer incremental scraping with timestamp checks or full data refreshes each run? Is there a preferred format or schema for stori
$750 USD em 28 dias
7,4
7,4

Hi, we can build a robust Python-based scraping pipeline to collect and structure multilingual household conversation data from your listed sources into Supabase PostgreSQL. Proposed Approach: We’ll use PRAW (Reddit), Playwright (forums/news sites), and APIs where available (YouTube/TikTok) with fallback scraping where compliant. The pipeline will normalize text, detect language, tag country/source, and store clean records (text, metadata, URL) in Supabase. Key Features: • Multi-source scraping (Reddit, forums, social, news, gov) • 7-language handling with auto-detection/validation • Deduplication, rate-limiting, and error handling • Scalable, modular architecture (cron/queue-based) • Clean DB schema with structured metadata Timeline: 3–4 weeks Budget: $1,200 – $2,000 We’ll deliver production-ready code, deployment scripts, and documentation. Ready to review your full spec and start immediately. Thanks.
$1.200 USD em 21 dias
7,5
7,5

Hi, I can do this task perfectly.I have done many python based scraping projects so I assure you that I can do this job perfectly within required time and reasonable budget. Message me here. I am looking forward to an early and positive response. Regards, Shalu
$260 USD em 7 dias
7,0
7,0

Hello, I can build a Python scraper tailored to collect household conversations from Reddit, parenting forums, social media comments, and official sites, ensuring the multilingual European data is stored neatly in your Supabase PostgreSQL setup. Using PRAW for Reddit and Playwright for dynamic forum scraping, I will handle 7 languages and multiple countries with clear tagging of each record by source, language, and URL. This solution will efficiently capture diverse text data like posts, replies, and transcripts across your specified platforms. Thanks, Teo
$500 USD em 5 dias
6,5
6,5

i’ve done very similar recently building multilingual scrapers with PRAW, Playwright, and Supabase pipelines across forums and social sources Do you need strict deduplication across sources and languages, and how should language detection be handled? What is your expected crawl frequency and volume per source? I suggest using a queue-based crawler with rate limiting per source, which avoids bans and keeps ingestion stable. I also suggest normalizing text with language detection and hashing, which improves deduplication and downstream analysis. I will first set up source connectors for Reddit, forums, and social platforms with Playwright. Then I will build parsing, metadata tagging, and Supabase ingestion. Finally I will add scheduling, logging, and deliver a clean, scalable scraper with configs. Best, Dev S.
$1.000 USD em 15 dias
6,4
6,4

With more than a decade of experience in web development, I have honed my skills to meet the specific needs of your project. As a Full-Stack Developer fluent in Python and PHP, I've created various web applications including robust web scrapers similar to the one you require. Having great expertise in leveraging Playwright and PRAW libraries, I am confident that I can efficiently fetch data from various platforms including Reddit, parenting forums, YouTube/ TikTok/ Instagram comment sections – all while maintaining the context and ensuring data integrity. My mastery of multiple European languages including German, English, French, Italian, Dutch, Spanish, Polish will come especially handy as it will enable me to not only collect conversations but also accurately document them with relevant details such as country, language, source name, and original URL as per your requirement. With my skills in Text Data Classification and Processing and Time Series Forecasting and Prediction which are especially relevant to your need for collecting household advice articles amenable to sorting and querying later on; I am confident I can exceed your expectations. Above all else, my clients' satisfaction has always been my priority. Choose me for a professional and efficient service that continues beyond completion date.
$250 USD em 3 dias
5,7
5,7

I can build a Python scraper that pulls text data from Reddit using PRAW, parenting forums with Playwright, plus YouTube, TikTok, and news comments, exactly as you need. I’ve done similar multi-source scrapes before, combining forum content and social media data in several languages, storing everything cleanly in a PostgreSQL database. A quick check: do you want real-time incremental scraping or scheduled batch runs? For YouTube/TikTok comments, do you prefer using official APIs or scraping page loads with Playwright? Handling 7 languages means careful encoding and normalization—would you like help setting up basic text cleaning pipelines too? Based on your spec, I’ll organize posts with metadata (country, language, source, URL) directly into Supabase PostgreSQL for easy queries. I can start by setting up Reddit and a couple of forums to confirm structure, then scale to other sources. Ready to start building this multi-source scraper and get your data flowing into Supabase quickly.
$500 USD em 7 dias
5,9
5,9

Your scraper will hit rate limits on Reddit after 60 requests per minute without OAuth rotation, and Playwright sessions will get blocked by Cloudflare on Mumsnet and ForoCoches if you're not rotating residential proxies. This will cause data gaps that make your dataset unreliable for analysis. Before architecting the solution, I need clarity on two things: What's your target volume per day (10K posts or 100K)? And do you already have proxy infrastructure, or should I build retry logic with exponential backoff to handle bot detection gracefully? Here's the architectural approach: - PRAW + OAUTH: Implement credential rotation across 5+ Reddit apps to stay under rate limits while collecting from 30+ subreddits without triggering API bans. - PLAYWRIGHT + STEALTH: Use playwright-stealth with randomized user agents and headless=False mode to bypass Cloudflare on European forums, then extract structured data via CSS selectors. - POSTGRESQL + JSONB: Store multilingual text in JSONB columns with GIN indexes on country/language fields so you can query 100K+ rows in under 200ms. - REDIS QUEUE: Build a task queue that distributes scraping jobs across workers and implements deduplication to prevent storing the same post twice when forums paginate inconsistently. - ERROR HANDLING: Log failed requests with source URLs to a separate table so you can replay them without losing data when sites go offline or change their HTML structure. I've built 4 large-scale scrapers that collected 2M+ documents from multilingual sources without getting IP-banned. Two questions before we start: Do you need real-time ingestion or can this run as a daily batch job? And what's your budget for proxy services if residential IPs are required? Let's schedule a 15-minute call to walk through edge cases like GDPR compliance for public forum data and handling paywalled content on news sites.
$450 USD em 10 dias
6,2
6,2

Hello there, ✸✸✸Python Expert is Here✸✸✸ I’ve checked your project – “Build a Python data scraper collecting European household conversations” And read the description carefully. As a professional Python Developer, I’m damn sure that I can “Create a Python script that will be able to collects European household conversations from multiple public sources and stores them as rows in a Supabase PostgreSQL database” as you required. I’ve completed a lot of Python project based on ✔Django, ✔Pandas, ✔Flask, ✔FastAPI, ✔Jupyter Notebook, ✔Automation, ✔Selenium & etc. Libraries in various platform. Here is some of my recent completed Python Project: ✔️ https://www.freelancer.com/projects/api-developmet/Python-IBKR-Trading-Template/details ✔️ https://www.freelancer.com/projects/python/Python-Programmer-for-Mathematical/details ✔️ https://www.freelancer.com/projects/python/Looking-for-Python-expert-code/details ✔️ https://www.freelancer.com/projects/python/Python-Backgammon-Game-Debugging-37926848/details Also you can visit my profile and check all the Reviews of my previous all Python Project to get the idea about my knowledge and skills. I’m ready to be hired or ready to be awarded as I can start this task Right Now. So, I’m waiting for your response in chat box. Best Regards! Eng. Bablu Mondol
$300 USD em 2 dias
5,9
5,9

Hi, I understand you need a Python-based scraper that aggregates multilingual household conversations from Reddit, forums, social platforms, and public sites, then stores structured data in Supabase PostgreSQL with proper metadata (country, language, source, URL). I’ve built similar pipelines using PRAW, Playwright, and API integrations—handling multi-language text, rate limits, deduplication, and clean schema design for scalable storage. I can deliver a robust, compliant scraping system with reliable ingestion, structured storage, and clear documentation aligned with your technical spec. Looking forward for your positive response in the chatbox. Best Regards, Arbaz N
$400 USD em 7 dias
6,4
6,4

Hello, I noticed your specification already defines the scraping scope across Reddit, Playwright‑driven forums, and social platforms, which tells me you’re looking for clean execution rather than design debate. I’ve delivered similar multilingual crawlers before, including a Supabase‑backed collection pipeline for EU public‑sector datasets and a Redis‑layered deduplication system that reduced duplication by 94%. The real complexity here lies in normalizing heterogeneous sources, especially Playwright‑rendered forums and mixed comment structures, into a consistent schema without overloading the target platforms or triggering anti‑bot heuristics. Getting this right requires careful throttling, robust retry logic, and language‑aware extraction. I’ll implement modular scrapers per source, add Redis hashing for content dedupe, and push structured documents into Supabase with country/language/source tagging. I’ll also create a shared utilities layer for parsing, retries, and Playwright session management. Before starting, I need clarification on authentication requirements for YouTube, TikTok, and any government portals, plus your preferred scheduling method for recurring runs. I can begin immediately and deliver in phases with clean, documented modules. Thanks, John allen.
$500 USD em 7 dias
5,5
5,5

As an experienced Full Stack Developer for the past 6 years, I am no stranger to intricate projects involving data mining, web scraping, and working with PostgreSQL in Python. My background affords me the unique ability to navigate between various programming languages efficiently – a skill that is critical for this project's multilingualism requirements. What sets me apart from others in my field is my commitment to providing quality solutions within a reasonable budget, without compromising on the deliverables or timeliness. Your project's aim to obtain text-based data across multiple European platforms aligns well with my extensive experience in data extraction from diverse sources including forums, social media channels, and news articles. MySQL and SQL databases are my speciality, which complements the Supabase PostgreSQL database you utilize. Moreover, being well-versed in multiple Core European languages (German, English, French, Italian, Dutch, Spanish & Polish) equips me to handle large-scale linguistic scraping tasks efficiently. If you choose me for your project I will guarantee you precise results and a comprehensive dataset stored with all the key identifiers needed for further analysis; country, language, source name and url. Let us turn your vision into a reality together. Feel free to reach out so we can discuss your project details further.
$251 USD em 4 dias
5,8
5,8

⭐Hi, I’m ready to assist you right away!⭐ I believe I’d be a great fit for your project since I have extensive experience in data scraping and web scraping using Python. With a keen eye on timelines and budget, I can ensure efficient and accurate data collection from diverse sources like Reddit, parenting forums, and social media platforms.
$555 USD em 4 dias
5,5
5,5

Hi, I can develop a high-performance Python-based scraping pipeline to collect and structure multilingual household conversation data across all specified European sources. I’ll use PRAW for Reddit, Playwright for dynamic forums (handling anti-bot protections), and custom scrapers/APIs for YouTube, TikTok, news comments, and government sites, ensuring reliable extraction of text, metadata, and source URLs. The system will normalize and tag each record (country, language, source) and efficiently store it in Supabase PostgreSQL with a clean, scalable schema. I’ll also implement proxy rotation, retry logic, and rate-limit handling to maintain stability at scale. The architecture will be modular, well-documented, and production-ready, with optional scheduling and automation support. I follow a highly professional approach focused on accuracy, performance, and long-term maintainability. Could you share the expected data volume and preferred scraping frequency?
$350 USD em 2 dias
5,3
5,3

Hi, As per my understanding: You need a scalable Python scraping pipeline to collect multilingual (7 EU languages) household conversations from multiple public sources (Reddit, forums, YouTube, TikTok, news, govt sites) and store structured data (text + metadata) into a Supabase PostgreSQL database. Implementation approach: I will build a modular scraper using Python with PRAW for Reddit, Playwright for dynamic forums, and APIs where available (YouTube). For restricted platforms, I’ll implement compliant scraping strategies. The pipeline will normalize data (text, language, country, source, URL) and push it into Supabase via secure connectors. I’ll include scheduling (cron/Airflow), retry logic, rate-limit handling, and logging. Language tagging and basic cleaning will be applied. The system will be scalable, maintainable, and aligned with platform policies. A few quick questions: 1. Do you have API keys for Reddit/YouTube already? 2. Frequency of scraping (real-time, daily, batch)? 3. Expected data volume per day? 4. Any deduplication or NLP processing needed? 5. Hosting preference (local, cloud, Supabase edge)?
$250 USD em 7 dias
5,6
5,6

Hello I can build a robust Python-based scraping pipeline using PRAW, Playwright, and relevant APIs to collect multilingual household conversations across the specified European sources, normalize and structure the data, and store it cleanly in a Supabase PostgreSQL database with metadata like country, language, source, and URL; I’ll ensure scalable architecture, proper handling of dynamic content, and compliance with platform policies. Regards Muhammad
$300 USD em 1 dia
5,4
5,4

Pontresina, Switzerland
Método de pagamento verificado
Membro desde out. 8, 2023
$3000-5000 USD
$15-25 USD / hora
$250-750 USD
$250-750 USD
$250-750 USD
$750-1500 USD
€6-12 EUR / hora
$30-250 USD
$30-250 SGD
₹750-1250 INR / hora
$10-30 USD
$30-250 USD
$250-750 USD
$30-250 USD
£250-750 GBP
€250-750 EUR
$3000-5000 USD
$30-250 USD
£10-15 GBP / hora
₹750-1250 INR / hora
€30-250 EUR
$10-30 USD
₹37500-75000 INR
$250-750 USD
₹5000-30000 INR