
Closed
Posted
Paid on delivery
I need a clean, research-ready dataset that tracks daily air-quality conditions for India, Indonesia and the Philippines from 1 Jan 2019 through 31 Dec 2025. The focus is PM2.5-dominant AQI, but I also want the underlying concentrations of PM2.5, PM10, NO₂ and SO₂ together with basic meteorological variables (temperature, humidity, wind speed and rainfall). Only figures released by the relevant national or provincial authorities will be acceptable; please work directly from government portals and monitoring stations rather than third-party aggregators. Where multiple stations report for the same city, keep station-level detail and include latitude-longitude so I can later map or group as needed. Deliverables • One well-structured CSV per country—or a single consolidated CSV with a country field—containing, at minimum: date, station ID/name, latitude, longitude, PM2.5, PM10, NO₂, SO₂, temperature, humidity, wind speed, rainfall and computed PM2.5-based AQI. • A short README that lists every source URL used, the retrieval method, units, any conversion factors applied and how missing values are flagged. • (Optional but appreciated) the Python/R scripts you use for scraping and cleaning so I can reproduce or extend the workflow. Acceptance criteria 1. Daily continuity: ≥90 % of days present for each variable per station over the full time span. 2. Values expressed in consistent units (µg/m³ for pollutants, °C, %, m/s, mm). 3. No duplicate rows; date and station must form a unique key. 4. Source references verifiable and traceable back to the original government site. If anything about the timeline, variables or format is unclear, let me know before you start scraping so we can lock down the spec.
Project ID: 40424855
9 proposals
Remote project
Active 9 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
9 freelancers are bidding on average ₹8,189 INR for this job

Hi, A clean, research-ready daily AQI dataset covering India and Indonesia means reconciling at least three fragmented source ecosystems — CPCB for India, BMKG and regional APIs for Indonesia — where station coverage, pollutant reporting (PM2.5 vs PM10 vs AQI composite), and update cadence vary significantly by region. My approach: pull from each country's official monitoring APIs plus OpenAQ as a cross-validation layer, normalize all readings to a unified schema (date, station ID, lat/lon, pollutant type, raw value, AQI index, data source), and flag gaps or outlier readings rather than silently interpolating them. I'd store the pipeline in Python using `requests` + `pandas`, with a SQLite or Parquet output depending on your downstream tooling — Parquet is the better default for research workflows given columnar query performance. One clarifying question before I start: is this a one-time historical pull (e.g., last 2–5 years) or are you also expecting a recurring daily update pipeline? That distinction shapes the delivery significantly. Either way, within 24 hours I can share a sample schema and a data coverage map showing which stations and date ranges are reliably available for both countries. Best regards, Val
₹1,500 INR in 7 days
1.8
1.8

Hello, Yes, I can definitely help with this project. I have experience with Python and can work with data scraping, cleaning, preprocessing, and CSV generation workflows. I understand that you need a research-ready air-quality dataset for India, Indonesia, and the Philippines from 2019–2025 using official government monitoring sources only, including station-level pollutant and meteorological variables with computed PM2.5-based AQI values. I can assist with: • Collecting data directly from official government portals and monitoring stations • Building Python scripts for scraping and cleaning the data • Standardizing all variables into consistent units • Maintaining station-level details with latitude and longitude • Removing duplicates and validating daily continuity • Generating structured CSV datasets and a clear README with source references and methodology I also understand the importance of reproducibility, traceable source documentation, and maintaining clean research-quality datasets. Before starting, I would confirm the preferred AQI calculation standard and finalize the exact source scope to ensure the output fully matches your research requirements.
₹6,000 INR in 2 days
0.9
0.9

Hi, I can build this as a reproducible Python data pipeline rather than a one-off spreadsheet. I would pull daily AQI / PM2.5 / PM10 records from documented public sources, normalize India/Indonesia/Philippines into one schema, flag gaps/outliers, and deliver both the clean Excel/CSV dataset and the scripts used to rebuild it. I will include source URLs, date coverage notes, and a short QA summary so the dataset is research-ready and repeatable.
₹22,000 INR in 5 days
0.0
0.0

Hello, I can help build a clean, research-ready air quality dataset for India, Indonesia, and the Philippines using official government monitoring sources only. I have experience with large-scale data collection, Python-based scraping/automation, data cleaning, validation, and CSV dataset preparation for research and analytics projects. The deliverables will include: • Structured CSV dataset with station-level daily records • PM2.5-based AQI computation with documented methodology • Pollutant + meteorological variables in standardized units • Latitude/longitude and station metadata • Duplicate-free validated data with missing-value handling • README containing all official source URLs, retrieval methods, units, and conversions • Optional reproducible Python scripts for scraping and preprocessing I will ensure: • Traceable government-source references • Consistent formatting across countries • Proper station/date uniqueness validation • Clean and research-friendly output structure I can start immediately and discuss final station coverage, preferred AQI formula, and dataset structure before beginning the extraction workflow.
₹7,000 INR in 7 days
0.0
0.0

Badarpur, India
Member since Mar 24, 2026
₹1500-12500 INR
₹1500-12500 INR
$10-30 USD
$10-30 USD
$10-30 USD
₹600-1500 INR
₹12500-37500 INR
₹1500-12500 INR
$30-250 USD
$15-25 USD / hour
$8-15 USD / hour
₹600-1500 INR
$250-750 USD
₹750-1250 INR / hour
$10-30 USD
$30-250 USD
₹12500-37500 INR
€30-250 EUR
£20-30 GBP
₹37500-75000 INR
$25-50 USD / hour
₹600-1500 INR