
Concluído
Publicado
Pago na entrega
I need an end-to-end ML experiment to predict duplicate customer records in a financial dataset from Kaggle. The goal is to build a proactive classification model that flags likely duplicates before they reach reporting, analytics, or risk pipelines. The workflow should include data loading, EDA, synthetic duplicate labelling (since labels won’t exist), feature engineering, model training, and evaluation. Duplicate pairs will be created using techniques like exact duplication, small perturbations, and formatting inconsistencies. Features should include exact matches, numeric differences (age, income, spending), and similarity measures. Models to test include Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or similar, but deliver one final tuned model. Evaluation should focus on F1-score (target ≥0.85), with a balance between precision and recall. Deliverables: reproducible notebook, clean code, short report, and README.
ID do Projeto: 40330729
5 propostas
Projeto remoto
Ativo há 24 dias
Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos

Hey, I have extensive experience working in the Fintech Domain as a Applied ML Engineer and Data Scientist, since last 6+ years. I can complete your task and also provide you with report in less than 1 day.
$70 USD em 1 dia
2,9
2,9
5 freelancers estão ofertando em média $44 USD for esse trabalho

Hello, With over 7 years of experience in Excel, Data Science, Data Visualization, Statistical Analysis, and Statistics, I have the expertise to handle your project efficiently. I have carefully reviewed the requirements for the project. To address the predictive data quality modeling for financial customer data using machine learning, I will begin by loading the dataset from Kaggle and performing exploratory data analysis (EDA). Synthetic duplicate labeling will be implemented due to the absence of labels. Feature engineering will involve creating features based on exact matches, numeric differences, and similarity measures. The workflow will include model training and evaluation using techniques like Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or similar algorithms to develop a tuned model. Evaluation will focus on achieving an F1-score of ≥0.85, balancing precision and recall. The deliverables will include a reproducible notebook, clean code, a concise report, and a README file detailing the project setup. I would like to discuss this project further with you. Please connect with me via chat for a detailed conversation. You can visit my profile at https://www.freelancer.com/u/HiraMahmood4072 Thank you.
$36 USD em 2 dias
6,4
6,4

Hi there, I am A.R.M. MASUD, with a strong Data Science background. As a Python developer, I have extensive experience building robust, scalable, and efficient solutions that address various business needs. I understand the importance of delivering high-quality, well-architected code, and I am committed to working closely with you to ensure the success of this project. I implement core functionality using Python, utilizing relevant libraries and frameworks such as Pandas, NumPy, GUI, SciPy, Matplotlib, Seaborn, Plotly, Scikit-learn, TensorFlow, Keras, PyTorch, spaCy, Flask, Django, FastAPI, OpenCV, and Jupyter. I am a professional responsible for extracting actionable insights and knowledge from large volumes of data through Machine Learning models, including CNNs, RNNs, LSTMs, GANs, Transformers, FNNs, ANNs, and DNNs. I conduct comprehensive unit, integration, and performance testing to ensure the solution is error-free and optimized. https://www.freelancer.com/u/MZITSERVICES I appreciate the opportunity to submit this proposal and am excited about the possibility of working with you to bring your project to life. Thanks A.R.M MASUD
$40 USD em 7 dias
4,7
4,7

Your duplicate detection challenge needs synthetic labeling since real financial datasets won't have duplicate flags. I'd start by loading your Kaggle dataset, creating controlled duplicates through exact matches and small perturbations (typos, formatting changes), then engineer similarity features like Levenshtein distance, numeric differences, and exact match indicators. XGBoost typically performs well for this type of classification with proper hyperparameter tuning. I built a price aggregation engine that tracks 800+ products across multiple stores, handling fuzzy matching and duplicate detection for similar products with slight naming variations. The pattern recognition work translates directly to customer record deduplication. You can see my automation projects at ffulb.com. Can deliver the complete notebook, tuned model hitting your F1≥0.85 target, and documentation within a week. Ready to start immediately.
$28 USD em 2 dias
1,5
1,5

South Africa
Método de pagamento verificado
Membro desde mar. 20, 2026
₹1500-12500 INR
$15-25 USD / hora
₹12500-37500 INR
$30-250 USD
$250-750 USD
$10-30 USD
₹750-1250 INR / hora
£500-700 GBP
₹600-1500 INR
₹600-1500 INR
₹12500-37500 INR
$20-30 SGD / hora
$30-250 USD
$10000-20000 USD
$250-750 USD
£20-250 GBP
$30-250 USD
$10-30 USD
₹750-1250 INR / hora
$250-750 AUD