
Fechado
Publicado
Need a strong streaming experience person to develop design deploy Pyspark publishing and upserting job in EMR with Spark, MongoDB(documentDb) connector, AWS EMR step functions, Cloud watch, docker, Kafka cluster architecture, Airflow dags, Gitlab, Pycharm, Cursor AI IDE etc needed for environment experience
ID do Projeto: 40174710
40 propostas
Projeto remoto
Ativo há 7 dias
Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos
40 freelancers estão ofertando em média $11 USD/hora for esse trabalho

⭐Hi, I’m ready to assist you right away!⭐ I believe I’d be a great fit for your project since I have extensive experience in Python, Apache Kafka, PySpark, MongoDB, and Docker. Additionally, I have worked on data integration and processing projects using GitLab and AWS services. My technical expertise includes setting up streaming jobs with Pyspark, MongoDB connectors, and Kafka cluster architecture. I am proficient in utilizing AWS EMR step functions, CloudWatch, and Airflow for seamless deployment and monitoring of data streaming processes. If you have any questions, would like to discuss the project in more detail, or would like to know how I can help, we can schedule a meeting. Thank you. Maxim
$20 USD em 15 dias
5,4
5,4

Hi There!!! THE GOAL OF THE PROJECT:- Develop, design, and deploy a PySpark streaming solution for upserting and publishing data from DocumentDB to Kafka topics on AWS EMR. I have carefully read your project description and understand that the role requires strong experience with PySpark, MongoDB (DocumentDB) connectors, EMR step functions, Kafka architecture, Docker, Airflow DAGs, GitLab, and monitoring via CloudWatch in a scalable streaming environment. I am the best fit because I have hands-on experience designing and deploying end-to-end data streaming pipelines with PySpark and Kafka in AWS environments. 1. Building PySpark jobs to upsert and publish data to Kafka topics 2. Implementing AWS EMR step functions with monitoring through CloudWatch 3. Managing code and workflow automation using GitLab, Docker, and Airflow DAGs I provide database management, testing, UI/monitoring dashboards, and full source code delivery at project completion. I have 9+ years experience as a full stack developer and have completed multiple streaming and ETL projects with PySpark, Kafka, and MongoDB. Looking forward to chat with you for make a deal Best Regards Elisha Mariam!
$6 USD em 40 dias
4,6
4,6

Hi Ninja B., I am a data engineer with strong streaming experience in PySpark and AWS EMR. I have built publish and upsert pipelines using the MongoDB/DocumentDB Spark connector, Kafka clusters, Airflow DAGs and EMR step functions. I also work with Docker, CloudWatch, GitLab and PyCharm. I can design and deploy a PySpark job to publish and upsert records to DocumentDB and integrate it into your Kafka architecture with monitoring and CI/CD. Please check my profile and portfolio for similar projects. Could you share the expected event rate and whether you use a schema registry or a specific message format? Best regards, Saad J.
$8 USD em 40 dias
3,0
3,0

Hi There! ⚜⭐⭐⭐⭐⚜(( Design and deploy a PySpark streaming solution on AWS EMR to publish and upsert data into DocumentDB. Integrate with Kafka for data streaming, and automate workflows using Airflow and EMR step functions. Ensure monitoring with CloudWatch and manage the environment with Docker, GitLab, and IDE tools. ))⚜⭐⭐⭐⭐⚜ I HAVE BUILT AND DEPLOYED STREAMING PIPELINES USING PYSPARK ON EMR WITH KAFKA AND MONGODB CONNECTORS, INCLUDING AIRFLOW DAG AUTOMATION AND CLOUDWATCH MONITORING. I CAN SHARE SIMILAR WORK IN CHAT. I WILL DEVELOP A robust streaming job that reads from Kafka, processes data, and upserts into DocumentDB using the MongoDB connector. I WILL SETUP EMR STEP FUNCTIONS, AIRFLOW DAGs, AND CLOUDWATCH ALERTS, while ensuring the architecture is scalable and maintainable with Docker and GitLab CI. Let us chat so I can understand your data schema and streaming requirements in detail. Warm Regards, Farhin B.
$5 USD em 40 dias
3,0
3,0

Greetings, I see you're looking for someone with strong streaming experience to work on Pyspark jobs in an AWS EMR environment. Your project involves publishing and upserting data using the Spark and MongoDB connector, along with setting up a Kafka architecture and managing workflows through Airflow. It’s clear that you need someone who can seamlessly integrate these technologies to ensure efficient data streaming. With my background in Python and data processing, combined with hands-on experience in AWS, Docker, and Kafka, I can design and deploy a robust solution that meets your needs. I have worked extensively with MongoDB and Pyspark, ensuring smooth data integration and real-time processing. My familiarity with tools like GitLab and CloudWatch will also help in monitoring and maintaining the system effectively. Looking forward to collaborating on this exciting project. Best regards, Saba Ehsan
$5 USD em 40 dias
3,1
3,1

Hi there, I’m excited about your project involving Pyspark and Kafka! With over 9 years of solid experience in data streaming and processing, I have successfully developed and deployed similar Pyspark applications on AWS EMR, leveraging MongoDB for data integration. I’m adept at utilizing Kafka for real-time data streaming and can help you design a robust architecture that meets your requirements. My strong background includes working with Docker, GitLab, and orchestrating workflows using Airflow, ensuring smooth deployment and scalability. I can start immediately and am confident in delivering a solution that aligns with your vision. Let’s discuss how we can proceed with this project!
$8 USD em 20 dias
2,5
2,5

HI, Greetings of the day! With my extensive programming background in Python, as well as my deep understanding of data analysis and ML models, I am confident in my ability to excel at your project. As a graduate engineer I have been involved in problems solution particularly in python and other languages such as R and Matlab which increases the stack knowledge. I have solid experience working with streaming data through frameworks like Kafka,in addition to AWS EMR step functions. I am proficient with PySpark, Spark, MongoDB (DocumentDB) and have expert-level knowledge of using these technologies in tandem for efficient data processing techniques. Not only that, but my familiarity with Docker containerization will ensure smooth deployment of your project within your preferred environment. Having used Gitlab as part of my daily development workflow means that my collaboration skills are on point, and I always deliver clean, efficient code. My proficiency with Airflow DAG also enables me to create high-quality data workflows that can fit into any complex architecture seamlessly. Overall, my expertise makes me more than capable of tackling the intricacies of your project. Let me demonstrate how I can turn challenges into successful outcomes for you and your team!
$10 USD em 40 dias
1,8
1,8

Dear Ninja, I am writing to express my keen interest in your project involving PySpark streaming with DocumentDB updates to Kafka topics on AWS EMR. With extensive experience in data processing, big data frameworks, and cloud-native architectures, I am confident in delivering a robust, scalable solution tailored to your requirements. My expertise includes designing and deploying PySpark jobs within AWS EMR environments, leveraging Spark-MongoDB connectors for efficient upserts and publishing, as well as implementing complex Kafka cluster architectures to enable high-throughput, low-latency streaming pipelines. Additionally, I have solid hands-on experience with AWS Step Functions, CloudWatch monitoring for operational observability, Docker containerization for consistent deployment, and the orchestration of workflows using Airflow DAGs. My proficiency with GitLab ensures version control and CI/CD best practices, while familiarity with PyCharm and modern IDEs enhances productivity. I am well-versed in building production-grade data integration workflows with fault tolerance and scalability, aligning perfectly with your technology stack. I look forward to contributing to your project with a focus on clean design, maintainability, and optimized performance. Thank you for considering my bid. I am ready to engage immediately and deliver results with precision and professionalism. Best regards, Yael
$3 USD em 40 dias
1,4
1,4

This looks like a streaming platform and infrastructure engineering role, not just Spark job writing — and that’s exactly what I focus on. I’ve worked with EMR-based Spark pipelines, Kafka-driven architectures, and cloud-native orchestration where reliability, idempotent upserts, monitoring, and reproducible deployments matter as much as the code itself. I can help design and deploy: PySpark publishing/upserting jobs on EMR Kafka → Spark → MongoDB/DocumentDB pipelines Step Functions and/or Airflow DAGs for orchestration Containerized execution where appropriate Monitoring and alerting via CloudWatch Clean GitLab-based CI/CD workflows My focus is production safety, scalability, and observability, not ad-hoc batch jobs. Happy to discuss data volume, SLA expectations, and failure/rollback handling before implementation.
$5 USD em 40 dias
1,2
1,2

I can create a seamless, automated PySpark job integrating Spark and MongoDB on AWS EMR, ensuring a strong streaming experience. Your need for a professional, user-friendly solution with AWS EMR step functions, Kafka architecture, and Airflow DAGs stands out. I bring expertise in PySpark, Docker, Airflow, and cloud-based data pipelines using GitLab and modern IDEs like PyCharm and Cursor AI. Does this approach sound like what you’re looking for? Regards, Alicia
$5 USD em 14 dias
1,1
1,1

Hi, this is Jagrati. I have strong experience in building end-to-end streaming and batch data pipelines using PySpark on AWS EMR, integrating with MongoDB (DocumentDB), Kafka clusters, and orchestrating jobs via Airflow DAGs. I can design, develop, deploy, and monitor publishing and upserting jobs efficiently. My expertise includes: PySpark jobs for streaming & batch processing with EMR step functions MongoDB connector for upserts and efficient data handling CloudWatch monitoring for logs, metrics, and alerting Dockerized environments, GitLab CI/CD integration, and Pycharm / Cursor AI IDE setups Designing scalable Kafka architecture for real-time data streaming Building robust Airflow DAGs for automated orchestration and scheduling I follow best practices for performance, reliability, and maintainability, ensuring that jobs run efficiently at scale. I can set up the environment end-to-end and provide clean, production-ready code. I’m ready to start immediately and deliver a fully functional, scalable streaming and upserting pipeline tailored to your requirements. Regards, Jagrati.
$5 USD em 40 dias
0,4
0,4

Hello, A potential challenge lies in ensuring seamless integration between PySpark jobs and the MongoDB DocumentDB while maintaining data consistency in the Kafka topics. To overcome this, I suggest implementing robust error handling and monitoring solutions through AWS CloudWatch and Airflow to track job performance and failures. Additionally, designing the architecture with scalability in mind will facilitate future enhancements as data volumes grow. I'm eager to leverage my skills to create a reliable and efficient data streaming solution tailored to your needs.
$10 USD em 40 dias
0,3
0,3

Greetings, Having carefully reviewed your project description, I am confident in my ability to execute this project to perfection. I possess a broad spectrum of skills, knowledge, and experience in this specific field, making me the ideal candidate to handle your project. My proficiency includesData Integration, Apache Kafka, MongoDB, Big Data, Python, Docker, Data Processing, PySpark, Amazon Web Services and GitLab, which positions me as the best choice for the successful completion of your project. While I am well-prepared to begin, I have a few clarifying questions. Kindly drop me a message in the chat so that we can engage in a discussion regarding the project's budget and deadline. Thank you, and I look forward to the opportunity to collaborate on your project.
$10 USD em 15 dias
2,3
2,3

✅⭐⭐⭐✅ I am ready to make your project a complete success! ✅⭐⭐⭐✅ I’ve analyzed your requirements, and this is a production-grade streaming and data engineering task centered on building, deploying, and operating PySpark jobs on AWS EMR with reliable publishing and upserting at scale. The key challenge is not just writing Spark code, but designing a resilient streaming architecture that integrates Kafka ingestion, MongoDB/DocumentDB upserts, orchestration, observability, and CI/CD without data loss or operational surprises. My approach is to design and implement PySpark streaming and batch upsert jobs on EMR using the MongoDB (DocumentDB) Spark connector, with schema-aware writes and idempotent upsert logic. EMR Steps and/or Step Functions will manage execution, Airflow DAGs will orchestrate dependencies, and CloudWatch will provide logging and alerting. Kafka will be used for reliable ingestion, Docker for reproducible environments, and GitLab for version control and CI. Development will be done in PyCharm/Cursor with clean, well-documented code that’s easy to extend and operate. Looking forward to work with you for your project. Thank you !
$10 USD em 40 dias
0,0
0,0

Hello, With my strong background in Full-Stack Development and a particular focus on AI technologies, I am confident I can bring immense value to your project. Specifically, I have extensive experience with Python and all the components you listed out. I have hand one broad experience Pycharm for development tasks along with Gitlab for version controlling throughout the staging process. My expertise with AWS EMR and documentDB will perfectly align with your requirements for developiong, designing and deploying a Pyspark publishing and upserting job. My skills in Pyspark and MongoDB connector will enhance the quality of your streaming as needed. Additionally, proficiently working with Kafka clusters has been part of my every major project in the past two years. In terms of my management abilities, I have optimized workflow by leveraging Airflow dags. Efficient project management, continuous deployment and monitoring via cloud watch is an essential component of every project that cannot be overlooked and I have skilles to efficiently handle that as well. Overall, my skillset combined with my user-centric approach and data-driven mindset positions me as an ideal fit for your project. Let's connect to further discuss how we can make this project a success together! Thanks!
$19 USD em 34 dias
0,0
0,0

Hello, I hope you are doing well. I am a streaming-focused data engineer with hands-on experience in building end-to-end PySpark pipelines on AWS EMR, integrating MongoDB via the MongoDB Spark Connector, and publishing/upserting data to Kafka. I’ve delivered scalable streaming solutions with robust monitoring and CI/CD using Docker, Airflow, and GitLab. I will architect and deploy your PySpark publishing/upserting job on EMR, wire in MongoDB and Kafka, and orchestrate with Airflow DAGs and Step Functions. I’ll ensure CloudWatch monitoring, Dockerized environments, and GitLab CI/CD for repeatable deployments. Please feel free to contact me so we can discuss more details. I am looking forward to the chance of working together. Best regards, Billy Bryan
$20 USD em 32 dias
0,0
0,0

Hi there! Are you planning to handle schema evolution or data partitioning strategies while upserting to MongoDB using the Spark connector? Regardless, this is definitely something that I feel confident delivering on, given my past experience. I would love to discuss your project further! Looking forward hearing from you. Kind Regards, Corné
$2 USD em 14 dias
0,0
0,0

Hi there, I understand you need a strong streaming engineer to design and deploy a PySpark publishing and upserting job on EMR, using MongoDB DocumentDB, Kafka, Airflow, Docker, and AWS tools. I can handle it: I have hands-on experience building end-to-end streaming pipelines with PySpark, MongoDB connectors, Kafka, EMR steps, and CloudWatch in production. How I’d implement it: - Define a robust upsert flow between DocumentDB and PySpark streaming - Implement PySpark job reading/writing to MongoDB and Kafka - Orchestrate with Airflow DAGs, EMR steps, and Step Functions - Dockerize, configure CloudWatch alerts, and set GitLab CI Availability: I can start now and deliver a working version within about 5 days. Do you have an existing EMR cluster and Kafka cluster, or should I provision and test end-to-end in a new environment? Thanks,
$20 USD em 15 dias
0,0
0,0

Im very skilled in python and i always coding even when im not working so i can guarantee you i will not let you down
$5 USD em 40 dias
0,0
0,0

Cuento con la experiencia senior necesaria en Python y AWS para diseñar y desplegar sus jobs de Pyspark con éxito. Dominio el stack completo requerido, desde la arquitectura de Kafka hasta la orquestación en Airflow. Mi propuesta técnica para su infraestructura: Streaming & Upserts: Desarrollaré procesos de ingesta continua desde Kafka hacia DocumentDB, utilizando el conector oficial de MongoDB para garantizar operaciones de "upsert" eficientes y consistentes. Orquestación en AWS: Configuraré AWS EMR Step Functions y Airflow DAGs para automatizar el ciclo de vida de los jobs, asegurando una ejecución resiliente y monitoreada mediante CloudWatch. CI/CD & Entorno: Implementaré el flujo de despliegue con Docker, GitLab y entornos optimizados en Cursor AI/PyCharm para garantizar la paridad entre desarrollo y producción. Eficiencia de Procesamiento: Aplicaré técnicas avanzadas de optimización en Spark para reducir el consumo de recursos en el cluster EMR. Mi enfoque se basa en la estabilidad y automatización, principios que aplico en mi propio sistema de gestión autónomo Janus V2.1.
$30 USD em 40 dias
0,0
0,0

Malvern, United States
Método de pagamento verificado
Membro desde mai. 29, 2019
$250-750 USD
$8-15 USD / hora
$2-10 USD / hora
$2-8 USD / hora
$2-8 USD / hora
₹1500-12500 INR
₹600-1500 INR
€30-250 EUR
$1500-3000 CAD
€100-300 EUR
$10-30 USD
£2-5 GBP / hora
$35-60 AUD / hora
$10-30 USD
$250-750 USD
₹37500-75000 INR
₹37500-75000 INR
₹1500-12500 INR
$30-250 USD
$250-750 AUD
£10-15 GBP / hora
$15-25 CAD / hora
$250-750 USD
₹600-1500 INR
$250-750 USD