Em Andamento

MongoDB: Optimize Background Data Aggregation

We have a MongoDB database that contains time series data. Every couple of seconds, data is added to the time series collection. Our current framework to aggregate the raw time series data to quarer-hourly, hourly etc. values is very inefficient. The objective of the project is to replace / optimize the current implementation.

Time series data stored in a MongoDB collection (irregular time intervals) should be aggregated to

- 15min (xx:00-xx:15m, xx:15-xx:30, xx:30-xx:45, xx:45-xx:60),

- 1h (xx:00-xx:60),

- 24h (00:00-24:00) and

- 7d (Mon-Sun)

buckets.

Aggregation functions to be considered are:

- Average

- Sum

- Variance

- Minimum

- Maximum

- Count

Important aspects to take into consideration:

- Every document in the time series database belongs to an object.

- It is likely that historic data changes and re-computation of previous aggregations is required.

- Currently, a lot of work is done by a Python script and only a little by the MongoDB server. This is inefficient.

- The Python script is invoked every 30 minutes. We would appreciate a solution that continuously aggregates

You will get the source code of the current implementation and a database dump.

When bidding on this project, please provide references of previous projects that you successfully completed which are comparable to this project.

-----

Example of a document in the raw time series store:

{

"_id" : ObjectId("514ab3f03ea70484382c8769"),

"annotations" : { },

"dt" : {

"week_of_year" : 9,

"day_of_week" : 4,

"hour" : 1,

"year" : 2013,

"timestamp" : new Date("[url removed, login to view] 02:19:20"),

"day_of_year" : 60,

"day" : 1,

"minute" : 19,

"month" : 3

},

"dtype" : "real",

"owner" : {

"id" : ObjectId("5141c06091e89e7f043cd0dd")

},

"values" : {

"rg3" : 13770.924760095495,

"h3" : 0.79999999999999927,

}

}

--------

Example of a document for 1h aggregation:

{

"_id" : "2012-01-04-00-30-00_5141bc7c91e89e7cbbda3b3d",

"count" : {

"rg3" : 1.0,

"h3" : 2.0,

},

"min" : {

"rg3" : 13770.924760095495,

"h3" : 0.79999999999999927,

},

"dtype" : "real",

"sum" : {

"rg3" : 13770.924760095495,

"h3" : 0.79999999999999927,

},

"var" : {

"rg3" : NaN,

"h3" : NaN,

},

"max" : {

"rg3" : 13770.924760095495,

"h3" : 0.79999999999999927,

},

"owner" : {

"id" : ObjectId("5141bc7c91e89e7cbbda3b3d")

},

"dt" : {

"week_of_year" : 1,

"day_of_week" : 2,

"hour" : 0,

"year" : 2012,

"timestamp" : new Date("[url removed, login to view] 01:30:00"),

"timespan" : {

"from_date" : new Date("[url removed, login to view] 01:00:00"),

"to_date" : new Date("[url removed, login to view] 02:00:00")

},

"day_of_year" : 4,

"day" : 4,

"minute" : 30,

"month" : 1

},

"avg" : {

"rg3" : 13770.924760095495,

"h3" : 0.79999999999999927,

}

}

Habilidades: Algoritmo, Grande Volume de Dados, Map Reduce, NoSQL Couch & Mongo, Python

Ver mais: solution algorithm, maximum and minimum algorithm, example of an algorithm, example of algorithm, example of a algorithm, example algorithm, database aggregation example, data algorithm, an example of an algorithm, algorithm of the day, algorithm functions, algorithm example, algorithm data, 2 sum algorithm, 15min, mongodb c++, variance, python: get data, python algorithm, nan, algorithm python, algorithm for maximum, algorithm and data, raw object, server 2012 python

Acerca do Empregador:
( 40 comentários ) Baar, Switzerland

ID do Projeto: #6478191

Premiar a:

manoj97738

I have gone through the problem statement and I feel why not use node.js which goes well with mongodb. The other solution is to use elastic search and index the Document which would be searched and i hope it would solv Mais

$166 USD em 10 dias
(1 Comentário)
2.6
pinakmishra

Dear Employer, I have been working with big data space with (cloudera certification) with a product based MNC 3yrs by now with different NoSQL dbs and distributed technologies like hadoop. It would be a great pleas Mais

$155 USD em 3 dias
(0 Avaliações)
0.0

8 freelancers are bidding on average $379 for this job

NTechcorporate

Dear Client Our Python Case Studies: [url removed, login to view] [url removed, login to view] We are also open for technical interview. We are ready to start the project immediatel Mais

$412 USD in 6 dias
(12 Comentários)
5.6
aamaia

Hi, can you tell if the current python script is incremental or re-calculates from scratch every time? Inside mongodb, there are two main ways to aggregate: using the aggregation pipeline or using mapreduce. The first Mais

$526 USD in 3 dias
(2 Comentários)
3.7
anandgeor

Hi, Have been working on MongoDB for the past 3 years. Have completed two courses from MongoDB University 1. MongoDB for Node.js developers - Final score 100%. 2. MongoDB for DBA - Final score 94%. A write-up Mais

$277 USD in 3 dias
(1 Comentário)
3.0
fattahaabdul

i have 8+ years of experience. Can we discuss the project. Please initiate a chat with me so that we can discuss the project at a broader level. Why you should hire me- 1. I have a very good communication skills Mais

$1052 USD in 3 dias
(1 Comentário)
1.2
vw7266026vw

I have more than 5 years exp in python. i think this can be done. you can pay once you are happy with my work.

$111 USD in 3 dias
(1 Comentário)
0.8
sradhakrishna

My Expertise/ Experience: 1. 4 years into architecting and developing analytics frameworks for scientific analysis; 2. 1 year into migrating legacy analytics applications into Hadoop ecosystem. 3. Experience in Mais

$333 USD in 5 dias
(0 Comentários)
0.0