Data Analysis

  • Status: Closed
  • Prêmio: $100
  • Inscrições Recebidas: 25
  • Vencedor: pshyvw

Síntese do concurso

Taking the attached CSV file (or any other CSV in similar format) can you provide a solution that accepts a percentage parameter and have it available in executable fashion so that I can run it in the future.

The solution needs to do the following:

1. Delete all rows where COUNTRY = ANTEPOST

2. Delete all rows where FULL_DESCRIPTION contains 'Place Market' or 'Odds' or 'Forecast' or 'Specials' or 'Match' or 'Distance' or 'Without'

3. Sort by COUNTRY (column D), EVENT_ID, IN_PLAY DESC, LATEST_TAKEN

4. For each EVENT_ID do the following:

a. Delete all rows where LATEST_TAKEN is 2 or more minutes before the SCHEDULED_OFF (i.e. if SCHEDULED_OFF is 31/03/2014 13:30 then any times before 31/03/2014 13:29)
b. Determine the (first if more than one) SELECTION with the lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Arab Spring with ODDS of 1.7 in the race whose EVENT_ID is 113511428). This will be called the FAVOURITE.
c. Delete all other rows where IN_PLAY = PE
d. Determine the (first) SELECTION with the (same or) next lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Mr Greenspan in the race whose EVENT_ID is 113511428)
e. If there is less than a 5% difference (this is the supplied parameter) between the two ODDS for 4b and 4d, then delete all other rows apart from 4b and 4d and exit (i.e. move onto next EVENT_ID)
f. Delete all rows for the EVENT_ID where SELECTION FAVOURITE
g. Find the lowest value ODDS where IN_PLAY = IP. This will be called LOWEST_IP.
h. Delete all other rows where IN_PLAY = IP

5. Save a separate cut down version of the CSV after the above processing has taken place (i.e. each EVENT_ID should have 2 rows - one with a PE and one with an IP or two with PE if we exited at 4e above).

6. Split the cut down version of the CSV into two further files extracting the rows where EVENT = TO BE PLACED into a file of their own and saving the remaining rows in another file of their own.


03.07.2014 14:55:00 (UK)____AMENDED INSTRUCTIONS AS FOLLOWS (after swapping 4c and 4d points above and making further small tweaks)

4. For each EVENT_ID do the following:

a. Delete all rows where LATEST_TAKEN is 2 or more minutes before the SCHEDULED_OFF (i.e. if SCHEDULED_OFF is 31/03/2014 13:30 then any times before 31/03/2014 13:29)
b. Determine the (first if more than one) SELECTION with the lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Arab Spring with ODDS of 1.7 in the race whose EVENT_ID is 113511428). This will be called the FAVOURITE.
c. Determine the (first) SELECTION with the (same or) next lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Mr Greenspan in the race whose EVENT_ID is 113511428). This will be called the 2ND_FAVOURITE.
d. Delete all other rows where IN_PLAY = PE or IN_PLAY = NI (i.e. where EVENT_ID = 113511428)
e. If there is less than a 5% difference (this is the supplied parameter) between the two ODDS for 4b (1.7) and 4c (3.8), then delete all other rows apart from 4b and 4c and exit (i.e. move onto next EVENT_ID)
f. Sort by COUNTRY (column D), EVENT_ID, SELECTION and delete all rows for the EVENT_ID where SELECTION is NOT equal to FAVOURITE (determined in 4b) but keep the row where 2ND_FAVOURITE was determined
g. If COUNTRY is NOT equal to 'GB' or NOT equal to 'IRE', sort by COUNTRY (column D), EVENT_ID, ODDS and find the lowest value ODDS where IN_PLAY = IP. This will be called LOWEST_IP.
h. If COUNTRY is NOT equal to 'GB' or NOT equal to 'IRE', delete all other rows where IN_PLAY = IP (i.e. where EVENT_ID = 113511428)

5. Save a separate cut down version of the CSV after the above processing has taken place (i.e. each EVENT_ID should have 3 rows - two with a PE and one with an IP or two with PE if we exited at 4e above or COUNTRY is NOT equal to 'GB' or 'IRE').

6. Split the cut down version of the CSV into two further files extracting the rows where EVENT = TO BE PLACED into a file of their own and saving the remaining rows in another file of their own.

Habilidades Recomendadas

Feedback do Empregador

“Delighted with the end result. As the rating suggests, the quality, communication, expertise and professionalism was top notch. I\'m already thinking of the next task to give to this freelancer.”

Foto do perfil Collie33, Ireland.

Painel de Comentários

  • pshyvw
    pshyvw
    • 8 anos atrás

    FYI. To see if there is a real need to sort based on multiple fields(could be time consuming for large datasets), I ran some checks. For a given EVENT_ID, every SELECTION belongs to only one COUNTRY. Multi-field sorting at 4f & 4g is not needed in my opinion(unnecessary performance hit), if we can get the records based on the other filters to arrive at that step.

    • 8 anos atrás
    1. FixTheTask
      FixTheTask
      • 8 anos atrás

      And the part appears in my proposed image as well. it shows how i made it. it does not take more than some seconds to sort them.

      • 8 anos atrás
    2. tosha5252
      tosha5252
      • 8 anos atrás

      If it does not take more than some seconds, you haven't done it correctly

      • 8 anos atrás
  • FixTheTask
    FixTheTask
    • 8 anos atrás

    I ran some checks. For a given EVENT_ID, every SELECTION belongs to only one COUNTRY.

    • 8 anos atrás
  • pshyvw
    pshyvw
    • 8 anos atrás

    Incorporated all changes. New findings are that for each EVENT_ID, output csv file has: (a) 3 records with the third rec for IN_PLAY=IP only when COUNTRY is "GB" or "IRE". OR (b) 2 records with IN_PLAY= PE or NI (c) Interestingly, has 161 entries with only one record(FAVORITE). Found that these events have very few records 'during the last minute' (sometimes only 1 record during the last minute as in say EVENT_ID=113530124) and even if more than 1 record in last minute they have only one SELECTION. For this third category, we get only FAVORITE; there is no SECOND_FAVORITE as per modified rule 4c

    • 8 anos atrás
  • pshyvw
    pshyvw
    • 8 anos atrás

    Oops: Message got posted when I hit Enter. Please ignore previous message

    • 8 anos atrás
  • pshyvw
    pshyvw
    • 8 anos atrás

    Incorporated all changes. New findings:

    • 8 anos atrás
  • haankrishan
    haankrishan
    • 8 anos atrás

    sir had u selected the winner

    • 8 anos atrás
  • Collie33
    Proprietário do Concurso
    • 8 anos atrás

    Folks, just so you know you're on the right track I've saved a CSV/XLSX file for most of the steps in the instructions (for EVENT_ID = 113511428).

    See files available from https://www.dropbox.com/sh/hh4dj7dtuhlz2r1/AAAEUCYMsV6L5msf8wLt8L36a.

    • 8 anos atrás
  • Collie33
    Proprietário do Concurso
    • 8 anos atrás

    Folks, I've just realised that only UK and Ireland have in play markets so 4d now becomes:

    d. Delete all other rows where IN_PLAY = PE or IN_PLAY = NI (i.e. where EVENT_ID = 113511428)

    g and h will not be applicable when COUNTRY is NOT equal to 'GB' or NOT egual to 'IRE'

    I'm updating the brief to reflect this and small typo in point e which I'm also updating.

    • 8 anos atrás
  • pshyvw
    pshyvw
    • 8 anos atrás

    4g. I believe the objective is to get the 'lowest ODDS' for a given EVENT_ID and IN_PLAY=IP? Was wondering what the sorting by COUNTRY, EVENT_ID and ODDS would result in. In my opinion, EVENT_ID is redundant at 4g as it is the same for all records for consideration at this stage. Further, we have deleted all records other than for FAVORITE SELECTION. A SELECTION belongs to only one COUNTRY and as such COUNTRY too would be same for all records selected for this step. Can we simply sort based on ODDS for the relevant 'subsetted' dataset at that step? Data 'subsetting' is based on EVENT_ID & IN_PLAY=IP. While this may sound like an implementation issue, need confirmation whether sorting by COUNTRY can lead to a different record being selected (not necessarily the lowest ODDS for the dataset filtered only by EVENT_ID & IN_PLAY=IP).

    • 8 anos atrás
    1. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      Yes, the objective is to find the lowest ODDS for the FAVOURITE. There is no need to sort if you don't think it is necessary provided you retain the correct record/row. However, a SELECTION does exist for multiple EVENT_IDs (e.g. 113511429 and 113511429). A SELECTION could also race in the UK (where COUNTRY = GB) one day and then race in Ireland (where COUNTRY = IRE) a day or two later.

      • 8 anos atrás
  • pshyvw
    pshyvw
    • 8 anos atrás

    4c. If we look at the dataset just before executing this step, the (first) SELECTION with the 'next lowest ODDS' for EVENT_ID=113511428 during the minute before SCHEDULED OFF is also Arab Spring with 1.71. Since you were expecting Mr Greenspan with 3.8, I think 4c wording (initial part) should be "Determine the 'next' (other than FAVORITE) SELECTION with the lowest ODDS during the ...". Please confirm.

    • 8 anos atrás
    1. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      Yes you are correct - good observation - I will amend brief to match.

      • 8 anos atrás
  • pshyvw
    pshyvw
    • 8 anos atrás

    4b. Though not mentioned, it is implied from point 5 that the FAVORITE SELECTION is from IN_PLAY=PE. For EVENT_ID=100933526, we have all IN_PLAY=NI, for which we can determine 4b & 4c though not 4g. This is because 4g clearly specifies "where IN_PLAY=IP". Can we add the "where IN_PLAY=PE" to the specification at 4b & 4c? Or alternately, how do we handle cases like 100933526 where we get the first 2 NIs (not PEs) but no IP.

    • 8 anos atrás
    1. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      I've just realised that only UK and Ireland have in play markets so 4d now becomes:

      d. Delete all other rows where IN_PLAY = PE or IN_PLAY = NI (i.e. where EVENT_ID = 113511428)

      g and h will not be applicable when COUNTRY is NOT equal to 'GB' or NOT egual to 'IRE'

      • 8 anos atrás
  • momo2619
    momo2619
    • 8 anos atrás

    That contest is nearly 1 a day, and if the fastest should win, so one of those freelancer could finish it since yesterday.
    Shall I continue or stop here? just one answer. Continue or not.

    • 8 anos atrás
    1. momo2619
      momo2619
      • 8 anos atrás

      Just to update you, the application will be fast and simple.
      if more details needed please let me know. like the interface or anything else.

      • 8 anos atrás
    2. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      I haven't given any thought to interface but may have some comments when I see it.

      • 8 anos atrás
  • bstoinev
    bstoinev
    • 8 anos atrás

    How about new button in the ribbon that allows you to do this?

    • 8 anos atrás
    1. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      I don't understand your comment - ribbon?

      • 8 anos atrás
  • artej11
    artej11
    • 8 anos atrás

    Is mathematica script fine?( you will need mathematica to run it)

    • 8 anos atrás
    1. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      Can I download mathematica for free? If so, then it should be okay.

      • 8 anos atrás
  • artej11
    artej11
    • 8 anos atrás

    also what does f. Delete all rows for the EVENT_ID where SELECTION FAVOURITE mean?

    • 8 anos atrás
    1. artej11
      artej11
      • 8 anos atrás

      Is it just delete all favourites?

      • 8 anos atrás
    2. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      Typo in original instructions. See amended brief where f is now:

      f. Sort by COUNTRY (column D), EVENT_ID, SELECTION and delete all rows for the EVENT_ID where SELECTION is NOT equal to FAVOURITE (determined in 4b) but keep the row where 2ND_FAVOURITE was determined

      • 8 anos atrás
  • pshyvw
    pshyvw
    • 8 anos atrás

    If we go by the specified sequence at 4c all PE records would get deleted and there is no way we can get another PE at 4d - would always be PE and IP

    • 8 anos atrás
    1. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      See amended brief as 4c and 4d were in the wrong order previously.

      • 8 anos atrás
  • pshyvw
    pshyvw
    • 8 anos atrás

    In 4b, how do we interpret "during the minute before the SCHEDULED_OFF" - does this mean that SCHEDULED_OFF - LATEST_TAKEN should be 0 or 1 as 2 and more are deleted?

    • 8 anos atrás
    1. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      Perform some date/time comparison and only choose the minute before the off. So of the off time was 13:30 then the minute before would be 13:29. I'm only interested in rows that contain 13:29.

      • 8 anos atrás
  • mfvonh
    mfvonh
    • 8 anos atrás

    Would you accept a Python script?

    • 8 anos atrás
    1. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      Assuming I can easily run it on a Windows environment then yes.

      • 8 anos atrás
  • Collie33
    Proprietário do Concurso
    • 8 anos atrás

    Contest brief updated to reflect new amendments to instructions.

    • 8 anos atrás
  • Collie33
    Proprietário do Concurso
    • 8 anos atrás

    5. Save a separate cut down version of the CSV after the above processing has taken place (i.e. each EVENT_ID should have 3 rows - two with a PE and one with an IP or two with PE if we exited at 4e above).

    6. Split the cut down version of the CSV into two further files extracting the rows where EVENT = TO BE PLACED into a file of their own and saving the remaining rows in another file of their own.

    • 8 anos atrás
  • Collie33
    Proprietário do Concurso
    • 8 anos atrás

    c. Determine the (first) SELECTION with the (same or) next lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Mr Greenspan in the race whose EVENT_ID is 113511428). This will be called the 2ND_FAVOURITE.
    d. Delete all other rows where IN_PLAY = PE (i.e. where EVENT_ID = 113511428)
    e. If there is less than a 5% difference (this is the supplied parameter) between the two ODDS for 4b (1.7) and 4c (3.8), then delete all other rows apart from 4b and 4d and exit (i.e. move onto next EVENT_ID)
    f. Sort by COUNTRY (column D), EVENT_ID, SELECTION and delete all rows for the EVENT_ID where SELECTION is NOT equal to FAVOURITE (determined in 4b) but keep the row where 2ND_FAVOURITE was determined
    g. Sort by COUNTRY (column D), EVENT_ID, ODDS and find the lowest value ODDS where IN_PLAY = IP. This will be called LOWEST_IP.
    h. Delete all other rows where IN_PLAY = IP (i.e. where EVENT_ID = 113511428)

    • 8 anos atrás
  • Collie33
    Proprietário do Concurso
    • 8 anos atrás

    Folks,

    I spotted an error in my instructions which may answer some of your questions. I've slightly amended the instructions which basically swap points 4c and 4d. There are some other tiny tweaks which should help explain the scenario. Amended instructions as follows (in separate messages due to 1000 character limit):

    4. For each EVENT_ID do the following:

    a. Delete all rows where LATEST_TAKEN is 2 or more minutes before the SCHEDULED_OFF (i.e. if SCHEDULED_OFF is 31/03/2014 13:30 then any times before 31/03/2014 13:29)
    b. Determine the (first if more than one) SELECTION with the lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Arab Spring with ODDS of 1.7 in the race whose EVENT_ID is 113511428). This will be called the FAVOURITE.

    • 8 anos atrás
  • anujbatham
    anujbatham
    • 8 anos atrás

    wait for my entry

    • 8 anos atrás
  • ACMAuk
    ACMAuk
    • 8 anos atrás

    Stay tuned- Almost complete.........

    • 8 anos atrás
  • shenghuan
    shenghuan
    • 8 anos atrás

    1. Delete all rows where COUNTRY = ANTEPOST --- There were even no country named ANTEPOST ( or list of ANTEPOST provided).

    • 8 anos atrás
    1. shenghuan
      shenghuan
      • 8 anos atrás

      Thank you.

      • 8 anos atrás
    2. soniapathania
      soniapathania
      • 8 anos atrás

      hi..is these employers seeking data entryjobs real..because they are asking for registration fee.

      • 8 anos atrás
  • FixTheTask
    FixTheTask
    • 8 anos atrás

    Please wait for something

    • 8 anos atrás
  • momo2619
    momo2619
    • 8 anos atrás

    I am nearly finished the job itself. But if you need it in an executable,i am creating currently a c# application to do the task,

    • 8 anos atrás
  • ACMAuk
    ACMAuk
    • 8 anos atrás

    Please wait for something VERY EXCELLENT!

    • 8 anos atrás
  • shenghuan
    shenghuan
    • 8 anos atrás

    Does the data set only contain IN_PLAY = IP or IN_PLAY = PE records?

    • 8 anos atrás
    1. Collie33
      Proprietário do Concurso
      • 8 anos atrás

      For each EVENT_ID you should have one row where IN_PLAY = IP and one row where IN_PLAY = PE.

      The row where IN_PLAY = PE should be a minute before the scheduled off time and the row where IN_PLAY = IP should be the lowest odds figure for the where SELECTION_ID is the same as that for the PE row.

      • 8 anos atrás
  • shenghuan
    shenghuan
    • 8 anos atrás

    Your description is not clear. It does not mention if IN_PLAY = other should be kept.

    • 8 anos atrás
  • shenghuan
    shenghuan
    • 8 anos atrás

    My feeling is freelancer is only for logo design. So only images are allowed to be submitted.\

    • 8 anos atrás
  • shenghuan
    shenghuan
    • 8 anos atrás

    It only allow to submit figures. I change the .csv to .gif or ,png, but cannot submit.

    • 8 anos atrás
  • shenghuan
    shenghuan
    • 8 anos atrás

    An error occurred, please try again later.

    • 8 anos atrás

Mostrar mais comentários

Como começar com concursos

  • Publique seu concurso

    Publique seu Concurso Rápido e fácil

  • Receba muitas inscrições

    Obtenha Toneladas de Inscrições De todo o mundo

  • Premie a melhor inscrição

    Premie a melhor inscrição Baixe os arquivos, é fácil!

Poste um Concurso ou Junte-se a nós Hoje!