Em Andamento

Weblog PPC Parser

Log PPC program

Require a stand-alone program to parse a raw log file to accomplish the following.

The files have the form of (with no breaks between the lines):

[url removed, login to view] - - [30/Sep/2013:12:16:15 +0000] "GET / HTTP/1.1" 302 225 "-" "Pingdom.com_bot_version_1.4_([url removed, login to view])"

[url removed, login to view] - - [30/Sep/2013:12:16:33 +0000] "GET /images/headers/[url removed, login to view] HTTP/1.1" 304 - "-" "Mozilla/5.0 (compatible; YandexImages/3.0; +[url removed, login to view])"

[url removed, login to view] - - [30/Sep/2013:12:26:59 +0000] "GET /go-imi/[url removed, login to view]{adtype} HTTP/1.1" 200 7744 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +[url removed, login to view])"

1. Parse each line looking for 'GET / HTTP' and replace with 'GET /[url removed, login to view] HTTP' as in the first line above.

2. Parse each line looking for the following strings as shown after the first "GET":

a. 'GET /go-imi/'

b. 'GET /MSN/

3. For each line with either of the above phrases, between ']' and 'GET', insert a string "ADWORDS" (with quotes) when 'go-imi' is found and "BING" (with quotes) when 'MSN' is found.

4. For each line following (not preceding) those lines with 'go-imi' or 'MSN' in them with the same IP address (first field) also write "ADWORDS" or "BING" as appropriate. These lines may be adjacent or may be widely separated in the file if the person came back to the website days later. Note the data span will not exceed one month though the first dates in the file may be from the last day of the previous month.

5. For any line not meeting the above, add "-" (with quote marks) in the same location as an empty field place holder

6. Only write lines containing fetches to 'htm' or 'html' files (following the initial "GET") to the output file. That is, any fetch for support files such as images (jpg, gif, etc.), css, js, and so forth, would not be written out. Only lines with page fetches such as the second line in the following example where the file type is .htm or .html. Also write out the lines where '[url removed, login to view]' was inserted as in 1) above.

For example:

Original log file -

[url removed, login to view] - - [30/Sep/2013:13:03:18 +0000] "GET / HTTP/1.1"

[url removed, login to view] - - [30/Sep/2013:13:03:18 +0000] "GET /go-imi/[url removed, login to view]

[url removed, login to view] - - [30/Sep/2013:13:03:20 +0000] "GET /css/[url removed, login to view] HTTP/1.1"

[url removed, login to view] - - [30/Sep/2013:13:03:18 +0000] "GET /[url removed, login to view]

would write out as

[url removed, login to view] - - [30/Sep/2013:13:03:18 +0000] "-" "GET /[url removed, login to view] HTTP/1.1"

[url removed, login to view] - - [30/Sep/2013:13:03:18 +0000] "ADWORDS" "GET /go-imi/[url removed, login to view]

[url removed, login to view] - - [30/Sep/2013:13:03:18 +0000] "-" "GET /[url removed, login to view]

The files can have in excess of 1 million lines of data and each line can be quite long within the limits of the internet protocol standards.

7. The program shall run under Windows 7 and 8.

8. A GUI shall be provided to allow a standard Windows Explorer interface for selecting the input and output file name. The interface will remember the last entries that were used. A status bar or some indication of program activity shall be provided.

9. The program should be able to be launched from a desktop shortcut which pulls up the GUI for file name input.

10. The program should be a stand-alone program not requiring loading any other program to run. That is, should be a .exe or similar. May require support files to be installed on the computer such as .net but would prefer a complete package.

10. The output should be a text file in the same format as the input file as shown above.

11. Documented source files shall be provided for future support.

Habilidades: Vale Tudo, Programação C, Programação C++

Ver mais: future internet programming, strings standard, programming quotes, programming gif, line quotes computer programming, interface computer programming, html computer programming, future computer programming, forth programming, first computer programming, span programming, computer programming quotes, computer programming html, portable package, yandex, ppc bing, loading gif, imi, bing ppc, explorer interface, found log files, log bing, html page stand alone, shortcut desktop program, status bar css

Acerca do Empregador:
( 24 comentários ) San Diego, United States

ID do Projeto: #5096898