Em Andamento

data processing command line software

This software will operate from the command line (bash prefered) and process a set of files in a directory.? Each input file will be used to generate a modified output file. The input file consists of two columns of data.? The second column needs to be parsed into 25 columns and output as a tab delimited text file.? In addition to parsing, the numbers in the column need to be de-scaled and de-normalized according to equations that will be provided.? Statistics also need to be calculated by comparing each of the 25 output columns to a column in a reference file.? All the columns in the reference file need to be includes as the first columns in the output file.? A statistic from the statistical processing will be concatenated onto the name of the input file to create the output file name.

The number of rows in each of the 25 columns can be determined by reading the reference file.? The numerical values for the de-scale and de-normalize operations can be included in this file as well, as opposed to passing them in with the command line arguments.? Sample files will be provided along with the .xls macro that is currently used for processing.? Software should be in C or C++, compiled under cygwin GNU g++ using makefiles.? A java, ruby, script language(perl, python etc.) or database solution using H2 or other freeware database would also be acceptable.? Feel free to make other suggestions.

## Deliverables

This software will be used to process the output of an artificial neural net.? The neural net outputs a predicted value for a set of input patterns (rows) every 100 training epochs.? This output is appended to a file resulting in a single column of data.? There are currently no separators between the sets of output (the column is continuous), although this could be changed if absolutely necessary.

The software will convert this single column of data into multiple columns

(one for each 100 epoch output) in spreadsheet format.? In the included

example, column one of the output will be the first 571 rows, the next column will be rows 572-1142 etc.? The software must read the block size (571 in this example) from the reference file so that any dimension can be processed.? Currently there are 25 columns in the final output, but this number should be flexible.? Along with parsing the single column to multiple columns, some numeric manipulation is necessary.? The numbers have been normalized (z-score) and scaled (from 0.2-0.8).? This must be reversed.? I will provide equations if you do not know how to do this.? The equations may also be taken from the included spreadsheet.

After parsing and de-scale, de-normalize, statistics must be calculated by

comparing the values in each column to values in the reference file.? A

squared correlation coefficient (r2) and a mean absolute error will be

calculated.? Ask for equations if you need them.? There are three subsets of data, train, cross validate and external validate.? In the included example reference file, rows 2-422 are the train rows (T in the group column), rows 423-467 are the cross validate rows (S in the group column) and rows 468-572 are the external validate rows (V in the group column).? Three separate sets of statistics need to be calculated.? If you open the sample output file in excel, you will see the statistics in the cells above each of the 25 columns of output, starting with column F for epoch 100.? These statistics are calculated in the included spreadsheet

[url removed, login to view] The minimum EV-MAE value, in this case cell AD7, must be determined and concatenated with an underscore to the beginning of the output filename.

I think that C or C++ would provide good performance for this, as there could be a large number of files to process.? Input to the command line should be the directory where files are to be processed (./ as default) and the name of the reference file.? Other languages and solutions are permissible, such as java, ruby or a database application.? All components should be freeware unless cleared in advance.? Software needs to be compiled under windows using cygwin (Eclipse if it is a java app).

I am open to other suggestions, just let me know.

Thanks for considering my project,

LMH_medchemist

Sample Files:

sample file to be processed

[url removed, login to view]

reference file to go with input file

[url removed, login to view]

sample output file (output of software)

[url removed, login to view]

.xls file and macro currently used for processing

[url removed, login to view]

plots of final output (for context)

[url removed, login to view]

Habilidades: Programação C, Engenharia, Java, Linux, Microsoft, MySQL, PHP, Gestão de projetos, Python, Arquitetura de software, Teste de Software, Área de trabalho do Windows

Ver mais: the ruby programming language, the first programming language, statistical programming language, statistical programming, software programming languages, sample ruby application, ruby programming language, python programming software, python programming in context, python database programming, programming ruby, programming patterns, programming languages need to know, programming language ruby, programming language performance, programming in ruby, processing programming language, php programming patterns, php eclipse, parsing input, numerical programming, need for data processing, make line, macro programming in excel, java programming training

Acerca do Empregador:
( 39 comentários ) Quincy, United States

ID do Projeto: #3049342

Premiar a:

renardpaul

See private message.

$7 USD em 14 dias
(116 Avaliações)
6.7

3 freelancers estão ofertando em média $59 para este trabalho

setosl

See private message.

$85 USD in 14 dias
(55 Comentários)
5.0
spx2vw

See private message.

$85 USD in 14 dias
(41 Comentários)
4.8