I would like a stand alone program preferably written in VB that would help me prepare data for statistical analysis. I am not looking for statistical analysis software. I want something that will help me prepare the data.
Because I need stock data often, I would like a module that will download stock data from the Yahoo finance site. I know some of these are already available on the market, but the ones I have found do not do everything I like (like automated downloads) and you have to put up with advertising and ugly interfaces. Yahoo provides the data for a click of a button and it is not a difficult thing to program, so I would like to include such a download module as an integrated part of my program.
If I need data from other sources (like interest rate, currency, company cash flow data, etc.) I will get it myself.
The primary function of the program is to prepare data for statistical analysis. For example, to analyze stocks, I might want to have monthly data on prices and merge that data with monthly data on interest rates or currency exchange rates that come from another source. The program would need to merge the interest rate or currency exchange rate data or both by date.
Further, the data may need to be processed before the correct subset of data can be selected. For example, I might want to look only at a subset of stocks whose price has increased (or decreased) X% in the last N days/weeks/months. This would require the creation of calculated fields upon which the data selection is made.
It is possible that I would need to have several calculated fields and even calculated fields based upon calculated fields (e.g., select all stocks whose cash flow is greater than X, whose PE ratio is < average of DOW PE (itself a "nested" calculation), etc.) The calculations should user defined and not hard coded (though the program should "remember" previously devised calculations so I do not have to reinvent the wheel).
4) Stock price data collection module (which downloads stock data from Yahoo!). The module should be capable of downloading a single stock, or multiple stocks, or mutiple stocks from user created lists. The module should prompt for the amount of price history desired. It should check to see if that data has already been downloaded and exists in a database table already and give the options to update only, or to reload all data. It should be configurable so that it can be run manually or automatically on a user defined schedule. It should store the data in an Access or SQL 2000 table.
5) Data Query module that will take data in table or spread sheet form and
a) run summary calculations and create calculated fields. The summary calculations might be something like this: compute average PE ratio of the DOW Industrials stocks. (I anticipate that I would create a field in the table that marks each of the DOW Industrials stocks so all that the program would have to do is make a selection on that field and then compute the average). Likewise I would need calculated fields, such as % change in stock price for day/week/month/year. Both summary calculations and calculated fields should be user definable. The program should provide an option to remember the calculation so that it can be later chosen from a list.
b) select one or more subsets of data using one or multiple queries which may be simple queries or queries based upon the results of the summary calculations and calculated fields. I anticipate going through a process something like this: First run a query selecting only a subset of stocks. This selection might be on a field in the data table or it might be a calculated field. For example, select only those stocks which had a 10% increase in daily price on some day within the last 3 years; or select only the DOW stocks. Then select the stocks from that list whose average weekly continuously compounded rate of return (CCR) from 1 Jan 03 to 31 Mar 03 was > X. Now compute the the CCR from 31 Mar 03 to today for each stock to today. Merge this data for later statistical analysis to determine if there is a correlation between the prior quarter performance and the subsequent daily, weekly, monthly, quarterly performance. It is very important to be able to run multiple rather simple queries against the data to separate the wheat from the chaff because a simple query is easier to create and because it is less prone to errors than a complex SQL query involving multiple joined tables, etc.
c) merge the resulting result sets of the different data tables or spread sheets into a single data table or spreadsheet for further statistical analysis. Export of data should optionally include export of the calculated fields. The program needs to check the merged data to make sure the merge is correct. For example, for time series data the program should check when it merges the data to make sure that the dates are matched (a stock may not have traded on a day or week and therefore one cannot simply assume that data for that day/week/month exists). The program should warn when there is a problem with the merge and provide options such as 1) include the data anyway using the nearest data, 2) exclude the data for that missing date, 3) include the data and include a user defined value for the missing data.
Windows 2000 or later; Access, SQL 2000, Excel