Hi there! I have 130+ spreadhseets, which have multiple sheets, and some additional data, all in Excel that I would like to be:
1) transformed into a low latency database such as Cassandra. Latency is very very important.
2) I would like the database to be updated automatically, using APIs. I will provide the links.
3) I would also like some webscraping for at least 2 web pages, if not more. The webpages are updated monthly. Also 50 PDFs scraping. I can provide the links. The PDFs are also updated monthly.
4) All the data is either with me or publicly available online from multiple websites.
5) The database architecture should be scalable. So that I can add more data sources in the future.
6) I plan to connect the database to Python and run econometric/datascience/machine learning models on it. So I need that functionality
7) The database has to be extremely low latency as I will also be connecting my stock broker's API to it. It should be able to capture and store live prices of some currencies, stocks and commodities that are in my watchlist. I am open to alternative solutions for this.
8) The data is of different time horizons, so ONE KEY DELIVERABLE would be to have the functionality to roll up the data in different time horizons, such as monthly quarterly and annually, both through linear interpolation downwards and summing up upwards. I will build the front end myself in Tableau.
9) The database and models will run on a purchased machine/server and not on the cloud or online.
10) I will also be connecting Twitter API to build NLP (natural language processing) models, using text data. So the database has to be able to process and store that for the models I build. I want the functionality to pick and choose what is stored and for how long, including updates.
Please let me know of any credible references that I can check. I really want someone who knows what they are doing and want to get this right.
The support from you might become ongoing after a few months of initial deliverable, and there is a chance of regular work as I scale bigger and bigger into the future. This project can generate long term monthly income. I am looking for someone whom I can work with long term, as a team.