Hi Bruce, I've never played with "billions lines of data" and use only??for storage (I use sqlite3 and framework written in java). I keep it simple - one file per one table. So if for example I download a few months of 1-sec bars with "what to show" set to "trades", then it will end up in one file, if it is "what to show" == something else, then it will be a different file. It works for me and covers all my needs. The most complex use case for that storage was to replay few ears of all 500+ components of S&P 500. It took about 2-3 weeks back then to download all the 1-min history bars for more than 10 years (for some ticker more for some less). And I end up with ~40-50GB of db files or so (won't fit ram:). Then I solved the challenge of reading historical data?from?multiple?DB/tables in parallel without making "join" on those huge datasets. I put "prototype" here as a self-sufficient standalone project:? And then spent couple evenings to integrate this into my java "speculant" framework, so it is now part of the "replay" mechanizm. I'll put here and example of generated files for 1 particular contract: I put all the scanners result in one file (one table), then have "symbol_store.db" for all the contract details I've ever seen. When run scanners and getting back another 50 conid's I automatically query ContractDetails on all of those. Also recently added "tags.db" to simply be able to associate a tag (string) to contract id - this is useful to group things around etc. If interested - PM me and I'll send all the "create table" statements for all the mentioned DBs. Cheers, Dmitry Shevkoplyas On Sun, Feb 14, 2021 at 2:06 PM Sean McNamara <tank@...> wrote:
|