Keyboard Shortcuts
Likes
- Twsapi
- Messages
Search
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
I'm locking this thread now. It's really off-topic, and there are probably as many answers to it as there are members of the group! Please, folks, try to remember what the purpose of this group is: to help people who are having problems with the TWS API. It's not a free-for-all discussion on any topic that's in some way related (or even unrelated) to the API. You need to make your own decisions about what technologies to use. Members who persist in making off-topic posts or who appear not to be willing to make an effort to find answers themselves before asking the group, are like to be put back on moderation. Richard King Group Owner and Moderator |
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
Hi Bruce, ? I use pandas(python) to and from csv and that works for me. My datasets are not in the GB size and hence ymmv depending on whether you are trading?at the tick level. -Ajay On Sun, Feb 14, 2021 at 2:19 PM Amaganset <joe.paoloni@...> wrote:
|
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
MSAcess
|
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
Hi Bruce, I've never played with "billions lines of data" and use only??for storage (I use sqlite3 and framework written in java). I keep it simple - one file per one table. So if for example I download a few months of 1-sec bars with "what to show" set to "trades", then it will end up in one file, if it is "what to show" == something else, then it will be a different file. It works for me and covers all my needs. The most complex use case for that storage was to replay few ears of all 500+ components of S&P 500. It took about 2-3 weeks back then to download all the 1-min history bars for more than 10 years (for some ticker more for some less). And I end up with ~40-50GB of db files or so (won't fit ram:). Then I solved the challenge of reading historical data?from?multiple?DB/tables in parallel without making "join" on those huge datasets. I put "prototype" here as a self-sufficient standalone project:? And then spent couple evenings to integrate this into my java "speculant" framework, so it is now part of the "replay" mechanizm. I'll put here and example of generated files for 1 particular contract: I put all the scanners result in one file (one table), then have "symbol_store.db" for all the contract details I've ever seen. When run scanners and getting back another 50 conid's I automatically query ContractDetails on all of those. Also recently added "tags.db" to simply be able to associate a tag (string) to contract id - this is useful to group things around etc. If interested - PM me and I'll send all the "create table" statements for all the mentioned DBs. Cheers, Dmitry Shevkoplyas On Sun, Feb 14, 2021 at 2:06 PM Sean McNamara <tank@...> wrote:
|
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
You would be better off using 4K blocks as that's the size of an inode on disk.? The next size a filesystem uses is 2MB which is 1 parent inode filled with 64 bit pointers to full 4K inodes.? Those are the fastest sizes of data to write to most filesystems. Hunter
On Sunday, February 14, 2021, 12:43:16 PM PST, btw <newguyanon@...> wrote:
I use gzipped text files, basically just as they come from IB.? Gzip can compress on the fly so I append 32k blocks when I have enough data so it doesn't write all the time.? I get about 7* compression on my data which increases to 8* if I recompress an existing file.? Reading is faster for compressed files.? I use a normal ssd. ? ? ? ? ? ? if (rtvBytes > 16000 || isForce){//32k block size but prefer more writes than less, force on close, still good compression
? ? ? ? ? ? ? ? try (GZIPOutputStream gzout = new GZIPOutputStream(new FileOutputStream(outf, true))) {//true append
? ? ? ? ? ? ? ? ? ? for (; idxF < indVals.size(); idxF++) {
? ? ? ? ? ? ? ? ? ? ? ? RTV rtv = (RTV) indVals.get(idxF).misc;
? ? ? ? ? ? ? ? ? ? ? ? gzout.write(rtv.toSCSV(fmtStr).getBytes());
? ? ? ? ? ? ? ? ? ? }
?
?oops maybe I use 16k blocksI don't need any specific database queries since I just ask for a full month/year/contract etc. at a time.? If you need one symbol for one week and another for another week in reverse order or something then a database makes sense.? ?Obviously this never happens. For historical 5 sec bars which I get some every week I gzip them all at the end.? If you screw up and get too many or forget a week and get them out of order the file is screwed up.? So when loading I put them in a set then sort (all by date).? I was going to write code to detect errors and save the fixed file but I never bothered as it takes just a few msecs to fix every time. I think they would all be called timeseries since I save a timestamp. |
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
I use gzipped text files, basically just as they come from IB.? Gzip can compress on the fly so I append 32k blocks when I have enough data so it doesn't write all the time.? I get about 7* compression on my data which increases to 8* if I recompress an existing file.? Reading is faster for compressed files.? I use a normal ssd.
? ? ? ? ? ? if (rtvBytes > 16000 || isForce){//32k block size but prefer more writes than less, force on close, still good compression
? ? ? ? ? ? ? ? try (GZIPOutputStream gzout = new GZIPOutputStream(new FileOutputStream(outf, true))) {//true append
? ? ? ? ? ? ? ? ? ? for (; idxF < indVals.size(); idxF++) {
? ? ? ? ? ? ? ? ? ? ? ? RTV rtv = (RTV) indVals.get(idxF).misc;
? ? ? ? ? ? ? ? ? ? ? ? gzout.write(rtv.toSCSV(fmtStr).getBytes());
? ? ? ? ? ? ? ? ? ? }
?
?oops maybe I use 16k blocksI don't need any specific database queries since I just ask for a full month/year/contract etc. at a time.? If you need one symbol for one week and another for another week in reverse order or something then a database makes sense.? ?Obviously this never happens. For historical 5 sec bars which I get some every week I gzip them all at the end.? If you screw up and get too many or forget a week and get them out of order the file is screwed up.? So when loading I put them in a set then sort (all by date).? I was going to write code to detect errors and save the fixed file but I never bothered as it takes just a few msecs to fix every time. I think they would all be called timeseries since I save a timestamp. |
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
If you are doing large scale backtesting with the full set sets of data then the suggestion from ds-avatar is a good one as the Parquet file format is quite nice.?
If you are looking for a more database-centric approach, I've had great luck using PostgreSQL () with the TimescaleDB () module enabled.? I like the fact that you get excellent compression of data, that you can generate on-demand subsets of history (vs. file-based persistence?like Parquet), and that the interface to the data is normal SQL queries. ?
I've played around with a few other options such as InfluxDB, but found that for my personal use-case TimescaleDB was preferable.
?
It's not clear how you intend to interact with the data, and those specifics will most likely point you in the right direction. |
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
Thanks for the feedback.?I will read on Parquet. I am surprised by the sheer amount of companies tackling storage issue. Another tool with quiet a bit of bragging vs Redia is Tarantool that I have read about. They solve the hot/cold issue with cache apparently... It seems picking a storage technology is a whole project in itself nowadays. Reading just a bit about kdb+ it seems pretty interesting how simple and high level and meaningful the syntax is and also column saving instead of row saving makes sense. *I wish someone had benchmarked all the top storage systems and had written about all of them in one place specifically for trading use :) Maybe Dmitry and Richard can also give some feedback too. Thanks, On Sun, Feb 14, 2021, 5:02 AM ds-avatar <dimsal.public@...> wrote:
|
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
Trying out parquet. It's binary table file format, good for time series, apparently designed for big data. Only started to store harvested tick data with it recently but I hear lots of good things about its efficiency and Matlab seems to make it easy to operate for out of memory processing. §Ó§ã, 14 §æ§Ö§Ó§â. 2021 §Ô., 5:55 Bruce B <bruceb444@...>: Hello, |
||||
Re: Individual examples for each function of API?
For C#, there is a pretty good sample app coming with the API, source included. It is not perfect due to somewhat outdated and contrived backbone (uses winforms and lacks full support for task based async code, while relying on an internal custom messaging subsystem that is a bit overwhelming) but it is very extensive, well structured and can be used for tinkering and prototyping of minor custom incremental functionality. §Ó§ã, 14 §æ§Ö§Ó§â. 2021 §Ô., 5:52 Bruce B <bruceb444@...>: Hello, |
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
Dean Williams
¿ªÔÆÌåÓýBruceUnfortunately I have no experience with Redis. This gives a short overview of kdb: There is also a developers group for the non-commercial version: Dean
|
||||
Re: Semantic difference between tickByTickBidAsk() and tickByTickAllLast()?
That is not correct. tickPrice() and tickByTickLastAll() are both Level 1. BidAsk is Level 2.
I point you again to
AllLast has additional trade types such as combos, derivatives, and average price trades which are not included in Last. On Sat, Feb 13, 2021 at 08:48 PM, Bruce B wrote:
|
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
I just use CSV files for pricing data.? The size of the pricing data just isn't that large and it makes the complexity of using a DB for pricing data just unnecessary.? I use a DB for other things in my system and I still don't use it for pricing data. If you must, you probably want to use a which is what kdb+ is.? Q is analogous to SQL.? Its not suitable for general purpose programming but its fine for the query language itself.? I imagine you would have to write something in your language to call the Q queries. Redis is probably unsuitable for your needs completely. Hunter
On Saturday, February 13, 2021, 7:31:10 PM PST, Bruce B <bruceb444@...> wrote:
Dean, Thanks for the feedback. Can you expand on this please specially if you have Redis experience and can compare? What of kdb+ is useful for this purpose? Q Language and future analytics capability, speed of read / write, or something else? Thanks, |
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
Dean,
Thanks for the feedback. Can you expand on this please specially if you have Redis experience and can compare? What of kdb+ is useful for this purpose? Q Language and future analytics capability, speed of read / write, or something else? Thanks, |
||||
Locked
Re: What type of storage or db is best for a lot of trading data that streams to you?
Dean Williams
toggle quoted message
Show quoted text
|
||||
Locked
What type of storage or db is best for a lot of trading data that streams to you?
Hello,
Those who record streaming and history data what do you use for storage? And what type of storage do you use for persistent and non-persistent use (on the fly analysis)? Do you use any timeseries? I am talking billions lines of data of course. I like to you hear your experience. Thanks, |
||||
Individual examples for each function of API?
Hello,
Are there any individual / simple examples (single files) for each function listed in APIs posted anywhere (in any of the supported languages)? or a project that is breaks down and makes examples of the whole API or big parts of it? The samples are full of bugs that come with the SDK are full of bugs and cluttered with a lot of things in one place. It's not efficient to separate things out of those to get simple examples running. Thanks, |
||||
Re: Semantic difference between tickByTickBidAsk() and tickByTickAllLast()?
Thanks for the explanation. From IBKR jargons, tickPrice is Level 1 and tickByTickLastAll is Level 2 data. I think the only main question I have left now is what is the difference between tickType = All Last and tickType = Last from tickByTickAllLast() class. Thanks, On Sat, Feb 13, 2021, 8:55 PM JR <TwsApiOnGroupsIo@...> wrote: The main difference is that reqMktData() returns aggregated data snapshots while reqTickByTickData() does not aggregate and reports all relevant events individually. |
||||
Re: Semantic difference between tickByTickBidAsk() and tickByTickAllLast()?
Thanks JR.
toggle quoted message
Show quoted text
"This is very different and distinct from the Tick-By-Tick data interfaces." This is different and distinct in the way request is made versus how tickByTickLastAll request is made or also result is different too? If result is different from BidAskLast/LasAll then how is tickPrice different? Below is how they describe it and it shows "contract traded" so this is really a trade too. How many trades types are there??? there must be only one. Ref:?
Thanks, On Sat, Feb 13, 2021 at 02:04 PM, JR wrote:
|