¿ªÔÆÌåÓý

ctrl + shift + ? for shortcuts
© 2025 ¿ªÔÆÌåÓý

reqHistoricalTicks() & batching vs. reqTickByTick() vs. reqHistoricalData(), Backtesting vs. Realtime execution


 

Hi,
?
I have discussed this with the API support team, but I am still fuzzy and was not given the most clear answer. I am backtesting a strategy where the data has been collected using reqHistoricalTicks() & whatToShow='TRADES' was selected going back to early 2023 for natural gas. In real time, I imagined I would use reqTickByTick() to get the most recent ticks and act accordingly to the strategy. However, I am aware that reqHistoricalTicks() uses a batching algorithm to batch all ticks to each second I believe. Atleast, that's what the timestamp suggests when received from IB. So, because reqTickByTick() is more granular and I do not believe batches to each second, I am concerned any strategy I form will not be reliable in execution because the data will be too granular and you cannot stream reqHistoricalTicks() in realtime I believe like you can with reqHistoricalData(). The main problem, is my strategy revolves a lot around large "tick" trades. So, more granularity means broken up matched orders, in turn large order sightings are not frequent. In practice, should I just be streaming some sort of 1 second bar using reqHistoricalData() as that is really what reqHistoricalTicks() is giving me? Or am I wrong?
?
Thank you.


 

I do see multiple tick entries for the same timestamp with different price and size values though in my historical tick data that's the problem. So, its not exactly matching to 1 second bars. I need my realtime trade execution to match as close as possible to how my backtested data was collected. Any thoughts?


 

While your post is quite short, you include a lot of interconnected details that probably cause the "fuzziness" and unspecific answers by API support. So let's try and "divide and conquer".

What kind of data does your strategy really need?

I guess your first step is to define exactly what data your strategy really needs. If it needs size information for every individual trade that happens, only the TickByTick feeds can give you that information. But if cumulative volume over a certain time period is sufficient, candles/bars may work, too. But keep in mind that the only real-time bar feed currently available is the 5-second reqRealTimeBars() and there is no 1 second real-time feed. But you could easily make one from a TickByTick feed yourselves.

Since you specifically mention? 'large "tick" trades', I am going to assume that none of the bar feeds will work for you and that you need to use TickByTick feeds.

Keep in mind, though, that you see only the TRADES (level 1) not the ORDERS (level 3). A single order for, say, 10,000 units, will probably be filled from many smaller trades and likely at several different prices. You will see those individual trades but never one with "size == 10,000". Having said that, you can detect from reqTickByTickData(TRADES) feeds when something out of the ordinary goes on.

?

Back-test with the same data that your live trades gets

Assume that the various data feeds provide different values so you must make sure that you back test with the same (equivalent) data that your live trader would experience. I suggest you make a small client that records next week's reqTickByTickData(TRADES), download the reqHistoricalTicks() next weekend, and compare the results.

For many instruments, reqHistoricalTicks() will provide additional data that is not be part of the real-time reqTickByTickData(TRADES) feed for the same period. Those are generally "non-reportable" trades and can be filtered out by looking at the specialConditions field of the HistoricalTickLast objects. You should be able to convert downloaded reqHistoricalTicks() into feeds that are almost identical to what reqTickByTickData(TRADES) would have provided.

?

Make your trading logic truly TickByTick

Set your trading logic up such that it is not aware of where the data actually comes from.

During a live session you will receive one tick at a time (though possibly hundreds or even thousands of ticks during busy seconds) so that your trading logic would provide an event handler interface such as "onNextTick( TickByTickLast last )" that consumes each individual tick when it arrives. You would wire that event handler up to a reqTickByTickData(TRADES) feed in a live session or, during back testing, feed it one tick at a time from a previously recorded reqTickByTickData(TRADES) file or from a (filtered) reqHistoricalTicks() download.

And when you back-test, you obviously want that to run as quickly as possible and many times faster than real-time. So the "chunked" download behavior of reqHistoricalTicks() does not really matter. BTW, last time I checked IBKR allowed you to request up to 1,000 ticks for each call to reqHistoricalTicks(), though you would always receive data for full seconds so that you occasionally receive more than 1,000 ticks.

?

Time stamps in TickByTick data

Most of IBKR's time stamps still have a resolution of 1 second and that includes TickByTick data (live and historical). But that is generally no issue since the logic consumes one tick at a time anyway. During live feeds you can time stamp the arrival time with a higher resolution time stamp (I use the Java Instant with a precision of 1 nanosecond, but a realistic resolution on my server of 1 microsecond) and get a rough feel for the spacing between trades within the second. For recorded feeds or historical download you just need to make sure that you save the ticks in the order your receive them.

?

I guess I could write an entire book on this topic, but I hope these thoughts help.

´³¨¹°ù²µ±ð²Ô

?

?
?
On Sat, Jan 11, 2025 at 02:11 PM, Brendan Lydon wrote:

Hi,
?
I have discussed this with the API support team, but I am still fuzzy and was not given the most clear answer. I am backtesting a strategy where the data has been collected using reqHistoricalTicks() & whatToShow='TRADES' was selected going back to early 2023 for natural gas. In real time, I imagined I would use reqTickByTick() to get the most recent ticks and act accordingly to the strategy. However, I am aware that reqHistoricalTicks() uses a batching algorithm to batch all ticks to each second I believe. Atleast, that's what the timestamp suggests when received from IB. So, because reqTickByTick() is more granular and I do not believe batches to each second, I am concerned any strategy I form will not be reliable in execution because the data will be too granular and you cannot stream reqHistoricalTicks() in realtime I believe like you can with reqHistoricalData(). The main problem, is my strategy revolves a lot around large "tick" trades. So, more granularity means broken up matched orders, in turn large order sightings are not frequent. In practice, should I just be streaming some sort of 1 second bar using reqHistoricalData() as that is really what reqHistoricalTicks() is giving me? Or am I wrong?
?
Thank you.


 

Jurgen,
?
Yes definitely a complicated task with a lot of considerations both in backtest and application. I do need to compare the 2 functions' data streams of reqHistoricalTicks() and reqTickByTick() and check feasibility there. My strategy is similar to another classification machine learning model I wrote for 1 minute OHLCV bars which I used to map a bunch of different features and ran in realtime. That was straight forward, as you can stream reqHistoricalData() in realtime for each new bar update on the minute every minute and feed it through the model to get your prediction on the next candle: up or down, and form a strategy around existing positions that way. With this though, I plan to center my classification model around 'large' orders to answer the same question though: up or down, but I want to pass a new x-vector through my trained model on each 'large' order. Assume its a basic logistic regression binary classification problem, so the x-vector I pass into the model will make a signal every time the large order threshold is met. I will obviously set a threshold for 'large', probably dynamic based on current tick-by-tick volumes, say outside 2 standard deviations for example. I want to avoid using bar like data structures like tick-bars, volume bars, imbalance bars, etc. I said a basic logistic regression binary classification model, but in reality it will be probably end up being a multi-class classification model using an LSTM like architecture with classes like: long, short, do nothing. And pass a multi-dimensional vector packing in all tick data and associated, mapped features between large-orders into the next signal. None of the strategy matters though if the data I use in real time is not as close to the data I back test on. And I am worried that a 'large' order defined by the data I am using for the backtest, collected via reqHistoricalTicks(), will approximate parameters for the features I come up with based on values I may never see in realtime if I then decide to use reqTickByTick() making the backtest obsolete with this current data I have. Collecting enough reqTickByTick() data could potentially take years to have a valid dataset to train on if I started collecting now. I hope that clears up my intentions for the strategy and I appreciate your thoughts.
?
Thank you,
Brendan


 

Also, most of my live strategies are written in a C++ Client application I wrote for IB communications, etc. It's built with Cmake to run on linux. But backtesting is done in python.


 

Sounds similar to what I have been banging my head against.

As I said in my post. You should be able to filter the additional ticks out of reqHistoricalTicks() and convert them to streams that are identical to recorded reqTickByTick() streams for the same period. Just try it for a day or two next week.

I have no recorded natural gas ticks, but here the trade profile for ESH5 last week from recorded reqTickByTick() data:
  • There were 2.5Mio trades last week with a total volume of just shy of 7Mio contracts
  • 55% of the trades and 20% of the total volume came from trades with a size of 1
  • 90% of the trades had a size of 5 or less while 90% of the total volume came from trades with sizes of 25 or less.
  • You see there is a "long tail" ending with a trade of 1,895 contracts (just over $555Mio) where you can find "large" trades
´³¨¹°ù²µ±ð²Ô
?

?

?

?
?
?
?
On Sat, Jan 11, 2025 at 09:38 PM, Brendan Lydon wrote:

Jurgen,
?
Yes definitely a complicated task with a lot of considerations both in backtest and application. I do need to compare the 2 functions' data streams of reqHistoricalTicks() and reqTickByTick() and check feasibility there. My strategy is similar to another classification machine learning model I wrote for 1 minute OHLCV bars which I used to map a bunch of different features and ran in realtime. That was straight forward, as you can stream reqHistoricalData() in realtime for each new bar update on the minute every minute and feed it through the model to get your prediction on the next candle: up or down, and form a strategy around existing positions that way. With this though, I plan to center my classification model around 'large' orders to answer the same question though: up or down, but I want to pass a new x-vector through my trained model on each 'large' order. Assume its a basic logistic regression binary classification problem, so the x-vector I pass into the model will make a signal every time the large order threshold is met. I will obviously set a threshold for 'large', probably dynamic based on current tick-by-tick volumes, say outside 2 standard deviations for example. I want to avoid using bar like data structures like tick-bars, volume bars, imbalance bars, etc. I said a basic logistic regression binary classification model, but in reality it will be probably end up being a multi-class classification model using an LSTM like architecture with classes like: long, short, do nothing. And pass a multi-dimensional vector packing in all tick data and associated, mapped features between large-orders into the next signal. None of the strategy matters though if the data I use in real time is not as close to the data I back test on. And I am worried that a 'large' order defined by the data I am using for the backtest, collected via reqHistoricalTicks(), will approximate parameters for the features I come up with based on values I may never see in realtime if I then decide to use reqTickByTick() making the backtest obsolete with this current data I have. Collecting enough reqTickByTick() data could potentially take years to have a valid dataset to train on if I started collecting now. I hope that clears up my intentions for the strategy and I appreciate your thoughts.
?
Thank you,
Brendan


 

Will do this week. Interesting finds there on /ES. Do you also work with machine learning? If so, how has your experience been with it?

On Sat, Jan 11, 2025 at 11:30?PM ´³¨¹°ù²µ±ð²Ô Reinold via <TwsApiOnGroupsIo=[email protected]> wrote:

Sounds similar to what I have been banging my head against.

As I said in my post. You should be able to filter the additional ticks out of reqHistoricalTicks() and convert them to streams that are identical to recorded reqTickByTick() streams for the same period. Just try it for a day or two next week.

I have no recorded natural gas ticks, but here the trade profile for ESH5 last week from recorded reqTickByTick() data:
  • There were 2.5Mio trades last week with a total volume of just shy of 7Mio contracts
  • 55% of the trades and 20% of the total volume came from trades with a size of 1
  • 90% of the trades had a size of 5 or less while 90% of the total volume came from trades with sizes of 25 or less.
  • You see there is a "long tail" ending with a trade of 1,895 contracts (just over $555Mio) where you can find "large" trades
´³¨¹°ù²µ±ð²Ô
?

?

?

?
?
?
?
On Sat, Jan 11, 2025 at 09:38 PM, Brendan Lydon wrote:
Jurgen,
?
Yes definitely a complicated task with a lot of considerations both in backtest and application. I do need to compare the 2 functions' data streams of reqHistoricalTicks() and reqTickByTick() and check feasibility there. My strategy is similar to another classification machine learning model I wrote for 1 minute OHLCV bars which I used to map a bunch of different features and ran in realtime. That was straight forward, as you can stream reqHistoricalData() in realtime for each new bar update on the minute every minute and feed it through the model to get your prediction on the next candle: up or down, and form a strategy around existing positions that way. With this though, I plan to center my classification model around 'large' orders to answer the same question though: up or down, but I want to pass a new x-vector through my trained model on each 'large' order. Assume its a basic logistic regression binary classification problem, so the x-vector I pass into the model will make a signal every time the large order threshold is met. I will obviously set a threshold for 'large', probably dynamic based on current tick-by-tick volumes, say outside 2 standard deviations for example. I want to avoid using bar like data structures like tick-bars, volume bars, imbalance bars, etc. I said a basic logistic regression binary classification model, but in reality it will be probably end up being a multi-class classification model using an LSTM like architecture with classes like: long, short, do nothing. And pass a multi-dimensional vector packing in all tick data and associated, mapped features between large-orders into the next signal. None of the strategy matters though if the data I use in real time is not as close to the data I back test on. And I am worried that a 'large' order defined by the data I am using for the backtest, collected via reqHistoricalTicks(), will approximate parameters for the features I come up with based on values I may never see in realtime if I then decide to use reqTickByTick() making the backtest obsolete with this current data I have. Collecting enough reqTickByTick() data could potentially take years to have a valid dataset to train on if I started collecting now. I hope that clears up my intentions for the strategy and I appreciate your thoughts.
?
Thank you,
Brendan


 

I also found the same issue as you where the historical ticks are rounded to the 1 second, and after much searching couldn't find any way to get higher resolution from ibkr.
?
I wasn't sure that I could trust that the order of the ticks would be the same order that they would have come in via the live tickbytick method.. so I just went and bought historical ticks from databento for the backtest time range.
?
It's a bit of a pain to use another data source, map it all, etc.. but at least now I'm confident that the data is accurate. Good luck.


 

So, in real-time though, assuming your backtest went well and you are now running in realtime, you are using reqTickByTick()? Or are you using a live datastream from databento, if it's even offered, and just submitted orders through IBKR?


 

Yea in realtime I'm just getting the ticks from ibkr via reqTickByTick. Databento does provide live data but I have not subscribed to it.
?
Apologies, it is relevant that the data is not consistent between the two providers... and that data bento has more trades. I haven't sorted out why this is aside from reading in a few places that ibkr shouldn't really be trusted to reliably provide tick data.?
?
Here's an example, these are tick_type=48 from ibkr vs trade ticks from dbento.
?
| dt | ticks_ibkr | ticks_ibkr_size | ticks_dbento | ticks_dbento_size |
|------------|------------|----------------|--------------|------------------|
| 2024-12-22 | 12744 | 30344 | 23458 | 30344 |
| 2024-12-23 | 136348 | 486266 | 340921 | 486486 |
| 2024-12-25 | 6701 | 16119 | 11993 | 16119 |
Note that the reported sum of volume (size columns) are the same, but the count of rows differs.?
?
Couple notes:
- for the actual contract traded under continuous futures, ibkr chooses the front running contract with a fuzzy transition period by highest volume during the transition period. I have not found a way to find with certainty which contract they used exactly on which date.. so I've had to make a best guess by trade volume
- while the prices are very close, they are not identical and it appears that ibkr seems to average ticks a little bit (not completely sure here)
- Since I'm using these ticks for order execution simulation I figured that higher resolution would be better regardless of what I get in realtime from ibkr -- this may not be the same for you since you're using the ticks for predictions
?
The tick data for NQ from databento was about $30 for the last 3 months, I suggest checking it out and comparing. Hope that helps.


 

Shoot i was hoping markdown formatted tables worked. :/


 

Okay thank you for clarifying. $30 for 3 months is not bad at all. I checked out Portara CQG a couple months back for /NQ & /ES and I believe it was $895 per instrument going back to 2014 for level 1 tick data, that is matched order info & what NBBO was @ time of execution. That is also obviously terabytes of data, yet I still might go with them, but then I still run into the real-time problem although I think they may have real-time tick streams. Continuing discussion with them. I mean IBKR states that they're data is not reliable and they do recommend other data providers for tick data. Might just have to use them for what they are, a broker, and just submit my orders through them. Definitely means more work though to pull all these things together in a live, real-time system. Hesitant to hack away until I have a solid, actionable plan with this tick data problem though.
?
Thank you,
Brendan


 

I am not disputing your results, but you are now comparing reqMktData() tick #48 "RtVolume" with the tick-by-tick feed from Databento.? But our discussion here is about reqTickByTick() which is entirely different and independent from reqMktData( RtVolume ). Have you compared IBKR TickByTick data (historical or realtime) with Databento tick feeds?

The few times I had access to high-res tick-by-tick feed samples from trustworthy sources, IBKR reqTickByTick(LAST) showed the same price/size ticks in the same order as the comparable feeds. However, these were spot checks and only for CME/CBOT futures so I cannot guarantee that IBKR tickByTick feeds for all symbols and all exchanges are complete.

The 1-second resolution for time stamps in historical and real-time tickByTickData does not pose an issue for many applications since the order of tick arrival seems to be consistent and correct within the second:

  • If you record live feeds, you can tag each data arrival with a serial number and a high resolution arrival time stamp (as I do) so that you remember not only the temporal order of ticks for each symbol but also the order across all symbols you record. My "session tape" contains all available reqMktData() ticks and 5 second reqRealTimeBars() for many instruments, reqTickByTick(Lastt and BidAsk) for some instruments, and L2 market depth for few instruments. That "tape" had 404 million ticks last week but there are weeks with much more than that. Depending on the task at hand, the arrival tags allow me to replay the entire tape or filtered sub-streams at "real time", much faster, or in "slomo".
  • If you are only interested in TickByTick trades, filtered reqHistoricalTicks can be used as an equivalent replacement for reqTickByTick live data.
  • But you will not be able to combine different reqHistoricalTick downloads into a consistent feed at the sub-second level (say Trades and BidAsk for a single instrument, or Trades from multiple instruments). But you would probably need a nanosecond resolution and precision time stamp to be able to do that accurately. Milliseconds won't do it, but that moves you into an entirely different class of specialized data providers. Keep in mind that IBKR is a brokerage, some exchanges do not event provide that resolution, few data servicers do, and you mush hope that they all have perfectly synchronized clocks.
´³¨¹°ù²µ±ð²Ô
?
?
?
On Sun, Jan 12, 2025 at 12:08 PM, wordd wrote:

Yea in realtime I'm just getting the ticks from ibkr via reqTickByTick. Databento does provide live data but I have not subscribed to it.
?
Apologies, it is relevant that the data is not consistent between the two providers... and that data bento has more trades. I haven't sorted out why this is aside from reading in a few places that ibkr shouldn't really be trusted to reliably provide tick data.?
?
Here's an example, these are tick_type=48 from ibkr vs trade ticks from dbento.
?
| dt | ticks_ibkr | ticks_ibkr_size | ticks_dbento | ticks_dbento_size |
|------------|------------|----------------|--------------|------------------|
| 2024-12-22 | 12744 | 30344 | 23458 | 30344 |
| 2024-12-23 | 136348 | 486266 | 340921 | 486486 |
| 2024-12-25 | 6701 | 16119 | 11993 | 16119 |
Note that the reported sum of volume (size columns) are the same, but the count of rows differs.?
?
Couple notes:
- for the actual contract traded under continuous futures, ibkr chooses the front running contract with a fuzzy transition period by highest volume during the transition period. I have not found a way to find with certainty which contract they used exactly on which date.. so I've had to make a best guess by trade volume
- while the prices are very close, they are not identical and it appears that ibkr seems to average ticks a little bit (not completely sure here)
- Since I'm using these ticks for order execution simulation I figured that higher resolution would be better regardless of what I get in realtime from ibkr -- this may not be the same for you since you're using the ticks for predictions
?
The tick data for NQ from databento was about $30 for the last 3 months, I suggest checking it out and comparing. Hope that helps.


 

Where have you seen that?
?
On Sun, Jan 12, 2025 at 01:10 PM, Brendan Lydon wrote:

.... I mean IBKR states that they're data is not reliable and they do recommend other data providers for tick data ...
?
?


 

I was in talks with Portara CQG in November of 2024 about purchasing a large amount of tick data for /NQ and /ES. They may not have live tick feeds, as I am looking over their website again right now if that's what you are asking about. Reopened the thread with them though to see if they have some sort of software solution there. They seemed to work to your needs and were quite accommodating in our last discussion.


 

My apologies I did not see your quoted text. This is right from the documentation:

Pacing Violations for Small Bars (30 secs or less)

Although Interactive Brokers offers our clients high quality market data, IB is not a specialised market data provider and as such it is forced to put in place restrictions to limit traffic which is not directly associated to trading. A Pacing Violation occurs whenever one or more of the following restrictions is not observed:
  • Making identical historical data requests within 15 seconds.
  • Making six or more historical data requests for the same Contract, Exchange and Tick Type within two seconds.
  • Making more than 60 requests within any ten minute period.
  • Note that when BID_ASK historical data is requested, each request is counted twice. In a nutshell, the information above can simply be put as ¡°do not request too much data too quick¡±
Important: these limitations apply to all our clients and it is not possible to overcome them. If your trading strategy¡¯s market data requirements are not met by our market data services please consider contacting a specialized provider.
?
This is what I was referring too, not exactly saying it's unreliable, but they say they're a broker not a data provider.
?
?
?
?


 

Just a note that the price I mentioned was for trade ticks only. The bid/ask and book ticks are more.


 

Yes agreed it does appear that the sequence they arrive is correct as long as it's maintained. Good idea tagging them with a serial number.... I'm gonna add that for good measure.
?
?
-----
?
Side note (just because I'm looking at it) in case helpful for anyone, here's a chart of NQ with the prices from each source. Similar but not identical, where bento seems to have higher price resolution and ibkr sorta averages ticks.
?


 

Very insightful discussion, when rounding to the 1 second, does it means that any tick with a "real" timestamp with be rounded to the second based on of +/-500ms or is it more of a ceiling/floor/truncation method?
In other words assuming 2 ticks scenario:
if a real tick timestamp is 11h05mn48.5001 (hh:mn:ss.ms) will the corresponding IBKR timestamp be 11h05mn49s? (1st arrival)
if a real tick timestamp is 11h05mn49.4999 (hh:mn:ss.ms) will the corresponding IBKR timestamp be 11h05mn49s? (2nd arrival)


 

Good question. See for yourself when calling reqTickByTick(), try converting the time from an epoch to a timestamp with the format -> 'hh:mm:ss.ms'. Then call reqHistoricalTicks() and see a comparison to try and bridge the gap with how they are doing this and try to decipher if its doing a ceiling, floor, or truncation based on the summed tick size/prices, etc. I have not messed around with that yet, but may try later when I get the time. I will post my results here when I do assuming you do not find out yourself. If you do try it, posting your results would be appreciated. Hopefully I am understanding your question correctly.