¿ªÔÆÌåÓý

ctrl + shift + ? for shortcuts
© 2025 Groups.io

Collecting tick-by-tick price data


 

Below is my code to collect tick-by-tick daily data for SPY. When I run it I get around 40k trades for SPY, but when I check log file I get around 400k trades a day. So some of the data is missing. My first thought is that my code is running too slow and because of api buffer overflow some data gets dumped. Is this accurate guess? Is there way to confirm it? How else would you go about collecting tick-by-tick data?
Minor detail: for illustration purposes I showed only last price request for one ticker, but when I run actual code, there is also bid and ask request for another ticker along with last price request.?

from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract
from ibapi.ticktype import TickTypeEnum
import time
import threading

class TestApp(EWrapper, EClient):
def __init__(self):
EClient.__init__(self, self)
self.last_exchange = []
self.last_market_time = []
self.last_price = []
self.last_size = []

def error(self, reqId, errorCode, errorString):
print("Error: ", reqId, " ", errorCode, " ", errorString)

def tickByTickAllLast(self, reqId, tickType, time, price, size, tickAtrribLast,
exchange, specialConditions):

super().tickByTickAllLast(reqId, tickType, time, price, size, tickAtrribLast,
exchange, specialConditions)


self.last_exchange = self.last_exchange + [exchange]
self.last_market_time = self.last_market_time + [time]
self.last_price = self.last_price + [price]
self.last_size = self.last_size + [size]

app = TestApp()

contract = Contract()
contract.symbol = "SPY"
contract.secType = "STK"
contract.exchange = "SMART"
contract.currency = "USD"
contract.primaryExchange = "ARCA"

app.connect("127.0.0.1", 4002, 0)
time.sleep(1)
app.reqTickByTickData(1 ,contract, "Last", 0, False)
api_thread = threading.Thread(target=app.run)
api_thread.start()




 

I believe you're suffering from the issue discussed here.


 

Thank for the link, buddy!

This is very interesting. I think this is next level for me. For now I would be happy to reconcile why api-log has more data than my python code.


 

That link is not relevant to your problem. It relates to and not to TickByTick data that you are asking for.

When you say you see 400k ticks in the log file, is that the TWS/IBG API log file with market data included? And how do you know your client only saw 40k ticks?

We cannot fix your code for you by email here in the group, but here a few things that come to mind that you could check:
  • You should review all call paths in your client to make sure that tickByTick callbacks properly handle all data presented to them. Maybe add a simple counter so that you know exactly how often the API calls them.
  • Does your code contain any time consuming operations, that prevent it from handling callbacks on a timely basis?
  • You use time.sleep after the connection call. Does your client use sleep anywhere else? Calling time.sleep (in any API language) is a really bad idea and shall not be done.
  • Is your client co-located with TWS/.IBGW in the same server, or is there a network there them?

Your client (even in Python) should be able to handle 400k ticks without breaking a sweat. Our client (while implemented in Java) handles the data streams from 80+ instruments, plus five 15 TickByTick streams (five times Last, BidAsk, and Midpoint each) plus several Level II Market Book feeds. We see between 40Mio and 90Mio ticks per day (500per second average over 24hrs, with peaks of 15k ticks per second and more).

´³¨¹°ù²µ±ð²Ô



On Wed, Sep 14, 2022 at 02:27 PM, GreenGreen wrote:
Thank for the link, buddy!

This is very interesting. I think this is next level for me. For now I would be happy to reconcile why api-log has more data than my python code.


 

Thank you for reply, ´³¨¹°ù²µ±ð²Ô!

I ran my code for whole day and then checked my python list it had 40k entries and then I analyzed log for the same day. The log had 400k price points. I ran nothing else, but my single piece of code. This is how I know.?

  • This is good idea to review call path. I am also going to run my code for my much shorter time frame (minute or two) and check how much data I have in log vs python code. Hopefully, this way I can pinpoint what data points are missing
  • There is really nothing time consuming in my code. The only deficiency I see right now it that I used a = a + [some_value] on Python list, instead of a.append(some_value). Append should be faster.?
  • I do have a sleep in my main thread. I put my main thread to sleep until the end of trading day. However, I do not think it should affect the thread with app.?
  • Yes, my client is on the same machine as IB gateway. There are no network issues.

This is a bit off topic, but decrypting 300MB api-log file takes about 3 hours. This is on Google compute instance with 2 cores. It seems too long.?

Regards


 

´³¨¹°ù²µ±ð²Ô makes extremely valid points. If you're seeing more data in the TWS log file than your Python code is collecting, then you're probably not trapping everything you think you are. Otherwise, a modern computer is quite fast and should generally have enough memory to deal with TWS/IBGW... unless there's something like a memory leak in your code or some inefficient O(exp(x)) algorithm you've written.

As far as the link I provided, I think it's helpful to be aware of in case you are comparing TWS data to another provider's data (I wasn't sure what "log" you were referring to). That said, I'm not sure why it would pertain to reqMktData and not to tickByTickAllLast, perhaps you can elaborate on that ´³¨¹°ù²µ±ð²Ô? It could be helpful for future reference.


 

The post you pointed to is 15 years old, and, at that point, only reqMktData existed. The API Reference Guide chapter ""? describes how sampling/aggregation is handled today. So the post is still relevant for reqMktData (even if sampling is a little faster and smarter now).

was introduced only a few years back and is not sampled or aggregated. Consequently, data volume is huge and the limits of how many simulations subscriptions can exist are much tighter, Every trade (ask/bid) is reported to you as it happens.

If you subscribe, for example, via reqMktData for an instrument, you will never get more that 4 to 6 updates per second even if that instrument is traded heavily. If you subscribe to TickByTiclLast for the same instrument, you might get hundreds if not thousands of ticks in very busy seconds. And you will see every price change including the "102" the post in your link refers to.

´³¨¹°ù²µ±ð²Ô

On Wed, Sep 14, 2022 at 03:37 PM, buddy wrote:

´³¨¹°ù²µ±ð²Ô makes extremely valid points. If you're seeing more data in the TWS log file than your Python code is collecting, then you're probably not trapping everything you think you are. Otherwise, a modern computer is quite fast and should generally have enough memory to deal with TWS/IBGW... unless there's something like a memory leak in your code or some inefficient O(exp(x)) algorithm you've written.

As far as the link I provided, I think it's helpful to be aware of in case you are comparing TWS data to another provider's data (I wasn't sure what "log" you were referring to). That said, I'm not sure why it would pertain to reqMktData and not to tickByTickAllLast, perhaps you can elaborate on that ´³¨¹°ù²µ±ð²Ô? It could be helpful for future reference.


 

Helpful, thanks!


 

It looks like I found culprit: thread locking. I pasted full code below. I implemented thread locking in order to make sure that list of tick data gets fully update before it is analyzed for any trade opportunity. Without this lock I have seen cases where price is update, but size is not, so I was getting two lists of different length. Now, however, this thread locking prevents list from fully updating. I ran the code for 5 minutes and Python script accumulated only 3622 rows, while log file collected 3631. I repeated the exercise. but with thread locking commented and got the same number of lines in both Python script and log. I am a bit new to writing threaded code, anybody have any suggestion how to fix it?


from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract
from ibapi.ticktype import TickTypeEnum
import time
import datetime
from pytz import timezone
import threading
import pandas as pd

EST_TZ = timezone('US/Eastern')
GMT_TZ = timezone('GMT')


class TestApp(EWrapper, EClient):
def __init__(self):
EClient.__init__(self, self)
self.last_ticker = []
self.last_exchange = []
self.last_market_time = []
self.last_server_time = []
self.last_price = []
self.last_size = []
self.last_tick_type = []
self.last_past_limit = []
self.last_unreported = []
self.last_special_cond = []

self.lock_last = threading.Lock()


def error(self, reqId, errorCode, errorString):
print("Error: ", reqId, " ", errorCode, " ", errorString)

def tickByTickAllLast(self, reqId, tickType, time, price, size, tickAtrribLast,
exchange, specialConditions):

super().tickByTickAllLast(reqId, tickType, time, price, size, tickAtrribLast,
exchange, specialConditions)

self.update_last_list( reqId, tickType, time, price, size, tickAtrribLast,
exchange, specialConditions)



def update_last_list(self, reqId, tickType, time, price, size, tickAtrribLast,
exchange, specialConditions):

self.lock_last.acquire()

self.last_ticker = self.last_ticker + [reqId]
self.last_exchange = self.last_exchange + [exchange]
self.last_server_time = self.last_server_time + [datetime.datetime.now(EST_TZ)]
self.last_market_time = self.last_market_time + [GMT_TZ.localize(datetime.datetime.fromtimestamp(time)).astimezone(EST_TZ)]
self.last_price = self.last_price + [price]
self.last_size = self.last_size + [size]
self.last_tick_type =self.last_tick_type + [tickType]
self.last_past_limit = self.last_past_limit + [tickAtrribLast.pastLimit]
self.last_unreported = self.last_unreported + [tickAtrribLast.unreported]
self.last_special_cond = self.last_special_cond + [specialConditions]

self.lock_last.release()

app = TestApp()

contract = Contract()
contract.symbol = "SPY"
contract.secType = "STK"
contract.exchange = "SMART"
contract.currency = "USD"
contract.primaryExchange = "ARCA"


app.connect("127.0.0.1", 7497, 0)
time.sleep(1)
app.reqTickByTickData(1 ,contract, "Last", 0, False)
api_thread = threading.Thread(target=app.run)
api_thread.start()
time.sleep(300)
app.cancelTickByTickData(1)
app.disconnect()


 

It sounds like you found your problem so I don't know exactly what you're trying to fix here.

I can, however, offer advice w.r.t. writing multi-threaded code... avoid it like the plague! Lol, if you can't, at least avoid low-level locking/unlocking (acquire/release). Seriously, give major consideration to the need/requirement and the design.

You might find you only need multi-processing, not multi-threading... using a simple fork can save a ton of effort in this case. Knowing when to use which requires experience and if you're asking yourself basic questions about multi-threading you're better off multi-processing first; then seeing if it's worth optimizing. Don't go down a rabbit hole of a poorly thought out multi-threaded design because the cool kids are doing it.

If you've gotten this far, the keyword can be your friend. I think it turns out easier in the long run. In addition, trying to optimize critical sections on a very granular basis (w/ lock/unlock, etc) is generally a fool's errand... there's not that much to gain from a performance aspect IMHO. Some may just consider this just syntactic sugar... it's sweet just the same.

Also, the best way to write multi-threaded code IMHO is to never use global variables, put everything on the call-stack... this way you're writing "thread-safe" code instead of trying to crowbar multi-threading into code that wasn't designed for it from the start. In effect, consider a functional design instead of an imperative one.

Finally, worker pools can be very helpful. Instead of starting/stopping individual threads in an ad-hoc fashion, design your software so that it starts a predetermined number of threads? (usually getconf _NPROCESSORS_ONLN) and then feed them data. This habit couples nicely with first solving the problem via multi-processing since it should be straightforward to change process oriented code into thread oriented; you're just launching it in a different manner.

As far as your example is concerned specifically, what pops out at me is that you start but never join the thread. In a correct design this shouldn't be the case but we all get lazy and let the OS handle some cleanup on occasion; not to mention it's clearly sample code.


 

Thank you! I need to spend some time thinking how to both acquire and analyze data simultaneous.


 

You're welcome. I'll add one "stupid trick" that has helped me when playing with initial design ideas. I'll first take a rough multi-processing approach and see how amenable the program is to being used in a command pipeline with xargs -P (or ). BTW (shameless plug) this is exactly how I tripped over my pet issue. Anyway...

If the program works well with xargs -P then I feel I'm onto something. Very often that's enough to keep me happy. After that I'll consider re-factoring the code into a single monolithic process with multiple threads.

Threads are rather low-level so starting out by considering them is like choosing to build a prototype in assembly; unorthodox to say the least. This is all coming from an over-grown, but agile, script-kiddie who isn't a big fan of waterfall or top-down therefore YMMV.

Good luck.


 

¿ªÔÆÌåÓý

Jurgen

?

Yes it is an old post, but it¡¯s still accurate in its description of the basic mechanism. The sampling intervals are shorter now, especially for Forex, but it still works the same way.

?

Also changed is that when I wrote it, each reqMktData call knocked 1 off your count of available market data lines, and was actually a slightly different data stream because the alignment of the sampling periods wasn¡¯t the same . But at some point IB improved it that so that no matter how many requests you make for a particular contract, and whether in the same or different programs or even live or paper-trading, it only counts as one line.

?

It¡¯s quite easy to check out my description simply by making simultaneous requests for ¡®realtime¡¯ data and ¡®tick-by-tick¡¯ data and writing the received ticks to a text file. What you find is that the ¡®realtime¡¯ data ticks always arrive before the corresponding ¡®tick-by-tick¡¯ ticks, and actually I was amazed to discover how badly behind the ¡®tick-by-tick¡¯ data is: it usually arrives several hundred milliseconds behind the ¡®realtime¡¯ data.

?

So while you might get every tick with ¡®tick-by-tick¡¯ data, it would be useless for making snappy trading decisions because it¡¯s so out-of-date by the time it arrives.

?

Richard

?

From: [email protected] <[email protected]> On Behalf Of ´³¨¹°ù²µ±ð²Ô Reinold via groups.io
Sent: 14 September 2022 22:17
To: [email protected]
Subject: Re: [TWS API] Collecting tick-by-tick price data

?

The post you pointed to is 15 years old, and, at that point, only reqMktData existed. The API Reference Guide chapter ""? describes how sampling/aggregation is handled today. So the post is still relevant for reqMktData (even if sampling is a little faster and smarter now).

was introduced only a few years back and is not sampled or aggregated. Consequently, data volume is huge and the limits of how many simulations subscriptions can exist are much tighter, Every trade (ask/bid) is reported to you as it happens.

If you subscribe, for example, via reqMktData for an instrument, you will never get more that 4 to 6 updates per second even if that instrument is traded heavily. If you subscribe to TickByTiclLast for the same instrument, you might get hundreds if not thousands of ticks in very busy seconds. And you will see every price change including the "102" the post in your link refers to.

´³¨¹°ù²µ±ð²Ô


 

I agree that the old post is still relevant for reqMktData, Richard. I think I even said so. And price sampling at a rate of a few samples per second is also highly relevant and useful.

But I cannot confirm your observation that TickByTickLast data is badly behind sampled LastPrice ticks, though. At least not in our scenario:
  • Very liquid instrument ESZ2 (ES futures)
  • Very low flight time between IBKR and us (<4ms) and exchange and us (<10ms)
  • Very high speed internet connection with huge bandwidth
  • Java API 9.85 and TWS 10.18.1c

I don't have the time right now to do an in-depth analysis with accurate time measurements (though it is over due and I should be able to do something more formal next week) but attached a couple quick snapshots from ESZ2 at market open yesterday.

One snapshot is for the first ~30 seconds:
  • There were 1,972 trades reported via individual TickByTickLast callbacks (the little blue dots)
  • That corresponded to 79 LastPrice samples (red lines and larger red dots)
The second snapshot zooms in a little and shows the first ~10 seconds:
  • In that time frame, 1,027 trades were reported by individual TickByTickLast callbacks
  • With 36 corresponding LastPrice samples

In both charts, time is not to scale. In other words, ticks are listed left-to-right with equal spacing in the order they were received. Distance between dots in the charts does not represent the time between dots.

Obviously, LastPrice samples at a rate of 3 to 4 per second adequately represent the price movement, though the details and specifically fast price changes can only be learned from TickByTick since there are 25x more of them. And I am not saying that there are no instances, where TickByTickLast trades arrive after a sampled LastPrice tick, but in our case, at least empirically, TickByTickLast are not consistently behind LastPrice samples by 100++ ms.

And I agree with you that reqTickByTick data does not suddenly allow you to run HFT schemes. But for us they are invaluable for measuring changes in price movements:
  • Are prices pretty much stagnant and are just jumping between Ask and Bid?
  • Can we see a clear direction where every few trades the price moves up or down by a tick?
  • Or are we seeing erratic jumps within a window of a six to ten ticks?
  • And was that fast and huge price change just a spike from one or a few orders, or does it look like price has moved permanently and will stay there

TickByTick data helps us in risk assessment even if trading has a horizon of many seconds to many minutes.or even hours. It's one more gauge for fly-by-instrumens.

More on this, hopefully, next week.

´³¨¹°ù²µ±ð²Ô


PS: For completeness, here a histogram of all ESZ2 related ticks in the first 15 minutes after market open yesterday.

Count of ticks received from 08:30 to 08:45 US/Central on 20220915

Count TickType API source
? ? ?
250,265 MarketDepth reqMarketDepth
? ? ?
228,906 TickByTickMidPoint reqTickByTick
228,906 TickByTickBidAsk reqTickByTick
37,238 TickByTickLast reqTickByTick
? ? ?
7,447 AskSize reqMktData
7,442 BidSize reqMktData
5,144 LastSize reqMktData
3,354 Volume reqMktData
2,361 AskPrice reqMktData
2,349 BidPrice reqMktData
2,091 OptionHistoricalVolatility reqMktData
2,087 LastPrice reqMktData
900 LastTimestamp reqMktData
753 MarkPrice reqMktData
357 OptionPutVolume reqMktData
330 OptionCallVolume reqMktData
180 RtDataBar reqMktData
30 VolumeRate reqMktData
30 TradeRate reqMktData
30 TradeCount reqMktData
15 ShortTermVolume5Minutes reqMktData
15 ShortTermVolume3Minutes reqMktData
15 ShortTermVolume10Minutes reqMktData
11 Low reqMktData
8 OptionImpliedVolatility reqMktData
? ? ?
780,264 Total recorded ESZ2 ticks ?
867 Ticks per second for ESZ2 ?
? ? ?
2,968,365 Totoal recorded ticks all instruments ?
3,298 Ticks per second recorded ?


On Thu, Sep 15, 2022 at 06:09 PM, Richard L King wrote:

Jurgen

?

Yes it is an old post, but it¡¯s still accurate in its description of the basic mechanism. The sampling intervals are shorter now, especially for Forex, but it still works the same way.

?

Also changed is that when I wrote it, each reqMktData call knocked 1 off your count of available market data lines, and was actually a slightly different data stream because the alignment of the sampling periods wasn¡¯t the same . But at some point IB improved it that so that no matter how many requests you make for a particular contract, and whether in the same or different programs or even live or paper-trading, it only counts as one line.

?

It¡¯s quite easy to check out my description simply by making simultaneous requests for ¡®realtime¡¯ data and ¡®tick-by-tick¡¯ data and writing the received ticks to a text file. What you find is that the ¡®realtime¡¯ data ticks always arrive before the corresponding ¡®tick-by-tick¡¯ ticks, and actually I was amazed to discover how badly behind the ¡®tick-by-tick¡¯ data is: it usually arrives several hundred milliseconds behind the ¡®realtime¡¯ data.

?

So while you might get every tick with ¡®tick-by-tick¡¯ data, it would be useless for making snappy trading decisions because it¡¯s so out-of-date by the time it arrives.

?

Richard

?

From: [email protected] <[email protected]> On Behalf Of ´³¨¹°ù²µ±ð²Ô Reinold via groups.io
Sent: 14 September 2022 22:17
To: [email protected]
Subject: Re: [TWS API] Collecting tick-by-tick price data

?

The post you pointed to is 15 years old, and, at that point, only reqMktData existed. The API Reference Guide chapter ""? describes how sampling/aggregation is handled today. So the post is still relevant for reqMktData (even if sampling is a little faster and smarter now).

was introduced only a few years back and is not sampled or aggregated. Consequently, data volume is huge and the limits of how many simulations subscriptions can exist are much tighter, Every trade (ask/bid) is reported to you as it happens.

If you subscribe, for example, via reqMktData for an instrument, you will never get more that 4 to 6 updates per second even if that instrument is traded heavily. If you subscribe to TickByTiclLast for the same instrument, you might get hundreds if not thousands of ticks in very busy seconds. And you will see every price change including the "102" the post in your link refers to.

´³¨¹°ù²µ±ð²Ô


 

¿ªÔÆÌåÓý

´³¨¹°ù²µ±ð²Ô

?

Sorry for the delay in responding to this.

?

You¡¯re clearly seeing different results from me, and I¡¯ve been wondering what is the cause of this.

?

Let¡¯s begin by describing what we¡¯d expect to see when subscribing to tick-by-tick data. The assertion is that every tick Is sent, and one would hope that each tick is sent with no appreciable delay. So you¡¯d expect to see a stream of ticks arriving from TWS at more or less random intervals. But there may be some output buffering to reduce protocol overheads both at the server farm and in TWS, so you can also expect some ¡®clumping¡¯ of ticks, depending on how the servers and TWS manage their buffer flushing, but you have to hope/expect that such buffering does not add more than a few milliseconds to the latency.

?

Now as far as the non-tick-by-tick market data is concerned, a cursory glance at recorded data with accurate timestamps show that at the end of each sampling period, the changed values compared with the start of the period are sent consecutively with no intervening data (always in the same order Last, LastSize, Bid, Ask, BidSize, AskSize, though values that have not changed are not included). Volume ticks are sent on their own, not usually batched with the other values.

?

So if subscribed to both tick-by-tick and ¡®normal¡¯ data streams for a particular security (whether via the same or different API connections), you¡¯d expect to receive the normal data clumped in blocks as above, and the tick-by-tick data more or less randomly spread through the gaps between these blocks (and for a busy security like ES futures, there are many more tick-by-tick events, so you wouldn¡¯t expect to see long periods of inactivity between the blocks of non-tick-by-tick data). And in particular I¡¯d expect to consistently see the prices in the tick-by-tick data being directly related to the end-of-sampling period values (with the varying intra-interval values as described in my original post): so if the Last price at the end of a sampling interval is X, you¡¯d expect to be able to identify at least one tick-by-tick X (and potentially many more than one) preceding this but after the previous sampling interval values.

?

But that is absolutely not what I¡¯m observing!

?

The tick-by-tick data is ¡®batched¡¯ in much the same way as the normal data, arriving in large blocks of many ticks, with significant intervals between them. And It¡¯s often not possible to make sense of the data unless the tick-by-tick values (or at least some of them) are being sent out after the end-of-sample values.

?

I¡¯ve attached a file showing data gathered at the open of ESZ2 on 19 September (timestamps are UTC, so the market open time was 13:30:00). The program I used to generate this timestamps each data block arriving from the socket (this is using my own API implementation, so it has access to the raw data) and logs the raw data to the program logfile; as each tick message is unpacked it is also individually timestamped and logged (with each field recorded exactly as it was parsed from the message with no interpretation); and the program also displays the received data on-screen in an interpreted form (eg showing the tick type names): it is this latter format that is captured in the attached file. Note that the ¡®normal¡¯ data and tick-by-tick data are formatted rather differently, the latter being much longer, but that¡¯s handy as it makes them easier to distinguish. These various levels of timestamping enable me to see that the program itself has very little overhead.

?

I actually used two instances of the program on different computers, one capturing ¡®normal¡¯ market data and the other tick-by-tick data (the machines¡¯ clocks are very tightly synchronised: by checking against the same NTP server using the Windows w32tm /stripchart command I can seen that they are usually well within 5 milliseconds of each other). But if I do the capture on a single computer with a single API connection subscribed to both, it all looks exactly the same.

?

I then merged the two sets of data, and edited it to include a blank line between each consecutive batch of ticks to aid readability, but the data itself is exactly as captured.

?

The attached file contains a small part of this merged data, just a couple of seconds worth of data, and I chose this bit because it actually contains quite a rapid change in price which makes it easier to see the correlation between the two types of data.

?

Looking at the data, the first ¡®normal¡¯ Last price is 3854.5 at 13:30:28.951. The latest prior tick-by-tick at this price is at 13:30:28.314, more than 600 millisecs earlier and followed by a good many at 3854.25 and one at 3854. So it¡¯s a bit of a stretch to believe that this first ¡®normal¡¯ tick is a reflection of this prior data. On the other hand the very first tick-by-tick after this first ¡®normal¡¯ tick at 13:30:29.332 is indeed also 3854.5. So this seems a better correlation, but it doesn¡¯t align with the expectations described above.

?

There are many other similar puzzles in this small extract (and far more in the larger files that I collected), but it¡¯s late at night and I don¡¯t have the time or the inclination to detail them here.

?

The bottom line is that what I see is not consistent with what you reported, and I¡¯m unclear why this should be. I have to say that what you see is much (MUCH!) closer to what I would have expected, hence my surprise when I found that in my case it¡¯s so different.

?

Here are some things where my environment differs from yours:

?

  • Ping time to IB 35ms
  • 73 Mbps full fibre broadband
  • .Net 6 and TWS 10.17.1u

?

Now the flight time should be pretty much irrelevant in that it would affect all data the same way. However I do actually wonder whether the normal market data and the tick-by-tick data have the same source: it may be that tick-by-tick data comes from the historical data farm rather than the market data farm, and this might possibly explain why the tick-by-tick data appears to be behind the normal market data for me (I¡¯ll have to try and figure out the relevant IP addresses and check the ping times).

?

It's possible that some of the gaps could be explained by the program doing garbage collections, but that seems pretty far-fetched (the program only uses about 16MB of memory at most, way below the level where .Net would even begin to consider thinking about garbage collection.

?

And it¡¯s possible that some intermediate networking infrastructure (or even TWS itself) is batching up the tick-by-tick data. I can sort of imagine something deciding that my data is not voice or video/gaming, and therefore giving it lower priority (I don¡¯t have any kind of Quality-of-service controls in place, maybe I should investigate this?).

?

Anyway, sorry for this rather long epistle, but as you can see there is something going on here that is not satisfactory.

?

Richard


 

¿ªÔÆÌåÓý

Oops, forgot to attach the file. Here it is.

?

From: [email protected] <[email protected]> On Behalf Of Richard L King
Sent: 22 September 2022 01:19
To: [email protected]
Subject: Re: [TWS API] Collecting tick-by-tick price data

?

´³¨¹°ù²µ±ð²Ô

?

Sorry for the delay in responding to this.

?

You¡¯re clearly seeing different results from me, and I¡¯ve been wondering what is the cause of this.

?

Let¡¯s begin by describing what we¡¯d expect to see when subscribing to tick-by-tick data. The assertion is that every tick Is sent, and one would hope that each tick is sent with no appreciable delay. So you¡¯d expect to see a stream of ticks arriving from TWS at more or less random intervals. But there may be some output buffering to reduce protocol overheads both at the server farm and in TWS, so you can also expect some ¡®clumping¡¯ of ticks, depending on how the servers and TWS manage their buffer flushing, but you have to hope/expect that such buffering does not add more than a few milliseconds to the latency.

?

Now as far as the non-tick-by-tick market data is concerned, a cursory glance at recorded data with accurate timestamps show that at the end of each sampling period, the changed values compared with the start of the period are sent consecutively with no intervening data (always in the same order Last, LastSize, Bid, Ask, BidSize, AskSize, though values that have not changed are not included). Volume ticks are sent on their own, not usually batched with the other values.

?

So if subscribed to both tick-by-tick and ¡®normal¡¯ data streams for a particular security (whether via the same or different API connections), you¡¯d expect to receive the normal data clumped in blocks as above, and the tick-by-tick data more or less randomly spread through the gaps between these blocks (and for a busy security like ES futures, there are many more tick-by-tick events, so you wouldn¡¯t expect to see long periods of inactivity between the blocks of non-tick-by-tick data). And in particular I¡¯d expect to consistently see the prices in the tick-by-tick data being directly related to the end-of-sampling period values (with the varying intra-interval values as described in my original post): so if the Last price at the end of a sampling interval is X, you¡¯d expect to be able to identify at least one tick-by-tick X (and potentially many more than one) preceding this but after the previous sampling interval values.

?

But that is absolutely not what I¡¯m observing!

?

The tick-by-tick data is ¡®batched¡¯ in much the same way as the normal data, arriving in large blocks of many ticks, with significant intervals between them. And It¡¯s often not possible to make sense of the data unless the tick-by-tick values (or at least some of them) are being sent out after the end-of-sample values.

?

I¡¯ve attached a file showing data gathered at the open of ESZ2 on 19 September (timestamps are UTC, so the market open time was 13:30:00). The program I used to generate this timestamps each data block arriving from the socket (this is using my own API implementation, so it has access to the raw data) and logs the raw data to the program logfile; as each tick message is unpacked it is also individually timestamped and logged (with each field recorded exactly as it was parsed from the message with no interpretation); and the program also displays the received data on-screen in an interpreted form (eg showing the tick type names): it is this latter format that is captured in the attached file. Note that the ¡®normal¡¯ data and tick-by-tick data are formatted rather differently, the latter being much longer, but that¡¯s handy as it makes them easier to distinguish. These various levels of timestamping enable me to see that the program itself has very little overhead.

?

I actually used two instances of the program on different computers, one capturing ¡®normal¡¯ market data and the other tick-by-tick data (the machines¡¯ clocks are very tightly synchronised: by checking against the same NTP server using the Windows w32tm /stripchart command I can seen that they are usually well within 5 milliseconds of each other). But if I do the capture on a single computer with a single API connection subscribed to both, it all looks exactly the same.

?

I then merged the two sets of data, and edited it to include a blank line between each consecutive batch of ticks to aid readability, but the data itself is exactly as captured.

?

The attached file contains a small part of this merged data, just a couple of seconds worth of data, and I chose this bit because it actually contains quite a rapid change in price which makes it easier to see the correlation between the two types of data.

?

Looking at the data, the first ¡®normal¡¯ Last price is 3854.5 at 13:30:28.951. The latest prior tick-by-tick at this price is at 13:30:28.314, more than 600 millisecs earlier and followed by a good many at 3854.25 and one at 3854. So it¡¯s a bit of a stretch to believe that this first ¡®normal¡¯ tick is a reflection of this prior data. On the other hand the very first tick-by-tick after this first ¡®normal¡¯ tick at 13:30:29.332 is indeed also 3854.5. So this seems a better correlation, but it doesn¡¯t align with the expectations described above.

?

There are many other similar puzzles in this small extract (and far more in the larger files that I collected), but it¡¯s late at night and I don¡¯t have the time or the inclination to detail them here.

?

The bottom line is that what I see is not consistent with what you reported, and I¡¯m unclear why this should be. I have to say that what you see is much (MUCH!) closer to what I would have expected, hence my surprise when I found that in my case it¡¯s so different.

?

Here are some things where my environment differs from yours:

?

  • Ping time to IB 35ms
  • 73 Mbps full fibre broadband
  • .Net 6 and TWS 10.17.1u

?

Now the flight time should be pretty much irrelevant in that it would affect all data the same way. However I do actually wonder whether the normal market data and the tick-by-tick data have the same source: it may be that tick-by-tick data comes from the historical data farm rather than the market data farm, and this might possibly explain why the tick-by-tick data appears to be behind the normal market data for me (I¡¯ll have to try and figure out the relevant IP addresses and check the ping times).

?

It's possible that some of the gaps could be explained by the program doing garbage collections, but that seems pretty far-fetched (the program only uses about 16MB of memory at most, way below the level where .Net would even begin to consider thinking about garbage collection.

?

And it¡¯s possible that some intermediate networking infrastructure (or even TWS itself) is batching up the tick-by-tick data. I can sort of imagine something deciding that my data is not voice or video/gaming, and therefore giving it lower priority (I don¡¯t have any kind of Quality-of-service controls in place, maybe I should investigate this?).

?

Anyway, sorry for this rather long epistle, but as you can see there is something going on here that is not satisfactory.

?

Richard


 

Richard,

thank you for your explanation and the data. I am still mostly consumed with travel but have made progress on my analysis. I hope I can wrap that up and post soon.

In the meantime, I found an hour (donated by early rising due to jet lag) and I lined up your logged data with ours. XLSX and CVS versions attached. The TickByTick data records match perfectly (we subscribe to the TickByTickLastAll though) but the LastTimeStamp, LastPrice, LastSize, and Volume data is similar but arrival times vary. We receive all data from the same feed.

For the TickByTick data I calculated the time difference of arrival between when we received them and when you did. Assuming that neither of our servers made major time adjustments during the few seconds for this log I had expected some fluctuation but an approximately constant difference. It shows a much more complex pattern, though, and it looks like TickByTick data is delivered to you in large chunks (back to back data arrives in the same or next millisecond) while we get a stream, where only few back-to-back items arrive at essentially the same time and the time between data varies greatly. I guess there is the answer.

I agree with your overall description (saves me a bunch of paragraphs for for when I post my analysis) but I would like to offer a small addition that explains occasional chunking of TickByTick data (but not the kind you experience). ESZ2 is very liquid but a lot of trades have only few contracts (less than 10, most of them just 1 or 2). Larger orders (say 20 contracts and more) will quickly be filled, but that will result in a burst of smaller trades to do so. When you sum up the sizes for TickByTickLast trades in our log with sub-millisecond back-to-back arrival, you frequently get sums such as 20, 30, 50, or similar numbers.

Anyway. A more comprehensive Last vs TickByTick analysis soon.

´³¨¹°ù²µ±ð²Ô




On Wed, Sep 21, 2022 at 07:18 PM, Richard L King wrote:

´³¨¹°ù²µ±ð²Ô

?

Sorry for the delay in responding to this.

?

You¡¯re clearly seeing different results from me, and I¡¯ve been wondering what is the cause of this.

?

...


 

¿ªÔÆÌåÓý

´³¨¹°ù²µ±ð²Ô

?

Thanks. Very informative. Clearly my statement about ¡°how badly behind the ¡®tick-by-tick¡¯ data is¡± in my post of 16 September is incorrect, and there is something in the path between IB¡¯s servers and my network that is causing the individual tick messages to be amalgamated into larger transmission lumps.

?

There are 14 hops between my network and either IB¡¯s US or European servers (87ms ping to the US, 35ms to Europe). I guess any of those items could be contributing to this, but it seems pretty unlikely to me that massive, general-purpose internet routers would do anything other than send every packet on its way as soon as possible.

?

So the most likely culprits are my ISP, their broadband router, and something in my own network.

?

As it happens, I¡¯m switching to a new ISP early next week, with a different router, so it¡¯ll be interesting to see if that makes any difference. This will be higher bandwidth, but latency will be much the same.

?

But before that, I intend to utilise the second (currently unused) network adapter on my virtualisation server to create a direct route just for IB market data to the broadband router ¨C at the moment there¡¯s about 40Mbps of other data flowing through the same network adapter as the market data. Given that it¡¯s a 1Gbps connection, I don¡¯t really see that this should cause any real congestion. But separating the market data out via a dedicated network adapter can only help. We shall see¡­

?

I might also try relocating TWS to another virtual machine that has less of a workload, perhaps to my Linux VM (which does nothing at all most of the time, in fact not even powered on as I don¡¯t really use it except when I have to).

?

Anyway, I¡¯ll let you know if I discover anything of interest.

?

Richard

?

?

?

From: [email protected] <[email protected]> On Behalf Of ´³¨¹°ù²µ±ð²Ô Reinold via groups.io
Sent: 23 September 2022 07:11
To: [email protected]
Subject: Re: [TWS API] Collecting tick-by-tick price data

?

Richard,

thank you for your explanation and the data. I am still mostly consumed with travel but have made progress on my analysis. I hope I can wrap that up and post soon.

In the meantime, I found an hour (donated by early rising due to jet lag) and I lined up your logged data with ours. XLSX and CVS versions attached. The TickByTick data records match perfectly (we subscribe to the TickByTickLastAll though) but the LastTimeStamp, LastPrice, LastSize, and Volume data is similar but arrival times vary. We receive all data from the same feed.

For the TickByTick data I calculated the time difference of arrival between when we received them and when you did. Assuming that neither of our servers made major time adjustments during the few seconds for this log I had expected some fluctuation but an approximately constant difference. It shows a much more complex pattern, though, and it looks like TickByTick data is delivered to you in large chunks (back to back data arrives in the same or next millisecond) while we get a stream, where only few back-to-back items arrive at essentially the same time and the time between data varies greatly. I guess there is the answer.

I agree with your overall description (saves me a bunch of paragraphs for for when I post my analysis) but I would like to offer a small addition that explains occasional chunking of TickByTick data (but not the kind you experience). ESZ2 is very liquid but a lot of trades have only few contracts (less than 10, most of them just 1 or 2). Larger orders (say 20 contracts and more) will quickly be filled, but that will result in a burst of smaller trades to do so. When you sum up the sizes for TickByTickLast trades in our log with sub-millisecond back-to-back arrival, you frequently get sums such as 20, 30, 50, or similar numbers.

Anyway. A more comprehensive Last vs TickByTick analysis soon.

´³¨¹°ù²µ±ð²Ô


On Wed, Sep 21, 2022 at 07:18 PM, Richard L King wrote:

´³¨¹°ù²µ±ð²Ô

?

Sorry for the delay in responding to this.

?

You¡¯re clearly seeing different results from me, and I¡¯ve been wondering what is the cause of this.

?

...