¿ªÔÆÌåÓý

ctrl + shift + ? for shortcuts
© 2025 Groups.io

How to create a performance efficient trading app so each sub process runs smoothly in Python?


 

What would be the best way to do on-the-fly computations using Numpy, Pandas for trading using IB TWS API, writing to/from mySQL, and in the meanwhile receive and send data and orders?

From (and ) I conclude that CPU-bound programs, those that spend most of the their time processing data (like on the-fly computations using Pandas which is synchronous single threaded), solutions like threading (pre-emptive multitasking) and asyncio are inferior (and may disrupt the sending/receipt of data or introduce some lag or data drops) since there is only one Python interpreter on a single core/processor (asyncio aligns I/O processes more efficiently) that takes care of both the CPU computations and checking for socket data.

I know there is? a ib_insync library that implements cooperative multitasking (asyncio) in Python, but will the overall program be fast enough if you do computations using large dataframes? I read that on a socket level there might not be any limitations (¡°socket itself is actually thread-safe¡±), as suggested by . I do not assume that Ptyhon has it implemented this way since the Global Interpreter Lock is in place (single thread, synchronous). From :

IB.sleep(0); may be an alternative, but I do not think a Pandas groupby() can be interrupted once called.

Should I use the Multiprocessing library and do every task on its own processor (thus dataframe calculations, mySQL connection read/write, sending and receiving orders and data)?
Or should I combine some things using (I think some next level threading which can be combined with the normal way of interpretation of code by Python) or the (has to do with built-in async for Scikit-Learn, Pandas, etc. functions, if I understand correctly) library? I think this is next level, and may be an overkill.

Or should stick to Java (I do not know if there is a Pandas like alternative which allows easy manipulation of data and performing machine learning for example, but threading may be organized better)?

Any suggestions or tutorials?

(I read something about was well, but that¡¯s a whole different story since it is not directly applicable to IB TWS API, and may be prone to delays resulting from single core processing/interpreting by Python as well, if I understand correctly)


 

I don't know how your overall design is structured, but indeed there are performance issues due to GIL of python. So you do need to use multiprocessing.? But I do believe you can get lot of performance just using python + multiprocessing. Lot of the core building blocks in numpy/pandas is highly optimized - so it is less likely your performance would be impacted due to these (unless ofcourse you are extremely latency sensitive). I use IB for orders, but use a different 3rd party data provider. What I have is?
#1) First I have a process that would start IB as a background thread. This process will create the IB thread with all the connections established. Then it enters a wait loop forever, waking up periodically.
#2) I have a separate process that handle data feeds, does model predictions etc and send messages to the waiting process. I use multiprocessing shared memory () to exchange signal output between data feed process and IB process. Waiting IB process picks up from queue periodically and post orders to IB. Just use locks as appropriate (If you need IB for feed, you can still do it by making the waiting process query data and receive data in an async fashion. Locks and shared memory are your best friend here).?

It take a bit to get this going, but I want to stick to python and this design so far works for me (I haven't tested it with lot of symbols, but at that point the bottleneck would be with data + model evaluation, I suspect)


 

I love Python but it has its own place and it's not good for real-time-ish parallel compute. You can hack around things but in the end it won't be crazy.

You can consider Go and node.js from mainstream languages (apart from C++).
I personally always wanted to try Elixir but never got around - perfect for highly concurrent stuff.?

On Sat, Mar 13, 2021 at 2:52 PM matt <rmdeboer82@...> wrote:
What would be the best way to do on-the-fly computations using Numpy, Pandas for trading using IB TWS API, writing to/from mySQL, and in the meanwhile receive and send data and orders?

From (and ) I conclude that CPU-bound programs, those that spend most of the their time processing data (like on the-fly computations using Pandas which is synchronous single threaded), solutions like threading (pre-emptive multitasking) and asyncio are inferior (and may disrupt the sending/receipt of data or introduce some lag or data drops) since there is only one Python interpreter on a single core/processor (asyncio aligns I/O processes more efficiently) that takes care of both the CPU computations and checking for socket data.

I know there is? a ib_insync library that implements cooperative multitasking (asyncio) in Python, but will the overall program be fast enough if you do computations using large dataframes? I read that on a socket level there might not be any limitations (¡°socket itself is actually thread-safe¡±), as suggested by . I do not assume that Ptyhon has it implemented this way since the Global Interpreter Lock is in place (single thread, synchronous). From :

IB.sleep(0); may be an alternative, but I do not think a Pandas groupby() can be interrupted once called.

Should I use the Multiprocessing library and do every task on its own processor (thus dataframe calculations, mySQL connection read/write, sending and receiving orders and data)?
Or should I combine some things using (I think some next level threading which can be combined with the normal way of interpretation of code by Python) or the (has to do with built-in async for Scikit-Learn, Pandas, etc. functions, if I understand correctly) library? I think this is next level, and may be an overkill.

Or should stick to Java (I do not know if there is a Pandas like alternative which allows easy manipulation of data and performing machine learning for example, but threading may be organized better)?

Any suggestions or tutorials?

(I read something about was well, but that¡¯s a whole different story since it is not directly applicable to IB TWS API, and may be prone to delays resulting from single core processing/interpreting by Python as well, if I understand correctly)


 

You can also think of using something like ZeroMQ as your messaging/concurency backbone.
Python has a bindings module for it - pyzmq.

On Wed, Mar 17, 2021 at 2:40 AM Alex Gorbachev <ag@...> wrote:
I love Python but it has its own place and it's not good for real-time-ish parallel compute. You can hack around things but in the end it won't be crazy.

You can consider Go and node.js from mainstream languages (apart from C++).
I personally always wanted to try Elixir but never got around - perfect for highly concurrent stuff.?

On Sat, Mar 13, 2021 at 2:52 PM matt <rmdeboer82@...> wrote:
What would be the best way to do on-the-fly computations using Numpy, Pandas for trading using IB TWS API, writing to/from mySQL, and in the meanwhile receive and send data and orders?

From (and ) I conclude that CPU-bound programs, those that spend most of the their time processing data (like on the-fly computations using Pandas which is synchronous single threaded), solutions like threading (pre-emptive multitasking) and asyncio are inferior (and may disrupt the sending/receipt of data or introduce some lag or data drops) since there is only one Python interpreter on a single core/processor (asyncio aligns I/O processes more efficiently) that takes care of both the CPU computations and checking for socket data.

I know there is? a ib_insync library that implements cooperative multitasking (asyncio) in Python, but will the overall program be fast enough if you do computations using large dataframes? I read that on a socket level there might not be any limitations (¡°socket itself is actually thread-safe¡±), as suggested by . I do not assume that Ptyhon has it implemented this way since the Global Interpreter Lock is in place (single thread, synchronous). From :

IB.sleep(0); may be an alternative, but I do not think a Pandas groupby() can be interrupted once called.

Should I use the Multiprocessing library and do every task on its own processor (thus dataframe calculations, mySQL connection read/write, sending and receiving orders and data)?
Or should I combine some things using (I think some next level threading which can be combined with the normal way of interpretation of code by Python) or the (has to do with built-in async for Scikit-Learn, Pandas, etc. functions, if I understand correctly) library? I think this is next level, and may be an overkill.

Or should stick to Java (I do not know if there is a Pandas like alternative which allows easy manipulation of data and performing machine learning for example, but threading may be organized better)?

Any suggestions or tutorials?

(I read something about was well, but that¡¯s a whole different story since it is not directly applicable to IB TWS API, and may be prone to delays resulting from single core processing/interpreting by Python as well, if I understand correctly)


 

Thank you Alex, for both suggestions. I will have a look at it.


 

Thank you Kris K,

I tried suggestion #1 using a minimal IB program and put it in a Process, like:

from ibapi.wrapper import EWrapper
from ibapi.client import EClient
from ibapi.utils import iswrapper
from ibapi.common import *
from ibapi.contract import *
from ibapi.ticktype import *
# Request IB Data in less than 50 lines of code
class BasicApp(EWrapper, EClient):
? def __init__(self):
??? EClient.__init__(self,self)

? def error(self, reqId: TickerId, errorCode:int, errorString:str):
??? print('Error:', reqId, " ", errorCode, " ", errorString)

? @iswrapper
? def tickPrice(self, reqId: TickerId, tickType: TickType, price: float, attrib: TickAttrib):
??? super().tickPrice(reqId, tickType, price, attrib)
??? print("Tick Price. Ticker Id:", reqId, "tickType:", tickType, "Price:", price, "CanAutoExecute:", attrib.canAutoExecute, "PastLimit", attrib.pastLimit)
?
? @iswrapper
? def tickSize(self, reqId: TickerId, tickType: TickType, size: int):
??? super().tickSize(reqId, tickType, size)
??? print("Tick Size. Ticker Id:", reqId, "tickType:", tickType, "Size:", size)

? @iswrapper
? def tickString(self, reqId: TickerId, tickType: TickType, value: str):
??? super().tickString(reqId, tickType, value)
??? print("Tick string. Ticker Id:", reqId, "Type:", tickType, "Value:", value)

? @iswrapper
? def tickGeneric(self, reqId: TickerId, tickType: TickType, value: float):
??? super().tickGeneric(reqId, tickType, value)
??? print("Tick Generic. Ticker Id:", reqId, "tickType:", tickType, "Value:", value)

def main():
? app = BasicApp()
? app.connect("127.0.0.1", 7497, 0)
? contract = Contract();
?
? contract.symbol = "EOE";
? contract.secType = "IND";
? contract.exchange = "FTA";
? contract.currency = "EUR"; ?
?
? app.reqMarketDataType(3);
? app.reqMktData(0, contract, "", False, False, []);

? app.run()

from multiprocessing import Process

if __name__ == '__main__':
? p = Process(target=main)
? p.start()
? p.join()

This initial basic program appears to be working (now I am going extend it), sometimes it gives me this though:
?
unhandled exception in EReader threadTraceback (most recent call last):? File "/opt/anaconda3/lib/python3.8/site-packages/ibapi-9.76.1-py3.8.egg/ibapi/reader.py", line 34, in run??? data = self.conn.recvMsg()? File "/opt/anaconda3/lib/python3.8/site-packages/ibapi-9.76.1-py3.8.egg/ibapi/connection.py", line 99, in recvMsg??? buf = self._recvAllMsg()? File "/opt/anaconda3/lib/python3.8/site-packages/ibapi-9.76.1-py3.8.egg/ibapi/connection.py", line 119, in _recvAllMsg??? buf = self.socket.recv(4096)ConnectionResetError: [Errno 54] Connection reset by peer^CTraceback (most recent call last):? File "minimal_process.py", line 57, in <module>??? p.join()? File "/opt/anaconda3/lib/python3.8/multiprocessing/process.py", line 149, in join??? res = self._popen.wait(timeout)? File "/opt/anaconda3/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait??? return self.poll(os.WNOHANG if timeout == 0.0 else 0)? File "/opt/anaconda3/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll??? pid, sts = os.waitpid(self.pid, flag)KeyboardInterrupt


Matthias Frener
 

and maybe to add a #3)
build microservice architecture.

I'm on node.js () and only way to scale there is running multiple processes. You can do this with controler/worker concept (e.g. via shared memory a mentioned), or you isolate it even further, looking at every task like beieng a independent service with a defined interface. My "trading app" backend consist of a nginx reverse proxy and than there is a auth-service for login, a user-mangement service for user management, a portfolio service to watch portfolio, a ticker service to watch ticker, a IB-API service to convert TWS API into REST/WebSock API, a trade-log service.. ect.. pp. A lot of small little?processes, each having a very specific task. The connect to each other via the APIs. Every services defines an OpenAPI model (swagger) and whoever needs it, generates client code from the json and uses it. It's scalable across CPUs, machines and data centers - makes it easy to implement failovers (when machine 1 is down, fallback to machine?2) or scale dynamically (start up more processes, or more machines, or more clouds - depends on how much more power you need).?


 

Thank you Matthias, that's an interesting view, and has advantages as you say. I will have a look at it.