Jurgen,
?
Yes definitely a complicated task with a lot of considerations both in backtest and application. I do need to compare the 2 functions' data streams of reqHistoricalTicks() and reqTickByTick() and check feasibility there. My strategy is similar to another classification machine learning model I wrote for 1 minute OHLCV bars which I used to map a bunch of different features and ran in realtime. That was straight forward, as you can stream reqHistoricalData() in realtime for each new bar update on the minute every minute and feed it through the model to get your prediction on the next candle: up or down, and form a strategy around existing positions that way. With this though, I plan to center my classification model around 'large' orders to answer the same question though: up or down, but I want to pass a new x-vector through my trained model on each 'large' order. Assume its a basic logistic regression binary classification problem, so the x-vector I pass into the model will make a signal every time the large order threshold is met. I will obviously set a threshold for 'large', probably dynamic based on current tick-by-tick volumes, say outside 2 standard deviations for example. I want to avoid using bar like data structures like tick-bars, volume bars, imbalance bars, etc. I said a basic logistic regression binary classification model, but in reality it will be probably end up being a multi-class classification model using an LSTM like architecture with classes like: long, short, do nothing. And pass a multi-dimensional vector packing in all tick data and associated, mapped features between large-orders into the next signal. None of the strategy matters though if the data I use in real time is not as close to the data I back test on. And I am worried that a 'large' order defined by the data I am using for the backtest, collected via reqHistoricalTicks(), will approximate parameters for the features I come up with based on values I may never see in realtime if I then decide to use reqTickByTick() making the backtest obsolete with this current data I have. Collecting enough reqTickByTick() data could potentially take years to have a valid dataset to train on if I started collecting now. I hope that clears up my intentions for the strategy and I appreciate your thoughts.
?
Thank you,
Brendan