Thank you both.
So, I'd try to make the smallest possible (but complete)?standalone unit test?in order to really isolate the issue.
@Buddy, these code snippets are pretty self explanatory and stand alone really, put the depth bars in a tuple, append the tuple to a list, put the list in a dataframe and save it to a parquet file. Even using a small toy example below with the same logic, there's clearly no way this code flow can fail on its own even if the input is trash. However you might be correct that it could come from an unfortunate thread concurrency, i.e. a huge amount of data being unequally added to the tuple/list at the exact time the dataframe is created, so I might simply need to hard copy the list before converting
that copy to a dataframe instead of the list itself. I'll try that next week and keep the group posted.
from time import time
import pandas as pd
a = list()
a.append((time(), '1', '2', '2'))
a.append((time(), '1', 0, '3'))
a.append((time(), '1', '1'))
a.append(( 0, 0, None))
df = pd.DataFrame(a, columns=['time', 'col1', 'col2', 'col3', ])
print(df)
output:
? ? ? ? ? ?time col1? col2? col3
0? 1.698522e+09? ? 1? ? ?2? ? ?2
1? 1.698522e+09? ? 1? ? ?0? ? ?3
2? 1.698522e+09? ? 1? ? ?1? None
3? 0.000000e+00? ? 0? None? None
anyway, in this case, i would even wonder what is the real value of market depth data on forex market.
@ fordfrog, well that's precisely what I am trying to find out! :) but for this I first need to extract and save a substantial amount of data in order to study it.