daks allows to distribute computation across a cluster. It basically allows to work with dataframes greater than your available memory. How? By applying computation and make it small enough before you receive it
So you can store high frequency price data, and just get last X bars for example.
I’m not there yet, I’m using mainly daily data. And parquet (pystore storage backend) is really good at keeping files small. I have RUS3000 companies going back 15years and it’s only 401M.?
I have developed a small layer on top of pystore and ib_insync (to be migrated to ib_async) and I can get my data collection updated daily
For in-memory data dask is an overkill. It’s a subset of pandas with cluster capabilities. But for pystore it offers the possibility to escalate if needed.
The most challenging scenario is data updates, for example every day add new data. Dask can do this without putting the whole file in memory
Dask is a big framework, you can distribute your own code across a cluster, and run machine learning in a cluster. I have not used it so far. But in principle I could run a few things across 2-3 PCs that I have here. I have work to do on my algorithm before going there.
toggle quoted message
Show quoted text
On 6 May 2024, at 18:05, Mel <climbermel@...> wrote:
What would be the use of dask when running ib?? Would you be able to distribute one loggin across multiple machines?
-------- Original message --------
From: "Gonzalo Saenz via groups.io" <yo@...>
Date: 2024-05-06 4:23 a.m. (GMT-08:00)
Well I think pystore is a good package.
As I said I faced a few issues recently, mainly due to pandas and dask move from fast parquet to pyarrow, it has break a few things. The package is not under much maintenance, which is not great.
But being a relatively simple package I manage to do the necessary changes to have it working.
This is my pull request on pastors to move to pyarrow??so if want to test it pull from there. Because the official package is broken unless you find the right combination of pandas, dask and pyarrow.
It’s a lot simpler than having a database, and VERY scalable because it works with dask
On 4 May 2024, at 09:23, Mel <climbermel@...> wrote:
I've been using mySQL, but you need a database server running.? I have one on? all the time for various things so I just use it.? I'm just interested in how it works and how it stores the data.?
-------- Original message --------
From: "Gonzalo Saenz via
?" <
yo@...>
?Date: 2024-05-03 11:36 p.m. (GMT-08:00)?
Yes, I’m using pystore
I’m quite happy with it, except for some problems I found recently. Unfortunately it does not seem to be under active development, I have sent a pull request to have it working with latest pandas and dask but no answer so far.
I store daily price bars for many stocks and ETFs and it works quite well. The library is really simple so it’s easy to go through the code if needed.
I haven’t seen many options available on this space, there is ArticDB, but it’s not open source. And from what I understand a “production” DB requires a paid license. So I won’t spent time building my code around it.
Do you know any other alternatives on this space?
Regards,
Gonzalo
On 3 May 2024, at 06:07, Mel <climbermel@...> wrote:
Welcome. Have you used Pystore? Sounds interesting.