开云体育

ctrl + shift + ? for shortcuts
© 2025 开云体育

Re: Welcome to [email protected]


 

开云体育

daks allows to distribute computation across a cluster. It basically allows to work with dataframes greater than your available memory. How? By applying computation and make it small enough before you receive it

So you can store high frequency price data, and just get last X bars for example.

I’m not there yet, I’m using mainly daily data. And parquet (pystore storage backend) is really good at keeping files small. I have RUS3000 companies going back 15years and it’s only 401M.?

I have developed a small layer on top of pystore and ib_insync (to be migrated to ib_async) and I can get my data collection updated daily

For in-memory data dask is an overkill. It’s a subset of pandas with cluster capabilities. But for pystore it offers the possibility to escalate if needed.

The most challenging scenario is data updates, for example every day add new data. Dask can do this without putting the whole file in memory

Dask is a big framework, you can distribute your own code across a cluster, and run machine learning in a cluster. I have not used it so far. But in principle I could run a few things across 2-3 PCs that I have here. I have work to do on my algorithm before going there.

On 6 May 2024, at 18:05, Mel <climbermel@...> wrote:

What would be the use of dask when running ib?? Would you be able to distribute one loggin across multiple machines?





-------- Original message --------
From: "Gonzalo Saenz via groups.io" <yo@...>
Date: 2024-05-06 4:23 a.m. (GMT-08:00)
Subject: Re: [ib-async] Welcome to [email protected]

Well I think pystore is a good package.

As I said I faced a few issues recently, mainly due to pandas and dask move from fast parquet to pyarrow, it has break a few things. The package is not under much maintenance, which is not great.

But being a relatively simple package I manage to do the necessary changes to have it working.

This is my pull request on pastors to move to pyarrow??so if want to test it pull from there. Because the official package is broken unless you find the right combination of pandas, dask and pyarrow.

It’s a lot simpler than having a database, and VERY scalable because it works with dask


On 4 May 2024, at 09:23, Mel <climbermel@...> wrote:

I've been using mySQL, but you need a database server running.? I have one on? all the time for various things so I just use it.? I'm just interested in how it works and how it stores the data.?





-------- Original message --------
From: "Gonzalo Saenz via?" <yo@...>?
Date: 2024-05-03 11:36 p.m. (GMT-08:00)?
Subject: Re: [ib-async] Welcome to?[email protected]

Yes, I’m using pystore

I’m quite happy with it, except for some problems I found recently. Unfortunately it does not seem to be under active development, I have sent a pull request to have it working with latest pandas and dask but no answer so far.

I store daily price bars for many stocks and ETFs and it works quite well. The library is really simple so it’s easy to go through the code if needed.

I haven’t seen many options available on this space, there is ArticDB, but it’s not open source. And from what I understand a “production” DB requires a paid license. So I won’t spent time building my code around it.

Do you know any other alternatives on this space?

Regards,
Gonzalo

On 3 May 2024, at 06:07, Mel <climbermel@...> wrote:

Welcome. Have you used Pystore? Sounds interesting.




Join [email protected] to automatically receive all group messages.