Ib 1m hist by goodboy · Pull Request #401 · pikers/piker

goodboy · 2022-09-24T00:41:30Z

~~Super WIP~~ Ready for action, ~~but~~ just a start at doing 1m long term (slow) chart alongside our classic 1s OHLC in the fast chart.

This adjust our history loading in the data feed layer (piker.data.feed) to do multi-time frame data loading concurrently and in a highly reliable manner such that both can be stored in the tsdb as well as explicitly queried, loaded and processed in shared mem arrays.

`ib` related

Since ib is the only currently supported backend with 1s OHLC history, this patch focuses around it but contains necessary adjustments to handle backends (like all the crypto$) which don't have this support (at least not without us writing our own sampler). When a backend doesn't have 1s OHLC history the fast chart simply starts empty and starts filling when the brokerd feed is first booted - during the pikerd parent's lifetime.

Further enhancements in this backend include:

way better and faster history loading by re-jigging the data feed reset hack task-concurrency (6 years spy loaded to marketstore in like, <= 2mins 🥳 )
- adjust the frame query timeout to 3s
- also includes a mutex around the reset hack request (task) to support multi-symbol backfilling from multiple clients
add back support for the the ad-hoc symbol table system for things like bitcoin futes (brr.cmecrypto) which seems to require inconsistent contract params when selecting in ib_insync
(28535fa) add feed reconnect task which reloads whenever a network/feed reset event is detected to avoid hanging for whatever internal timeout-reset ibs api does..
api adjustments to get the first datetime from Client.get_head_time() with an fqsn input str and use this stamp as the earliest stamp allowed before raising DataUnavailable to the history mgmt layer
also add a "no data"-for-x-queries threshold where after 6 days worth of empty frame-results we presume the contract has no earlier history and we also raise a DataUnavailable
a variety of other small backend-internal improvements to the history loading apis and mechanics to support the above.
(ceca0d9) some tweaks to trades ledger parsing/loading as briefly mentioned in Order ledger entries by processed datetime #412 but that need to land with this change set.

The summary of enhancements and bug fixes is more or less in the todo section below:

TODO:

avoid throttle state condition that shows up with too many open data reset hack requests:
- use global (mutex) state var to support simultaneous contract queries (dabb9e8)
improving the the 1m loading algo to avoid slow waits on queries after 3s (completed and working well after dabb9e8) also)
- multi-contract history queries need a mutex around the data reset hackery
- better task conc around data resets: one task for query, one for gw reset poll loop
actually a day (or more)'s worth of 1s day (went with 6d if tsdb is up and 1 if not)
- if the backend suports 1s OHLC write loaded frames to tsdb
- handle providers who don't support 1s

goodboy · 2022-10-27T20:02:52Z

piker/brokers/binance.py

            datetime,  # start
            datetime,  # end
        ]:
+            if timeframe != 60:


So this is how we indicate that a brokerd can't deliver 1s OHLC.

goodboy · 2022-10-27T20:04:44Z

piker/data/marketstore.py

-                        to_prepend = ohlcv[ohlcv['time'] < ts['Epoch'][0]]
-
-            profiler('Finished db arrays diffs')
+            # for secs in (1, 60):


Yeah just ignore most of this stuff since it's part of provisional tools we likely want for interactive tsdb mucking.

It relies on goodboy/tractor#306 which is still far from ready 😂

goodboy · 2022-10-27T20:05:23Z

piker/ui/_display.py


            godwidget.resize_all()

+            await link_views_with_region(


this factoring was mostly for sanity and because i don't even really get what layer this whole subsystem fits in 😂

goodboy · 2022-10-28T19:56:42Z

Just pushed a bad timeframe wiper hack @guilledk

Allow data feed sub-system to specify the timeframe (aka OHLC sample period) to the `open_history_client()` delivered history fetching API. Factor the data keycombo hack into a new routine to be used also from the history backfiller code when request latency increases; there is a first draft at trying to use the feed reset to speed up 1m frame throttling by timing out on the history frame response, but it needs a lot of fine tuning.

The `Store.load()`, `.read_ohlcv()` and `.write_ohlcv()` and `.delete_ts()` now can take a `timeframe: Optional[float]` param which is used to look up the appropriate sampling period table-key from `marketstore`.

Adjust all history query machinery to pass a `timeframe: int` in seconds and set default of 60 (aka 1m) such that history views from here forward will be 1m sampled OHLCV. Further when the tsdb is detected as up load a full 10 years of data if possible on the 1m - backends will eventually get a config section (`brokers.toml`) that allow user's to tune this.

Manual tinker-testing demonstrated that triggering data resets completely independent of the frame request gets more throughput and further, that repeated requests (for the same frame after cancelling on the `trio`-side) can yield duplicate frame responses. Re-work the dual-task structure to instead have one task wait indefinitely on the frame response (and thus not trigger duplicate frames) and the 2nd data reset task poll for the first task to complete in a poll loop which terminates when the frame arrives via an event. Dirty deatz: - make `get_bars()` take an optional timeout (which will eventually be dynamically passed from the history mgmt machinery) and move request logic inside a new `query()` closure meant to be spawned in a task which sets an event on frame arrival, add data reset poll loop in the main/parent task, deliver result on nursery completion. - handle frame request cancelled event case without crash. - on no-frame result (due to real history gap) hack in a 1 day decrement case which we need to eventually allow the caller to control likely based on measured frame rx latency. - make `wait_on_data_reset()` a predicate without output indicating reset success as well as `trio.Nursery.start()` compat so that it can be started in a new task with the started values yielded being a cancel scope and completion event. - drop the legacy `backfill_bars()`, not longer used.

It doesn't seem to be any slower on our least throttled backend (binance) and it removes a bunch of hard to get correct frame re-ordering logic that i'm not sure really ever fully worked XD Commented some issues we still need to resolve as well.

When we get a timeout or a `NoData` condition still return a tuple of empty sequences instead of `None` from `Client.bars()`. Move the sampling period-duration table to module level.

This allows the history manager to know the decrement size for `end_dt: datetime` on the next query if a no-data / gap case was encountered; subtract this in `get_bars()` in such cases. Define the expected `pendulum.Duration`s in the `.api._samplings` table. Also add a bit of query latency profiling that we may use later to more dynamically determine timeout driven data feed resets. Factor the `162` error cases into a common exception handler block.

Must have gotten left in during refactor from the `trimeter` version? Drop down to 6 years for 1m sampling.

Allows for easier restarts of certain `trio` side tasks without killing the `asyncio`-side clients; support via flag. Also fix a bug in `Client.bars()`: we need to return the duration on the empty bars case..

When a network outage or data feed connection is reset often the `ib_insync` task will hang until some kind of (internal?) timeout takes place or, in some (worst) cases it never re-establishes (the event stream) and thus the backend needs to restart or the live feed will never resume.. In order to avoid this issue once and for all this patch implements an additional (extremely simple) task that is started with the real-time feed and simply waits for any market data reset events; when detected restarts the `open_aio_quote_stream()` call in a loop using a surrounding cancel scope. Been meaning to implement this for ages and it's finally working!

Allows keeping mutex state around data reset requests which (if more then one are sent) can cause a throttling condition where ib's servers will get slower and slower to conduct a reconnect. With this you can have multiple ongoing contract requests without hitting that issue and we can go back to having a nice 3s timeout on the history queries before activating the hack.

Our default sample periods are 60s (1m) for the history chart and 1s for the fast chart. This patch adds concurrent loading of both (or more) different sample period data sets using the existing loading code but with new support for looping through a passed "timeframe" table which points to each shm instance. More detailed adjustments include: - breaking the "basic" and tsdb loading into 2 new funcs: `basic_backfill()` and `tsdb_backfill()` the latter of which is run when the tsdb daemon is discovered. - adjust the fast shm buffer to offset with one day's worth of 1s so that only up to a day is backfilled as history in the fast chart. - adjust bus task starting in `manage_history()` to deliver back the offset indices for both fast and slow shms and set them on the `Feed` object as `.izero_hist/rt: int` values: - allows the chart-UI linked view region handlers to use the offsets in the view-linking-transform math to index-align the history and fast chart.

Turns out querying for a high freq timeframe (like 1sec) will still return a lower freq timeframe (like 1Min) SMH, and no idea if it's the server or the client's fault, so we have to explicitly check the sample step size and discard lower freq series-results. Do this inside `Storage.read_ohlcv()` and return an empty `dict` when the wrong time step is detected from the query result. Further enforcements, - both `.load()` and `read_ohlcv()` now require an explicit `timeframe: int` input to guarantee the time step of the output array. - drop all calls `.load()` with non-timeframe specific input.

If a history manager raises a `DataUnavailable` just assume the sample rate isn't supported and that no shm prepends will be done. Further seed the shm array in such cases as before from the 1m history's last datum. Also, fix tsdb -> shm back-loading, cancelling tsdb queries when either no array-data is returned or a frame is delivered which has a start time no lesser then the least last retrieved. Use strict timeframes for every `Storage` API call.

Factor the multi-sample-rate region UI connecting into a new helper `link_views_with_region()` which reads in the shm buffer offsets from the `Feed` and appropriately connects the fast and slow chart handlers for the linear region graphics. Add detailed comments writeup for the inter-sampling transform algebra.

Not only improves startup latency but also avoids a bug where the rt buffer was being tsdb-history prepended *before* the backfilling of recent data from the backend was complete resulting in our of order frames in shm.

There never was any underlying db bug, it was a hardcoded timeframe in the column series write key.. Now we always assert a matching timeframe in results.

To make it easier to manually read/decipher long ledger files this adds `dict` sorting based on record-type-specific (api vs. flex report) datetime processing prior to ledger file write. - break up parsers into separate routines for flex and api record processing. - add `parse_flex_dt()` for special handling of the weird semicolon stamps in flex reports.

goodboy force-pushed the ib_1m_hist branch 3 times, most recently from a4c67ea to 12de756 Compare September 30, 2022 21:24

goodboy requested a review from guilledk September 30, 2022 21:38

goodboy force-pushed the ib_1m_hist branch 4 times, most recently from 4d7adae to d2b6216 Compare October 10, 2022 13:27

goodboy marked this pull request as ready for review October 26, 2022 16:07

goodboy mentioned this pull request Oct 27, 2022

History/slow chart view follow up #400

Open

12 tasks

goodboy commented Oct 27, 2022

View reviewed changes

goodboy added 16 commits October 28, 2022 16:17

Add 1m ohlc sample rate support to Client.bars(); frame query is 1 day

220981e

Make marketstore storage api timeframe aware

bf7d5e9

The `Store.load()`, `.read_ohlcv()` and `.write_ohlcv()` and `.delete_ts()` now can take a `timeframe: Optional[float]` param which is used to look up the appropriate sampling period table-key from `marketstore`.

Make binance history api accept a timeframe

fce7055

Temporarily disable error on pos size mismatch

6b34c9e

Pass back interal cancel scope from data reset task

72dfeb2

Add timeframe input to kraken history api

25b90af

More correct no-data output handling

54567d3

When we get a timeout or a `NoData` condition still return a tuple of empty sequences instead of `None` from `Client.bars()`. Move the sampling period-duration table to module level.

Explicit fast chart naming, auto-yrange the fast chart on increment

811d21e

Drop duplicate frame request

23d0353

Must have gotten left in during refactor from the `trimeter` version? Drop down to 6 years for 1m sampling.

Support no-disconnect on open_aio_clients() exit

90a395a

Allows for easier restarts of certain `trio` side tasks without killing the `asyncio`-side clients; support via flag. Also fix a bug in `Client.bars()`: we need to return the duration on the empty bars case..

goodboy added 21 commits October 28, 2022 16:17

Subtract duration instead of passing to .subtract() (facepalm)

55dc27a

Comment format tweak

27bd3c0

Add back adhoc symbol lookup support, some exchs info is off

c7f57b9

Drop remaining timeframe scanning from .read_ohlcv()

b7ef059

Only get dbus user on sudo-user-present

f7ec663

Use feed-shm offsets in fill-arrow indexing arithmetic

4ca7817

Do tsdb backloading to shm concurrently

dc1edee

Not only improves startup latency but also avoids a bug where the rt buffer was being tsdb-history prepended *before* the backfilling of recent data from the backend was complete resulting in our of order frames in shm.

Make ib client's .get_head_time() (only) expect an fqsn

2f7b272

Raise DataUnavailable when a contract's 'earliest time' is hit

553d055

Make binance reject 1s OHLC history requests

a1a24da

Only wait on backfill if provider supports timeframe

286228c

Lul, fix timeframe key when writing history

2b231ba

There never was any underlying db bug, it was a hardcoded timeframe in the column series write key.. Now we always assert a matching timeframe in results.

Drop NoData handler, just let it bubble

610fb5f

Raise DataUnavailable on >= 6 no data error events

d5b357b

Drop key error again

fb4f173

Just wipe wrong timeframe filled tsdb colseries for now

df16726

goodboy force-pushed the ib_1m_hist branch from 6b93168 to df16726 Compare October 28, 2022 20:17

goodboy mentioned this pull request Oct 28, 2022

Order ledger entries by processed datetime #412

Closed

goodboy changed the base branch from master to even_moar_kraken_order_fixes October 28, 2022 21:34

Add todo for order duration setting goodTillDuration

1fadf58

Base automatically changed from even_moar_kraken_order_fixes to master October 28, 2022 23:52

guilledk approved these changes Oct 29, 2022

View reviewed changes

goodboy merged commit 11ecf9c into master Oct 29, 2022

goodboy deleted the ib_1m_hist branch October 29, 2022 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ib 1m hist#401

Ib 1m hist#401
goodboy merged 40 commits intomasterfrom
ib_1m_hist

goodboy commented Sep 24, 2022 •

edited

Loading

Uh oh!

goodboy Oct 27, 2022

Uh oh!

goodboy Oct 27, 2022

Uh oh!

goodboy Oct 27, 2022

Uh oh!

goodboy commented Oct 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

goodboy commented Sep 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ib related

Uh oh!

goodboy Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

goodboy Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

goodboy Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

goodboy commented Oct 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goodboy commented Sep 24, 2022 •

edited

Loading

`ib` related