Skip to content

Ib 1m hist#401

Merged
goodboy merged 40 commits intomasterfrom
ib_1m_hist
Oct 29, 2022
Merged

Ib 1m hist#401
goodboy merged 40 commits intomasterfrom
ib_1m_hist

Conversation

@goodboy
Copy link
Copy Markdown
Contributor

@goodboy goodboy commented Sep 24, 2022

Super WIP Ready for action, but just a start at doing 1m long term (slow) chart alongside our classic 1s OHLC in the fast chart.

This adjust our history loading in the data feed layer (piker.data.feed) to do multi-time frame data loading concurrently and in a highly reliable manner such that both can be stored in the tsdb as well as explicitly queried, loaded and processed in shared mem arrays.


ib related

Since ib is the only currently supported backend with 1s OHLC history, this patch focuses around it but contains necessary adjustments to handle backends (like all the crypto$) which don't have this support (at least not without us writing our own sampler). When a backend doesn't have 1s OHLC history the fast chart simply starts empty and starts filling when the brokerd feed is first booted - during the pikerd parent's lifetime.

Further enhancements in this backend include:

  • way better and faster history loading by re-jigging the data feed reset hack task-concurrency (6 years spy loaded to marketstore in like, <= 2mins 🥳 )
    • adjust the frame query timeout to 3s
    • also includes a mutex around the reset hack request (task) to support multi-symbol backfilling from multiple clients
  • add back support for the the ad-hoc symbol table system for things like bitcoin futes (brr.cmecrypto) which seems to require inconsistent contract params when selecting in ib_insync
  • (28535fa) add feed reconnect task which reloads whenever a network/feed reset event is detected to avoid hanging for whatever internal timeout-reset ibs api does..
  • api adjustments to get the first datetime from Client.get_head_time() with an fqsn input str and use this stamp as the earliest stamp allowed before raising DataUnavailable to the history mgmt layer
  • also add a "no data"-for-x-queries threshold where after 6 days worth of empty frame-results we presume the contract has no earlier history and we also raise a DataUnavailable
  • a variety of other small backend-internal improvements to the history loading apis and mechanics to support the above.
  • (ceca0d9) some tweaks to trades ledger parsing/loading as briefly mentioned in Order ledger entries by processed datetime #412 but that need to land with this change set.

The summary of enhancements and bug fixes is more or less in the todo section below:


TODO:

  • avoid throttle state condition that shows up with too many open data reset hack requests:
    • use global (mutex) state var to support simultaneous contract queries (dabb9e8)
  • improving the the 1m loading algo to avoid slow waits on queries after 3s (completed and working well after dabb9e8) also)
    • multi-contract history queries need a mutex around the data reset hackery
    • better task conc around data resets: one task for query, one for gw reset poll loop
  • actually a day (or more)'s worth of 1s day (went with 6d if tsdb is up and 1 if not)
    • if the backend suports 1s OHLC write loaded frames to tsdb
    • handle providers who don't support 1s

@goodboy goodboy force-pushed the ib_1m_hist branch 3 times, most recently from a4c67ea to 12de756 Compare September 30, 2022 21:24
@goodboy goodboy requested a review from guilledk September 30, 2022 21:38
@goodboy goodboy force-pushed the ib_1m_hist branch 4 times, most recently from 4d7adae to d2b6216 Compare October 10, 2022 13:27
@goodboy goodboy marked this pull request as ready for review October 26, 2022 16:07
@goodboy goodboy mentioned this pull request Oct 27, 2022
12 tasks
datetime, # start
datetime, # end
]:
if timeframe != 60:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is how we indicate that a brokerd can't deliver 1s OHLC.

to_prepend = ohlcv[ohlcv['time'] < ts['Epoch'][0]]

profiler('Finished db arrays diffs')
# for secs in (1, 60):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just ignore most of this stuff since it's part of provisional tools we likely want for interactive tsdb mucking.

It relies on goodboy/tractor#306 which is still far from ready 😂


godwidget.resize_all()

await link_views_with_region(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this factoring was mostly for sanity and because i don't even really get what layer this whole subsystem fits in 😂

@goodboy
Copy link
Copy Markdown
Contributor Author

goodboy commented Oct 28, 2022

Just pushed a bad timeframe wiper hack @guilledk

Allow data feed sub-system to specify the timeframe (aka OHLC sample
period) to the `open_history_client()` delivered history fetching API.
Factor the data keycombo hack into a new routine to be used also from
the history backfiller code when request latency increases; there is
a first draft at trying to use the feed reset to speed up 1m frame
throttling by timing out on the history frame response, but it needs
a lot of fine tuning.
The `Store.load()`, `.read_ohlcv()` and `.write_ohlcv()` and
`.delete_ts()` now can take a `timeframe: Optional[float]` param which
is used to look up the appropriate sampling period table-key from
`marketstore`.
Adjust all history query machinery to pass a `timeframe: int` in seconds
and set default of 60 (aka 1m) such that history views from here forward
will be 1m sampled OHLCV. Further when the tsdb is detected as up load
a full 10 years of data if possible on the 1m - backends will eventually
get a config section (`brokers.toml`) that allow user's to tune this.
Manual tinker-testing demonstrated that triggering data resets
completely independent of the frame request gets more throughput and
further, that repeated requests (for the same frame after cancelling on
the `trio`-side) can yield duplicate frame responses. Re-work the
dual-task structure to instead have one task wait indefinitely on the
frame response (and thus not trigger duplicate frames) and the 2nd data
reset task poll for the first task to complete in a poll loop which
terminates when the frame arrives via an event.

Dirty deatz:
- make `get_bars()` take an optional timeout (which will eventually be
  dynamically passed from the history mgmt machinery) and move request
  logic inside a new `query()` closure meant to be spawned in a task
  which sets an event on frame arrival, add data reset poll loop in the
  main/parent task, deliver result on nursery completion.
- handle frame request cancelled event case without crash.
- on no-frame result (due to real history gap) hack in a 1 day decrement
  case which we need to eventually allow the caller to control likely
  based on measured frame rx latency.
- make `wait_on_data_reset()` a predicate without output indicating
  reset success as well as `trio.Nursery.start()` compat so that it can
  be started in a new task with the started values yielded being
  a cancel scope and completion event.
- drop the legacy `backfill_bars()`, not longer used.
It doesn't seem to be any slower on our least throttled backend
(binance) and it removes a bunch of hard to get correct frame
re-ordering logic that i'm not sure really ever fully worked XD

Commented some issues we still need to resolve as well.
When we get a timeout or a `NoData` condition still return a tuple of
empty sequences instead of `None` from `Client.bars()`. Move the
sampling period-duration table to module level.
This allows the history manager to know the decrement size for
`end_dt: datetime` on the next query if a no-data / gap case was
encountered; subtract this in `get_bars()` in such cases. Define the
expected `pendulum.Duration`s in the `.api._samplings` table.

Also add a bit of query latency profiling that we may use later to more
dynamically determine timeout driven data feed resets. Factor the `162`
error cases into a common exception handler block.
Must have gotten left in during refactor from the `trimeter` version?
Drop down to 6 years for 1m sampling.
Allows for easier restarts of certain `trio` side tasks without killing
the `asyncio`-side clients; support via flag.

Also fix a bug in `Client.bars()`: we need to return the duration on the
empty bars case..
When a network outage or data feed connection is reset often the
`ib_insync` task will hang until some kind of (internal?) timeout takes
place or, in some (worst) cases it never re-establishes (the event
stream) and thus the backend needs to restart or the live feed will
never resume..

In order to avoid this issue once and for all this patch implements an
additional (extremely simple) task that is started with the  real-time
feed and simply waits for any market data reset events; when detected
restarts the `open_aio_quote_stream()` call in a loop using
a surrounding cancel scope.

Been meaning to implement this for ages and it's finally working!
Allows keeping mutex state around data reset requests which (if more
then one are sent) can cause a throttling condition where ib's servers
will get slower and slower to conduct a reconnect. With this you can
have multiple ongoing contract requests without hitting that issue and
we can go back to having a nice 3s timeout on the history queries before
activating the hack.
Our default sample periods are 60s (1m) for the history chart and 1s for
the fast chart. This patch adds concurrent loading of both (or more)
different sample period data sets using the existing loading code but
with new support for looping through a passed "timeframe" table which
points to each shm instance.

More detailed adjustments include:
- breaking the "basic" and tsdb loading into 2 new funcs:
  `basic_backfill()` and `tsdb_backfill()` the latter of which is run
  when the tsdb daemon is discovered.
- adjust the fast shm buffer to offset with one day's worth of 1s so
  that only up to a day is backfilled as history in the fast chart.
- adjust bus task starting in `manage_history()` to deliver back the
  offset indices for both fast and slow shms and set them on the
  `Feed` object as `.izero_hist/rt: int` values:
  - allows the chart-UI linked view region handlers to use the offsets
    in the view-linking-transform math to index-align the history and
    fast chart.
Turns out querying for a high freq timeframe (like 1sec) will still
return a lower freq timeframe (like 1Min) SMH, and no idea if it's the
server or the client's fault, so we have to explicitly check the sample
step size and discard lower freq series-results. Do this inside
`Storage.read_ohlcv()` and return an empty `dict` when the wrong time
step is detected from the query result.

Further enforcements,
- both `.load()` and `read_ohlcv()` now require an explicit `timeframe:
  int` input to guarantee the time step of the output array.
- drop all calls `.load()` with non-timeframe specific input.
If a history manager raises a `DataUnavailable` just assume the sample
rate isn't supported and that no shm prepends will be done. Further seed
the shm array in such cases as before from the 1m history's last datum.

Also, fix tsdb -> shm back-loading, cancelling tsdb queries when either
no array-data is returned or a frame is delivered which has a start time
no lesser then the least last retrieved. Use strict timeframes for every
`Storage` API call.
Factor the multi-sample-rate region UI connecting into a new helper
`link_views_with_region()` which reads in the shm buffer offsets from
the `Feed` and appropriately connects the fast and slow chart handlers
for the linear region graphics. Add detailed comments writeup for the
inter-sampling transform algebra.
Not only improves startup latency but also avoids a bug where the rt
buffer was being tsdb-history prepended *before* the backfilling of
recent data from the backend was complete resulting in our of order
frames in shm.
There never was any underlying db bug, it was a hardcoded timeframe in
the column series write key.. Now we always assert a matching timeframe
in results.
To make it easier to manually read/decipher long ledger files this adds
`dict` sorting based on record-type-specific (api vs. flex report)
datetime processing prior to ledger file write.

- break up parsers into separate routines for flex and api record
  processing.
- add `parse_flex_dt()` for special handling of the weird semicolon
  stamps in flex reports.
@goodboy goodboy changed the base branch from master to even_moar_kraken_order_fixes October 28, 2022 21:34
Base automatically changed from even_moar_kraken_order_fixes to master October 28, 2022 23:52
@goodboy goodboy merged commit 11ecf9c into master Oct 29, 2022
@goodboy goodboy deleted the ib_1m_hist branch October 29, 2022 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants