auto-dispose, reconnects, readcdr, static analysis, refactoring #512

eboasson · 2020-05-13T14:42:24Z

This is terribly large PR and not at all following the general rule that one should not mix unrelated things in a single PR. It is rather bad to not give a good example, but I have nonetheless decided to do it this way:

Each commit (potentially) builds on top of the preceding and so it doesn’t really make life simpler anyway considering that it all ends up in my lap anyway.
A good number of the small commits that are in here were the result of a discovery at some point, then inserted at the appropriate point in the history to have each commit a working (verified on Fedora 32 with the address sanitizer enabled).
A large large part of it comes from two other PRs that I would have merged into master had I not held back for other reasons. They've been sitting there for long enough now, and with the 0.6.0 release process under way, ROS2 Dashing not yet having moved to releases/0.5.x branch, and the preparations for merging security support into master it really makes more sense to merge these into the security branch.

I welcome anyone commenting on it. I'd recommend looking at individual commits ... In particular I'd be much obliged if @MarcelJordense could find some time to look at: d2ca446, 9579461, cacd39a and 50fa943; and if
@dennis-adlink would find time to look at: b0a19ef, 816d016 because I think they know that code well enough to do meaningful review without taking an inordinate amount of time.

What follows is a summary:

Part the First

In which #499 and #505 are moved over to the security branch. They are very nearly identical: a few types have been given a better name, a few unsigneds got changed to uint32_ts and a complete mess-up in one of the intermediate commits in #499 such that it wouldn’t even compile. For those who like to read diffs of diffs comparing them is easy enough ...

In short, from #499:

improve the tracing of sample content by distinguishing between unsigned and signed integers and floating-point numbers
bring sanity to the implementation of dds_read_instance (and similar)
add dds_readcdr (this affects the RHC plugin interface)
refactor & reduce code duplication in the implementation of reading/taking
strip out some wrapper functions that served no purpose in the RHC implementation

and #505:

saner behaviour of auto-dispose when there are multiple active writers: do the auto-dispose when the instance would otherwise have gone to the NOT_ALIVE_NO_WRITERS state
make dds_lookup_instance on a built-in topic work instead of crash because of a missing implementation of the from_sample operation …
various bits of refactoring

(This part ends at "Rework listener tests" 24ccb39)

Part the Second

In which the security branch went for looking cover. It was there that a few mouldy spots were discovered. A number of them could have developed into real problems so it is a good thing that this happened. That’s:

d2ca4467 Vet lengths of property lists in crypto deserialization
b0a19ef2 Fix double free if local identity validation fails on certificate expiry
816d0160 Memory leak reading access control configuration
95794611 Atomic update next heartbeat time for p2p writers
cacd39a7 Fix leak when inserting p2p message in WHC fails
50fa943b Fix conversion of user_data to security plugin
9e827098 Properly pair entity_pin/mutex_lock

Part the Third

In which the reworked listener tests were subjected to a bit of further rework: it makes vastly more sense to have an init/step/fini set of functions so that this little monster can be paired with custom code. No doubt there is more work to be done (e.g., turn the individual operations into nice functions so you can use them in more contexts, some of them might come in useful).

This then led to a few more things:

The instance state is now set to ALIVE when a sample is rejected (too old, doesn’t pass the filter), which means at least transient-local data will always go back to ALIVE after a reconnect.
It turns out that after a disconnect, rediscovery is blocked for 10s. For no good reason that I can think of. (Being able to block it for a while can be useful, but not for doing so by default.)
The dds_merge_listeners function changed from copying each listener from the source to the destination if and only if the destination is clear (for that listener) and the source is set (for that listener). Before, it’d copy it if the destination was clear, regardless of whether the source was set. The subtle difference in lies in the listener arguments, and I think this makes more sense. (Need to add set/get stuff that includes the argument …)
A macro DDS_STATUS_ID_MAX got added to the interface that makes life easier for anyone wanting to range over all status IDs.
The implementation of dds_wait_for_acks is now capable of waiting for acknowledgements from a specific reader, though this is not currently exposed in the interface. It’s worth considering, but there needs to be a compelling reason to do so, I’d say.

Part the Fourth

In which Clang’s static analyzer in its latest incarnation picked up a few dead stores (all eliminated) and GCC 10’s complained about a few things more. This does not yet eliminate all warnings from GCC 10.

eboasson · 2020-05-14T07:18:11Z

Thanks @MarcelJordense

* Add a flag to indicate signed integral values and one to indicate floating-point values * Set these flags in the output of idlc * Use them when printing sample contents to the trace By encoding the information as flags in reserved bits the actual serialization and deserialization is unaffected. Signed-off-by: Erik Boasson <[email protected]>

Scanning all instances was never good for anything: the RHC is organised as hash table on instance id (which is an alias for "instance handle") and it was always designed to do this with a fast lookup. Signed-off-by: Erik Boasson <[email protected]>

Signed-off-by: Erik Boasson <[email protected]>

Do not pass a dangling pointer to update_conditions_locked after dropping an instance. The dangling pointer did not actually get dereferenced because of the state changes caused by dropping the samples, but that is cutting a bit fine. Signed-off-by: Erik Boasson <[email protected]>

Deadline registration, renewal and deregistration was somewhat spread through the code and relied on the "isdisposed" flag as a proxy for whether it was registered or not. This consolidates the deadline handling code in a final step of updating the instance and uses a separate flag to track whether the instance is currently registered in the deadline administration or not. This also makes it possible to trivially change the rules for when deadline notifications are required, and so allows for, e.g., adding a mode in which instances in the "no writers" state do not trigger any deadline missed notifications, or just once (both of which seem useful modes). Signed-off-by: Erik Boasson <[email protected]>

This changes the behaviour of auto-dispose writers: instead of always disposing when the writer disposes the data, it now only disposes the data when the instance would otherwise go to the "no writers" state. This only affects the behaviour when there are multiple writers for the same instance. In case the writers use a different value for the auto-dispose setting, it now tracks whether an instance has ever been touched by an writer with auto-dispose enabled, and treats auto-disposes the instance when the last writer leaves if this is the case. This way, if an instance is registered by one auto-dispose and one non-auto-dispose writer, the order of unregistering does not matter. Signed-off-by: Erik Boasson <[email protected]>

The entire point of this test program is to exercise the RHC while checking its internal state. The likelihood of (at least some) forgetting to enable the "expensive" checks has been proven to be significant. Signed-off-by: Erik Boasson <[email protected]>

Use of an auto-dispose writer meant the NO_WRITERS case did not actually get tested. The behaviour of the implementation was to generate deadline missed notifications for such instances, but the test expected otherwise. There is a disagreement between different DDS implementations on the desirability of generating deadline missed notifications for NOT_ALIVE instances. Deadline notifications on DISPOSED instances seems silly, as it means end-of-life. Deadline notifications on a NO_WRITERS instance are certainly valuable for applications that don't pay attention to the number of writers (otherwise one has to monitor both liveliness changed and deadline missed notifications to be be sure to get some notification). Different usage patterns definitely affect what is desirable and I doubt one-size-fits-all is the right approach. This commit changes the test and retains the behaviour, and if it errs, it at least errs on the side of caution. Signed-off-by: Erik Boasson <[email protected]>

Disposing an instance would only add an invalid sample if the instance is empty, but it should also do so when the latest sample is read. Otherwise reading all NOT_READ samples gives nothing or nonsensical output. Signed-off-by: Erik Boasson <[email protected]>

Signed-off-by: Erik Boasson <[email protected]>

The standard defines GUIDs as an array of 16 uint8_t and so they are presented on the network and in built-in topic samples. Internally they are arrays of 4 uint32_t, requiring byte-order conversion. A keyhash is also an array of 16 uint8_t, and used almost exclusively on the network. The exception is the generation of built-in topic samples, which relies on the "from_keyhash" function. One would expect the keyhash here to contain the GUID in the external representation, and this commit adds the byte-order conversions to conform to the expectation. Signed-off-by: Erik Boasson <[email protected]>

One cannot create writers for built-in topics, therefore one generally does not create samples for them. However, one can lookup an instance handle from a sample with just a key value in it, and so the function is needed. Signed-off-by: Erik Boasson <[email protected]>

A few failures to signal DATA_AVAILABLE (as well as some where it was signalled unnecessarily) were discovered while refactoring the RHC despite the tests all passing. Clearly the tests were inadequate. The enormous amount of boilerplate in the tests prompted a small rewrite to a programmable listener invocation tester that one simply feeds a noise-like one-liner in a string. This trades the boilerplate for somewhat inscrutable code. Signed-off-by: Erik Boasson <[email protected]>

Coverity has difficulty observering that dds_entity_pin / ddsrt_mutex_lock / dds_entity_unlock is correct. It is perhaps a bit confusing, so change it. Signed-off-by: Erik Boasson <[email protected]>

The security plugins currently use the standardized representations of octet sequences, unlike the DDSI stack's internal representation. A shallow copy is therefore not simply a memcpy. Signed-off-by: Erik Boasson <[email protected]>

Signed-off-by: Erik Boasson <[email protected]>

Sending a heartbeat to all matched readers for the P2P builtin participant volatile secure writer unlocks the writer before pushing each individual message out, and so determining the time of the next heartbeat event before writing and updating it afterwards means the state may have changed. While this is appears benign, it is better to do the update atomically. Signed-off-by: Erik Boasson <[email protected]>

Signed-off-by: Erik Boasson <[email protected]>

The memory allocation in deserializing property lists within the crypto code should not trust the deserialized length and try to allocate that much memory but should first verify that the length is consistent with the number of bytes remaining in the input. (Noted by Coverity as use of tainted data.) Signed-off-by: Erik Boasson <[email protected]>

Signed-off-by: Erik Boasson <[email protected]>

This leaves the argument pointer in the destination unchanged, rather than resetting it to an irrelevant value. Signed-off-by: Erik Boasson <[email protected]>

The code for executing one-liner tests might be more generally useful. Signed-off-by: Erik Boasson <[email protected]>

In particular, this means instances published by a transient-local writer will go back to ALIVE following a disconnect and reconnect. Signed-off-by: Erik Boasson <[email protected]>

Signed-off-by: Erik Boasson <[email protected]>

The dds_wait_for_acks function follows the DCPS specification and allows waiting for all matching readers to have acknowledged all data written prior to that point. This commit leaves the API unchanged but extends the implementation to make it possible to wait until a specific reader has acknowledged everything, as this is a useful device in testing with deliberate one-way disconnections using dds_domain_set_deafmute. Signed-off-by: Erik Boasson <[email protected]>

Signed-off-by: Erik Boasson <[email protected]>

When defining a new topic, typically the serializer instructions that are usually in constant memory and generated by the IDL compiler are copied into memory managed by the Cyclone implementation. For this it needs to compute the size of the serializer, which the IDL compiler doesn't provide. It does this by effectively dry-running the program. (Note that it doesn't validate the program.) All but the JSR operations move the program counter forward, but the JSR operation can cause it to go backward instead and allows implementing recursive types (the IDL compiler doesn't support them, but one might decide to work around that limitation). When dry-running the program, following a backwards jump can cause a non-terminating loop. The jump could potentially be to an unexplored address and so ignoring all backwards jumps potentially means it skips part of the program. As this is not a validator and the program can always be arranged so that a following a backwards jump is not relevant to computing the size correctly, this is reasonable approximation. Signed-off-by: Erik Boasson <[email protected]>

* Compute the time at which the handshake must have completed from the initial timeout specification, rather than using it as a timeout for the individual steps of the handshake * If the handshake fails because an expected message is not present, print this, including whether the timeout occured because the message queue was empty or because the expected message could not be found in a non-empty queue. * Replace the 0.5s per-step timeout to a single 2s timeout. Signed-off-by: Erik Boasson <[email protected]>

Signed-off-by: Erik Boasson <[email protected]>

* Increase the matching timeout to 5s (there are some hints the failures on Travis are timing related) * Replace the relative timeout in the waitset by a timestamp so that it gives up after the specified timeout regardless of the number of events that occur Signed-off-by: Erik Boasson <[email protected]>

This was referenced May 13, 2020

Preparatory work for implementing autopurge on RHC #505

Closed

Improvements to dds_read_instance, the addition of dds_readcdr of nicer printing of samples #499

Closed

eboasson added 27 commits May 15, 2020 15:35

Add a dds_readcdr analogous to dds_takecdr

ed4b4fb

Signed-off-by: Erik Boasson <[email protected]>

Remove a layer of wrapper functions in default RHC

0563aec

Signed-off-by: Erik Boasson <[email protected]>

Refactor storing and unregistering writers in RHC

eca9b69

Signed-off-by: Erik Boasson <[email protected]>

Properly pair entity_pin/mutex_lock

ad86b1f

Coverity has difficulty observering that dds_entity_pin / ddsrt_mutex_lock / dds_entity_unlock is correct. It is perhaps a bit confusing, so change it. Signed-off-by: Erik Boasson <[email protected]>

Fix conversion of user_data to security plugin

fe5dbb9

The security plugins currently use the standardized representations of octet sequences, unlike the DDSI stack's internal representation. A shallow copy is therefore not simply a memcpy. Signed-off-by: Erik Boasson <[email protected]>

Fix leak when inserting p2p message in WHC fails

4ce1e39

Signed-off-by: Erik Boasson <[email protected]>

Memory leak reading access control configuration

9ded3e5

Signed-off-by: Erik Boasson <[email protected]>

Fix double free if local identity validation fails on certificate expiry

74f4246

Signed-off-by: Erik Boasson <[email protected]>

No reason to prevent rediscovery for 10s by default

671c70c

Signed-off-by: Erik Boasson <[email protected]>

Add DDS_STATUS_ID_MAX for ranging over status ids

4101cd0

Signed-off-by: Erik Boasson <[email protected]>

Merging listeners: only when source is set

1494d11

This leaves the argument pointer in the destination unchanged, rather than resetting it to an irrelevant value. Signed-off-by: Erik Boasson <[email protected]>

Refactor oneliner listener tests

f5fd5e6

The code for executing one-liner tests might be more generally useful. Signed-off-by: Erik Boasson <[email protected]>

Instance to ALIVE also for rejected samples

6d60047

In particular, this means instances published by a transient-local writer will go back to ALIVE following a disconnect and reconnect. Signed-off-by: Erik Boasson <[email protected]>

Remove dead stores triggering clang warnings

4fe819a

Signed-off-by: Erik Boasson <[email protected]>

eboasson added 6 commits May 15, 2020 15:35

Assertions to help clang static analyzer

097065f

Signed-off-by: Erik Boasson <[email protected]>

Fix documentation comment

80087f4

Signed-off-by: Erik Boasson <[email protected]>

Use C99 flex array member instead of length-1 array

10980ec

Signed-off-by: Erik Boasson <[email protected]>

Add wait-for-acknowledgement to oneliner tests

d7773e0

Signed-off-by: Erik Boasson <[email protected]>

eboasson force-pushed the sec-coverity branch from 1c9bb44 to 5f78e7d Compare May 15, 2020 13:36

eboasson added 3 commits May 16, 2020 08:56

Do not respond to an ignored SPDP message

8a86af7

Signed-off-by: Erik Boasson <[email protected]>

eboasson merged commit 2aa7054 into eclipse-cyclonedds:security May 16, 2020

eboasson deleted the sec-coverity branch May 10, 2021 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

auto-dispose, reconnects, readcdr, static analysis, refactoring #512

auto-dispose, reconnects, readcdr, static analysis, refactoring #512

Uh oh!

eboasson commented May 13, 2020

Uh oh!

eboasson commented May 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

auto-dispose, reconnects, readcdr, static analysis, refactoring #512

auto-dispose, reconnects, readcdr, static analysis, refactoring #512

Uh oh!

Conversation

eboasson commented May 13, 2020

Part the First

Part the Second

Part the Third

Part the Fourth

Uh oh!

eboasson commented May 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants