Conversation
|
RPL test test 08-rpl-dao-route-loss-2 fails, I will have a look at it |
|
These DAO errors are problematic. We did a lot of experimentation with this configuration a while back at Thingsquare, as we use DAO routes a lot, and found then that being a bit more aggressive when it comes to rebuilding the network would make the network quicker to repair node outages (which is also the backstory to this particular regression test). Very nice to see more work go into this! |
|
@laurentderu Is there any update on this PR coming soon? The functionality is of interest, but we need all tests to pass before merging. |
|
it's still on my todo list :) I hope to rebase it and have a look at it next week |
|
The issue is caused by a (non) interaction between NDP and RPL : The receiver node has only one parent, node 8 or node 4 after the swap. It is a purely receiving node, the only outgoing traffic is the unicast DAOs to its parent. When node 8 is swapped with node 4, the receiver does not receive DIOs from 8 anymore and so no more DAO are triggered. As the rank of node 4 is exactly the same as node 8, the receiver does not select node 4 as his new parent; and as no more outgoing unicast traffic is performed the rank of node 8 does not increase at all. Also NUD on node 8 would only be triggered if the case of outgoing traffic towards node 8. In 6LoWPAN-ND this issue is avoided as all the host perform a periodic reachability check on all their default routers, so the receiver would discover that node 8 is unreachable and would switch to node 4. I have made a small workaround in uip_ds6_neighbor_periodic(), when a neighbor leave it's REACHABLE state and is a default router, instead of going to STALE state it enters DELAY state in order to force a NUD on it. This mimics the 6LoWPAN-ND behavior and I guess that it's still less energy consuming that triggering a global repair. |
|
Thanks for looking into the problem and proposing a workaround. I'm reluctant to accept a change of the ND implementation's state machine to solve this problem because it may break standard compatibility. The ND implementation should follow RFC 4861, and unless you can show that the workaround is not a problem in this regard, I have to propose that we consider other fixes -- primarily within the RPL implementation. A less appealing alternative would be to embed the current workaround within a preprocessor conditional that checks whether RPL is enabled. |
|
This workaround does not break standard compatibility with NDP, the transition STALE -> DELAY would occurs anyway when the host send an unicast packet to its neighbor (in this case, its preferred parent), so we are only anticipating the transition, not introducing a unexpected transition. I just want to stress that if a node does not send upstream unicast traffic, with the current implementation it is not aware of the changes in the network topology and could be rendered non accessible. The current implementation resolves this by triggering a global repair which is quite extreme. Another workaround more RPL centric (but a bit more complex) would be to use the default router lifetime and discard the preferred parent when the router lifetime expire and trigger the selection another parent. This would require also a modification of the default RPL route lifetime, which is in Contiki is set to 6 month right now |
|
OK, I think that the simplest solution is to embed the ND block in a preprocessor conditional to check for RPL. Furthermore, it would be good with a comment within this block that states the purpose of both the code and the conditional. I also have some minor line comments that follow. @adamdunkels Do you think that this is a suitable solution? |
core/net/ipv6/uip6.c
Outdated
There was a problem hiding this comment.
Please change into /* Packet cannot be forwarded. */
308b43f to
9184c54
Compare
9184c54 to
35e876e
Compare
|
Seems Travis got stuck while trying to access github.com, could someone restart the job ? |
|
Sure, I'll restart the Travis tests. |
|
Code rebased and updated as suggested and travis is happy this time |
|
I think this looks good, 👍 from me! |
|
👍 |
Currently when a packet with a Forward Error set in the Hop-by-Hop option is received by a RPL node, it is forwarded back to its originator. When such a packet reaches the Border Router it trigger a global repair. This means that any route error in the DAG will trigger a complete reconfiguration of the network, wasting a huge amount of energy as all the timer are reset and all the nodes must learn again their neighbourhood. In our testbed we have observed global repair triggered several times per hour.
As solution we have removed the forwarding of the packet with forward error flag set and instead we send back a No-Path DAO to remove the offending route on the originating node, this is as suggested by the RFC. On the BR router, the route is simply removed and the DIO timer is reset to trigger DAO from all the children.
Also, packets with the rank error flag set are currently forwarded, this is forbidden by the RFC (except for the first error as one hop rank errors are tolerated), instead the packet should be dropped and the DIO timer must be reset to refresh the nodes rank.