Skip to content

BUG REPORT [ the NullPtr bug within communication between two pkgs -- nav2_planner & nav2_costmap_2d] #3940

@GoesM

Description

@GoesM

<1> Bug Description

Bug Type: NullPtr referenced

nav2_planner referenced a NULL-Pointer of nav2_costmap_2d

workstation_environment set

ROS system navigation2 TURTLEBOT3_MODEL
ROS2-humble nav2_humble "waffle"

code location

/navigation2/nav2_costmap_2d/src/layered_costmap.cpp

/navigation2/nav2_planner/src/planner_server.cpp

function isCurrent() from /navigation2/nav2_costmap_2d/src/layered_costmap.cpp

bool LayeredCostmap::isCurrent()
{
  current_ = true;
  for (vector<std::shared_ptr<Layer>>::iterator plugin = plugins_.begin();
    plugin != plugins_.end(); ++plugin)
  {
    current_ = current_ && ((*plugin)->isCurrent() || !(*plugin)->isEnabled());
  }
  for (vector<std::shared_ptr<Layer>>::iterator filter = filters_.begin();
    filter != filters_.end(); ++filter)
  {
    current_ = current_ && ((*filter)->isCurrent() || !(*filter)->isEnabled());
  }
  return current_;
}

isCurrent() is accessed by /navigation2/nav2_planner/src/planner_server.cpp

void PlannerServer::waitForCostmap()
{
  // Don't compute a plan until costmap is valid (after clear costmap)
  rclcpp::Rate r(100);
  while (!costmap_ros_->isCurrent()) {
    r.sleep();
  }
}

<2> References [ log_files ]

More details are provided here.

function calling stack [by Asan report]

=================================================================
==23106==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000000a1 (pc 0x7f18e85ce4e0 bp 0x7f18da8b7af0 sp 0x7f18da8b7a18 T54)
==23106==The signal is caused by a WRITE memory access.
==23106==Hint: address points to the zero page.
    #0 0x7f18e85ce4e0 in nav2_costmap_2d::LayeredCostmap::isCurrent() (/.../..../...../navigation2/install/nav2_costmap_2d/lib/libnav2_costmap_2d_core.so+0xda4e0) (BuildId: 41e31d3f75e7a644fd0534297548b67bd616808d)
    #1 0x7f18e8d79c87 in nav2_planner::PlannerServer::waitForCostmap() (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x179c87) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #2 0x7f18e8d70740 in nav2_planner::PlannerServer::computePlan() (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x170740) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #3 0x7f18e8e74909 in nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>::work() (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x274909) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #4 0x7f18e8e73d88 in std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::ComputePathToPose> >)::'lambda'()> >, void>::operator()() const (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x273d88) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #5 0x7f18e8e73aa9 in std::enable_if<is_invocable_r_v<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::ComputePathToPose> >)::'lambda'()> >, void>&>, std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> >::type std::__invoke_r<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::ComputePathToPose> >)::'lambda'()> >, void>&>(std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::ComputePathToPose> >)::'lambda'()> >, void>&) (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x273aa9) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #6 0x7f18e8e73930 in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::ComputePathToPose> >)::'lambda'()> >, void> >::_M_invoke(std::_Any_data const&) (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x273930) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #7 0x7f18e8e73546 in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x273546) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #8 0x7f18e6a99ee7 in __pthread_once_slow nptl/./nptl/pthread_once.c:116:7
    #9 0x7f18e8e7170a in std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::ComputePathToPose> >)::'lambda'()> >, void>::_M_run() (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x27170a) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #10 0x7f18e6edc252  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc252) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
    #11 0x7f18e6a94ac2 in start_thread nptl/./nptl/pthread_create.c:442:8
    #12 0x7f18e6b26a3f  misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
...
...
...
>Thread T15 created by T0 here:
    #0 0x55be89e317dc in __interceptor_pthread_create (/.../..../...../navigation2/install/nav2_planner/lib/nav2_planner/planner_server+0x977dc) (BuildId: 595dafd2fd88fddc63cc968e55abb47694f6d722)
    #1 0x7f18e6edc328 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc328) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
    #2 0x7f18e8e5262c in nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>::SimpleActionServer(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeClockInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeLoggingInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeWaitablesInterface>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>, std::function<void ()>, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, rcl_action_server_options_s const&) (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x25262c) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #3 0x7f18e8e4fd78 in nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>::SimpleActionServer<std::shared_ptr<nav2_util::LifecycleNode> >(std::shared_ptr<nav2_util::LifecycleNode>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>, std::function<void ()>, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, rcl_action_server_options_s const&) (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x24fd78) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #4 0x7f18e8d9c42d in std::_MakeUniq<nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose> >::__single_object std::make_unique<nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose>, std::shared_ptr<nav2_util::LifecycleNode>, char const (&) [21], std::_Bind<void (nav2_planner::PlannerServer::* (nav2_planner::PlannerServer*))()>, std::nullptr_t, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool>(std::shared_ptr<nav2_util::LifecycleNode>&&, char const (&) [21], std::_Bind<void (nav2_planner::PlannerServer::* (nav2_planner::PlannerServer*))()>&&, std::nullptr_t&&, std::chrono::duration<long, std::ratio<1l, 1000l> >&&, bool&&) (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x19c42d) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #5 0x7f18e8d6e702 in nav2_planner::PlannerServer::on_configure(rclcpp_lifecycle::State const&) (/.../..../...../navigation2/install/nav2_planner/lib/libplanner_server_core.so+0x16e702) (BuildId: 88d2332dc0a7b289359c7c3dfdca327d13582e9f)
    #6 0x7f18e91a0b8c  (/opt/ros/humble/lib/librclcpp_lifecycle.so+0x28b8c) (BuildId: ccb1b39efb3eaee63de490386a8850a5752f307d)
==23106==ABORTING

planner_server work_log [by ros_node_log]

We have met the same bug more than 50 times totally, just in one week

Each time, the planner_server met the bug after a [INFO] as: "Received request to clear entirely the global_costmap",

The following is representative log_files of planner_server from these 50 attempts

...
...
[INFO] [*****3176.955120154] [planner_server]: Configuring plugin GridBased of type NavfnPlanner
[INFO] [*****3176.955580448] [planner_server]: Planner Server has GridBased  planners available.
[INFO] [*****3177.746643121] [planner_server]: Activating
[INFO] [*****3177.746690562] [global_costmap.global_costmap]: Activating
[INFO] [*****3177.746702439] [global_costmap.global_costmap]: Checking transform
[INFO] [*****3177.746767047] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: Lookup would require extrapolation into the past.  Requested time 0.647000 but the earliest data is at time 1.201000, when looking up transform from frame [base_link] to frame [map]
[INFO] [*****3178.246867321] [global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: Lookup would require extrapolation into the past.  Requested time 1.157000 but the earliest data is at time 1.201000, when looking up transform from frame [base_link] to frame [map]
[INFO] [*****3178.747014193] [global_costmap.global_costmap]: start
[INFO] [*****3179.798119907] [planner_server]: Activating plugin GridBased of type NavfnPlanner
[INFO] [*****3179.799260621] [planner_server]: Creating bond (planner_server) to lifecycle manager.
[INFO] [*****3180.284098882] [global_costmap.global_costmap]: Message Filter dropping message: frame 'base_scan' at time 0.601 for reason 'the timestamp on the message is earlier than all the data in the transform cache'
[WARN] [*****3196.562101008] [planner_server]: GridBased: failed to create plan with tolerance 0.50.
[WARN] [*****3196.562149795] [planner_server]: Planning algorithm GridBased failed to generate a valid path to (0.56, 1.51)
[WARN] [*****3196.562165888] [planner_server]: [compute_path_to_pose] [ActionServer] Aborting handle.
[INFO] [*****3196.580592290] [global_costmap.global_costmap]: Received request to clear entirely the global_costmap
[INFO] [*****3196.642112157] [global_costmap.global_costmap]: Received request to clear entirely the global_costmap
[WARN] [*****3196.753239141] [planner_server]: GridBased: failed to create plan with tolerance 0.50.
[WARN] [*****3196.753289160] [planner_server]: Planning algorithm GridBased fa

at this line, the planner_server shutdown suddenly.

<3> Analysis

In what situations bugs would happen?

When executing instructions [ nav2_goal action sent by user ], planner-node need the current costmap-result so that would access the pointer variable [ costmap_ros_ ] from nav2_costmap_2d

However, it seems that there's no checks by planner_server before accessing the pointer,

There may be a coincidental collision causing the bug:

nav2_costmap_2d node happens to have reached its lifecycle or is undergoing recalculation due to changes in sensors_msg (like odom and scan) . At this stage, the pointer may have been released, and at this point, the planner_server coincidentally called its pointer for the need of executing an action instruction , finally resulting in a null pointer access.

<4> POC design

This seems to be a concurrency problem caused by multithreaded execution, with a high frequency of bugs but also full of coincidences, so we cannot provide a 100% successful POC design. But we can provide some ideas to try and trigger the bug:

Method 1:

At present, it seems that when a certain situation occurs (such as changes in sensor information or end of life cycle), it will lead to nav2_ costmap_ 2d's operation -- "clear entirely the global_costmap".

Therefore, it could be tried:

when program executing nav2_goal aciton normally, we could send some interference sensor messages, constantly make nav2_ costmap_2d perform the operations of clearing or recalculating, at the same time observing if the same bug will occur.

Method 2:

this method is just used for checking the bug, but not a real POC:

when program executing nav2_goal aciton normally, try to restart nodes related to nav2_costmap_ 2d;

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions