Skip to content

Wait for process completion for sflow script#15809

Merged
liat-grozovik merged 1 commit intosonic-net:masterfrom
weiguo-nvidia:fix_sflow_script
Dec 3, 2024
Merged

Wait for process completion for sflow script#15809
liat-grozovik merged 1 commit intosonic-net:masterfrom
weiguo-nvidia:fix_sflow_script

Conversation

@weiguo-nvidia
Copy link
Copy Markdown
Contributor

Description of PR

Summary: Wait for process completion for sflow script
Fixes #
Sflow reboot testcase pytest sflow/test_sflow.py::TestReboot might fail due to sflow packets are not received in active collector
When open the /tmp/collector0 and /tmp/collector1 files, we can see sflow packets recorded by sflow tool

{"datagramSourceIP":"20.1.1.1","datagramSize":"448","unixSecondsUTC":"1732083938","localtime":"2024-11-20T06:25:38+0000","datagramVersion":"5","agentSubId":"100000","agent":"20.1.1.1","packetSequenceNo":"8","sysUpTime":"50697","samplesInPacket":"3","samples":[{"sampleType_tag":"0:2","sampleType":"COUNTERSSAMPLE","sampleSequenceNo":"1","sourceId":"0:205","elements":[{"counterBlock_tag":"0:1005","ifName":"Ethernet204"},{"counterBlock_tag":"0:1","ifIndex":"205","networkType":"6","ifSpeed":"200000000000","ifDirection":"0","ifStatus":"0","ifInOctets":"0","ifInUcastPkts":"0","ifInMulticastPkts":"4294967295","ifInBroadcastPkts":"4294967295","ifInDiscards":"0","ifInErrors":"0","ifInUnknownProtos":"4294967295","ifOutOctets":"0","ifOutUcastPkts":"0","ifOutMulticastPkts":"4294967295","ifOutBroadcastPkts":"4294967295","ifOutDiscards":"0","ifOutErrors":"0","ifPromiscuousMode":"0"}]},{"sampleType_tag":"0:2","sampleType":"COUNTERSSAMPLE","sampleSequenceNo":"1","sourceId":"0:81","elements":[{"counterBlock_tag":"0:1005","ifName":"Ethernet80"},{"counterBlock_tag":"0:1","ifIndex":"81","networkType":"6","ifSpeed":"200000000000","ifDirection":"0","ifStatus":"3","ifInOctets":"0","ifInUcastPkts":"0","ifInMulticastPkts":"4294967295","ifInBroadcastPkts":"4294967295","ifInDiscards":"0","ifInErrors":"0","ifInUnknownProtos":"4294967295","ifOutOctets":"0","ifOutUcastPkts":"0","ifOutMulticastPkts":"4294967295","ifOutBroadcastPkts":"4294967295","ifOutDiscards":"0","ifOutErrors":"0","ifPromiscuousMode":"0"}]},{"sampleType_tag":"0:2","sampleType":"COUNTERSSAMPLE","sampleSequenceNo":"1","sourceId":"0:25","elements":[{"counterBlock_tag":"0:1005","ifName":"Ethernet24"},{"counterBlock_tag":"0:1","ifIndex":"25","networkType":"6","ifSpeed":"200000000000","ifDirection":"0","ifStatus":"3","ifInOctets":"0","ifInUcastPkts":"0","ifInMulticastPkts":"4294967295","ifInBroadcastPkts":"4294967295","ifInDiscards":"0","ifInErrors":"0","ifInUnknownProtos":"4294967295","ifOutOctets":"0","ifOutUcastPkts":"0","ifOutMulticastPkts":"4294967295","ifOutBroadcastPkts":"4294967295","ifOutDiscards":"0","ifOutErrors":"0","ifPromiscuousMode":"0"}]}]}

Run the ansible/roles/test/files/ptftests/py3/sflow_test.py, get info

collector0 Sampled Packets : Total flow samples -> 800 Total counter samples -> 47

While the test logged:

06:19:28.572  root      : INFO    : collector0 Sampled Packets : Total flow samples -> 0 Total counter samples -> 0

This is very likely due to incorrect process termination in the test:

        outfile = '/tmp/%s' % collector
        with open(outfile, 'w') as f:
            process = subprocess.Popen(['/usr/local/bin/sflowtool', '-j', '-p'] + sflow_port,
                                       stdout=f,
                                       stderr=subprocess.STDOUT,
                                       shell=False
                                       )

            ....

            # Wait for event to be set from Main Thread or to pass out by timeout
            event_is_set = event.wait(timeout=timeout)
            logging.info("{}; Event set: {}".format(
                threading.current_thread().getName(), event_is_set))

        process.terminate()
        f.close()
        with open(outfile, 'r') as sflow_data:
            ...

        logging.info("%s Sampled Packets : Total flow samples -> %s Total counter samples -> %s" %
                     (collector, flow_count, counter_count))
        return (port_sample)

The process.terminate() is sending SIGTERM to sflow but there is a missing process.wait() to wait for process to exit. So, likelly the file content was buffered and not fully written by the time we open the file for reading

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@liat-grozovik liat-grozovik merged commit dfbff2d into sonic-net:master Dec 3, 2024
@weiguo-nvidia weiguo-nvidia deleted the fix_sflow_script branch December 13, 2024 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants