Skip to content
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
e4fbed9
impacted are based PR test
yutongzhang-microsoft Sep 23, 2024
9b45596
Add three types of subtype topology
yutongzhang-microsoft Oct 25, 2024
0fe1a50
Merge branch 'sonic-net:master' into yutongzhang/pr_impacted_area
yutongzhang-microsoft Oct 25, 2024
a6971d3
Add the script to calculate instance number
yutongzhang-microsoft Oct 25, 2024
6a536b3
Merge branch 'sonic-net:master' into yutongzhang/pr_impacted_area
yutongzhang-microsoft Oct 31, 2024
c8df319
test new template
yutongzhang-microsoft Oct 31, 2024
e2a7b90
Add comments
yutongzhang-microsoft Oct 31, 2024
0c5a270
test
yutongzhang-microsoft Oct 31, 2024
48d07b6
test whole scope
yutongzhang-microsoft Oct 31, 2024
01d4f1d
comment out
yutongzhang-microsoft Oct 31, 2024
d6f0f1f
test script
yutongzhang-microsoft Nov 1, 2024
14f39fa
test
yutongzhang-microsoft Nov 1, 2024
7c1a565
test
yutongzhang-microsoft Nov 1, 2024
993510e
Merge branch 'sonic-net:master' into yutongzhang/pr_impacted_area
yutongzhang-microsoft Nov 20, 2024
4fd2f19
fix the logic -- get impacted area
yutongzhang-microsoft Nov 21, 2024
0013342
Use template
yutongzhang-microsoft Nov 21, 2024
ff96ff5
fix
yutongzhang-microsoft Nov 21, 2024
0c4818d
Modify as comment
yutongzhang-microsoft Dec 4, 2024
614b700
Merge branch 'master' into yutongzhang/pr_impacted_area
yutongzhang-microsoft Dec 4, 2024
7f54427
test access token
yutongzhang-microsoft Dec 6, 2024
91a8002
Merge remote-tracking branch 'origin/master'
yutongzhang-microsoft Dec 9, 2024
a28903f
Merge remote-tracking branch 'origin/master'
yutongzhang-microsoft Dec 9, 2024
fe511fe
modify
yutongzhang-microsoft Dec 9, 2024
dd02492
Merge remote-tracking branch 'origin/master'
yutongzhang-microsoft Dec 12, 2024
2e8d8c4
Using script instead of AzureCli
yutongzhang-microsoft Dec 12, 2024
5f06dca
Merge remote-tracking branch 'origin/master'
yutongzhang-microsoft Dec 24, 2024
93074e1
Add $ to prevent partial matches
yutongzhang-microsoft Dec 24, 2024
2b54633
Fail the pipeline if any issues occur
yutongzhang-microsoft Dec 24, 2024
ae477d1
Add some try...except
yutongzhang-microsoft Dec 24, 2024
0e9fffa
Use fstring
yutongzhang-microsoft Dec 24, 2024
92cf002
Merge remote-tracking branch 'origin/master'
yutongzhang-microsoft Dec 25, 2024
27892c1
Unblock current PR testing
yutongzhang-microsoft Dec 25, 2024
c6be95b
Add condition in query
yutongzhang-microsoft Dec 25, 2024
c7ee5bd
Merge remote-tracking branch 'origin/master'
yutongzhang-microsoft Dec 26, 2024
bb90ffa
Modify query to get real entry
yutongzhang-microsoft Dec 26, 2024
a4fe6c1
Modify
yutongzhang-microsoft Dec 27, 2024
3f80e54
Add stop on failure
yutongzhang-microsoft Dec 29, 2024
16527f4
New git command
yutongzhang-microsoft Dec 29, 2024
6065d75
New git command
yutongzhang-microsoft Dec 30, 2024
e028618
Merge remote-tracking branch 'origin/master'
yutongzhang-microsoft Dec 30, 2024
e158c2a
New git command
yutongzhang-microsoft Dec 30, 2024
9abad0c
New git command
yutongzhang-microsoft Dec 30, 2024
9b2a127
trailing-whitespace
yutongzhang-microsoft Dec 30, 2024
2aa919c
test
yutongzhang-microsoft Dec 30, 2024
85dc5c1
test
yutongzhang-microsoft Dec 30, 2024
ff0ee30
test new command
yutongzhang-microsoft Dec 30, 2024
4315251
uncomment
yutongzhang-microsoft Dec 30, 2024
c94f1dd
Modify
yutongzhang-microsoft Dec 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
parameters:
- name: TOPOLOGY
type: string
default: ""

- name: BUILD_BRANCH
type: string
default: ""

steps:
- script: |
set -x

sudo apt-get update && sudo apt-get install -y jq

TEST_SCRIPTS=$(echo '$(TEST_SCRIPTS)' | jq -r -c '."${{ parameters.TOPOLOGY }}"')

if [[ $? -ne 0 ]]; then
echo "##vso[task.complete result=Failed;]Get test scripts of specfic topology fails."
exit 1
fi

SCRIPTS=$(echo "$TEST_SCRIPTS" | jq -r '. | join(",")')
echo -n "##vso[task.setvariable variable=SCRIPTS]$SCRIPTS"
displayName: "Get ${{ parameters.TOPOLOGY }} test scripts"

- script: |
set -x

# Check if azure cli is installed. If not, try to install it
if ! command -v az; then
echo "Azure CLI is not installed. Trying to install it..."

echo "Get packages needed for the installation process"
sudo apt-get -o DPkg::Lock::Timeout=600 update
sudo apt-get -o DPkg::Lock::Timeout=600 -y install apt-transport-https ca-certificates curl gnupg lsb-release

echo "Download and install the Microsoft signing key"
sudo mkdir -p /etc/apt/keyrings
curl -sLS https://packages.microsoft.com/keys/microsoft.asc |
gpg --dearmor | sudo tee /etc/apt/keyrings/microsoft.gpg > /dev/null
sudo chmod go+r /etc/apt/keyrings/microsoft.gpg

echo "Add the Azure CLI software repository"
AZ_DIST=$(lsb_release -cs)
echo "Types: deb
URIs: https://packages.microsoft.com/repos/azure-cli/
Suites: ${AZ_DIST}
Components: main
Architectures: $(dpkg --print-architecture)
Signed-by: /etc/apt/keyrings/microsoft.gpg" | sudo tee /etc/apt/sources.list.d/azure-cli.sources

echo "Update repository information and install the azure-cli package"
sudo apt-get -o DPkg::Lock::Timeout=600 update
sudo apt-get -o DPkg::Lock::Timeout=600 -y install azure-cli
else
echo "Azure CLI is already installed"
fi
displayName: "Install azure-cli"

- script: |
set -x

pip install azure-kusto-data
pip install azure-kusto-data azure-identity

INSTANCE_NUMBER=$(python ./.azure-pipelines/impacted_area_testing/calculate_instance_number.py --scripts $(SCRIPTS) --topology ${{ parameters.TOPOLOGY }} --branch ${{ parameters.BUILD_BRANCH }})

if [[ $? -ne 0 ]]; then
echo "##vso[task.complete result=Failed;]Get instances number fails."
exit 1
fi

echo "$INSTANCE_NUMBER"
echo -n "##vso[task.setvariable variable=INSTANCE_NUMBER]$INSTANCE_NUMBER"
displayName: "Calculate instance number"
134 changes: 134 additions & 0 deletions .azure-pipelines/impacted_area_testing/calculate_instance_number.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
import os
import argparse
import math
from constant import PR_CHECKER_TOPOLOGY_NAME, MAX_INSTANCE_NUMBER, MAX_GET_TOKEN_RETRY_TIMES
from azure.kusto.data import KustoConnectionStringBuilder, KustoClient


def parse_list_from_str(s):
# Since Azure Pipeline doesn't support to receive an empty parameter,
# We use ' ' as a magic code for empty parameter.
# So we should consider ' ' as en empty input.
if isinstance(s, str):
s = s.strip()
if not s:
return None
return [single_str.strip()
for single_str in s.split(',')
if single_str.strip()]


def get_access_token():
managed_identity_id = os.environ.get("SONIC_AUTOMATION_UMI")

# 1. Run az login with re-try
az_login_cmd = f"az login --identity --username {managed_identity_id}"
az_login_attempts = 0
while az_login_attempts < MAX_GET_TOKEN_RETRY_TIMES:
try:
result = os.popen(az_login_cmd)
result.read()
break
except Exception as exception:
az_login_attempts += 1
raise Exception(
f"Failed to az login with exception: {repr(exception)}. "
f"Retry {MAX_GET_TOKEN_RETRY_TIMES - az_login_attempts} times to login."
)

# If az login failed, return with exception
if az_login_attempts >= MAX_GET_TOKEN_RETRY_TIMES:
raise Exception(f"Failed to az login after {MAX_GET_TOKEN_RETRY_TIMES} attempts.")

# 2. Get access token with re-try
get_token_cmd = "az account get-access-token --resource https://api.kusto.windows.net --query accessToken -o tsv"
get_token_attempts = 0
while get_token_attempts < MAX_GET_TOKEN_RETRY_TIMES:
try:
result = os.popen(get_token_cmd)
access_token = result.read()
if not access_token:
raise Exception("Parse token from stdout failed, accessToken is None.")

return access_token

except Exception as exception:
get_token_attempts += 1
raise Exception(f"Failed to get token with exception: {repr(exception)}.")

# If az get token failed, return with exception
if get_token_attempts >= MAX_GET_TOKEN_RETRY_TIMES:
raise Exception(f"Failed to get token after {MAX_GET_TOKEN_RETRY_TIMES} attempts")


def main(scripts, topology, branch):
ingest_cluster = os.getenv("TEST_REPORT_QUERY_KUSTO_CLUSTER_BACKUP")
access_token = get_access_token()

if not ingest_cluster or not access_token:
raise RuntimeError(
"Could not load Kusto Credentials from environment")

try:
kcsb = KustoConnectionStringBuilder.with_aad_application_token_authentication(ingest_cluster,
access_token) # noqa F841
client = KustoClient(kcsb)
except Exception as e:
raise Exception("Connect to kusto fails, error {}".format(e))

Comment on lines +22 to +78
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If query from kusto fail, post action will be blocked, right? So, suggest to enhance it with setting default instance_num for the calculate_instance_number task.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have set the default instance number MAX_INSTANCE_NUMBER

scripts = parse_list_from_str(scripts)

scripts_running_time = {}
total_running_time = 0

for script in scripts:
# As baseline test is the universal set of PR test
# we get the historical running time of one script here
# We get recent 5 test plans and calculate the average running time
query = "V2TestCases " \
"| join kind=inner" \
"(TestPlans " \
"| where TestPlanType == 'PR' and Result == 'FINISHED' " \
f"and Topology == '{PR_CHECKER_TOPOLOGY_NAME[topology][0]}' " \
f"and TestBranch == '{branch}' and TestPlanName contains '{PR_CHECKER_TOPOLOGY_NAME[topology][1]}' " \
"and TestPlanName contains '_BaselineTest_'" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Baseline only applied on master branch, it would cause empty testplans for other branches, so I suggest to filter recent 5 success testcases, not success testplans

Copy link
Copy Markdown
Contributor Author

@yutongzhang-microsoft yutongzhang-microsoft Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In table V2TestCases, there is too little information. Here, we must filiter out the test case which has the same branch and topology. To get these information, we must join the table TestPlans.

Copy link
Copy Markdown
Contributor

@xwjiang-ms xwjiang-ms Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you are not clear with my points, I have 2 points here:

  1. baseline only applied for master branch, so your query would not get any result for other branches like 202405, so I suggest to remove condition TestPlanName contains 'BaselineTest'
  2. I agree with we must join testplans table, but in your query, if recent 5 test results are "failure", it's not accurate, so I suggest to add another filter "where result !in ("failure", "error") in V2TestCases table

Copy link
Copy Markdown
Contributor Author

@yutongzhang-microsoft yutongzhang-microsoft Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first question,
We have many other testplans except for baseline test. Here, we need to use the result of baseline test, so the condition is necessary. And according to our design, we will get the result from our baseline test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second one, the Result == 'FINISHED' for testplan can't represent the success of all test scipts in this testplan?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Dose TestPlanType == 'PR' and CreatedByType == 'PR' not enough to filter them? I suppose there won't be much difference, or you can limit results with a time range. Impact area feature would extend to other branches even internal, baseline may not cover such a range of branches, it's only a matter of time to make this change.
  2. testplan result == 'FINISHED' doesn't mean testcase result not in ("failure", "error"), PR test would retry, baseline would not stop in failure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Impact area feature would extend to other branches even internal, baseline may not cover such a range of branches, it's only a matter of time to m
  1. No, there are other kinds of PR testing, like created by pull request in sonic-mgmt/sonic-buildimage, and some manually triggered PRs.
  2. I will add this condition.

"| order by UploadTime desc | take 5) on TestPlanId " \
f"| where FilePath == '{script}' " \
"| where Result !in ('failure', 'error') " \
"| summarize sum(Runtime)"

try:
response = client.execute("SonicTestData", query)
except Exception as e:
raise Exception("Query results from Kusto fails, error {}".format(e))

for row in response.primary_results[0]:
# We have obtained the results of the most recent five times.
# To get the result for a single time, we need to divide by five
# If response.primary_results is None, which means where is no historical data in Kusto,
# we will use the default 1800s for a script.
average_running_time = row["sum_Runtime"] / 5
# There is no relevant records in Kusto
if average_running_time == 0:
average_running_time = 1800

total_running_time += average_running_time
scripts_running_time[script] = average_running_time
# Total running time is calculated by seconds, divide by 60 to get minutes
# For one instance, we plan to assign 90 minutes to run test scripts
# Obtain the number of instances by rounding up the calculation.
# To prevent unexpected situations, we set the maximum number of instance
print(min(math.ceil(total_running_time / 60 / 90), MAX_INSTANCE_NUMBER))


if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--topology", help="The topology of testplan", type=str, default="")
parser.add_argument("--scripts", help="Test scripts to be executed", type=str, default="")
parser.add_argument("--branch", help="Test branch", type=str, default="")
args = parser.parse_args()

scripts = args.scripts
topology = args.topology
branch = args.branch
main(scripts, topology, branch)
28 changes: 28 additions & 0 deletions .azure-pipelines/impacted_area_testing/constant.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Now, we only have below types of PR checker
# - dpu
# - dualtor-t0
# - multi-asic-t1-lag
# - t0
# - t0-2vlans
# - t0-sonic
# - t1- lag
PR_TOPOLOGY_TYPE = ["t0", "t0-2vlans", "t0-sonic", "t1", "t1-multi-asic", "dpu", "dualtor"]

EXCLUDE_TEST_SCRIPTS = [
"test_posttest.py",
"test_pretest.py"
]

# The mapping of topology type in PR test and topology recorded in kusto and the name of PR test.
PR_CHECKER_TOPOLOGY_NAME = {
"t0": ["t0", "_kvmtest-t0_"],
"t0-2vlans": ["t0", "_kvmtest-t0-2vlans_"],
"t0-sonic": ["t0-64-32", "_kvmtest-t0-sonic_"],
"t1": ["t1-lag", "_kvmtest-t1-lag_"],
"t1-multi-asic": ["t1-8-lag", "_kvmtest-multi-asic-t1-lag_"],
"dpu": ["dpu", "_kvmtest-dpu_"],
"dualtor": ["dualtor", "_kvmtest-dualtor-t0_"]
}

MAX_INSTANCE_NUMBER = 25
MAX_GET_TOKEN_RETRY_TIMES = 3
76 changes: 76 additions & 0 deletions .azure-pipelines/impacted_area_testing/get-impacted-area.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
steps:
- script: |
set -x

DIFF_FOLDERS=$(git diff HEAD^ HEAD --name-only | xargs -n1 dirname | sort -u | tr '\n' ' ')

if [[ $? -ne 0 ]]; then
echo "##vso[task.complete result=Failed;]Get diff folders fails."
exit 1
else
echo -n "##vso[task.setvariable variable=DIFF_FOLDERS]$DIFF_FOLDERS"
fi
continueOnError: false
displayName: "Get diff folders"

- script: |
set -x

pip install PyYAML
pip install natsort

sudo apt-get install -y jq

FINAL_FEATURES=""
IFS=' ' read -ra FEATURES_LIST <<< "$(DIFF_FOLDERS)"
for FEATURE in "${FEATURES_LIST[@]}"
do
# If changes contains the common part in tests folder,the scope of PR testing is all test scripts.
if [[ "$FEATURE" == *tests/common* ]]; then
FINAL_FEATURES=""
break

# If changes only limited to specific feature, the scope of PR testing is impacted area.
elif [[ "$FEATURE" =~ tests\/* ]]; then
# Cut the feature path
if [[ $FEATURE == */*/* ]]; then
FEATURE=$(echo "$FEATURE" | cut -d'/' -f1-2)
fi

if [[ -z "$FINAL_FEATURES" ]]; then
FINAL_FEATURES="${FEATURE#tests/}"
else
FINAL_FEATURES="$FINAL_FEATURES,${FEATURE#tests/}"
fi

# If changes related to other folders excpet tests, we also consider them as common part.
# The scope of PR testing is all test scripts.
else
FINAL_FEATURES=""
break
fi
done

if [[ -z "$FINAL_FEATURES" ]]; then
TEST_SCRIPTS=$(python ./.azure-pipelines/impacted_area_testing/get_test_scripts.py --features "" --location tests)
else
TEST_SCRIPTS=$(python ./.azure-pipelines/impacted_area_testing/get_test_scripts.py --features ${FINAL_FEATURES} --location tests)
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here do not need if-else

Copy link
Copy Markdown
Contributor Author

@yutongzhang-microsoft yutongzhang-microsoft Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if [[ -z "$FINAL_FEATURES" ]]; doesn't mean $FINAL_FEATURES is "". If we remove the if-else here, and change the code to

TEST_SCRIPTS=$(python ./.azure-pipelines/impacted_area_testing/get_test_scripts.py --features ${FINAL_FEATURES} --location tests)

We will get below error:

python ./.azure-pipelines/impacted_area_testing/get_test_scripts.py --features --location tests
usage: get_test_scripts.py [-h] [--features FEATURES] [--location LOCATION]
get_test_scripts.py: error: argument --features: expected one argument

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you can add python parse params to support no value or None value passing.

nargs="?", 
const="", 

Such as test_plan.py:
image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this change can be applied in our second stage of rolling out.
In the first stage, we only focus on the changes which only modify the feature folders.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I will modify the code for the first stage of rolling out.


if [[ $? -ne 0 ]]; then
echo "##vso[task.complete result=Failed;]Get test scripts fails."
exit 1
fi

PR_CHECKERS=$(echo "${TEST_SCRIPTS}" | jq -c 'keys')

if [[ $? -ne 0 ]]; then
echo "##vso[task.complete result=Failed;]Get valid PR checkers fails."
exit 1
fi

echo "##vso[task.setvariable variable=PR_CHECKERS;isOutput=true]$PR_CHECKERS"
echo "##vso[task.setvariable variable=TEST_SCRIPTS;isOutput=true]$TEST_SCRIPTS"
name: SetVariableTask
continueOnError: false
displayName: "Get impacted area"
Loading