Skip to content
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 48 additions & 3 deletions .github/workflows/generate-release-notes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,38 @@ jobs:
"https://github.com/${GITHUB_REPO_OWNER}/${GITHUB_REPO_NAME}/releases/tag/${RELEASE_VERSION}" \
-o release_page.html

# Extract system prompt content from HTML
echo "Extracting system prompt content..."
SYSTEM_PROMPT=""

# Find the line number where system prompt starts
system_prompt_start=$(grep -n "<h2>system prompt</h2>" release_page.html | cut -d: -f1)

if [ -n "$system_prompt_start" ]; then
# Find the next h2 tag after system prompt
next_h2_line=$(tail -n +$((system_prompt_start + 1)) release_page.html | grep -n "<h2>" | head -1 | cut -d: -f1)

if [ -n "$next_h2_line" ]; then
# Calculate actual line number and extract content between the two h2 tags
system_prompt_end=$((system_prompt_start + next_h2_line))
system_prompt_raw=$(sed -n "${system_prompt_start},$((system_prompt_end - 1))p" release_page.html)
else
# If no next h2 found, extract till end of file
system_prompt_raw=$(tail -n +${system_prompt_start} release_page.html)
fi

# Clean up HTML tags and decode entities, store in variable
SYSTEM_PROMPT=$(echo "$system_prompt_raw" | \
sed 's/&lt;/</g; s/&gt;/>/g; s/&amp;/\&/g; s/&quot;/"/g; s/&#39;/'"'"'/g' | \
sed '/^$/d' | \
sed '1d') # Remove the first line (system prompt header)

echo "System Prompt Content:"
echo "$SYSTEM_PROMPT"
else
echo "No system prompt found in the release page."
fi
Comment on lines 69 to 101
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应使用更健壮的方法来解析HTML内容,而不是依赖grep和sed。

🟡 Major | 🧹 Code Smells

📋 问题详情

当前使用grep和sed来提取HTML中的内容容易出错且难以维护。如果HTML结构稍有变化,或者标签属性发生变化,这种硬编码的解析方式将失效。此外,使用shell命令处理HTML内容也不够安全,容易受到注入攻击。建议使用专门的HTML解析工具或库来处理HTML内容,以提高代码的健壮性和安全性。

💡 解决方案

考虑使用Python脚本配合BeautifulSoup库来解析HTML内容,这样可以更准确地定位和提取所需信息。例如:

-          # Extract system prompt content from HTML
-          echo "Extracting system prompt content..."
-          SYSTEM_PROMPT=""
-
-          # Find the line number where system prompt starts
-          system_prompt_start=$(grep -n "<h2>system prompt</h2>" release_page.html | cut -d: -f1)
-
-          if [ -n "$system_prompt_start" ]; then
-              # Find the next h2 tag after system prompt
-              next_h2_line=$(tail -n +$((system_prompt_start + 1)) release_page.html | grep -n "<h2>" | head -1 | cut -d: -f1)
-
-              if [ -n "$next_h2_line" ]; then
-                  # Calculate actual line number and extract content between the two h2 tags
-                  system_prompt_end=$((system_prompt_start + next_h2_line))
-                  system_prompt_raw=$(sed -n "${system_prompt_start},$((system_prompt_end - 1))p" release_page.html)
-              else
-                  # If no next h2 found, extract till end of file
-                  system_prompt_raw=$(tail -n +${system_prompt_start} release_page.html)
-              fi
-
-              # Clean up HTML tags and decode entities, store in variable
-              SYSTEM_PROMPT=$(echo "$system_prompt_raw" | \
-                  sed 's/&lt;/</g; s/&gt;/>/g; s/&amp;/\&/g; s/&quot;/"/g; s/&#39;/'"'"'/g' | \
-                  sed '/^$/d' | \
-                  sed '1d')  # Remove the first line (system prompt header)
-
-              echo "System Prompt Content:"
-              echo "$SYSTEM_PROMPT"
-          else
-              echo "No system prompt found in the release page."
-          fi
+          # Extract system prompt content from HTML using Python
+          echo "Extracting system prompt content..."
+          SYSTEM_PROMPT=$(python3 -c "
+import sys
+from bs4 import BeautifulSoup
+
+with open('release_page.html', 'r') as f:
+    soup = BeautifulSoup(f, 'html.parser')
+
+system_prompt_header = soup.find('h2', string='system prompt')
+if system_prompt_header:
+    content = []
+    for sibling in system_prompt_header.next_siblings:
+        if sibling.name == 'h2':
+            break
+        content.append(str(sibling))
+    print(''.join(content).strip())
+else:
+    print('No system prompt found in the release page.')
+")


<!-- Suggestion code flag -->




---

> 您的反馈对我们很重要!(建议右键在新标签页中打开以下链接)

  [有用意见👍](https://ap-southeast-1.lingma-agents-api.aliyuncs.com/v1/code-platform/sessions/s-d392en4uo6cafirdek3g/feedback?suggestion_id=sg-5163edd6a7b8&feedback_type=helpful) | [无用意见👎](https://ap-southeast-1.lingma-agents-api.aliyuncs.com/v1/code-platform/sessions/s-d392en4uo6cafirdek3g/feedback?suggestion_id=sg-5163edd6a7b8&feedback_type=neutral) | [错误意见❌](https://ap-southeast-1.lingma-agents-api.aliyuncs.com/v1/code-platform/sessions/s-d392en4uo6cafirdek3g/feedback?suggestion_id=sg-5163edd6a7b8&feedback_type=misleading)



<!-- This is an auto-generated comment by LingmaAgent -->


echo "Extracting PR numbers from ${GITHUB_REPO_OWNER}/${GITHUB_REPO_NAME} release notes..."
PR_NUMS=$(cat release_page.html | grep -o "/${GITHUB_REPO_OWNER}/${GITHUB_REPO_NAME}/pull/[0-9]*" | grep -o "[0-9]*$" | sort -n | uniq | tr '\n' ',')
PR_NUMS=${PR_NUMS%,}
Expand All @@ -88,11 +120,24 @@ jobs:
cd higress-report-agent
pip install uv
uv sync

# Build command
CMD_ARGS="--mode 2 --choice 2 --pr_nums ${PR_NUMS}"
if [ -n "${IMPORTANT_PR_NUMS}" ]; then
uv run report_main.py --mode 2 --choice 2 --pr_nums ${PR_NUMS} --important_prs ${IMPORTANT_PR_NUMS}
else
uv run report_main.py --mode 2 --choice 2 --pr_nums ${PR_NUMS}
CMD_ARGS="${CMD_ARGS} --important_prs ${IMPORTANT_PR_NUMS}"
fi
if [ -n "${SYSTEM_PROMPT}" ]; then
echo "${SYSTEM_PROMPT}" > temp_system_prompt.txt
CMD_ARGS="${CMD_ARGS} --sys_prompt_file temp_system_prompt.txt"
fi

uv run report_main.py ${CMD_ARGS}

# Clean up temporary file
if [ -f "temp_system_prompt.txt" ]; then
rm temp_system_prompt.txt
fi
Comment on lines +138 to +141
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应确保在所有退出路径上都清理临时文件,以防止残留文件占用磁盘空间。

🟢 Minor | 🧹 Code Smells

📋 问题详情

目前只在脚本正常执行完毕后清理临时文件temp_system_prompt.txt。如果脚本在执行过程中因错误而提前退出,临时文件将不会被清理,从而占用磁盘空间。为了确保临时文件总是被清理,应该使用trap命令在脚本退出时自动清理临时文件。

💡 解决方案

使用trap命令在脚本退出时自动清理临时文件。

-          # Clean up temporary file
-          if [ -f "temp_system_prompt.txt" ]; then
-              rm temp_system_prompt.txt
-          fi
+          # Set trap to clean up temporary file on exit
+          trap 'if [ -f "temp_system_prompt.txt" ]; then rm temp_system_prompt.txt; fi' EXIT


<!-- Suggestion code flag -->




---

> 您的反馈对我们很重要!(建议右键在新标签页中打开以下链接)

  [有用意见👍](https://ap-southeast-1.lingma-agents-api.aliyuncs.com/v1/code-platform/sessions/s-d392en4uo6cafirdek3g/feedback?suggestion_id=sg-74a0788a7ef2&feedback_type=helpful) | [无用意见👎](https://ap-southeast-1.lingma-agents-api.aliyuncs.com/v1/code-platform/sessions/s-d392en4uo6cafirdek3g/feedback?suggestion_id=sg-74a0788a7ef2&feedback_type=neutral) | [错误意见❌](https://ap-southeast-1.lingma-agents-api.aliyuncs.com/v1/code-platform/sessions/s-d392en4uo6cafirdek3g/feedback?suggestion_id=sg-74a0788a7ef2&feedback_type=misleading)



<!-- This is an auto-generated comment by LingmaAgent -->


cp report.md ../
cp report.EN.md ../
cd ..
Expand Down