feat: Infrastructure and CI/CD for test environment#63
feat: Infrastructure and CI/CD for test environment#63
Conversation
- Create Pennie.sln solution file for project organization - Add tests/PennieBot.Tests.csproj with xUnit, FluentAssertions, Moq - Extract helper methods to bot/Helpers/MeetingHelpers.cs for testability - Add 51 unit tests for meeting ID parsing, passcode extraction, @mention stripping - Update MediaBot.cs to use MeetingHelpers class - Add InternalsVisibleTo for test project access Addresses #6, #33 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add appsettings.local.json to .gitignore for developer secrets - Create appsettings.local.json.template with documented placeholders - Add Properties/launchSettings.json for IDE launch profiles (http/https) - Update Program.cs to load optional appsettings.local.json Addresses #61 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Update infra/main.parameters.test.json with KnowAll DevOps project - Create bot/teams-manifest/manifest.test.json with purple accent (#9C27B0) - Create bot/appsettings.Test.json for test environment settings - Test environment uses separate AI Foundry hub and resource group Addresses #59 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Infrastructure: - Add Spot VM parameters to windows-vm.bicep (useSpotVM, evictionPolicy, maxPrice) - Add auto-shutdown schedule resource for cost savings - Create windows-vm.parameters.test.json for test environment CI/CD: - Update deploy.yml with tag-based production releases (v*.*.*) - Add set-environment job to determine target environment - Update test.yml to run actual unit tests from Pennie.sln Documentation: - Update TESTING.adoc with detailed Test Environment section - Document Spot VM behavior, cost savings, and usage commands Cost savings: Test environment ~$10-15/month vs ~$70-90/month (85% reduction) Addresses #59, #60 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Quick start guide for building and running locally - Project structure documentation - Configuration hierarchy explanation - Development workflow (branches, PRs, CI/CD) - Local testing options (dev tunnel, Bot Framework Emulator) - Troubleshooting common issues - Links to related documentation Addresses #61 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove Azure Key Vault integration from Program.cs (use GitHub Secrets) - Remove AZURE_KEY_VAULT_NAME from appsettings.json - Remove keyVaultName parameter from windows-vm.bicep module - Remove Key Vault role assignment for VM managed identity - Update parameter files to remove keyVaultName Secrets will be managed via GitHub Secrets and set as environment variables during deployment, which is simpler for small teams. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove Key Vault references from scripts (use GitHub Secrets) - Update setup-bot-app-registration.sh to output gh commands - Remove .env file updates from scripts (use GitHub Secrets) - Update CLAUDE.md security principles - Update .env.example with TEAMS_APP_ID/PASSWORD - Delete zip files and publish folders from repo - Add src/*.zip to .gitignore - Fix repository URL in agent-config.json 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The deploy-bot-to-vm.ps1 script has optional Key Vault support and will work without KeyVaultName parameter. Credentials are managed via appsettings.json backup/restore during deployment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove appsettings.json modification step from deploy-bot.sh - Remove KeyVaultName parameter from deploy-bot-remote.sh - Credentials now managed via GitHub Secrets 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Replace Key Vault credential lookup with environment variables (TEAMS_APP_ID, TEAMS_APP_PASSWORD) which are set via GitHub Secrets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add scripts overview table to DEPLOYMENT.adoc Appendix A - Update Secrets Management section for GitHub Secrets - Update component status (Key Vault -> GitHub Secrets) - Update setup-bot-app-registration.sh description - Remove AZURE_KEY_VAULT_NAME from bot/README.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove Key Vault from .env example and configuration sections - Update status to reflect GitHub Secrets management - Remove "Phase 1/2" terminology (just document current state) - Update completion status section with checkmarks - Simplify appsettings.json example 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Clear production URLs/FQDNs from appsettings.json (addresses #34) - Update deploy-bot.sh to inject config via appsettings.Production.json - Remove reliance on VM backup for configuration - Add --env parameter to setup-bot-app-registration.sh for test/prod - Update DEPLOYMENT.adoc with environment-specific documentation Configuration is now injected at deployment time: 1. Script queries VM FQDN from Azure 2. Creates appsettings.Production.json on VM with correct values 3. .NET configuration hierarchy loads: base -> Production -> env vars 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove Key Vault module from main.bicep (using GitHub Secrets instead) - Remove teamsAppId parameter (now in GitHub Secrets) - Fix cross-scope role assignment in windows-vm.bicep (document CLI workaround) - Update main.parameters.test.json to remove Key Vault reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
teamsAppId is now stored in GitHub Secrets and not passed to Bicep (Key Vault module was removed from infrastructure) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… to AsciiDoc - Change deploy-infrastructure condition from 'if: false' to check AZURE_DEPLOYMENT_ENABLED - This allows test environment to deploy infrastructure while prod remains protected - Convert BRAND_GUIDE.md to BRAND_GUIDE.adoc (AsciiDoc format) - AZURE_DEPLOYMENT_ENABLED is now set per environment (test=true, prod=false) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add section explaining required Azure role assignments for GitHub Actions - Document Contributor and Storage Blob Data Contributor roles - Include concrete example commands for test environment setup - Note about role propagation timing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…ditions GitHub Actions job-level 'if:' conditions are evaluated before environment scope is applied. This fix: - Sets environment context on set-environment job - Outputs AZURE_DEPLOYMENT_ENABLED from environment scope - Other jobs use the output via needs.set-environment.outputs.deployment_enabled This enables per-environment control of deployment (test=true, prod=false). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…ions - Remove subscription-level targetScope from main.bicep - Resource groups must now be pre-created (one-time setup per environment) - Update workflow to deploy at resource group scope - Remove resourceGroupName and location params (use resource group defaults) - Create main.parameters.prod.json for prod environment - Update docs with resource group creation prerequisites This follows principle of least privilege - GitHub Actions only needs Contributor on the resource group, not the entire subscription. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add deployAiServices parameter (default: true) to main.bicep - Set deployAiServices=false in test parameters (test uses prod AI) - Fix storage account name exceeding 24 char limit using take() - Update outputs to handle conditional AI services deployment Fixes GPT-4o SKU availability issue in UK South for test environment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Explain why certain steps cannot be automated in CI/CD - Document security concerns (Global Admin for consent) - Add per-environment setup checklist (6 steps) - Include Teams manifest creation for test environment - Link to existing setup-bot-app-registration.sh script 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add deployVM parameter to conditionally deploy Windows VM - Add useSpotVM parameter to use Azure Spot VMs for cost savings - Test environment now deploys a Spot VM (60-80% cheaper) - Spot VMs can be evicted by Azure when capacity is needed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Changed from param() block to $args[0] for Azure CLI parameter passing - Added verbose logging to debug deployment issues - Fixed TLS protocol and added -UseBasicParsing for Invoke-WebRequest - Added file listing after extraction to verify deployment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The $args[0] approach doesn't work with az vm run-command invoke. Now using double-quoted here-string to embed the URL directly into the script before sending to the VM. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Double quotes were being stripped during here-string expansion, causing PowerShell to interpret paths as commands. Using single quotes instead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Single quotes inside here-string don't expand PowerShell variables. The PackageUrl variable needs double quotes to get the actual URL. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The ACME HTTP-01 challenge was failing because the Windows Firewall was blocking inbound connections on port 80. The Azure NSG had the rule, but the VM's Windows Firewall didn't. Added: New-NetFirewallRule for port 80 before running win-acme 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Check common Chocolatey paths before falling back to PATH - Use GitHub mirror for NSSM download (nssm.cc was returning 502) - Add fallback to nssm.cc if GitHub fails - Use $nssmExe variable throughout instead of relying on PATH Fixes service installation failure when NSSM was installed via Chocolatey but not visible in run-command's PATH environment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Documented the issue where `dotnet build --output` overwrites appsettings.json with the source version (empty placeholders), losing all configured secrets. Added fix: backup before build and restore after, plus correct script execution order. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🔍 Test Bot Investigation Status (Dec 5, 2025)Current IssueThe test bot Key Discovery: Production vs Test Credential MysteryProduction Bot (
Test Bot (
Error Progression
What We've Tried
Configuration Verified
The MysteryProduction bot works with empty credentials, meaning it somehow:
But the test bot with identical empty credentials fails on incoming token validation. What Still Needs Investigation
Recommended Next Steps
Files Modified
Commands for Continuation# Check test bot logs
AZURE_RESOURCE_GROUP=TMinus15Agents-Test VM_NAME=pennie-vm-test ./scripts/bot-logs.sh
# Test bot via Direct Line
BOT_NAME=pennie-bot-test AZURE_RESOURCE_GROUP=TMinus15Agents-Test ./tests/bot-direct-line.sh "Hello" 20
# Update test VM credentials
az vm run-command invoke --resource-group TMinus15Agents-Test --name pennie-vm-test \
--command-id RunPowerShellScript --scripts '<PowerShell to update appsettings.json>' |
- Add workflow step to grant VM managed identity "Cognitive Services OpenAI User" role for Azure OpenAI access - Add troubleshooting entry for InternalServerError caused by missing RBAC permissions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
PR Review: Infrastructure and CI/CD for Test EnvironmentThis is a substantial PR that sets up a complete test environment and CI/CD pipeline. Overall, the implementation is solid, but there are several security concerns, potential bugs, and areas for improvement. Critical Issues1. Secret Exposure Risk in Deployment Script (.github/workflows/deploy.yml:345-355)Severity: High The Base64-encoded secrets are being logged. While you've added add-mask for the original values, the Base64-encoded versions are NOT masked. An attacker could potentially decode these from logs. Recommendation: Remove these debug logging statements or ensure Base64 values are also masked. 2. PFX Password Stored in Plain Text (scripts/configure-ssl.ps1:34-35)Severity: High The PFX password is stored in a plain text file at C:\Pennie\certs\pfx-password.txt. Any user with access to the VM can read this password and extract the private key. Recommendation: Use Windows Data Protection API (DPAPI) to encrypt the password file, or store the password in Azure Key Vault, or use certificate stores without exporting PFX files. 3. Anonymous Backend Authentication (CLAUDE.md:39)Per CLAUDE.md: "Anonymous authentication (no API keys)" for the Azure Functions backend. This means anyone who discovers the backend URL can call your Azure DevOps integration functions. Recommendation: Add function-level authentication using function keys or managed identity. Medium Priority Issues4. Error Handling in Deployment Steps (.github/workflows/deploy.yml:338-445)The 7-step VM deployment process doesn't have robust error handling. If step 3 fails, steps 4-7 will still run with potentially corrupt state. Recommendation: Add error checking between steps. 5. Certificate Renewal Task (scripts/configure-ssl.ps1:194)The scheduled task runs renewal daily, but doesn't restart the bot service after renewal. The bot will continue using the old certificate until manually restarted. Recommendation: Add a post-renewal script that restarts the PennieBot service. 6. Hardcoded Timeout Values (bot/Helpers/MeetingHelpers.cs:11)100ms regex timeout might be too aggressive for complex but legitimate inputs. Consider making this configurable or increasing to 500ms-1s. 7. VM FQDN Resolution Inconsistency (.github/workflows/deploy.yml:387-392)Good error handling in SSL step, but the deploy-bot step uses echo empty string and continues. Inconsistent behavior. Positive HighlightsExcellent Code Organization
Comprehensive Unit Tests (tests/Helpers/MeetingHelpersTests.cs)
Security-Conscious Regex Usage
Spot VM for Cost Savings
Environment Separation (.github/workflows/deploy.yml:30-73)
Comprehensive Deployment Steps
Recommendations & Best Practices9. Add Rollback CapabilityThe notification step mentions rollback but doesn't implement it. Consider storing previous deployment artifact in Azure Storage or adding a manual rollback workflow. 10. Improve Health Check RobustnessThe health check should also verify: Bot can authenticate with Azure OpenAI, Bot can reach Azure Functions backend, Certificate is valid and not self-signed. 11. Add Integration TestsCurrently only endpoint connectivity tests exist. Consider adding: Test Teams message sending/receiving, Test meeting join flow, Test Azure DevOps work item creation. 12. Documentation: Spot VM LimitationsSpot VMs can be evicted with 30 seconds notice. Document this risk in DEPLOYMENT.adoc. 13. Consider Using Managed CertificatesInstead of Let's Encrypt + win-acme, consider Azure App Service Managed Certificates or Azure Key Vault certificates with auto-rotation. Test Coverage AssessmentGood Coverage:
Missing Coverage:
Minor Issues15. Magic Numbers (bot/Helpers/MeetingHelpers.cs:56)Add a comment explaining why 9-29: "Teams meeting IDs are 10-30 characters including spaces" 16. Inconsistent Error Handling (scripts/configure-bot-settings.ps1)Lines 48-65 have good null/empty checks, but lines 104-112 assume Azure OpenAI values are valid without validation. Summary
Overall Assessment: This is a well-structured PR with excellent code organization and comprehensive testing for the helpers. However, the security concerns around secret handling and certificate storage must be addressed before merging to production. The test environment deployment is safe to merge after addressing the secret logging issue. Recommendation:
Actionable Next Steps
Great work on this comprehensive infrastructure setup! The code quality is high, especially the refactoring and testing. Address the security concerns and this will be production-ready. |
Summary
TMinus15Agents-Test) with separate infrastructureIssues Addressed
Fixes #60 - CI/CD workflow with Git tag releases
main→ Deploy to test (when AZURE_DEPLOYMENT_ENABLED=true)v*.*.*→ Auto-deploy to productionworkflow_dispatch→ Manual deploy to test or prodFixes #59 - Set up test environment (TMinus15Agents-Test)
manifest.test.json) with purple accent (#9C27B0)main.parameters.test.json)Does NOT fix (out of scope)
Still TODO before merge
appsettings.jsonon test VM (Teams credentials, etc.)Key Changes
.github/workflows/deploy.yml- Full CI/CD pipeline with 6-step VM deploymentinfra/main.parameters.test.json- Test environment with Spot VMinfra/main.bicep- Optional AI services and VM deploymentbot/teams-manifest/manifest.test.json- Test Teams app manifestTest Plan
workflow_dispatchto test environment🤖 Generated with Claude Code