Add App Services Heartbeat with environment variable monitor#873
Add App Services Heartbeat with environment variable monitor#873
Conversation
…s) that will add some infrastructure-identifying properties. - Implements part of Option 1 in issue #868
…into WindowsServer. Update logic to ensure we keep heartbeat values updated. - Update logic in AzureWebAppRoleEnvironment to also signal AppServicesHeartbeat to update itself when changes are detected - Update AppServicesHeartbeat to expose an Instance variable to make it easier for update signals to be sent to it - Update CHANGELOG - Issue #868 discusses the need for the change to the AzureWebAppRoleEnvrionmeentInitializer
- make new AppServicesHeartbeat module public! - add try/catch scenarios around initialize - add further diagnostic messages where appropriate
- Use to keep the value of WEBSITE_SITE_NAME and other App Services values current during runime
…, update code for testability.
…erviceEnvVar monitor
…Server.Net45.Tests project
- Add heartbeat provider for Web Apps/App Services - Add monitor for App Services environment variables (which can change during the runtime of an AppServices app) - Ensure adequate testing is added, update current Env Var tests to be testable in a parallel environment
- Use env var monitor to check for udpates to environment in controlled/minimal fashion - default to 30 second interval for checking the changes to environment - corrections for stylecop
- PR suggestion
…nges with a subscriber model - Make env var monitor a singleton, and add an event handler for env var updates that consumers can subscribe to - Env var monitor updates according to its own timer - App services telemetry module no longer a singleton, and is no longer coupled to Web App role TelemetryInitializer - Web app role telemetry initializer only updates the environment variables when it has to now
…onment variable monitor - Make environment variable monitor an abstract class - App Services environment variable implements the abstract Environment variable monitor - Improves testability of the environment monitor - Handle proper disposal of the timer in environment monitor
- Tests center around the base class - No unit tests should be indeterminant
|
This is option B, option A is PR #870, and we can go with either PR for 2.6.0-beta3. |
- ensure tests read a special environment variable specific to each test - Ensure parallel tests do not impede ability to test this class (Now that we have a monitor in place, we need to make the tests get around it)
| /// <summary> | ||
| /// Initializes a new instance of the <see cref="EnvironmentVariableMonitor" /> class. | ||
| /// </summary> | ||
| public EnvironmentVariableMonitor(IEnumerable<string> envVars, TimeSpan checkInterval) |
There was a problem hiding this comment.
Is there any advantage to pass the initial envVars to this constructor?
I mean, the monitor can build up its own list of monitored environment variables via the "GetOrAdd" within GetCurrentEnvironmentVariableValue.
So, why not simplify it and only monitor those variables that are actually queried via the public API.
There was a problem hiding this comment.
Oh, I guess the difference is in the "OnEnvironmentVariableUpdated" call-back. You can get a call-back even for variables you didn't actually call GetCurrentEnvironmentVariableValue on.
There was a problem hiding this comment.
Yes, I am optimizing for the Azure App Services scenario here.
| /// <summary> | ||
| /// Initializes a new instance of the <see cref="EnvironmentVariableMonitor" /> class. | ||
| /// </summary> | ||
| public EnvironmentVariableMonitor(IEnumerable<string> envVars, TimeSpan checkInterval) |
There was a problem hiding this comment.
Constructor of an abstract class should be protected
There was a problem hiding this comment.
Yes of course, thanks for the catch.
…ableMonitor to be protected
Dmitry-Matveev
left a comment
There was a problem hiding this comment.
Overall, I like this PR more than #869. Couple comments and I think we can give it a go.
| { | ||
| try | ||
| { | ||
| var hbeatManager = this.GetHeartbeatPropertyManager(); |
There was a problem hiding this comment.
Minor: This will be called every time MonitoredAppServiceEnvVarUpdatedEvent fires, do we expect that GetHeartBeatPropertyManager() may yield new results across process lifetime or we can cache it?
No immediate need to cache it, though - it looks like operation is light enough due to insignificant amount of the telemetry modules we have in the list.
There was a problem hiding this comment.
Right now, we are pretty much guaranteed that the HeartbeatPropertyManager is present (since it is the DiagnosticsTelemetryModule itself). And for the lifetime of the app it will either be there or it won't, no change. I think I will change this to cache it.
There was a problem hiding this comment.
Please do cache. Otherwise we'll be logging 'AppServiceHeartbeatManagerNotAvailable' everytime this is called. (if hbeatmanager is not available)
d3r3kk: Done.
| this.CheckedValues.TryAdd(varName, Environment.GetEnvironmentVariable(varName)); | ||
| } | ||
|
|
||
| this.environmentCheckTimer = new Timer(this.CheckVariablesIntermittent, null, this.checkInterval, TimeSpan.FromMilliseconds(-1)); |
There was a problem hiding this comment.
If this.checkInterval can ever be 0 (i hope not), then there is a race condition possible - this.CheckVariablesIntermittent can be executed in its entirety before the assignment above happens, so this.environmentCheckTimer() will still be null. The last thing this.CheckVariablesIntermittent() does is to change this.environmentCheckTimer that may cause NullReference.
If I remember correctly, the right pattern was to create Timer like Timer(callback, null, Timeout.Infinite, Timeout.Infinite) and immediately follow up with timer.Change(this.checkInterval, TimeoutInfinite).
Again, I think this only applies if you may have really-really low value as this.checkInterval.
There was a problem hiding this comment.
Ok, I wasn't aware of that possibility. I will make the change to a 2-step initialization here.
There was a problem hiding this comment.
I think I will enforce a minimum check interval as well - 5 seconds.
| { | ||
| return Environment.GetEnvironmentVariable(WebAppHostNameEnvironmentVariable) ?? string.Empty; | ||
| string nodeName = string.Empty; | ||
| AppServiceEnvVarMonitor.Instance.GetCurrentEnvironmentVariableValue(this.WebAppHostNameEnvironmentVariable, ref nodeName); |
There was a problem hiding this comment.
Is this the quickest read possible out of ConcurrentDictionary (I see that internally it uses GetOrAdd()) - is there anything that reads faster than that for ConsurrentDictionary with no locks whatsoever?
We are OK if several items are still having old data, but having all ongoing requests/dependencies and other telemetry items to sync on read of that dictionary may lead to a bottleneck here.
There was a problem hiding this comment.
Good catch to call this out. However, I think I can put any fears to rest.
Notice that by the time this call is being made, the cached value from the environment has already been stored for this update cycle, the key is already in place. This all works together to ensure the GetOrAdd call is simply a TryGetValue call (that has no lock on it as you can see here).
However, in the scenario you suggest (in some future class that we decide to tax the EnvironmentVariableMonitor class as much as possible), it absolutely could be possible to create a nasty bottleneck.
Even then, as far as I am aware, GetOrAdd is the best, fastest way to deal with updates of this nature.
ConcurrentDictionary actually does the only thing I can think of that might be faster, breaking the problem down into a TryGetValue and then using a call to TryUpdateValue only when things have changed.
In the scenario that you site though, where everything is all being updated at once, we still have a bottleneck (albeit one that is handled very nicely by ConcurrentDictionary without us having to write and maintain any complex code).
There was a problem hiding this comment.
I probably could have made this shorter:
... We are OK if several items are still having old data...
This is exactly the path that we follow today.
- Update all environment variables in a loop, detecting any change.
- If we detected a change, notify subscribers to collect their new cached values, ensuring only
TryGetValueoperations on already tracked items.
But thanks for making me read into how ConcurrentDictionary works under the hood. I do feel better about using it now. I was worried it was locking things even in the case of reads!
There was a problem hiding this comment.
@d3r3kk I do not understand how your reply address @Dmitry-Matveev request to avoid any locks in happy path. Simple code of taking current values from local variable will work perfectly fine here
There was a problem hiding this comment.
Maybe I was diving too deep. I was investigating if there was a faster way to read from the ConcurrentDictionary whenever it was necessary to do so, I misunderstood the question regarding 'happy path'.
In the common case ('happy path'), we always read from the local variables:
From AzureWebAppRoleEnvironmentTelemetryInitializer.cs, line 45
public void Initialize(ITelemetry telemetry)
{
if (this.updateEnvVars) // <= this is set in the update callback when change is detected
{
this.roleName = this.GetRoleName();
this.nodeName = this.GetNodeName();
this.updateEnvVars = false; // <= this ensures we don't check again until another change
}
...
Only when a change in the environment is detected do we ever go and get the information from the monitor.
I have no regrets of going down the rabbit hole I did though, now I know more about ConcurrentDictionary!
- Using signed package based off of PR microsoft/ApplicationInsights-dotnet-server#873
| internal readonly KeyValuePair<string, string>[] WebHeartbeatPropertyNameEnvVarMap = new KeyValuePair<string, string>[] | ||
| { | ||
| new KeyValuePair<string, string>("appSrv_SiteName", "WEBSITE_SITE_NAME"), | ||
| new KeyValuePair<string, string>("appSrv_SiteName", "WEBSITE_SLOT_NAME"), |
There was a problem hiding this comment.
are those intentionally has the same key appSrv_SiteName?
There was a problem hiding this comment.
No, that was a mistake, thank you for catching this.
| { | ||
| return Environment.GetEnvironmentVariable(WebAppHostNameEnvironmentVariable) ?? string.Empty; | ||
| string nodeName = string.Empty; | ||
| AppServiceEnvVarMonitor.Instance.GetCurrentEnvironmentVariableValue(this.WebAppHostNameEnvironmentVariable, ref nodeName); |
There was a problem hiding this comment.
@d3r3kk I do not understand how your reply address @Dmitry-Matveev request to avoid any locks in happy path. Simple code of taking current values from local variable will work perfectly fine here
| public void Initialize(TelemetryConfiguration configuration) | ||
| { | ||
| this.UpdateHeartbeatWithAppServiceEnvVarValues(); | ||
| AppServiceEnvVarMonitor.Instance.MonitoredAppServiceEnvVarUpdatedEvent += this.UpdateHeartbeatWithAppServiceEnvVarValues; |
There was a problem hiding this comment.
You should only start this monitor if you detected that an application was actually deployed as Azure Web App. Otherwise the timer inside env var monitor will be firing and not do anything ever.
There was a problem hiding this comment.
Agreed. I am going to make an assumption that the presence of one of these environment variables means we are running under Azure App Services.
| /// intermittently. | ||
| /// </summary> | ||
| [EditorBrowsable(EditorBrowsableState.Never)] | ||
| internal sealed class AppServiceEnvVarMonitor : EnvironmentVariableMonitor |
There was a problem hiding this comment.
minor: C# guidance on naming says not to use shortened name like EnvVar.
There was a problem hiding this comment.
Will update this now, before we go stable. Consistency is good.
| { | ||
| var kvp = iter.Current; | ||
|
|
||
| string envValue = Environment.GetEnvironmentVariable(kvp.Key); |
There was a problem hiding this comment.
this may throw exception in partially trusted environment. Which in turn will throw out of timer callback. Throwing from timer callback is bad as CLR may decide that an app crashed due to unhandled exception
There was a problem hiding this comment.
Agreed. I capture the SecurityException explicitly and disable the EnvironmentVariableMonitor completely when it occurs. No need to continue generating exceptions/logs for this.
| /// in the configured interval once complete. | ||
| /// </summary> | ||
| /// <param name="state">Variable left unused in this implementation of TimerCallback.</param> | ||
| protected void CheckVariablesIntermittent(object state) |
There was a problem hiding this comment.
see comment about try{}catch below. Timer methods needs to be safe - I'd suggest global catch for this method
There was a problem hiding this comment.
Global try{}catch added, I disable the monitor in the specific case a SecurityException is caught when getting the environment variable value, but I only log a message for any other exception type.
|
|
||
| if (this.isEnabled) | ||
| { | ||
| this.environmentCheckTimer.Change(this.checkInterval, TimeSpan.FromMilliseconds(-1)); |
There was a problem hiding this comment.
single exception will stop timer and there will be no notification to the customer. This will be hard to troubleshoot.
There was a problem hiding this comment.
Logged messages to the WindowsServerEventSource that should help with this.
| internal abstract class EnvironmentVariableMonitor : IDisposable | ||
| { | ||
| // Environment variables tracked by this monitor. | ||
| protected readonly ConcurrentDictionary<string, string> CheckedValues; |
There was a problem hiding this comment.
Why is it a concurrent collection? Do you plan to run multiple instances of timer callback at the same time?
There was a problem hiding this comment.
For this abstract class I set out to run it on a single timer callback, and I even guard specifically against re-entrancy with how I use the timer.
However, because it is possible that some subscriber to the event might exist on different threads, I wanted to ensure those cases were covered.
Perhaps I was being over cautious here.
There was a problem hiding this comment.
@SergeyKanzhelev Would you like me to change this for this PR?
cijothomas
left a comment
There was a problem hiding this comment.
Please address the comments.
| { | ||
| var hbeatProviderMock = new HeartbeatProviderMock(); | ||
| var appSrvHbeatModule = this.GetAppServiceHeartbeatModuleWithUniqueTestEnvVars(hbeatProviderMock); | ||
| var envVars = this.GetEnvVarsAssociatedToModule(appSrvHbeatModule); |
There was a problem hiding this comment.
better to move these as TestInitialize
| Assert.Equal(hbeatProviderMock.HbeatProps[kvp.Key], envVars[kvp.Value]); | ||
| } | ||
|
|
||
| this.RemoveTestEnvVarsAssociatedToModule(appSrvHbeatModule); |
There was a problem hiding this comment.
better to move these as TestCleanup
| { | ||
| try | ||
| { | ||
| var hbeatManager = this.GetHeartbeatPropertyManager(); |
There was a problem hiding this comment.
Please do cache. Otherwise we'll be logging 'AppServiceHeartbeatManagerNotAvailable' everytime this is called. (if hbeatmanager is not available)
d3r3kk: Done.
| /// <summary> | ||
| /// Utility to monitor the value of environment variables which may change | ||
| /// during the run of an application. Checks the environment variables | ||
| /// intermittently. |
There was a problem hiding this comment.
intermittently .> at regular fixed intervals may sound better.
|
|
||
| [Event(35, | ||
| Message = "App Services Heartbeat Provider: Accessing the Hearbeat Manager failed as it is not in the list of available modules.", | ||
| Level = EventLevel.Informational)] |
There was a problem hiding this comment.
This could be Warning just like the above is warning.
There was a problem hiding this comment.
All these messages are Warning I think.
|
|
||
| /// <summary> | ||
| /// Value used for keeping track of when the hostname changes (slot swaps occur). We use this | ||
| /// to notify the other class that makes use of these environment variables. |
|
|
||
| /// <summary> | ||
| /// Initializes <see cref="ITelemetry" /> device context. | ||
| /// Initializes <see cref="ITelemetry" /> device context and helps keep any heartbeat values in sync as well. |
There was a problem hiding this comment.
This initializer does not have any knowledge of heartbeat right? The comment look incorrect.
There was a problem hiding this comment.
Yes, thanks for catching this. Leftover from when I updated it last.
| internal readonly KeyValuePair<string, string>[] WebHeartbeatPropertyNameEnvVarMap = new KeyValuePair<string, string>[] | ||
| { | ||
| new KeyValuePair<string, string>("appSrv_SiteName", "WEBSITE_SITE_NAME"), | ||
| new KeyValuePair<string, string>("appSrv_SiteName", "WEBSITE_SLOT_NAME"), |
| [TestMethod] | ||
| public void EnsureEnvironmentVariablesAreCapturedImmediately() | ||
| { | ||
| var envVars = GetCurrentAppServiceEnvironmentVariableValues(); |
There was a problem hiding this comment.
Thanks for the catch. Oversight on my part (did it in one place, not the other). Updated to use [TestInitialize] and [TestCleanup] here as well.
| { | ||
| Environment.SetEnvironmentVariable("WEBSITE_HOSTNAME", "TestRoleName.azurewebsites.net"); | ||
|
|
||
| var testVarName = "WEBSITE_" + Guid.NewGuid().ToString() + "_HOSTNAME"; |
There was a problem hiding this comment.
Why do we change the ENV vars?
There was a problem hiding this comment.
We now have 3 classes that all use the same environment variables and testing each of them means we update/clear/change that small set of environment variables as we go. If we run the tests in parallel, tests will have that value changed unexpectedly during their execution by other tests.
…re App Service environment
… guidance on not abbreviating names
…t variable monitor. - On SecurityExceptions, disable the monitor altogether
- Change logged event level for heartbeat manager missing from Informational to Warning - for environment variable monitor tests, update test classes to have a test init/cleanup that set up the environment
|
Build failure exceeding time limit of 60 minutes... |
| }; | ||
|
|
||
| // singleton pattern, this is the one instance of this class allowed | ||
| private static readonly AppServiceEnvironmentVariableMonitor SingletonInstance = new AppServiceEnvironmentVariableMonitor(); |
There was a problem hiding this comment.
Is it possible to avoid this static initialization? If the app is not running on WebApp, then the entire effort of this class ctr() and reading multiple env vars is wasted and leading to increasing application startup time. This maybe very minimal but .net core apps could be running in containers and restarted very often.
Perhaps move the entire logic to a lazy initializer?
(Lets track it as separate issue)
There was a problem hiding this comment.
Agreed. Let's track it in a separate issue. I've added it as #875.
|
|
||
| [Event(33, | ||
| Message = "App Services Heartbeat Provider: Failed to obtain Azure App Services environment variable '{0}'. Exception raised: {1}", | ||
| Level = EventLevel.Informational)] |
|
@cijothomas I have altered the AI_WebSDK_All-external build to time out at 90 minutes, we were pushing the upper limit of that build prior to this change. Let's discuss ways we can reduce this going forward, perhaps parallelize the builds across a few machines? |
|
@d3r3kk AI_WebSDK_All-external takes a long time because it has to download docker images everytime as it runs on 'throw away' machine. Not much we can do here. The internal builds are fast enough. -external is meant for PR from forked repos only. (which are rare) |
- Using signed package based off of PR microsoft/ApplicationInsights-dotnet-server#873
Fix Issue #868 and #872.
Add heartbeat properties when App Services environment is detected. Keep App Services-specific environment variables in sync across slot-swaps (and anything else that would update environment variables).
- No change to public API