Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,22 @@ To return to the *Settings* page, from the navigation panel select {MenuAEAdminS
//include::platform/proc-controller-authentication.adoc[leveloffset=+1]
//[ddacosta] subscription content moved to access management guide
//include::platform/proc-controller-configure-subscriptions.adoc[leveloffset=+1]

include::platform/proc-controller-configure-system.adoc[leveloffset=+1]

include::platform/proc-controller-configure-jobs.adoc[leveloffset=+1]

include::platform/ref-controller-logging-settings.adoc[leveloffset=+1]

//The only directly controller related thing here is the custom logo which is covered separately
//include::platform/proc-controller-configure-user-interface.adoc[leveloffset=+1]
//include::platform/proc-controller-configure-usability-analytics.adoc[leveloffset=+2]
//include::platform/con-controller-custom-logos.adoc[leveloffset=+1]

include::platform/proc-controller-configure-analytics.adoc[leveloffset=+1]

include::platform/con-controller-additional-settings.adoc[leveloffset=+1]

//This should be in Hala's documentation
//include::platform/proc-controller-obtaining-subscriptions.adoc[leveloffset=+1]
//include::platform/con-controller-keep-subscription-in-compliance.adoc[leveloffset=+2]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,38 +6,63 @@

Tune your {ControllerName} to optimize performance and scalability. When planning your workload, ensure that you identify your performance and scaling needs, adjust for any limitations, and monitor your deployment.

{ControllerNameStart} is a distributed system with multiple components that you can tune, including the following:
{ControllerNameStart} is a distributed system with many components that you can tune, including the following:

* Task system in charge of scheduling jobs
* Control Plane in charge of controlling jobs and processing output
* Execution plane where jobs run
* Web server in charge of serving the API
* Websocket system that serve and broadcast websocket connections and data
* Database used by multiple components
* WebSocket system that serve and broadcast WebSocket connections and data
* Database used by many components

include::platform/con-websocket-setup.adoc[leveloffset=+1]

include::platform/proc-configuring-discovery.adoc[leveloffset=+2]

include::platform/ref-controller-capacity-planning.adoc[leveloffset=+1]

include::platform/ref-controller-workload-characteristics.adoc[leveloffset=+2]

include::platform/ref-controller-node-types.adoc[leveloffset=+2]

include::platform/ref-scaling-control-nodes.adoc[leveloffset=+3]

include::platform/ref-scaling-execution-nodes.adoc[leveloffset=+3]

include::platform/ref-scaling-hop-nodes.adoc[leveloffset=+3]

include::platform/ref-ratio-control-execution.adoc[leveloffset=+3]

include::platform/ref-controller-capacity-planning-exercise.adoc[leveloffset=+1]

include::platform/ref-controller-performance-troubleshooting.adoc[leveloffset=+1]

include::platform/con-controller-metrics-monitor-controller.adoc[leveloffset=+1]

include::platform/ref-controller-database-settings.adoc[leveloffset=+1]

include::platform/ref-encrypting-plaintext-passwords.adoc[leveloffset=+2]

include::platform/proc-create-password-hashes.adoc[leveloffset=+3]

include::platform/proc-encrypt-postgres-password.adoc[leveloffset=+3]

include::platform/con-controller-tuning.adoc[leveloffset=+1]

include::platform/proc-controller-managing-live-events.adoc[leveloffset=+2]

include::platform/proc-controller-disabling-live-events.adoc[leveloffset=+3]

include::platform/ref-controller-settings-to-modify-events.adoc[leveloffset=+3]

include::platform/ref-controller-settings-job-events.adoc[leveloffset=+2]

include::platform/ref-controller-settings-control-execution-nodes.adoc[leveloffset=+2]

include::platform/ref-controller-capacity-instance-container.adoc[leveloffset=+2]

include::platform/ref-controller-settings-scheduling-jobs.adoc[leveloffset=+2]

include::platform/ref-controller-internal-cluster-routing.adoc[leveloffset=+2]
include::platform/ref-controller-web-service-tuning.adoc[leveloffset=+2]

include::platform/ref-controller-web-service-tuning.adoc[leveloffset=+2]
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,10 @@ Application level metrics provide data that the application knows about the syst
Using system and application metrics can help you identify what was happening in the application when a service degradation occurred. Information about {ControllerName}'s performance over time helps when diagnosing problems or doing capacity planning for future growth.

include::ref-controller-metrics-monitoring.adoc[leveloffset=+1]

include::con-controller-system-level-monitoring.adoc[leveloffset=+1]

.Additional resources

* For more information about configuring monitoring, see xref:assembly-controller-metrics[Metrics].
* Additional insights into automation usage are available when you enable data collection for automation analytics. For more information, see link:https://www.ansible.com/products/insights-for-ansible[Automation analytics and Red Hat Insights for Red Hat Ansible Automation Platform].
* xref:assembly-controller-metrics[Metrics]
* link:https://www.ansible.com/products/insights-for-ansible[Automation analytics and Red Hat Insights for Red Hat Ansible Automation Platform]
2 changes: 1 addition & 1 deletion downstream/modules/platform/con-websocket-setup.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ You can configure websockets at `/etc/tower/conf.d/websocket_config.py` in all o
[IMPORTANT]
====
Your {ControllerName} nodes are designed to broadcast websocket traffic across a private, trusted subnet (and not the open Internet).
Therefore, if you turn off HTTPS for websocket broadcasting, the websocket traffic, composed mostly of Ansible playbook stdout, is sent unencrypted between {ControllerName} nodes.
Therefore, if you turn off HTTPS for websocket broadcasting, the websocket traffic, composed mostly of Ansible Playbook stdout, is sent unencrypted between {ControllerName} nodes.
====
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@

. Disable live streaming events by using one of the following methods:
.. In the API, set `UI_LIVE_UPDATES_ENABLED` to *False*.
.. Navigate to your {ControllerName}. Open the *Miscellaneous System Settings* window. Set the *Enable Activity Stream* toggle to *Off*.
.. Go to your {ControllerName}. Open the *Miscellaneous System Settings* window. Set the *Enable Activity Stream* toggle to *Off*.
4 changes: 3 additions & 1 deletion downstream/modules/platform/proc-create-password-hashes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@

= Creating PostgreSQL password hashes

Supply the hash values that replace the plain text passwords within the {ControllerName} configuration files.

.Procedure

. On your {ControllerName} node, run the following:
+
[literal, options="nowrap" subs="+quotes,attributes"]
Expand Down Expand Up @@ -42,4 +45,3 @@ $encrypted$AESCBC$Z0FBQUFBQmNONU9BbGQ1VjJyNDJRVTRKaFRIR09Ib2U5TGdaYVRfcXFXRjlmdm
+
Note that the `$*_PASS` values are already in plain text in your inventory file.

These steps supply the hash values that replace the plain text passwords within the {ControllerName} configuration files.
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ Use the `max_concurrent_jobs` and `max_forks` settings available on instance gro
----
((number of worker nodes in kubernetes cluster) * (memory available on each worker)) / (memory request on pod_spec) = maximum number of forks
----
** For example, given a single worker node with 8 Gb of Memory, we determine that the `max forks` we want to run is 81. This way, either 39 jobs with 1 fork can run (task impact is always forks + 1), or 2 jobs with forks set to 39 can run.
** For example, given a single worker node with 8 GB of Memory, we determine that the `max forks` we want to run is 81. This way, either 39 jobs with 1 fork can run (task impact is always forks + 1), or 2 jobs with forks set to 39 can run.
* You might have other business requirements that motivate using `max_forks` or `max_concurrent_jobs` to limit the number of jobs launched in a container group.
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,6 @@ For this example, for a workload on 300 managed hosts, executing 1000 tasks per
* Keep the default fork setting of 5 on job templates.
* Use the capacity change feature in the instance view of the UI on the control node to reduce the capacity down to 16, the lowest value, to reserve more of the control node's capacity for processing events.


.Additional Resources

* For more information about workloads with high levels of API interaction, see link:https://www.ansible.com/blog/scaling-automation-controller-for-api-driven-workloads[Scaling Automation Controller for API Driven Workloads].
* For more information about managing capacity with instances, see link:{BaseURL}/red_hat_ansible_automation_platform/{PlatformVers}/html-single/using_automation_execution/index#assembly-controller-instances[Managing capacity with Instances].
* For more information about operator-based deployments, see link:{URLOCPPerformanceGuide}/index[{PlatformName} considerations for operator environments].
For more information about workloads with high levels of API interaction, see link:https://www.ansible.com/blog/scaling-automation-controller-for-api-driven-workloads[Scaling Automation Controller for API Driven Workloads].
For more information about managing capacity with instances, see link:{BaseURL}/red_hat_ansible_automation_platform/{PlatformVers}/html-single/using_automation_execution/index#assembly-controller-instances[Managing capacity with Instances].
For more information about operator-based deployments, see link:{URLOCPPerformanceGuide}/index[{PlatformName} considerations for operator environments].
26 changes: 22 additions & 4 deletions downstream/modules/platform/ref-controller-database-settings.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@

To improve the performance of {ControllerName}, you can configure the following configuration parameters in the database:


*Maintenance*

The `VACUUM` and `ANALYZE` tasks are important maintenance activities that can impact performance. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a `VACUUM` is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables. `ANALYZE` collects statistics about the contents of tables in the database, and stores the results in the `pg_statistic` system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries. The autovacuuming PostgreSQL configuration parameter automates the execution of `VACUUM` and `ANALYZE` commands. Setting autovacuuming to *true* is a good practice. However, autovacuuming will not occur if there is never any idle time on the database. If it is observed that autovacuuming is not sufficiently cleaning up space on the database disk, then scheduling specific vacuum tasks during specific maintenance windows can be a solution.
Expand All @@ -17,7 +16,22 @@ To improve the performance of the PostgreSQL server, configure the following _Gr

* `shared_buffers`: determines how much memory is dedicated to the server for caching data. The default value for this parameter is 128 MB. When you modify this value, you must set it between 15% and 25% of the machine's total RAM.

NOTE: You must restart the database server after changing the value for shared_buffers.
[NOTE]
====
If you are compiling Postgres against OpenSSL 3.2, your system regresses to remove the parameter for User during startup. You can rectify this by using the BIO_get_app_data call instead of open_get_data. Only an administrator can make these changes, but it impacts all users connected to the PostgreSQL database. f you update your systems without the OpenSSL patch, you are not impacted, and you do not need to take action.
====

[NOTE]
====
You must restart the database server after changing the value for `shared_buffers`.
====

[WARNING]
====
If you are compiling Postgres against OpenSSL 3.2, your system regresses to remove the parameter for User during startup. You can rectify this by using the BIO_get_app_data call instead of open_get_data. Only an administrator can make these changes, but it impacts all users connected to the PostgreSQL database.

If you update your systems without the OpenSSL patch, you are not impacted, and you do not need to take action.
====

* `work_mem`: provides the amount of memory to be used by internal sort operations and hash tables before disk-swapping. Sort operations are used for order by, distinct, and merge join operations. Hash tables are used in hash joins and hash-based aggregation. The default value for this parameter is 4 MB. Setting the correct value of the `work_mem` parameter improves the speed of a search by reducing disk-swapping.
** Use the following formula to calculate the optimal value of the `work_mem` parameter for the database server:
Expand All @@ -38,7 +52,11 @@ NOTE: Setting a large `work_mem` can cause the PostgreSQL server to go out of me
Total RAM * 0.05
----

NOTE: Set `maintenance_work_mem` higher than `work_mem` to improve performance for vacuuming.
[NOTE]
====
Set `maintenance_work_mem` higher than `work_mem` to improve performance for vacuuming.
====

.Additional resources
For more information on autovacuuming settings, see link:https://www.postgresql.org/docs/13/runtime-config-autovacuum.html[Automatic Vacuuming].

* link:https://www.postgresql.org/docs/13/runtime-config-autovacuum.html[Automatic Vacuuming]
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,4 @@

= Logging and aggregation settings


For information about these settings, see xref:proc-controller-set-up-logging[Setting up logging].
For information about these settings, see xref:proc-controller-set-up-logging[Setting up logging].
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,3 @@ The following settings impact capacity calculations on the cluster. Set them to

* `AWX_CONTROL_NODE_TASK_IMPACT`: Sets the impact of controlling jobs. You can use it when your control plane exceeds desired CPU or memory usage to control the number of jobs that your control plane can run at the same time.
* `SYSTEM_TASK_FORKS_CPU` and `SYSTEM_TASK_FORKS_MEM`: Influence how many resources are estimated to be consumed by each fork of Ansible. By default, 1 fork of Ansible is estimated to use 0.25 of a CPU and 100 Mb of memory.

//.Additional resources
//For information about file-based settings, see xref:con-controller-additional-settings[Additional settings for {ControllerName}].
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ The callback receiver processes all the output of jobs and writes this output as

Administrators can override the number of callback receiver workers with the setting `JOB_EVENT_WORKERS`. Do not set more than 1 worker per CPU, and there must be at least 1 worker. Greater values have more workers available to clear the Redis queue as events stream to the {ControllerName}, but can compete with other processes such as the web server for CPU seconds, uses more database connections (1 per worker), and can reduce the batch size of events each worker commits.

Each worker builds up a buffer of events to write in a batch. The default amount of time to wait before writing a batch is 1 second. This is controlled by the `JOB_EVENT_BUFFER_SECONDS` setting. Increasing the amount of time the worker waits between batches can result in larger batch sizes.
Each worker builds up a buffer of events to write in a batch. The default amount of time to wait before writing a batch is 1 second. This is controlled by the `JOB_EVENT_BUFFER_SECONDS` setting. Increasing the amount of time the worker waits between batches can result in larger batch sizes.
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,3 @@ If you cannot disable live streaming of events because of their size, reduce the
* `MAX_UI_JOB_EVENTS`: Number of events to display. This setting hides the rest of the events in the list.
* `MAX_EVENT_RES_DATA`: The maximum size of the ansible callback event's "res" data structure. The "res" is the full "result" of the module. When the maximum size of ansible callback events is reached, then the remaining output will be truncated. Default value is 700000 bytes.
* `LOCAL_STDOUT_EXPIRE_TIME`: The amount of time before a `stdout` file is expired and removed locally.

//.Additional resources
//For more information on file based settings, see xref:con-controller-additional-settings[Additional settings for {ControllerName}].
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,10 @@ To optimize {ControllerName}'s web service on the client side, follow these guid
* Direct user to use dynamic inventory sources instead of individually creating inventory hosts by using the API.
* Use webhook notifications instead of polling for job status.
* Use the bulk APIs for host creation and job launching to batch requests.
* Use token authentication. For automation clients that must make many requests very quickly, using tokens is a best practice, because depending on the type of user, there may be additional overhead when using basic authentication.
* Use token authentication. For automation clients that must make many requests very quickly, using tokens is a best practice, because depending on the type of user, there might be additional overhead when using Basic authentication.

.Additional resources
* For more information on workloads with high levels of API interaction, see link:https://www.ansible.com/blog/scaling-automation-controller-for-api-driven-workloads[Scaling Automation Controller for API Driven Workloads].
* For more information on bulk API, see link:https://www.ansible.com/blog/bulk-api-in-automation-controller[Bulk API in Automation Controller].
* For more information on how to generate and use tokens, see link:https://docs.ansible.com/automation-controller/latest/html/administration/oauth2_token_auth.html#ag-oauth2-token-auth[Token-Based Authentication].

* link:https://www.ansible.com/blog/scaling-automation-controller-for-api-driven-workloads[Scaling Automation Controller for API Driven Workloads]
* link:https://www.ansible.com/blog/bulk-api-in-automation-controller[Bulk API in Automation Controller]
* link:https://docs.ansible.com/automation-controller/latest/html/administration/oauth2_token_auth.html#ag-oauth2-token-auth[Token-Based Authentication]
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

[id="ref-encrypting-plaintext-passwords"]

= Encrypting plaintext passwords in {ControllerName} configuration files
= Encrypting plain text passwords in {ControllerName} configuration files

Passwords stored in {ControllerName} configuration files are stored in plain text.
A user with access to the `/etc/tower/conf.d/` directory can view the passwords used to access the database.
Access to the directories is controlled with permissions, so they are protected, but some security findings deem this protection to be inadequate.
The solution is to encrypt the passwords individually.
The solution is to encrypt the passwords individually.
2 changes: 1 addition & 1 deletion downstream/modules/platform/ref-scaling-control-nodes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ Scaling CPU and memory in the same proportion is recommended, for example, 1 CPU

NOTE: Vertically scaling a control node does not automatically increase the number of workers that handle web requests.

An alternative to vertically scaling is horizontally scaling by deploying more control nodes. This allows spreading control tasks across more nodes as well as allowing web traffic to be spread over more nodes, given that you provision a load balancer to spread requests across nodes. Horizontally scaling by deploying more control nodes in many ways can be preferable as it additionally provides for more redundancy and workload isolation in the event that a control node goes down or experiences higher than normal load.
An alternative to vertically scaling is horizontally scaling by deploying more control nodes. This allows spreading control tasks across more nodes and allowing web traffic to be spread over more nodes, given that you provision a load balancer to spread requests across nodes. Horizontally scaling by deploying more control nodes in many ways can be preferable as it additionally provides for more redundancy and workload isolation when a control node goes down or experiences higher than normal load.
Loading