-
-
Notifications
You must be signed in to change notification settings - Fork 3
Description
We're ready to prepare the StackStorm v3.5 release and start pre-release testing..
Release Process Preparation
Per Release Management Schedule @amanda11 is the Release Manager and @winem is Assisting for v3.5. They will freeze the master for the major repositories in StackStorm org, follow the StackStorm Release Process which is now available to public, accompanied by the Useful Info for Release managers. Communication is happening in #releasemgmt and #development Slack channels.
The first step is pre-release manual user-acceptance testing for v3.5dev.
Why Manual testing?
StackStorm is very serious about testing and has a lot of it:
- Linting, Compile, and Unit tests
- Integration tests
- User Interface tests
- Packaging tests
- Deployment Integrity checks
- Smoke tests
- end-to-end tests when automation spins up new AWS instance for each OS/flavor we support, installs real st2 like user would and runs set of st2tests (for each st2 PR, nightly, periodically, during release), including end-to-end tests ChatOps
See st2ci and st2cd for more examples and workflows about how StackStorm automation is used to test StackStorm (dogfooding).
That's a perfect way to verify what we already know and codify expectations about how StackStorm should function.
However it's not enough.
There are always new unknowns to discover, edge cases to experience and tests to add. Hence, manual Exploratory Testing is an exercise where entire team gathers together and starts trying (or breaking) new features before the new release. Because we're all different, perceive software differently and try different things we might find new bugs, improper design, oversights, edge cases and more.
This is how StackStorm previously managed to land less major/critical bugs into production.
TL;DR
curl -sSL https://stackstorm.com/packages/install.sh | bash -s -- --user=st2admin --password=Ch@ngeMe --unstable
See the Testing Process, where it will walk you through:
- Spinning up a StackStorm instance using st2vagrant
- Running StackStorm's self-check script
- Running through the basic manual testing steps
Additionally, try to use that StackStorm instance as you normally would, maybe try to break it in new and interesting ways that you haven't tried before, and report any regressions found comparing to v3.4.
At this stage, the following installation methods for 3.5 are available for testing:
- vagrant
- one line bash installer
- manual installation instructions - https://docs.stackstorm.com/latest/
Due to changes in O/S versions the ansible and puppet installers will be updated post-release.
Extra points for PR hotfixes, reporting entirely new bugs, and missing test cases!
Specific changes to test
- Test the web UI and the integrated flow UI - see instructions here
- Test install on each OS
- Ubuntu 20.04
- Manual
- Automated bash installer
- Run the self-verification script - see instructions here
- Ubuntu 18.04
- Manual
- Automated Bash Installer
- Run the self-verification script - see instructions here
- EL 7
- Manual
- Automated Bash Installer
- Run the self-verification script - see instructions here
- EL 8
- Manual
- Automated Bash Installer
- Run the self-verification script - see instructions here
- Ubuntu 20.04
- StackStorm upgrade from 3.4 to 3.5, following the Upgrade Instructions: https://docs.stackstorm.com/latest/install/upgrades.html#v3-5
- Without optional data migration script
- With optional data migration script - tested on smallish data set...
- EL 7
- EL 8
- Ubuntu 18.04
- Major changes as detailed in next section...
If you have successful test results, please post a summary of what all you tested (OSes, what features you tested).
If you run into any bugs, please open them in the respective repositories and link to this issue from there. I will add them to the list at the bottom of this description.
If you have any issues running StackStorm or running the tests, please post down below.
Major changes
- New Ubuntu 20.04 Focal Support with MongoDB 4.4
- Removed support for Ubuntu 16.04 Xenial support
- Performance improvements, including database operations, json seralization, yaml parsing in CLI, WebUI
- Redis enabled as co-ordination backend by default
- NGINX default config updated to include TLS v1.3 (in addition to v1.2)
- Upgrade to node 14 for st2chatops
- Ability to retrieve more than one resource with st2 get commands, see Allow user to retrieve more than one resource using "st2 resource get" CLI command st2#4912
- Support for task retry and task delay in Workflow Designer UI
Full Changelog
Changes which are recommended to ack, explore, check and try in a random way.
st2
Added
-
Added web header settings for additional security hardening to nginx.conf: X-Frame-Options,
Strict-Transport-Security, X-XSS-Protection and server-tokens. #5183Contributed by @Shital.
-
Added support for
limitandoffsetargument to thelist_valuesdata store
service method (#5097 and #5171).Contributed by @anirudhbagri.
-
Various additional metrics have been added to the action runner service to provide for better
operational visibility. (improvement) #4846Contributed by @Kami.
-
Added sensor model to list of JSON schemas auto-generated by
make schemasgenthat can be used
by development tools to validate pack contents. (improvement) -
Added the command line utility
st2-validate-packthat can be used by pack developers to
validate pack contents. (improvement) -
Fix a bug in the API and CLI code which would prevent users from being able to retrieve resources
which contain non-ascii (utf-8) characters in the names / references. (bug fix) #5189Contributed by @Kami.
-
Fix a bug in the API router code and make sure we return correct and user-friendly error to the
user in case we fail to parse the request URL / path because it contains invalid or incorrectly
URL encoded data.Previously such errors weren't handled correctly which meant original exception with a stack
trace got propagated to the user. (bug fix) #5189Contributed by @Kami.
-
Make redis the default coordinator backend.
-
Fix a bug in the pack config loader so that objects covered by an additionalProperties schema
can use encrypted datastore keys and have their default values applied correctly. #5225Contributed by @cognifloyd.
-
Add new
database.compressorsanddatabase.zlib_compression_levelconfig option which
specifies compression algorithms client supports for network / transport level compression
when talking to MongoDB.Actual compression algorithm used will be then decided by the server and depends on the
algorithms which are supported by the server + client.Possible / valid values include: zstd, zlib. Keep in mind that zstandard (zstd) is only supported
by MongoDB >= 4.2.Our official Debian and RPM packages bundle
zstandarddependency by default which means
setting this value tozstdshould work out of the box as long as the server runs
MongoDB >= 4.2. #5177Contributed by @Kami.
-
Add support for compressing the payloads which are sent over the message bus. Compression is
disabled by default and user can enable it by settingmessaging.compressionconfig option
to one of the following values:zstd,lzma,bz2,gzip.In most cases we recommend using
zstd(zstandard) since it offers best trade off between
compression ratio and number of CPU cycles spent for compression and compression.How this will affect the deployment and throughput is very much user specific (workflow and
resources available). It may make sense to enable it when generic action trigger is enabled
and when working with executions with large textual results. #5241Contributed by @Kami.
-
Mask secrets in output of an action execution in the API if the action has an output schema
defined and one or more output parameters are marked as secret. #5250Contributed by @mahesh-orch.
Changed
-
All the code has been refactored using black and black style is automatically enforced and
required for all the new code. (#5156)Contributed by @Kami.
-
Default nginx config (
conf/nginx/st2.conf) which is used by the installer and Docker
images has been updated to only support TLS v1.2 and TLS v1.3 (support for TLS v1.0 and v1.1
has been removed).Keep in mind that TLS v1.3 will only be used when nginx is running on more recent distros
where nginx is compiled against OpenSSL v1.1.1 which supports TLS 1.3. #5183 #5216 -
Add new
-xargument to thest2 execution getcommand which allows
resultfield to be excluded from the output. (improvement) #4846 -
Update
st2 execution get <id>command to also display executionlogattribute which
includes execution state transition information.By default
end_timestampattribute anddurationattribute displayed in the command
output only include the time it took action runner to finish running actual action, but it
doesn't include the time it it takes action runner container to fully finish running the
execution - this includes persisting execution result in the database.For actions which return large results, there could be a substantial discrepancy - e.g.
action itself could finish in 0.5 seconds, but writing data to the database could take
additional 5 seconds after the action code itself was executed.For all purposes until the execution result is persisted to the database, execution is
not considered as finished.While writing result to the database action runner is also consuming CPU cycles since
serialization of large results is a CPU intensive task.This means that "elapsed" attribute and start_timestamp + end_timestamp will make it look
like actual action completed in 0.5 seconds, but in reality it took 5.5 seconds (0.5 + 5 seconds).Log attribute can be used to determine actual duration of the execution (from start to
finish). (improvement) #4846Contributed by @Kami.
-
Various internal improvements (reducing number of DB queries, speeding up YAML parsing, using
DB object cache, etc.) which should speed up pack action registration between 15-30%. This is
especially pronounced with packs which have a lot of actions (e.g. aws one).
(improvement) #4846Contributed by @Kami.
-
Underlying database field type and storage format for the
Execution,LiveAction,
WorkflowExecutionDB,TaskExecutionDBandTriggerInstanceDBdatabase models has
changed.This new format is much faster and efficient than the previous one. Users with larger executions
(executions with larger results) should see the biggest improvements, but the change also scales
down so there should also be improvements when reading and writing executions with small and
medium sized results.Our micro and end to benchmarks have shown improvements up to 15-20x for write path (storing
model in the database) and up to 10x for the read path.To put things into perspective - with previous version, running a Python runner action which
returns 8 MB result would take around ~18 seconds total, but with this new storage format, it
takes around 2 seconds (in this context, duration means the from the time the execution was
scheduled to the time the execution model and result was written and available in the database).The difference is even larger when working with Orquesta workflows.
Overall performance improvement doesn't just mean large decrease in those operation timings, but
also large overall reduction of CPU usage - previously serializing large results was a CPU
intensive time since it included tons of conversions and transformations back and forth.The new format is also around 10-20% more storage efficient which means that it should allows
for larger model values (MongoDB document size limit is 16 MB).The actual change should be fully opaque and transparent to the end users - it's purely a
field storage implementation detail and the code takes care of automatically handling both
formats when working with those object.Same field data storage optimizations have also been applied to workflow related database models
which should result in the same performance improvements for Orquesta workflows which pass larger
data sets / execution results around.Trigger instance payload field has also been updated to use this new field type which should
result in lower CPU utilization and better throughput of rules engine service when working with
triggers with larger payloads.This should address a long standing issue where StackStorm was reported to be slow and CPU
inefficient with handling large executions.If you want to migrate existing database objects to utilize the new type, you can use
st2common/bin/migrations/v3.5/st2-migrate-db-dict-field-valuesmigration
script. (improvement) #4846Contributed by @Kami.
-
Add new
result_sizefield to theActionExecutionDBmodel. This field will only be
populated for executions which utilize new field storage format.It holds the size of serialzed execution result field in bytes. This field will allow us to
implement more efficient execution result retrieval and provide better UX since we will be
able to avoid loading execution results in the WebUI for executions with very large results
(which cause browser to freeze). (improvement) #4846Contributed by @Kami.
-
Add new
/v1/executions/<id>/result[?download=1&compress=1&pretty_format=1]API endpoint
which can be used used to retrieve or download raw execution result as (compressed) JSON file.This endpoint will primarily be used by st2web when executions produce very large results so
we can avoid loading, parsing and formatting those very large results as JSON in the browser
which freezes the browser window / tab. (improvement) #4846Contributed by @Kami.
-
Update
jinja2dependency to the latest stable version (2.11.3). #5195 -
Update
pyyamldependency to the latest stable version (5.4). #5207 -
Update various dependencies to latest stable versions (
bcrypt,appscheduler,pytz,
python-dateutil,psutil,passlib,gunicorn,flex,cryptography.
eventlet,greenlet,webob,mongoengine,pymongo,requests,
pyyaml,kombu,amqp,python-ldap).#5215, Upgrade python-ldap to latest stable version st2-auth-ldap#94
Contributed by @Kami.
-
Update code and dependencies so it supports Python 3.8 and Mongo DB 4.4 #5177
-
StackStorm Web UI (
st2web) has been updated to not render and display execution results
larger than 200 KB directly in the history panel in the right side bar by default anymore.
Instead a link to view or download the raw result is displayed.Execution result widget was never optimized to display very large results (especially for
executions which return large nested dictionaries) so it would freeze and hang the whole
browser tab / window when trying to render / display large results.If for some reason you want to revert to the old behavior (this is almost never a good idea
since it will cause browser to freeze when trying to display large results), you can do that by
settingmax_execution_result_size_for_renderoption in the config to a very large value (e.g.
max_execution_result_size_for_render: 16 * 1024 * 1024).Contributed by @Kami.
-
Some of the config option registration code has been refactored to ignore "option already
registered" errors. That was done as a work around for an occasional race in the tests and
also to make all of the config option registration code expose the same consistent API. #5234Contributed by @Kami.
-
Update
pyywinrmdependency to the latest stable version (0.4.1). #5212Contributed by @chadpatt .
-
Monkey patch on st2stream earlier in flow #5240
Contributed by Amanda McGuinness (@amanda11 Ammeon Solutions)
-
Support % in CLI arguments by reading the ConfigParser() arguments with raw=True.
This removes support for '%' interpolations on the configuration arguments.
See https://docs.python.org/3.8/library/configparser.html#configparser.ConfigParser.get for
further details. #5253Contributed by @winem.
-
Remove duplicate host header in the nginx config for the auth endpoint.
-
Update orquesta to v1.4.0.
Improvements
-
CLI has been updated to use or
orjsonwhen parsing API response and C version of the YAML
safe dumper when formatting execution result for display. This should result in speed up when
displaying execution result (st2 execution get, etc.) for executions with large results.When testing it locally, the difference for execution with 8 MB result was 18 seconds vs ~6
seconds. (improvement) #4846Contributed by @Kami.
-
Update various Jinja functiona to utilize C version of YAML
safe_{load,dump}functions and
orjson for better performance. (improvement) #4846Contributed by @Kami.
-
For performance reasons, use
udatetimelibrary for parsing ISO8601 / RFC3339 date strings
where possible. (improvement) #4846Contributed by @Kami.
-
Speed up service start up time by speeding up runners registration on service start up by
re-using existing stevedoreExtensionManagerinstance instead of instantiating new
DriverManagerinstance per extension which is not necessary and it's slow since it requires
disk / pkg resources scan for each extension. (improvement) #5198Contributed by @Kami.
-
Add new
?max_result_sizequery parameter filter to theGET /v1/executiond/<id>API
endpoint.This query parameter allows clients to implement conditional execution result retrieval and
only retrieve the result field if it's smaller than the provided value.This comes handy in the various client scenarios (such as st2web) where we don't display and
render very large results directly since it allows to speed things up and decrease amount of
data retrieved and parsed. (improvement) #5197Contributed by @Kami.
-
Update default nginx config which is used for proxying API requests and serving static
content to only allow HTTP methods which are actually used by the services (get, post, put,
delete, options, head).If a not-allowed method is used, nginx will abort the request early and return 405 status
code. #5193Contributed by @ashwini-orchestral
-
Update default nginx config which is used for proxying API requests and serving static
content to not allow range requests. #5193Contributed by @ashwini-orchestral
-
Drop unused python dependencies: prometheus_client, python-gnupg, more-itertools, zipp. #5228
Contributed by @cognifloyd.
-
Update majority of the "resource get" CLI commands (e.g.
st2 execution get,
st2 action get,st2 rule get,st2 pack get,st2 apikey get,st2 trace get,
st2 key get,st2 webhook get,st2 timer get, etc.) so they allow for retrieval
and printing of information for multiple resources using the following notation:
st2 <resource> get <id 1> <id 2> <id n>, e.g.st2 action.get pack.show packs.get packs.deleteThis change is fully backward compatible when retrieving only a single resource (aka single
id is passed to the command).When retrieving a single source the command will throw and exit with non-zero if a resource is
not found, but when retrieving multiple resources, command will just print an error and
continue with printing the details of any other found resources. (new feature) #4912Contributed by @Kami.
Fixed
-
Refactor spec_loader util to use yaml.load with SafeLoader. (security)
Contributed by @ashwini-orchestral -
Import ABC from collections.abc for Python 3.10 compatibility. (#5007)
Contributed by @tirkarthi -
Updated to use virtualenv 20.4.0/PIP20.3.3 and fixate-requirements to work with PIP 20.3.3 #512
Contributed by Amanda McGuinness (@amanda11 Ammeon Solutions) -
Fix
st2 execution get --with-schemaflag. (bug fix) #4846Contributed by @Kami.
-
Fix SensorTypeAPI schema to use class_name instead of name since documentation for pack
development uses class_name and registrar used to load sensor to database assign class_name
to name in the database model. (bug fix) -
Updated paramiko version to 2.7.2, to go with updated cryptography to prevent problems
with ssh keys on remote actions. #5201Contributed by Amanda McGuinness (@amanda11 Ammeon Solutions)
-
Update rpm package metadata and fix
Providessection for RHEL / CentOS 8 packages.In the previous versions, RPM metadata would incorrectly signal that the
st2package
provides various Python libraries which it doesn't (those Python libraries are only used
internally for the package local virtual environment).Contributed by @Kami.
-
Make sure
st2common.util.green.shell.run_command()doesn't leave stray / zombie processes
laying around in some command timeout scenarios. #5220Contributed by @r0m4n-z.
-
Fix support for skipping notifications for workflow actions. Previously if action metadata
specified an empty list fornotifyparameter value, that would be ignored / not handled
correctly for workflow (orquesta, action chain) actions. #5221 #5227Contributed by @khushboobhatia01.
-
Clean up to remove unused methods in the action execution concurrency policies. #5268
st2web
Changed
- If User is inactive for longer time the user should get logout from application
- Update dependencies
- logout action should logout user from all tabs
- Fix integer overview issue when the limit or timeout parameter values were entered beyond the maximum value
- Don't display and render large execution results
- Fix all the various lint violations detected by eslint.
###Added - Added Delay Property as a new field under task properties.
- Added Retry Feature as a new field under task properties.
- Add focal support
- Set maxLength attribute for username and password for input validation
Removed
- Remove xenial support
orquesta 1.4.0
Changed
- Migrate from Travis to Github Actions. (improvement)
Fixed
- Fix unreachable join when an inbound task fails as defined in the task transition. (bug fix)
- Throw exception if the workflow definition that is written in YAML contains duplicate keys (i.e. task name). (bug fix)
st2chatops
Changed
- Updated to node 14
- Update dependencies
Added
- added Focal support
Removed
- Remove Xenial support
Conclusion
Please report findings here and bugs/regressions in respective repositories.
Depending on severity and importance bugs might be fixed before the release or postponed to the next release if they're very minor and not a release blocker.
Issues Found During Release
- TLS versions not updated - upgrade instructions updated
- Packs with duplicate keys won't load - upgrade notes updated
- Data migration scripts not available
- slack pack doesn't install on focal - due to dependencies that do not support python 3.8 - [WIP] Update lxml and slackclient to versions that supports python 3.8 StackStorm-Exchange/stackstorm-slack#72
PRs Merged for Release
- Further upgrade info st2docs#1077
- Remove EWC reference and further upgrade info st2docs#1078
- Add missing setup.py entry for the new migration script st2#5291
TODOs
- Blog post for release
- Blog post for exchange/community update