Run configured helm tests during deploy #596
Open
+281
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a HelmTest specification to the HelmOptions on a Bundle. If enabled, this causes a ReleaseTesting action to be run during deployment, after the install/upgrade.
The happy path works well here: we can see the helm testing action run successfully, the bundle is unready until the tests are complete and once they are it becomes ready. We currently use helm test for our service smoke tests in our non-fleet deployment mechanism, so this is perfect for our use-case.
The error case is a bit less ideal, but does work for our use-case. Our existing non-fleet deployment mechanism uses a helm test failure to trigger a rollback, and that's the behaviour we'd like to be able to build on fleet. This change does give us that - the failure of the tests propagates back up through the bundle status, and then the GitRepo and we can use that to trigger a rollback of the GitRepo to the previous version (we also manage the GitRepo deployment itself via helm). What's not ideal is that after the tests have failed the next attempt to reconcile the bundle re-runs the tests, this happens over and over until something changes to either rollback or fix whatever the issue is. Since the test runs block the fleet-agent this doesn't feel like a great workflow, but one we could definitely work with for the time being (see below for some possible ideas on this - I'm sure it's something you've already put some thought into as well).
The
HelmTest
specification has anenabled
flag to toggle the test run, as well as atimeout
andfilter
spec that matches what is exposed by the helm ReleaseTesting api. I've added a default timeout into the deployer which means we never try and run the tests without a timeout. Since the waiting for the tests is a blocking operation I don't think it would be wise to allow running without a timeout.I see two main issues with this PR as-is: