Once a user has configured the alert endpoint, they will want to test the set-up to confirm it is will receive alerts and function as expected. This is common when setting up integrations so that the user has peace of mind that critical alerts are not going to get dropped. This will improve the onboarding experience for new users and therefore increase adoption.
Moreover, this will be really helpful for the Monitor:Health team to test new features.
This work drives the direction of the Alert Management category.
Proposal
Add functionality to the Alert management configuration page where users can send an example payload to the endpoint to confirm that it was set-up correctly. The user will see an example payload input pre-populated with some example data. They can modify this if they wish. Clicking the test button triggers the alert workflow.
Design
Updated default page
Endpoint activated
Test failed
Auth key reset
In scope for this issue:
Text updates in body copy
Adding a field where the test alert will show
Adding a test and save changes button
Success and failure alert messages
Out of scope for this issue:
Adding the monitoring tool dropdown
Up for discussion: the auth key reset success message. It doesn't necessarily need to be part of this issue. It's just a nice add given the other changes we're discussing on this page. We can make adding this a separate issue, if need be.
Note: the alerts endpoint section will be moved to Settings > Operations as part of #219142 (closed). So, the location of the alerts endpoint config page will depend on the status of that issue.
Sarah Waldnerchanged title from [Feature proposal] Allow triggering a test alert from the GitLab interface to Trigger test alert from the GitLab interface
changed title from [Feature proposal] Allow triggering a test alert from the GitLab interface to Trigger test alert from the GitLab interface
Leaving an update on this issue: I'm going to work on a proposal that will hopefully address the three changes we're wanting to make to the alert endpoint page in 13.2:
I'll loop everyone in to discuss the designs when they are posted so and we can align on a way forward that will address all three required improvements :)
@ck3g, @allison.browne, @syasonik, @seanarnold - you all came up with some great ideas for triggering a test alert, deleting customer data and allowing for the possibility of selecting different monitoring tools for the alert endpoints. Since these changes are all scheduled to occur at the same time (13.2), I took a pass at mocking these ideas up, to make sure they all work together in one cohesive whole. Wanted to share what I've come up with so far for your review :)
For the testing piece - I noticed that all the other integrations pages have a "test settings and save changes button". So I wondered if we could utilize a similar "test and save" experience on the alert endpoint page?
If we do that, we could:
Introduce a "test settings and save changes" button on the alerts endpoint page
Clicking that button would run the test
If the test is successful, the results of the alert test would populate the text field and the user would see the "endpoint activated" message
If the test fails, the user would see a "test failed" message
Further, we could:
Handle deleting customer data by introducing a "delete all alert data" button on the endpoint screen. Clicking this button would trigger a confirmation modal, so users can confirm they want to complete this destructive action
We could handle selecting different monitoring tools through a dropdown (as Sarah Y suggested)
To complete this flow and make the "reset key" action clear and distinct from these other actions, we can also introduce an alert to communicate when the key has been successfully reset.
All of that would look like this:
Updated default page
Endpoint activated
Test failed
Delete all data
Auth key reset
What do you all think? Do any of you have concerns about anything you're seeing here?
@sarahwaldner - also looping you in here. I know the alert integrations piece may or may not happen immediately. But, do you have any thoughts/concerns about anything you're seeing here?
When I've gathered all your feedback, I'll make the required revisions and add the appropriate designs to this issue, as well as to #215420 (closed) and #214035 (closed) :)
@ameliabauerly This is really fantastic. No concerns. Can we split the discussion though? Deleting customer data is this one:#215420 (closed). I completely understand why we started here though!
From the mockups, it looks like "Delete all alert data" button is something can can be use on a regular basis. It's very close to "Cancel". I also understand that it has a confirmation window to confirm, so that will probably reduce the number of misclicks.
Anyway, what if we have a separate section for the deleting data. With the description that the data will be lost permanently etc.
Similar to the one we have for removing a project.
I have no strong opinion on "Test settings and save changes", besides that we may need to add an explanation of what does that means. (That it will create an "Alert").
We will force users to test it every time, but I assume that settings won't be changed regularly, maybe that won't be a problem for users.
How will the button look when the "Active" switch is off? I assume there is no reason to "test" anything.
Thanks, everyone, for taking a look. Appreciate it!
I'm not a UI/UX designer, so please forgive me
@ck3g, these are excellent questions, thanks so much for bringing them up!
what if we have a separate section for the deleting data. With the description that the data will be lost permanently etc. Similar to the one we have for removing a project.
I can definitely explore that. I'll continue explorations on that piece as part of #215420 (closed)
I have no strong opinion on "Test settings and save changes", besides that we may need to add an explanation of what does that means. (That it will create an "Alert").
I'm hoping that, since the "test and save changes" button exists on all the other integration pages, the functionality won't be too surprising for users. With your suggestion that we add additional clarification that the button creates an alert - could we add that to the confirmation message? For instance, "Alert successfully received and your endpoint is activated. You can view the test alert on the alerts list page." Would that help at all? Note that we're planning to deprecate the "automatically create issues from alerts" functionality in the near future, so I'm not considering that workflow as part of this plan.
We will force users to test it every time, but I assume that settings won't be changed regularly, maybe that won't be a problem for users.
Yes, that's true. I'm also hoping it won't be too intrusive since this settings page would, I assume, only be used for the initial set up. We could separate out the test and save functionality, but that would mean that this page will function differently than all the other integrations pages (which all seem to have the "save and test" functionality). So, I opted to keep things consistent for now, with the understanding that we can try to improve and further optimize this later, if necessary. Not sure how you feel about that, but that was my thinking anyway :)
Will add the designs in a WIP section of this issue but am happy to keep refining based on additional feedback. Thank you all!
could we add that to the confirmation message? For instance, "Alert successfully received and your endpoint is activated. You can view the test alert on the alerts list page."
I think the underlying difference is whether we want to notify users about the accomplished action or to describe the action which is about to happen.
"Alert successfully received and your endpoint is activated. You can view the test alert on the alerts list page."
vs
"That will activate your endpoint and trigger a test alert. You can view the test alert on the alerts list page."
With the addition of the Monitoring tool dropdown, would we be expecting to move the prometheus alert endpoint configuration here as well? And remove it from Settings > Integrations > Prometheus?
If so, that'd certainly be its own issue. If not, would we just have 'Generic' as the only option in the dropdown for now & always default to it?
Either way! Idea inspired by the package group's #198524 (closed):
Convert #214035 (closed) to an epic, including at least one sub-issue for the Prometheus integration
Link to #214035 (closed) from text description of Monitoring tool dropdown
This could increase awareness of upcoming features without being too intrusive, offer folks the ability to find out more & advocate for their preferred tools, and would be very little additional development overhead for us. Thoughts?
Thanks @ck3g and @syasonik for the additional feedback. I believe I've made all the required updates and have updated the designs in the issue description. Is there anything else or should I mark this particular issue ready for development?
@syasonik, with the monitoring tool dropdown - I think I had imagined that we wouldn't show it until we had some additional options available. But, I like your suggestion to add some "advertising" of the upcoming features to the page! I'll add the updated design to #214035 (closed). I'll also port over your suggestion to promote the issue to an epic since there was already some discussion in progress there for consolidating the existing prometheus endpoints.
I think everything is resolved for this particular issue, then. But, if I've missed anything here, just let me know :)
I have two questions given this is the last ticket for %13.2 on my end that touches this section:
Should we disable the fields inside: Settings > Prometheus > Manual Configuration? I.e. Active, URL and "Save"?
Should we consider a path towards removing the Alerts section inside: Settings > Prometheus > Manual Configuration given it is now disable by default with this feature? I don't know what our process for retiring a section / screen is but I would be curious about the path forward on those
I think Prometheus settings are outside of the scope of that issue. This issue about triggering generic alerts, which is now located in Settings > Operations > Alerts. (The deprecated location is Settings > Integrations > Alerts endpoint)
Should we disable the fields inside: Settings > Prometheus > Manual Configuration? I.e. Active, URL and "Save"?
@oregand, no, as Vitali suggested, those fields are used outside of this alerting functionality (for example, for metrics) so they should be left as they are :)
Should we consider a path towards removing the Alerts section inside: Settings > Prometheus > Manual Configuration given it is now disable by default with this feature? I don't know what our process for retiring a section / screen is but I would be curious about the path forward on those
I don't think this section would need to be disabled by default with this feature. Users should still be able to utilize these fields if, for instance, they wanted to manually configure Prometheus for metrics (or similar).
The Prometheus settings will be moved by the APM team over to Settings > Operations in 13.3 as part of: #24651 (closed). I think they are considering moving them into a new Metrics section. At this point, all the fields on Settings > Integrations > Prometheus will be disabled. The whole page will probably be removed during the next major release. That's when we usually deprecate all old features/functionality.
Hope that helps but, if this doesn't align with what you were thinking, just let me know :)
Thank you for the incredible breakdown and overall path forward, it answers my questions and allows me to continue with this ticket in isolation and gives me less of a chance to break things :)