Charts help users visualize what went wrong with triaging an incident. If a particular threshold was exceeded, we can reduce time spent during investigation by automatically embedding the relevant metrics chart in the issue.
In #119016 (closed), we introduced automatically adding relevant charts for GitLab configured alerts. In that instance, we have an existing chart defined on the metrics dashboard that we can pull and display automatically when an issue is created from an alert.
In this issue, we'll focus on metrics where we don't have an existing chart. What defaults do we need to set from the alert so that we can display a chart in the related issue even in those cases when a chart hasn't been previously defined?
Automatically embed a chart visualization for the metric that triggered an alert when both title and y_label are present. Set the time frame to be event time +/- 30 minutes. As on the related issue, the embedded chart will display between the Summary and the rest of the payload.
Additional details
y_label is required for automatically embedding metrics. When that information is present, we will automatically add a metric to the issue created from the alert. The embed will fail silently if y_label is missing.
If title is missing from the alert, we will attempt to substitute in the metric name for the title. If the metric name is also missing, the embed will fail silently.
The default x-axis will be time, with the range being a one hour span around the event (event time ± 30 mins)).
When we do not have the metric defined in GitLab, the embed url will remain hidden from the description.
We'll want to disable the "Generate link to chart" dropdown option for 'metric-less' alerts, since we don't have a chart to link to.
Permissions and Security
Documentation
We will need to add y_label is required for automatically embedding metrics in issues created from incidents
Testing
What does success look like, and how can we measure that?
During investigations, I found that we have a little bug here: #196078 (closed) which will be helpful to potentially fix to give this issue a greater value to more users working with prometheus.
If we have access to full_query, then we can populate a chart with data. The chart formatting would be independent of that. I'm not sure which chart attributes (title, axis labels, units, etc) are truly required on the FE side, though.
When we define which chart attributes are required, we can define defaults for them. Those defaults would likely need to be defined before work can begin on this issue.
@tristan.read - for this issue, we are guessing we'll need to set default values for required chart attributes so that we can display a graph even when some of the values are undefined (as would likely be the case when we don't have a chart for the metric available on the metric dashboard).
But, we wanted to confirm - which values are required and which can be considered optional? Do you know at all?
When we do not have the metric defined, what will we want as the default values for the chart attributes such as title, y_label, and units? Currently mandatory chart attributes are defined here: Defining Custom Dashboards. @tristan.read From a FE perspective, could we consider all of these attributes downgradable to optional? I'm not sure what will break on the FE if the values aren't included.
@syasonik's proposal on the original issue was to set defaults for whatever is required based on information from the alert. Perhaps this could leave us with something like the following, depending on what's required:
title --> metric name and time range (?)
x-axis --> time (with the range being a one hour span around the event (event time ± 30 mins))
y-axis --> y-axis title would be the metric that has been named in the alert. Question: do we need to define a scale for the y-axis values or the units for the metric? Or, are those things we can pull from the alert?
@ohoral or @lauraMon, we're hoping for some insight on the questions above. Do you have a sense of which values are required for charts and which are considered optional?
@ameliabauerly I think the only required ones are metric, which is the type of chart (easy), title, and y-axis
y-axis --> y-axis title would be the metric that has been named in the alert. Question: do we need to define a scale for the y-axis values or the units for the metric?
Normally yes, you do need to define a unit. But if the y-axis is going to always be time, we can use those units. I do think we may be able to get this from the alert?
From a FE perspective, could we consider all of these attributes downgradable to optional?
@syasonik I think we can work around it. Would this mean all fields, even title? I haven't worked in metrics in a while, so apologies if this was already explained somewhere!
Wanted to clarify one thing: I believe the x-axis will be the time rather than the y-axis. So I guess I'm curious if we can pull the metric's units from the alert for the y-axis? Or would the y-axis units not be required, as long as we have the x-axis units defined?
Sorry for not replying earlier @ameliabauerly, I totally missed your ping
I think the original question from @syasonik was along the lines of - will the frontend cope with missing chart information. The short answer is - the chart components are very complex so we probably won't know for sure until we have some sample data of a 'metric-less' alert to try rendering.
The underlying chart library (ECharts) is flexible and pretty much all properties are optional, so any gotchas would be introduced by our components. The usages that I could see by looking through our code are:
panel.title. This is referenced a lot, I would guess that our components would require some rework if panel.title wasn't available.
The legend and tooltips rely on panel.y_label being populated. We could choose to disable both if a panel.y_label isn't present, but this wouldn't be a trivial change. panel.y_label can optionally be overridden by metric.label, but this is optional so we can't rely on that.
metric.unit is only used by the single stat component. I'm assuming only time-series is supported.
For scale: the chart will automatically determine the scale based on the values it finds.
So my recommendation would be - include title and y_label (both from panel) in the backend response if at all possible. Document this as a requirement in the prometheus docs that both these should be present on the alert for embedding / automatic issue creation to work.
If this is not possible, we could get away with using the query as the title and by disabling the legend and tooltip, though this will be a bit of additional work.
The short answer is - the chart components are very complex so we probably won't know for sure until we have some sample data of a 'metric-less' alert to try rendering.
Okay, good to know! Is this something worth creating a spike for, do you think @tristan.read, or will this just be something that whoever is assigned to this issue can explore during the course of their work?
So my recommendation would be - include title and y_label (both from panel) in the backend response if at all possible. Document this as a requirement in the prometheus docs that both these should be present on the alert for embedding / automatic issue creation to work.
So, if we require title and y_label for automatically embedding metrics:
When that information is present, we'd have what we need to add a metric to the issue created from the alert automatically. But, when we display the metric, there wouldn't be a link to display with it - it would just be the chart. Unless we could link back to Prometheus in some useful way?
When that information is not present, we just wouldn't be able to automatically display the metric.
WDYT about that proposal, @tristan.read, @syasonik or @sarahwaldner? It seems like this would at least give us a small, workable way forward?
@ameliabauerly There wouldn't be able to show a prometheus link. Prometheus may not always be available to the client, for example if prometheus is behind an auth mechanism, or if gitlab and prometheus are running on an internal network and only gitlab has access to the outside world. I don't think we can determine if this is the case for the user ahead of time.
Thanks so much for these additional details! Have added this WIP proposal to the issue description. Will update depending on whether title is a requirement, or if we can somehow pull it from the alert (either the alert title or the metric name). Both seem like viable options to me but, I'll let Sarah and Peter confirm :)
I cannot find any evidence to suggest that there wouldn't be a name-like field we could use for title. It might not be unique (as alert names don't need to be on Prometheus), but I'm guessing that's not an issue?
I'll let @tristan.read confirm if uniqueness is an issue (I'm guessing not ).
In the hopes that it's not, I've updated the proposal in the issue description. Is this an accurate summary of everything we discussed, @syasonik and @tristan.read? Is there anything else we need to plan for to categorize this is as "ready for development"?
Sounds good. I will tentatively mark this as "ready for development" then. But, if anyone else has any further thoughts about anything that's needed here, definitely let me know
That's correct. I don't think we got any responses on that question on the original issue. Will ask again so we can get this one ready to go in case it's needed for 12.9!
EDIT: have also updated the issue description to better reflect latest discussions. Hopefully I've captured everything correctly but, if not, definitely feel free to edit further
@sarahwaldner, is this still meant to be scheduled for 12.9? We're still waiting for feedback on #195739 (comment 286527319) so we can set reasonable chart defaults. Maybe we can quickly align on what these defaults should be and still work on this in 12.9, or maybe this issue is no longer a top priority, given the status page MVC issue. WDYT?
@oregand this may have little to no frontend work but I wanted to assign you to this since it's a Deliverable and want to make sure whoever is working on this from the backend side has frontend support
@oregand I think there's some FE work here I wasn't expecting!
Basically, it looks like there was a regression a long time ago, and its impact is more noticeable with this feature.
Regression:
The query label for charts with multiple series no longer falls back to using the data from the response. It you look at the docs for adding custom metrics, you'll see the corresponding note about 'Legend label' from the UI.
Impact:
Because the user is largely unable to configure the chart in this scenario, falling back to the y-axis label for the embed is pretty undesirable, as it essentially hides information from the user. Seeing the same label for every series is completely meaningless.
Actions to take:
I think we should ensure a more sensible fallback exists for series' legend labels based on the data from prometheus. However, since we have new functionality with legend labels since metrics were first implemented, we'll probably want to revisit what the appropriate overall functionality needs to be. In my opinion, ideal would be trying out the substitution-based labels, then fallback to data from the prometheus response, then fallback to the user-defined label only if the first 3 approaches didn't spit out an option.
While testing/debugging, I put together a few FE changes. I'm pretty sure I missed some edge cases, and I'm rusty on my JS, but it's at least touching some of the right files.
master...correct-query-labels
@oregand mentioned not feeling well earlier this week, if you don't hear back let me know so I can get another frontend engineer to help as I do not want us to delay if possible :)
I'm afraid I'm going to be OOO until Monday but if it can wait until then I'm super happy to get stuck in first thing to make sure there isn't a big delay but if that's too long please feel free to fire ahead with another dev!
And just what a amazing breakdown Sarah! If more people wrote issue instructions/explanations like that the world would be a better place :)
@oregand - thanks for flagging this! Would it be possible for us to somehow define or identify which item shows up in the labels for both the popovers and the labels to avoid having such long labels in the first place? I'm not sure if there is anything that could uniquely be used but, wanted to check. It might also be good to have something that's a little more "human readable" as the label but, again, I don't know if that's possible?
I think if we extend the space for the labels as suggested in gitlab-ui!1274 (merged), we might break the labels in other scenarios. If there's not a single thing we can use for the label, perhaps we could truncate the labels after a set number of characters to avoid having the long labels break everything?
Personally I am a bit unsure which of the metrics can be considered meaningful from prometheus given in my own tests they return very different values depending on the label but maybe @syasonik might be able to shed some light on that?
perhaps we could truncate the labels after a set number of characters to avoid having the long labels break everything?
I quite like this suggestion and have updated the MR to reflect that idea with the truncate happening after 160 characters but I would be open to any feedback on a character limit if we go for that option? I've added some screenshots to that MR to show the difference once we're past 160 characters.
I am open to any and all feedback here as this seems like a pretty fundamental change
Thanks for so quickly making the change, @oregand. Since some of these charts are half the width as the one you're showing in the screengrab, we may need to truncate even further but, let's see what others say. Maybe there's a better option here
Also, with truncation, I believe we usually truncate with a ... and have the full text visible in a tooltip. I don't think we should have a tooltip on the popover , but perhaps we could introduce a tooltip for the chart legend. Unless the label would still be so prohibitively long as to make the tooltip not very helpful?
That was the main issue I was having, both the label below and in the tooltip became completely driven offscreen with the sheer amount of text it contained. So I wasn't sure what made the most sense for an end user though IMO a scroll or perhaps a word break style in the tooltip would make it useable? As even my change only stops the text overflowing off the screen but doesn't help the user understand whats happening which I think we need to account for.
If however we can target specific keys from the prometheus data I am all for that!
Hovering over the chart shows a tooltip that contains un-truncated info.
The hover tooltip positions itself to maximise visibility.
Not saying that we necessarily need to follow the same rules, but I think it adheres to the principles of: 1) hide data where it makes the UI clearer and 2) provide the user with a way to see the hidden data when they want to.
In the edge case that the label is so long that it doesn't fit on the screen even in the tooltip, I don't think there's anything we could do.
Thank you for the awesome food for thought with how grafana deals with this.
What I am going to do if @ameliabauerly is happy with it is:
After 120(for smaller charts) characters truncated legend labels.
Hovering the chart will show the un-truncated info.
The information will be formatted to make it readable i.e. doesn't bleed off the screen.
Screenshots
< 120
> 120
I've tested this with all the charts available in the @gitlab/ui lib on multiple screen sizes and think this might be a good starting point for a solution but I am open to any feedback!
@oregand - I think this is a good proposal! We can always adjust again if we find it isn't working in the way we expect
A couple thoughts (though not deal breakers for moving this forward):
In the popover, when the label wraps, is it possible to align the value with the first row of the text? Right now, it looks like it's centered, but I'm thinking it might be easier to read if it's aligned with the top of that first row (ie, if the 7 is aligned with the pink line in the screenshot).
With the legend, could we display a ... when the text truncates? I think that might help let users know that something has been truncated here. You can kind of see that the "h" has been cut off in your example but, I can imagine that, in other cases, you wouldn't necessarily be able to tell that something has been truncated, depending on how the text gets cut off. Adding an ellipsis might help make the fact that the text has been truncated immediately visible to users.
Fantastic suggestions! I have attacked this one again and have now included these suggestions! Please see the screenshots below and let me know if they look suitable!
Yes, that looks great, @oregand Thank you! A couple final thoughts:
The legend probably shouldn't extend beyond the chart "container". It's currently shifting to the right, so it's going beyond the edge of the chart a bit.
We actually only need 16px between the label name and the value. So, I think if you tighten the space there, it might fix the issue where the value is going beyond the edge of the chart
Is documentation up to date with functionality from this issue? It kind of looked like it, but I was unsure. If not, please add doc updates this section. Thank you!