The current way we display the Ai-generated summary is getting duplicated by the human-generated description. The use case where we get a real value is when authors don't provide a description and the AI creates a correct summary, but currently the AI isn't getting 100% correctness on the summary, there is a lot of distrust from viewers of the MR, some users mentioned that they would feel better if they could see that the summary was reviewed and approved by a human.
We should provide a way for authors to generate their summaries when creating the MR, right on the creation/edit screen. So they can decide if they want to use the summary as the description of the MR and even if the AI is not 100% correct the author can edit the text and make it their own.
@andr3 We chatted about that some... it's sort of not different, but does also use a different prompt. There's also a bit of a difference in filling out a template and just providing a short summary. For now, the thing that's really resonating (likely because the AI Actions menu is a bit burried) seems to be the summary piece.
I think ultimately we'll only end up with one of these, but for now we'll keep both.
Here are some user flows when the MR author completes code and pushes it to GitLab:
Our rationale was to add this type of code box where the user could edit/add context within the AI content and insert it into the description box. Code owners would have control over the AI-generated content, they can edit and fix what's wrong when the AI doesn't get things 100% correct. When they insert the content, we could save it to be used when they request a reviewer to review the MR,(review rounds) or use it in some way to make our AI smarter.
My first impression is this looks simple and I like it.
I worked on #409509 (closed) before which is somehow the same as this one and I'm actually liking this workflow more. This can help prevent the LLM destroying (intentionally exaggerated ) the MR description if it starts to hallucinate and be smarter than the author as the author has the choice to insert or not.
I do have some questions/suggestion:
How does this work with automated diff summary? I see you mentioned "we could save it to be used when they request a reviewer to review the MR,(review rounds) or use it in some way to make our AI smarter." which is what we're already doing with automated diff summary.
Based on my experience with working with LLM, even if I regenerate a summary using the same prompt, I rarely got a different response (I sometimes get a different response if I send the same prompt on a different day). Should we remove the "Regenerate" functionality for now?
Is this going to replace the fill in MR description template experiment? I personally think it should.
I'm kind of confused about the "Author switches to Rich text" scenario. I am assuming that the AI summary won't be inserted unless user clicks "Insert". Why does it need to show AI-generated summary code block if it's not inserted at all?
I'm not sure I follow the plain text editing stuff I don't use the rich text editor so do I not get access to this?
Based on my experience with working with LLM, even if I regenerate a summary using the same prompt, I rarely got a different response (I sometimes get a different response if I send the same prompt on a different day). Should we remove the "Regenerate" functionality for now?
@patrickbajao Could we change the temperature in the hope it generates something different? Or is it possible to have another prompt which include the previously generated text and ask to regenerate it? I don't know how well this would work as I've not tried anything like this.
Could we change the temperature in the hope it generates something different?
@iamphill we can do that. The current default is 0.2 and as we get nearer to 1, LLM will try to be more creative. FWIW, we're changing the temperature of diff summary to 0 in !136964 (merged).
Or is it possible to have another prompt which include the previously generated text and ask to regenerate it?
I'm not sure what will be the "another prompt"? Would it contain an instruction saying that don't generate the same response as the "<insert previous response>"?
How does this work with automated diff summary? I see you mentioned "we could save it to be used when they request a reviewer to review the MR,(review rounds) or use it in some way to make our AI smarter." which is what we're already doing with automated diff summary.
@patrickbajao I'm proposing to move the automated diff summary to be generated when a MR author creates a new merge request. It's the same feature but positioned as the description of the MR.
Based on my experience with working with LLM, even if I regenerate a summary using the same prompt, I rarely got a different response (I sometimes get a different response if I send the same prompt on a different day). Should we remove the "Regenerate" functionality for now?
Yes, I think so. I believe we are not ready to have a summary that will auto-update. We need to give the user more control over the experience if we decide to provide it. It feels like a separate feature we could work on in the future.
Is this going to replace the fill in MR description template experiment? I personally think it should.
I'm kind of confused about the "Author switches to Rich text" scenario. I am assuming that the AI summary won't be inserted unless user clicks "Insert". Why does it need to show AI-generated summary code block if it's not inserted at all?
Yes, I agree with you and @iamphill on this. I've removed the "Insert" button from the design to make it more aligned with what we current have on 'Diagrams'
I'm not sure what will be the "another prompt"? Would it contain an instruction saying that don't generate the same response as the ""?
@patrickbajao Yeah thats what I was thinking I don't know how well that would work, or even changing the temperature. I guess we need to figure out a way to stop the AI hallucinating as well otherwise the re-generated responses could end up worse
Yeah thats what I was thinking I don't know how well that would work, or even changing the temperature. I guess we need to figure out a way to stop the AI hallucinating as well otherwise the re-generated responses could end up worse
@iamphill I see. Based on experience, AI gets more creative the higher the temperature is and leads to incorrect summary. It's something we can explore though (a separate feature), maybe letting the LLM know that their previous response is not good, will let them come up with a better summary.
I'm proposing to move the automated diff summary to be generated when a MR author creates a new merge request. It's the same feature but positioned as the description of the MR.
@afracazo Got it. I'm just wondering if we still need to save the summary in the DB somehow. But if the MR is not created yet, we won't be able to associate the summary to a MR.
Yes, I agree with you and @iamphill on this. I've removed the "Insert" button from the design to make it more aligned with what we current have on 'Diagrams'
Since we don't have the "Insert" button, I'm assuming that when user clicks the button to generate a summary, it's automatically inserted? Is my assumption correct?
I see. Based on experience, AI gets more creative the higher the temperature is and leads to incorrect summary. It's something we can explore though (a separate feature), maybe letting the LLM know that their previous response is not good, will let them come up with a better summary.
I'd be curious to know how the chat apps do this where you can re-generate a response and get a different response
@afracazo@iamphill@patrickbajao GREAT DISCUSSION! A couple thoughts below on things I don't think were completely covered, but let me know if I missed any other questions.
How does this work with automated diff summary? I see you mentioned "we could save it to be used when they request a reviewer to review the MR,(review rounds) or use it in some way to make our AI smarter." which is what we're already doing with automated diff summary.
In my mind, this replaces the automated diff summary. While I personally believe that's the correct implementation - it's clear the AI isn't there yet and people aren't ready for it yet. This is a compromise to generate it, but then use it in some of those same ways, while still giving users control.
Is this going to replace the fill in MR description template experiment? I personally think it should.
Yes - I think it will replace that as well, although there's no immediate need to remove that even if we go down this path.
In my mind, this replaces the automated diff summary. While I personally believe that's the correct implementation - it's clear the AI isn't there yet and people aren't ready for it yet. This is a compromise to generate it, but then use it in some of those same ways, while still giving users control.
Got it @phikai. If this replaces the automated diff summary, are we still envisioning of showing the generated summary on GL To-do and email? I'm assuming we don't since the summary will be included in the MR description but I'm just clarifying.
are we still envisioning of showing the generated summary on GL To-do and email
@patrickbajao I don't think so - in fact, I'm pretty sure that's been off for a couple weeks for GitLab team members and we haven't heard a single thing about it. I personally want to revisit it at some point, but it doesn't seem people liked it as much as I did
@phikai I see. I'm asking this question because if we don't need to display the summary alone elsewhere, then we won't need to store the summary as a separate DB record. But if we do plan show the summary alone somewhere, we'll need to find a way to store it in the DB after MR creation.
@patrickbajao Yeah - I don't think we'll need to store it as a separate DB record for now. When I had talked with @afracazo, I had a thought that if the user generated it and then edited it, we could save it so we could use it later. However, I don't think we need to figure out that flow now. WDYT?
I had a thought that if the user generated it and then edited it, we could save it so we could use it later. However, I don't think we need to figure out that flow now.
I had a thought that if the user generated it and then edited it, we could save it so we could use it later. However, I don't think we need to figure out that flow now. WDYT?
@phikai yup, we can opt not to do it now. If we decide later on that we want to do it, we can iterate and add that functionality I imagine.
@iamphill@patrickbajao Thanks so much for taking the time to review and provide feedback. I've adjusted the experience. I'm going to move the designs to the design section and tag both of you so we can continue the conversation from there.