Recently, the Release Managment Apps team has begun exploring how to integrate AI-powered features directly into our products. The introduction of Atlassian’s Rovo - now available for free across Jira and Confluence - has created an exciting foundation for Marketplace partners like us. This advancement opens the door to building smarter, more adaptive apps that can elevate project management and release processes to a new level.
In this article, we share insights from our journey exploring Rovo and building our first Rovo agent to extend our app’s functionality. We believe our experience will help save you time if you’re considering building Rovo applications yourself - whether in Studio or as a Marketplace partner - or if you’re simply curious about the system’s capabilities and current limitations.
What is Rovo?
Is it just another AI model like ChatGPT or Google Gemini, or simply a white-labeled wrapper around an existing LLM? Surprisingly, it’s neither.
Rovo started as a wrapper around ChatGPT with native Jira data access, ensuring sensitive information stayed protected and never used for training. But it has since evolved into something much more. While Atlassian hasn’t shared full technical details, real-world behavior and insights from partners suggest Rovo is now a hybrid AI platform. It likely blends models from OpenAI, Google, Anthropic, and others - choosing the best model for each task, whether content generation, analytical reasoning, or even autonomous code creation from Jira user stories.
What About Our Experience?
We had plenty of ideas for AI-powered features - from smarter release analysis to automated approvals, dependency checks, and nicer release notes. But very quickly we ran into a big limitation: as of mid-2025, marketplace apps still can’t call Rovo via API without using the chat interface.
Unexpected? Yes. A blocker? Not really.
We decided to start small and build an agent that could answer questions about release status, find the right release, and summarize its scope, blockers, and main risks.
And, of course, that was only the beginning. Below are the main pitfalls we hit along the way.
Talking to a Goldfish

Next, we hit Rovo’s memory problem. Every new chat started from zero - no context, no preferences, nothing from previous conversations. It felt like talking to a goldfish that’s thrilled to meet you every 10 seconds.
For our agent, this was a real issue. We needed it to remember basic settings instead of asking the same questions over and over.
The fix was simple: we added our own storage inside the app and adjusted the prompts so Rovo rebuilt the needed context each time.
Problem solved. Or so we thought… because things were about to get even more interesting.
Inventing Releases Out of Thin Air

Next came the classic AI issue - hallucinations. If we asked Rovo about releases, sprints, or epics for a timeframe where we had no data, it didn’t admit it. Instead, it confidently invented releases. And the scary part? They looked completely real. Names, dates, scopes - everything sounded perfectly believable.
After many experiments (and way too much Reddit browsing), we found a couple of tricks that actually helped. The most effective one was surprisingly simple: adding “temperature = 0” to the end of the prompt.
We also sometimes prefaced prompts with a dramatic line like, “Your life depends on the correctness of the results.” Not sure if it truly helps… but it definitely didn’t hurt.
Managing Large Data and Prompt Complexity

Another issue we ran into was Rovo losing context when dealing with a lot of information. This showed up in two ways: it struggled with large datasets - like summarizing a release with dozens of work items - and it performed worse when we sent long, complicated prompts in one go.
We tackled both problems separately.
For big datasets, we switched to a chunking approach: break the data into small pieces, summarize each part, then ask Rovo to summarize the summaries. Simple, but it made the output far more reliable.
For long prompts, we stopped writing giant paragraphs and instead rewrote everything as a step-by-step guide with short, clear instructions. That alone boosted accuracy noticeably.
We also added concrete instructions for each step. So instead of a single sentence trying to describe everything, we wrote something more like a 10-step algorithm with a clear sequence of actions.
Surprisingly, this alone boosted the quality and predictability of the results in a very noticeable way.
When Basic Math Becomes Advanced AI

You’d think counting work items in a release would be the easiest task possible. Well… not for Rovo during our early implementation.
No matter how we asked, it kept miscounting - skipping items, double-counting them, or confidently giving numbers that didn’t exist at all. Even small releases produced unreliable results.
We computed everything on the backend, passed the numbers to the agent, and clearly told it to use our values instead of trying to calculate on its own. That simple shift removed the errors completely and finally made the results consistent.
Stabilizing Date-Based Queries

Rovo ran into a similar problem with dates. Simple tasks like showing releases for this week or last month should’ve been easy, but the agent often struggled with something as basic as knowing today’s date. Depending on the model, it sometimes used old internal timestamps or simply guessed - leading to confusing and inconsistent results.
We solved it the same way we handled counting.
We built a backend method that always returns the current date, passed that value to Rovo on every request, and told it clearly to use our date instead of relying on its own “internal clock.”
After that, date-based queries finally became consistent - and Rovo stopped living in its own alternate timeline.
Good Answers, Slow Delivery

We also hit performance issues. Even after fixing accuracy and context, the setup often felt slow - Rovo’s processing time plus multiple backend calls caused noticeable delays. For quick, everyday questions, this really hurt the user experience.
We streamlined data fetching, removed redundant calls, and simplified prompts. It turned out that speed was just as important as correctness.
The Testing Challenge

Last - but definitely not least - came testing. With AI, results aren’t consistent, so getting predictable, repeatable outcomes is extremely hard. Automated testing didn’t help much either, since traditional tests expect deterministic behavior and AI simply isn’t. Building reliable tests for an AI-driven workflow turned out to be far more complex than we imagined.
Here’s the Good News
After all the hurdles above, you might think building with Rovo is a tough ride - and sometimes it really is. But that’s only part of the story. The platform is improving fast, and from our chats with the Atlassian AI team, it’s clear they know the current pain points and have a solid roadmap to address them.
During Atlassian Team ’25 Europe, we also learned that Rovo may soon support API calls, which would be a huge milestone. It would unlock far more use cases and let heavy operations run quietly in the background instead of inside the chat.
And there’s one big advantage people often overlook: Rovo is free and deeply integrated in the Atlassian Ecosystem. For Marketplace partners, that’s a huge win. For end users, it means we’ll soon see much more automation across the Atlassian ecosystem. And that’s genuinely exciting.