What is backlog refinement?

Backlog refinement is the process of reviewing, clarifying, and organizing product backlog items so they are ready for sprint planning. It involves adding detail, estimates, priorities, and acceptance criteria to user stories and tasks. Refine Backlog automates this process using AI.

How does AI backlog refinement work?

Refine Backlog uses Claude AI to analyze your raw backlog items, deduplicate similar tasks, add clear problem statements, estimate effort using t-shirt sizing (S/M/L/XL), assign priorities (P0-P3), categorize work, and identify dependencies. You paste your items and get structured, sprint-ready stories back in seconds.

How much does Refine Backlog cost?

Refine Backlog offers three plans: Free (10 items per session, 3 sessions/month, no signup required), Pro at $9/month (100 items per session, unlimited sessions), and Team at $29/month (500 items per session, team sharing & collaboration).

Can I import from Jira, Linear, or GitHub?

Yes. Refine Backlog accepts plain text (one item per line), CSV exports from Jira, Linear, and GitHub Issues, or JSON format. Just paste directly into the text area. You can also export results as CSV compatible with all major project management tools.

Yes. Refine Backlog does not store your backlog data. Processing happens in real-time and results are returned directly to your browser. No data is retained after your session.

What's the difference between Pro and Team?

Pro ($9/month) is for individual product managers and includes 100 items per session with unlimited sessions. Team ($29/month) adds team sharing & collaboration, custom export templates, bulk processing, and dedicated support with 500 items per session.

Gherkin Acceptance Criteria: Examples and Best Practices for Agile Teams

What Is Gherkin, and Why Should Your Team Care?

Gherkin is a plain-English syntax for writing executable acceptance criteria. It's the language behind Behavior-Driven Development (BDD) frameworks like Cucumber and SpecFlow. Instead of writing vague requirements like 'users should be able to log in,' Gherkin forces you to spell out the exact conditions, actions, and expected outcomes.

The format uses three keywords: Given (the starting state), When (the action), and Then (the expected result). This structure sounds simple, but it's surprisingly powerful at uncovering hidden assumptions. When a product manager, developer, and QA engineer all write the same Gherkin scenario, their disagreements surface immediately—before a single line of code is written.

That said, Gherkin isn't a silver bullet. Teams often adopt it because it sounds professional, only to discover they're spending more time debating syntax than clarifying requirements. The real value emerges when your features are genuinely complex, cross-functional, or prone to misinterpretation.

How Does the Given/When/Then Structure Actually Work?

The Given/When/Then pattern breaks acceptance criteria into three logical layers, each answering a specific question:

Given describes the preconditions—the world before anything happens. It answers: 'What state must exist for this scenario to be relevant?' When describes the action the user (or system) takes. It answers: 'What triggers the behavior we're testing?' Then describes the observable outcome. It answers: 'What should happen as a result?'

Here's a simple example: Given a user is logged in, When they click 'Forgot Password,' Then they receive a password reset email within 2 minutes. Each clause is specific, testable, and independent. A developer can code it. QA can verify it. A business analyst can validate it makes sense.

The power lies in the constraint. You can't be vague with Gherkin. You can't say 'the system should respond quickly'—you have to say 'within 2 seconds' or 'within 2 minutes.' This precision prevents the costly rework that comes from misaligned expectations.

What Are 10+ Real Examples of Gherkin Acceptance Criteria?

The best way to understand Gherkin is to see it in action across different feature types. Here are 10 real-world examples that cover common scenarios your team likely encounters:

E-commerce: Adding Items to Cart Given a user is browsing the product catalog, When they click 'Add to Cart' on a product priced at $49.99, Then the cart total updates to reflect the new item within 1 second, and a confirmation message appears for 3 seconds.
Authentication: Password Reset Flow Given a user has entered an invalid password 3 times, When they click 'Forgot Password,' Then they are redirected to a password reset form, and a reset link is sent to their registered email address within 5 minutes.
Search: Filter Results by Price Range Given a user is viewing search results with 50+ products, When they select a price range of $10–$50, Then only products within that range are displayed, and the result count updates from 50 to the filtered number within 500ms.
Notifications: Email Digest Scheduling Given a user has opted into daily email digests, When it is 9 AM in their timezone, Then they receive an email containing the past 24 hours of activity, and the email is sent only once per day.
Dashboard: Data Export to CSV Given a user is on the analytics dashboard with 1,000+ rows of data, When they click 'Export to CSV,' Then a file is generated and downloaded within 10 seconds, and the file contains all visible columns and rows.
Mobile: Offline Mode Sync Given a mobile user has no internet connection, When they create a new task, Then the task is saved locally, and when connectivity is restored, the task syncs to the server within 30 seconds without duplicating.
API: Rate Limiting Response Given a client has made 1,000 API requests in the past hour, When they attempt the 1,001st request, Then the API returns a 429 (Too Many Requests) status code with a Retry-After header indicating 1 hour.
Form Validation: Required Field Error Given a user is filling out a checkout form, When they leave the 'Email Address' field empty and click 'Submit,' Then an inline error message appears below the field saying 'Email is required,' and the form is not submitted.
Permissions: Admin-Only Feature Access Given a non-admin user is logged in, When they try to access the 'User Management' page via direct URL, Then they are redirected to the dashboard, and a toast notification says 'You do not have permission to access this page.'
Payments: Refund Processing Given a customer has requested a refund for an order placed 5 days ago, When an admin approves the refund, Then the original payment method is credited within 3–5 business days, and the customer receives a confirmation email immediately.

When Does Gherkin Actually Add Value vs. Overhead?

Here's the uncomfortable truth: not every story needs Gherkin. If you're building a simple feature with obvious behavior, writing Gherkin scenarios can feel like bureaucracy. You spend 20 minutes debating the exact wording of a Given clause when the story is straightforward enough that any competent developer would implement it correctly.

Gherkin shines when: (1) the feature is complex with multiple conditional paths, (2) multiple teams need to align on behavior before development starts, (3) non-technical stakeholders need to validate the acceptance criteria, or (4) the feature has been misunderstood or reworked before. In these cases, the 15 minutes spent writing Gherkin saves hours of rework.

Gherkin adds overhead when: (1) the story is simple and self-explanatory, (2) your team is new to BDD and is still learning the syntax, (3) you lack tooling to execute Gherkin as automated tests (it becomes just documentation), or (4) your team is distributed across timezones and synchronous debate over Gherkin wording becomes a bottleneck.

The pragmatic approach: use Gherkin for your high-risk, high-complexity stories. Use plain-language acceptance criteria for straightforward features. Your backlog will be a mix, and that's healthy.

What Are the Most Common Gherkin Mistakes Teams Make?

Even teams committed to Gherkin often stumble. Here are the pitfalls we see repeatedly:

Writing too many scenarios per story. A story should have 2–4 scenarios, not 10. If you're writing more, your story is too big and should be split. Writing scenarios that are too vague. 'The user should be happy' isn't a Then clause. 'The user receives a success message within 2 seconds' is. Mixing implementation details with behavior. 'Given the database is queried' is implementation. 'Given a user is logged in' is behavior. Skipping the And keyword. If you have multiple conditions, use And to keep scenarios readable: Given a user is logged in, And they have admin permissions, And it is after 5 PM, When they access the reports page, Then... Forgetting to define 'done.' Gherkin scenarios are testable, but only if you define what 'done' means. 'Within 2 seconds' is done. 'Quickly' is not.

How Can You Integrate Gherkin Into Your Backlog Refinement Process?

Gherkin works best when it's introduced early—during backlog refinement, not during sprint planning. This gives your team time to debate and align before the story hits the sprint board.

During refinement, start with a rough user story. Then, as a group, write 2–4 Gherkin scenarios. Ask the product manager: 'Is this the behavior you want?' Ask the developer: 'Can you build this?' Ask QA: 'Can you test this?' If anyone hesitates, that's a signal to refine further. This collaborative process is where Gherkin delivers its real value—not as documentation, but as a thinking tool.

If you're drowning in vague, unrefined backlog items, consider that your real bottleneck might not be the format of your acceptance criteria—it's the rigor of your refinement process. This is where tools like Refine Backlog can help. By automating the initial structuring of messy backlog items into clear problem statements and acceptance criteria, you free up your team to focus on the strategic conversations: Is this the right behavior? Is this the right story size? Should this be Gherkin or plain language? The tool handles the grunt work; your team handles the thinking.

Should You Use Gherkin Without Automated Testing Tools?

This is a critical question many teams don't ask until they're already committed. Gherkin was designed to be executable. Tools like Cucumber, SpecFlow, and Behave parse Gherkin scenarios and run them as automated tests. If you're writing Gherkin without this infrastructure, you're using it as documentation only.

Documentation-only Gherkin is still valuable—it's clearer than prose—but you're missing the biggest payoff: automated validation. Without automation, someone still has to manually verify each scenario. With automation, the scenarios run on every commit. Bugs surface in seconds, not days.

If your team lacks the tooling or expertise to set up Gherkin automation, consider a hybrid approach: write Gherkin for your most critical, complex features where automation will pay dividends. Use plain-language acceptance criteria for everything else. As your team matures and gains confidence with BDD, you can expand Gherkin adoption. There's no shame in starting small.

What's the Difference Between Gherkin and Other Acceptance Criteria Formats?

Gherkin isn't the only way to write acceptance criteria. The most common alternatives are user-story format ('As a [user], I want [action], so that [benefit]') and traditional checklists ('The user should be able to X, Y, and Z').

User-story format is great for capturing intent and motivation. Gherkin is better for capturing behavior and edge cases. A checklist is fast but often vague. In practice, many teams use all three: a user story for context, Gherkin scenarios for complex behavior, and a checklist for simple requirements.

If you're starting from scratch and unsure which format to adopt, ask yourself: How complex is this feature? How many edge cases are there? How many people need to agree on the behavior? If the answers are 'very,' 'many,' and 'several,' Gherkin is your answer. If the answers are 'simple,' 'few,' and 'just the dev,' plain language wins.

How Can Your Team Get Better at Writing Gherkin Scenarios?

Like any skill, writing good Gherkin takes practice. Here are three ways to accelerate your team's proficiency:

First, establish a Gherkin style guide. Decide on conventions: How specific should your timing be? (2 seconds or 'immediately'?) Should you use 'the user' or a specific role like 'the admin'? How many scenarios per story is too many? Consistency makes scenarios easier to read and reduces debate.

Second, do a Gherkin workshop with your team. Spend an hour writing scenarios together for a real story from your backlog. Discuss why certain wording works and why other wording fails. This shared experience builds intuition faster than reading a guide.

Third, start small. Don't convert your entire backlog to Gherkin overnight. Pick your next 5–10 complex stories and write Gherkin for those. Learn from the experience. Adjust your approach. Expand gradually. This measured approach prevents the common failure mode where teams adopt Gherkin, get frustrated with the overhead, and abandon it entirely. For more on refining your backlog systematically, check out our guide on backlog refinement best practices.