Group 05 · Capstone · Fall 2025

BURT++ — the bug report, translated.

A guided assistant that turns the kind of bug report a non-technical user actually writes — "the page is weird now" — into the kind of ticket an engineering team can act on: reproduction steps, environment, severity hint, and the obvious next question, asked back before the ticket files.

Domain Dev tools · QA

Stack LLM clarificationJira-shaped outputReact widgetNode APIPostgres

Demoed · Fall 2025

IWhat we built

What problem this solves.

The cost of a bad bug report is enormous. A title like "it's broken" routes the wrong way, sits in triage, and resurfaces a week later when the user is even angrier. Most of the friction is upstream: the user who hits the bug is rarely the one who can describe it in the words a sprint board expects.

BURT++ steps into that gap. Instead of asking the user for a perfect ticket, it asks the user for the report they would write anyway — then runs a clarification loop that fills in the gaps with the smallest number of follow-up questions it can get away with.

IIHow it works

The system, end to end.

Each report goes through a structured clarification cycle. The model reads the user's freeform description, identifies the missing slots — reproduction steps, browser / OS, expected vs actual, frequency — and asks only for what is missing. Questions are scored by information gain, so the user is never asked something the report already implied.

When the slots are filled, BURT++ emits a Jira-shaped ticket: title, description, repro steps, environment, severity hint. The hint is explicit about uncertainty ("likely P3, escalate if reproducible in production") so the triage engineer gets a starting point, not a foregone conclusion.

Pipeline · BURT++

Ingest

Bug draft

reporter writes

Model

Slot detection

structured-output classifier

Transform

Gain-ranked Qs

information-gain ranking

Surface

Follow-up

≤2 follow-up turns

Transform

Ticket emission

Jira-shaped JSON

Surface

Embed surface

Jira / Linear / GitHub Issues

Ingest Transform Model Surface Feedback

IIIThe stack

What it's built on.

Layer · tool / library
Slot detection	Structured-output LLM call against the ticket schema
Follow-up generation	Information-gain ranking over candidate questions
Ticket emission	Jira-shaped JSON with title / desc / repro / env / severity
Embed surface	Drop-in React widget that hosts the chat in-app

Slot detection

Structured-output LLM call against the ticket schema
Per-slot "is-filled" classification with calibrated thresholds
Optional free-form notes preserved verbatim into the ticket

Follow-up generation

Information-gain ranking over candidate questions
Cap of two follow-up turns before the system commits
Per-question reasoning kept in the audit log, not surfaced

Ticket emission

Jira-shaped JSON with title / desc / repro / env / severity
Webhook adapters for Jira, Linear, and GitHub Issues
Severity hint phrased as a confidence-tagged statement

Embed surface

Drop-in React widget that hosts the chat in-app
Token-scoped per-tenant; no shared LLM context
Telemetry surface for triage-engineer feedback signals

IVDeliverables

What the team shipped.

Source repository GitHub · code, tests, README

Demo video Capstone day · screen recording, 4–6 min

Write-up PDF Final brief · methods, evaluation, reflection

Slide deck Capstone presentation · 10 slides

VWhat sets it apart

What sets this capstone apart.

Takeaway 01 · Slots, not forms

Detect what's missing, ask for that.

BURT++ doesn't show a 12-field form. It reads the user's description, identifies the slots that are actually empty, and asks for those — and only those. The user feels heard; the engineer gets a complete ticket.

Takeaway 02 · Gain-ranked questions

One good follow-up beats five bad ones.

Each follow-up is scored by expected information gain. The system asks for severity-determining context first and stylistic details never. The whole interaction is two messages, not ten.

Takeaway 03 · Hint with calibration

Severity, with the uncertainty attached.

The output ticket includes a severity hint phrased with its own confidence — "likely P3, escalate if reproducible in production." Triage starts from a sane prior instead of a guess.

VIIInstructor note

How this project landed.

Solo capstones are scope traps. The first proposal had BURT++ doing slot detection, ticket emission, telemetry, and a separate auto-deduplication system on top — four projects in one. The first review cut the dedup work to a future-work footnote and made the team commit to the conversational core.

What shipped is tight: a working clarification loop, a clean Jira-shaped output, and a defensible answer to the "why two turns and not five" question that came up in critique. The capstone reads as one focused product, not four sketches.

BURT++ — the bug report, translated.

Detect what's missing, ask for that.

One good follow-up beats five bad ones.

Severity, with the uncertainty attached.

Related work in the cohort.

GenAI Claim Verification

CodeCaster