- Structured-output LLM call against the ticket schema
- Per-slot "is-filled" classification with calibrated thresholds
- Optional free-form notes preserved verbatim into the ticket
BURT++ — the bug report, translated.
A guided assistant that turns the kind of bug report a non-technical user actually writes — "the page is weird now" — into the kind of ticket an engineering team can act on: reproduction steps, environment, severity hint, and the obvious next question, asked back before the ticket files.
What problem this solves.
The cost of a bad bug report is enormous. A title like "it's broken" routes the wrong way, sits in triage, and resurfaces a week later when the user is even angrier. Most of the friction is upstream: the user who hits the bug is rarely the one who can describe it in the words a sprint board expects.
BURT++ steps into that gap. Instead of asking the user for a perfect ticket, it asks the user for the report they would write anyway — then runs a clarification loop that fills in the gaps with the smallest number of follow-up questions it can get away with.
The system, end to end.
Each report goes through a structured clarification cycle. The model reads the user's freeform description, identifies the missing slots — reproduction steps, browser / OS, expected vs actual, frequency — and asks only for what is missing. Questions are scored by information gain, so the user is never asked something the report already implied.
When the slots are filled, BURT++ emits a Jira-shaped ticket: title, description, repro steps, environment, severity hint. The hint is explicit about uncertainty ("likely P3, escalate if reproducible in production") so the triage engineer gets a starting point, not a foregone conclusion.
What it's built on.
| Slot detection | Structured-output LLM call against the ticket schema |
|---|---|
| Follow-up generation | Information-gain ranking over candidate questions |
| Ticket emission | Jira-shaped JSON with title / desc / repro / env / severity |
| Embed surface | Drop-in React widget that hosts the chat in-app |
- Information-gain ranking over candidate questions
- Cap of two follow-up turns before the system commits
- Per-question reasoning kept in the audit log, not surfaced
- Jira-shaped JSON with title / desc / repro / env / severity
- Webhook adapters for Jira, Linear, and GitHub Issues
- Severity hint phrased as a confidence-tagged statement
- Drop-in React widget that hosts the chat in-app
- Token-scoped per-tenant; no shared LLM context
- Telemetry surface for triage-engineer feedback signals
What the team shipped.
What sets this capstone apart.
Detect what's missing, ask for that.
BURT++ doesn't show a 12-field form. It reads the user's description, identifies the slots that are actually empty, and asks for those — and only those. The user feels heard; the engineer gets a complete ticket.
One good follow-up beats five bad ones.
Each follow-up is scored by expected information gain. The system asks for severity-determining context first and stylistic details never. The whole interaction is two messages, not ten.
Severity, with the uncertainty attached.
The output ticket includes a severity hint phrased with its own confidence — "likely P3, escalate if reproducible in production." Triage starts from a sane prior instead of a guess.
How this project landed.
Solo capstones are scope traps. The first proposal had BURT++ doing slot detection, ticket emission, telemetry, and a separate auto-deduplication system on top — four projects in one. The first review cut the dedup work to a future-work footnote and made the team commit to the conversational core.
What shipped is tight: a working clarification loop, a clean Jira-shaped output, and a defensible answer to the "why two turns and not five" question that came up in critique. The capstone reads as one focused product, not four sketches.