Reading the provenance recording
A Red Stet recording is process evidence — keystrokes, paste events, cursor moves, pauses. The Review panel turns the stream into a scrubbable timeline plus a queue of moments worth your eyes. This doc covers what each signal means and how to tell a reassuring recording from one that wants a closer look.
Red Stet is not an AI detector. It doesn't score the finished text and tell you "63% AI." It records the writing as it happens and shows you the receipts.
- Opening a recording
- The summary header — what the count means
- The scrubber timeline
- Focus mode and skipping around
- Paste flags — five flavors
- Why there's no "AI score"
- The composition fingerprint
- Baseline match — the author profile
- Cross-doc comparison
- When to trust, when to dig deeper
- Sharing & exporting a recording
- Related & feedback
Opening a recording
Three paths reach the same Review panel.
From a submission
Most common. Open the assignment, click the student's row. Multi-session writing shows as one chain-linked timeline.
From an assignment overview
Click Provenance on the assignment header to see every recording, sorted by flag count. Triage view — start with the noisiest.
From the verifier
Drop a .red.md file into the standalone verifier at /verify/. Same Review panel, no account. Right path for parents, external reviewers, or a file from another school.
The summary header — what the count means
The big number is events to review, not events of cheating. The detector surfaces things you might want to see; the verdict is yours. Three colors:
- Green (zero events) — nothing tripped the detector. About 70% of submissions land here.
- Yellow (some events, no high-severity) — pastes, unusual cursor segment, short typing burst. Usually one explained gesture (citation paste, copy from notes).
- Red (one+ high-severity) — paste after tab switch, off-page paste, programmatic input, or sharp baseline divergence. Open every one before grading.
The meta line shows session length, keystrokes, and paste count (all sizes). A 90-minute session with 8,000 keystrokes and three small pastes looks different from a 4-minute session with 12 keystrokes and one giant paste — even when both show "1 event."
If the classroom has a Provenance policy, the count reflects what the policy surfaces. Off and Manual categories move under "Suppressed by policy." The recording is unchanged. Hover the count for underlying totals.
Nothing tripped the detector. Reads as ordinary composition.
Low/medium-severity pastes; likely a quote and a citation.
Includes a tab-switch paste and an off-page paste. Open both.
The scrubber timeline
The horizontal bar is the full session. Drag the thumb and the panel below updates: absolute time (0:00 / 47:12), event description ("paste of 312 chars at position 1,408"), and a dot tracking the cursor.
Each tick is a flagged moment, color-coded:
- Red — paste flag (large, tab-switch, or off-page)
- Blue-grey — focus / visibility event tied to a paste
- Amber — style-shift (paragraph voice diverges)
- Sage — cursor / motion anomaly
- Teal dot — reflection saved. Writers who pause to reflect mid-draft leave a different signature than ones who fill prompts at the end.
Hover a tick for category + timestamp. Show Replay → in the flag queue jumps the scrubber and zooms.
cur samples; the panel skips cursor signals (straight-line motion, off-page motion).
Focus mode and skipping around
A 60-minute session on a 320-pixel scrubber is hard to drag precisely. Show Replay → on a queue item jumps the playhead and opens a focused window — a sub-track zoomed ±2.5 minutes.
The full-session bar stays above with a highlighted region showing the zoom. The scrubber moves into the focused bar.
View full session in the time read-out exits.
What playback looks like
Scrubber-driven. Drag, the panel paints the moment under the playhead. No play/pause, no speed selector.
Paste flags — five flavors
Pastes are the most useful signal because cost is asymmetric: typing 312 characters takes a minute; pasting takes a fraction of a second. Every paste captures when, where, and size. Content is never stored.
Pastes ≥100 characters classify into five categories. Below 100, recorded but unflagged — the "could have been hand-typed" threshold.
large-paste
100+ characters. The reason line names the shape: sentence fragment, full paragraph, multi-paragraph block, multi-sentence. Multi-paragraph blocks 300+ tag high — the AI block-paste fingerprint. Sentence fragments 100–150 tag low — likely a quoted source.
tab-paste
Tab regained focus, then a paste inside 3 seconds. Signature of copying out of ChatGPT, Google Docs, chat. High.
off-page-paste
Cursor left the editor pane, then a paste inside 3 seconds of return. Suggests assembly in another window. High.
programmatic-input
Text appeared via a JavaScript input event with no matching keystroke. Extension or script — Grammarly rewrite, AI browser extension, paste-as-keystrokes. High.
external-edit
The document changed between Red Stet sessions — work happened in a tool Red Stet couldn't observe (Google Docs, Pages, Word). Chain head hash mismatches; the gap is recorded. Medium.
Why there's no "AI score"
No "% likely AI" on the finished document. The recording is process evidence — paste events with position and size, keystroke timings, cursor coordinates. Raising an issue points at "there's a 312-character paste at 38:21, four seconds after you switched back to this tab" — a specific, falsifiable claim the student can respond to.
One statistical check runs: a paragraph-level style-shift compares each paragraph to the rest along sentence length, vocabulary richness, and AI-tell phrase density ("furthermore," "in conclusion," "delve"). Sharp divergence flags yellow — never a verdict alone.
.red.md. Process evidence holds up where statistical scores fall apart.
22:08 — sentence fragment paste (likely a quote).
06:14 — ¶3 voice diverges from ¶1, ¶2, ¶4.
The composition fingerprint cf-*
Expand Composition fingerprint below the flag queue for aggregate stats across all sessions:
- Typing rhythm sparkline — eight buckets from
<50msthrough≥3.2s. Human writing has a long pause tail; near-zero pauses with fast keystrokes is the bot signature. A healthy distribution is bottom-heavy on the right. - Typo / correction rate — backspace fraction. Below 0.5% is unusually clean (yellow); 1–5% is human-typical (green); 10%+ is heavy revision (fine).
- Read-back behavior — forward-only scrolling suggests one-shot paste; re-reading suggests composition.
- Tab-out pattern — quick lookups (<30s) vs. long absences (>30s). Long absences cluster with paste events.
Each renders as a one-line verdict with color. Open when something in the queue isn't sitting right.
At the top, the chain badge — "Chain intact across 4 sessions" (green) or "Chain broken between session 2 and 3" (red). The chain hash links each session's manifest to the previous; a break means the local sidecar was tampered with or rebuilt. Rare and worth investigating.
Baseline match — the author profile authorProfiles
A per-student baseline aggregates session manifests across every document — typing rhythm, correction rate, sentence-start pause profile, scroll patterns. After three sessions, each recording gets a baseline match score 0–100.
- 75–100 (sage) — strong match. Isolated flags get implicitly downgraded.
- 50–74 (amber) — partial. Some metrics line up, others drift. Worth opening.
- Below 50 (red) — significant divergence. The
authorship-anomalycase — a fingerprint that doesn't match the student's prior work is worth a conversation, even with no flag triggers.
The card shows four drift bars: typing rhythm, correction rate, pause profile, scroll behavior. 12% drift on rhythm with 78% on correction rate tells you something different than the inverse.
Cross-doc comparison provenanceChains
Inside the fingerprint, Compare against another document by this author → opens a side-by-side. Pick a prior recording; both sparklines render with a per-metric drift table.
Right move when the baseline is amber-to-red and you want to know "is this drift from the average, or from a specific recent piece I trust?" Side-by-side against last month's in-class essay sharpens the read.
Comparison uses the provenance chain — a per-student identifier linking sessions across documents and devices. Lives in localStorage, mirrors to Convex, so a Chromebook-last-week / iPad-today student still has a coherent history.
The authorship-anomaly flag is the cross-doc verdict raised to a flag. Sharp divergence surfaces in the main queue — you don't go looking.
| Metric | Drift |
|---|---|
| Avg keystroke interval | 8% |
| Typo correction rate | 14% |
| Sentence-start pause | 31% |
| Scroll-up fraction | 11% |
When to trust the recording, when to dig deeper
The detector surfaces possibilities, not pronouncements. Three rules:
The recording is reassuring when
- Summary green, or yellow with only
large-pastesentence fragment flags — almost always quotes from assigned reading. - Baseline match 75–100, drift metrics under 25%.
- Sparkline healthy across all eight buckets with a pause tail on the right.
- Read-back behavior present, typo corrections 1–5%.
Worth opening every flag before grading when
- One+
tab-paste,off-page-paste, orprogrammatic-inputfires. - Baseline match below 50, especially with a single drift metric above 70%.
- Session unusually short — 12 keystrokes for a 1,200-word essay tells its own story.
- Fingerprint shows uniformly fast typing (>70% under 100ms) with near-zero corrections.
Talk to the student when
High-severity flag and baseline divergence and rhythm fingerprint off. Any one alone is a question; all three is a conversation. Pull up the focused replay, show the student, ask them to walk you through the paragraph. The recording is a starting point, not a verdict.
1 confirmed · 1 non-issue · 2 pending
Sharing & exporting a recording
Export review
A snapshot of the Review panel — flag queue with your decisions, baseline card, composition fingerprint. Small; no raw event stream.
Export provenance — the .red.md bundle
Full recording with raw event stream, in a single .red.md file with the provenance bundle as a fenced JSON block. Drop into the standalone verifier at /verify/ and the recipient sees the same Review panel — scrubber, flags, baseline. No account.
The file that travels with the work. Parents, honor councils, transfer schools, archive-of-record. .red.md is canonical; the live view is convenient.
Verify import
The export round-trips through Verify import. A checks panel runs: chain integrity, content-hash match, paste positions valid, baseline consistent. Right tool when a student forwards a file from another device or claims their local sidecar got lost.
.red.md bundle reads in the standalone verifier with no server, account, or network. The file is the proof.