Red Stet
← Back to Help
Help · For teachers · Tier 2 — Day-to-day teaching

Reading the provenance recording

A Red Stet recording is process evidence — keystrokes, paste events, cursor moves, pauses. The Review panel turns the stream into a scrubbable timeline plus a queue of moments worth your eyes. This doc covers what each signal means and how to tell a reassuring recording from one that wants a closer look.

Red Stet is not an AI detector. It doesn't score the finished text and tell you "63% AI." It records the writing as it happens and shows you the receipts.

Opening a recording

Three paths reach the same Review panel.

From a submission

Most common. Open the assignment, click the student's row. Multi-session writing shows as one chain-linked timeline.

From an assignment overview

Click Provenance on the assignment header to see every recording, sorted by flag count. Triage view — start with the noisiest.

From the verifier

Drop a .red.md file into the standalone verifier at /verify/. Same Review panel, no account. Right path for parents, external reviewers, or a file from another school.

The recording lives where the student wrote. Keystrokes and cursor events stay in localStorage until submission. At submit, the sidecar uploads as one file. See Privacy doc for the full inventory.
AC
Ava Chen
0 flags Submitted 9:42 PM
BK
Ben Kowalski
3 flags Submitted 8:11 PM
DR
Diego Reyes
1 flag Submitted 6:55 PM
Click a row to open the Review panel. Flag chip = events the detector wants you to look at, not a verdict.

The summary header — what the count means

The big number is events to review, not events of cheating. The detector surfaces things you might want to see; the verdict is yours. Three colors:

  • Green (zero events) — nothing tripped the detector. About 70% of submissions land here.
  • Yellow (some events, no high-severity) — pastes, unusual cursor segment, short typing burst. Usually one explained gesture (citation paste, copy from notes).
  • Red (one+ high-severity) — paste after tab switch, off-page paste, programmatic input, or sharp baseline divergence. Open every one before grading.

The meta line shows session length, keystrokes, and paste count (all sizes). A 90-minute session with 8,000 keystrokes and three small pastes looks different from a 4-minute session with 12 keystrokes and one giant paste — even when both show "1 event."

If the classroom has a Provenance policy, the count reflects what the policy surfaces. Off and Manual categories move under "Suppressed by policy." The recording is unchanged. Hover the count for underlying totals.

0 events to review
Session · 47:12 long · 6,418 keystrokes · 2 pastes
Nothing tripped the detector. Reads as ordinary composition.
2 events to review
Session · 32:04 long · 4,210 keystrokes · 5 pastes
Low/medium-severity pastes; likely a quote and a citation.
4 events to review
Session · 8:30 long · 47 keystrokes · 3 pastes
Includes a tab-switch paste and an off-page paste. Open both.

The scrubber timeline

The horizontal bar is the full session. Drag the thumb and the panel below updates: absolute time (0:00 / 47:12), event description ("paste of 312 chars at position 1,408"), and a dot tracking the cursor.

Each tick is a flagged moment, color-coded:

  • Red — paste flag (large, tab-switch, or off-page)
  • Blue-grey — focus / visibility event tied to a paste
  • Amber — style-shift (paragraph voice diverges)
  • Sage — cursor / motion anomaly
  • Teal dot — reflection saved. Writers who pause to reflect mid-draft leave a different signature than ones who fill prompts at the end.

Hover a tick for category + timestamp. Show Replay → in the flag queue jumps the scrubber and zooms.

Touch-only sessions hide the cursor row. iPad sessions have no cur samples; the panel skips cursor signals (straight-line motion, off-page motion).
Scrubbable replay 12:34 / 47:12
Paste Tab / focus Style shift Cursor
paste · pos 1408 · len 312 chars position in document, character count of inserted text

Focus mode and skipping around

A 60-minute session on a 320-pixel scrubber is hard to drag precisely. Show Replay → on a queue item jumps the playhead and opens a focused window — a sub-track zoomed ±2.5 minutes.

The full-session bar stays above with a highlighted region showing the zoom. The scrubber moves into the focused bar.

View full session in the time read-out exits.

What playback looks like

Scrubber-driven. Drag, the panel paints the moment under the playhead. No play/pause, no speed selector.

Full session 38:21 / 47:12 · focused ±2.5m
Focused ±2.5m 35:51 → 40:51

Paste flags — five flavors

Pastes are the most useful signal because cost is asymmetric: typing 312 characters takes a minute; pasting takes a fraction of a second. Every paste captures when, where, and size. Content is never stored.

Pastes ≥100 characters classify into five categories. Below 100, recorded but unflagged — the "could have been hand-typed" threshold.

large-paste

100+ characters. The reason line names the shape: sentence fragment, full paragraph, multi-paragraph block, multi-sentence. Multi-paragraph blocks 300+ tag high — the AI block-paste fingerprint. Sentence fragments 100–150 tag low — likely a quoted source.

tab-paste

Tab regained focus, then a paste inside 3 seconds. Signature of copying out of ChatGPT, Google Docs, chat. High.

off-page-paste

Cursor left the editor pane, then a paste inside 3 seconds of return. Suggests assembly in another window. High.

programmatic-input

Text appeared via a JavaScript input event with no matching keystroke. Extension or script — Grammarly rewrite, AI browser extension, paste-as-keystrokes. High.

external-edit

The document changed between Red Stet sessions — work happened in a tool Red Stet couldn't observe (Google Docs, Pages, Word). Chain head hash mismatches; the gap is recorded. Medium.

Read the reason line. Every flag carries a one-sentence explanation: character count, shape, what triggered the rule. Half the time the flag explains itself ("paste of 142 chars · sentence fragment — verify against allowed-source list").
38:21
Tab switch + paste
Tab regained focus, then a paste of 312 characters within 2s — strong AI-paste fingerprint
Confirm Non-issue Pending
22:08
Large paste
Paste of 142 characters · sentence fragment — verify against allowed-source list
Confirm Non-issue Pending
06:14
Style shift
¶3 diverges in sentence length and AI-tell density from the rest of the document
Confirm Non-issue Pending

Why there's no "AI score"

No "% likely AI" on the finished document. The recording is process evidence — paste events with position and size, keystroke timings, cursor coordinates. Raising an issue points at "there's a 312-character paste at 38:21, four seconds after you switched back to this tab" — a specific, falsifiable claim the student can respond to.

One statistical check runs: a paragraph-level style-shift compares each paragraph to the rest along sentence length, vocabulary richness, and AI-tell phrase density ("furthermore," "in conclusion," "delve"). Sharp divergence flags yellow — never a verdict alone.

Asked for a "% AI" number? Share the recording or the exported .red.md. Process evidence holds up where statistical scores fall apart.
What Red Stet surfaces instead
38:21 — paste of 312 chars within 2s of regaining tab focus.
22:08 — sentence fragment paste (likely a quote).
06:14 — ¶3 voice diverges from ¶1, ¶2, ¶4.
Specific moments. Falsifiable claims. The conversation is about the work.
No "% AI" number anywhere. The recording answers what happened; you decide what it means.

The composition fingerprint cf-*

Expand Composition fingerprint below the flag queue for aggregate stats across all sessions:

  • Typing rhythm sparkline — eight buckets from <50ms through ≥3.2s. Human writing has a long pause tail; near-zero pauses with fast keystrokes is the bot signature. A healthy distribution is bottom-heavy on the right.
  • Typo / correction rate — backspace fraction. Below 0.5% is unusually clean (yellow); 1–5% is human-typical (green); 10%+ is heavy revision (fine).
  • Read-back behavior — forward-only scrolling suggests one-shot paste; re-reading suggests composition.
  • Tab-out pattern — quick lookups (<30s) vs. long absences (>30s). Long absences cluster with paste events.

Each renders as a one-line verdict with color. Open when something in the queue isn't sitting right.

At the top, the chain badge — "Chain intact across 4 sessions" (green) or "Chain broken between session 2 and 3" (red). The chain hash links each session's manifest to the previous; a break means the local sidecar was tampered with or rebuilt. Rare and worth investigating.

Typing rhythm · 6,418 keystrokes Chain intact · 1 session
<50<100<200<400<800<1.6s<3.2s≥3.2s
OK   Natural typing rhythm (42% fast, 28% pauses)
OK   84 typo corrections — human-typical
OK   Re-read behavior present (34% upward scroll)

Baseline match — the author profile authorProfiles

A per-student baseline aggregates session manifests across every document — typing rhythm, correction rate, sentence-start pause profile, scroll patterns. After three sessions, each recording gets a baseline match score 0–100.

  • 75–100 (sage) — strong match. Isolated flags get implicitly downgraded.
  • 50–74 (amber) — partial. Some metrics line up, others drift. Worth opening.
  • Below 50 (red) — significant divergence. The authorship-anomaly case — a fingerprint that doesn't match the student's prior work is worth a conversation, even with no flag triggers.

The card shows four drift bars: typing rhythm, correction rate, pause profile, scroll behavior. 12% drift on rhythm with 78% on correction rate tells you something different than the inverse.

The baseline lives on the student's device. Until three sessions accumulate, you'll see "Baseline being built · 1 of 3 sessions needed." A freshman week-two has no baseline; a senior on Red Stet a year has a strong one.
87/100 Matches established author profile
Comparing against this author's 14 prior sessions across 6 documents.
Typing rhythm 12% drift
Correction rate 18% drift
Pause profile 9% drift
Scroll behavior 14% drift
38/100 Significant divergence from this author's prior work
Comparing against this author's 9 prior sessions across 4 documents.
Typing rhythm 72% drift
Correction rate 84% drift

Cross-doc comparison provenanceChains

Inside the fingerprint, Compare against another document by this author → opens a side-by-side. Pick a prior recording; both sparklines render with a per-metric drift table.

Right move when the baseline is amber-to-red and you want to know "is this drift from the average, or from a specific recent piece I trust?" Side-by-side against last month's in-class essay sharpens the read.

Comparison uses the provenance chain — a per-student identifier linking sessions across documents and devices. Lives in localStorage, mirrors to Convex, so a Chromebook-last-week / iPad-today student still has a coherent history.

The authorship-anomaly flag is the cross-doc verdict raised to a flag. Sharp divergence surfaces in the main queue — you don't go looking.

Comparing against Memoir draft · 3 weeks ago
81/100 Consistent voice — same author with high confidence
MetricDrift
Avg keystroke interval8%
Typo correction rate14%
Sentence-start pause31%
Scroll-up fraction11%

When to trust the recording, when to dig deeper

The detector surfaces possibilities, not pronouncements. Three rules:

The recording is reassuring when

  • Summary green, or yellow with only large-paste sentence fragment flags — almost always quotes from assigned reading.
  • Baseline match 75–100, drift metrics under 25%.
  • Sparkline healthy across all eight buckets with a pause tail on the right.
  • Read-back behavior present, typo corrections 1–5%.

Worth opening every flag before grading when

  • One+ tab-paste, off-page-paste, or programmatic-input fires.
  • Baseline match below 50, especially with a single drift metric above 70%.
  • Session unusually short — 12 keystrokes for a 1,200-word essay tells its own story.
  • Fingerprint shows uniformly fast typing (>70% under 100ms) with near-zero corrections.

Talk to the student when

High-severity flag and baseline divergence and rhythm fingerprint off. Any one alone is a question; all three is a conversation. Pull up the focused replay, show the student, ask them to walk you through the paragraph. The recording is a starting point, not a verdict.

Triage decisions stick. The pills — Confirm, Non-issue, Pending — save to your device. Counts show on the assignment dashboard.
All4 High1 Medium2 Low1 Pending2
Progress
2 of 4 reviewed
1 confirmed · 1 non-issue · 2 pending
Triage state is per-teacher, per-device. If you teach the same class with a co-teacher, each of you has your own running queue.

Sharing & exporting a recording

Export review

A snapshot of the Review panel — flag queue with your decisions, baseline card, composition fingerprint. Small; no raw event stream.

Export provenance — the .red.md bundle

Full recording with raw event stream, in a single .red.md file with the provenance bundle as a fenced JSON block. Drop into the standalone verifier at /verify/ and the recipient sees the same Review panel — scrubber, flags, baseline. No account.

The file that travels with the work. Parents, honor councils, transfer schools, archive-of-record. .red.md is canonical; the live view is convenient.

Verify import

The export round-trips through Verify import. A checks panel runs: chain integrity, content-hash match, paste positions valid, baseline consistent. Right tool when a student forwards a file from another device or claims their local sidecar got lost.

The .red.md bundle reads in the standalone verifier with no server, account, or network. The file is the proof.
Export review Flag queue + your decisions · shareable summary
Export provenance (.red.md) Full bundle · verifiable offline · canonical artifact
Verify import Drop in a .red.md to check chain integrity
Export review Export provenance Verify import