Reading the provenance recording

Opening a recording

Three paths reach the same Review panel.

From a submission

Most common. Open the assignment, click the student's row. Multi-session writing shows as one chain-linked timeline.

From an assignment overview

Click Provenance on the assignment header to see every recording, sorted by flag count. Triage view — start with the noisiest.

From the verifier

Drop a .red.md file into the standalone verifier at /verify/. Same Review panel, no account. Right path for parents, external reviewers, or a file from another school.

The recording lives where the student wrote. Keystrokes and cursor events stay in localStorage until submission. At submit, the sidecar uploads as one file. See Privacy doc for the full inventory.

AC

Ava Chen

0 flags Submitted 9:42 PM

BK

Ben Kowalski

3 flags Submitted 8:11 PM

DR

Diego Reyes

1 flag Submitted 6:55 PM

Click a row to open the Review panel. Flag chip = events the detector wants you to look at, not a verdict.

The summary header — what the count means

The big number is events to review, not events of cheating. The detector surfaces things you might want to see; the verdict is yours. Three colors:

Green (zero events) — nothing tripped the detector. About 70% of submissions land here.
Yellow (some events, no high-severity) — pastes, unusual cursor segment, short typing burst. Usually one explained gesture (citation paste, copy from notes).
Red (one+ high-severity) — paste after tab switch, off-page paste, programmatic input, or sharp baseline divergence. Open every one before grading.

The meta line shows session length, keystrokes, and paste count (all sizes). A 90-minute session with 8,000 keystrokes and three small pastes looks different from a 4-minute session with 12 keystrokes and one giant paste — even when both show "1 event."

If the classroom has a Provenance policy, the count reflects what the policy surfaces. Off and Manual categories move under "Suppressed by policy." The recording is unchanged. Hover the count for underlying totals.

0 events to review

Session · 47:12 long · 6,418 keystrokes · 2 pastes
Nothing tripped the detector. Reads as ordinary composition.

2 events to review

Session · 32:04 long · 4,210 keystrokes · 5 pastes
Low/medium-severity pastes; likely a quote and a citation.

4 events to review

Session · 8:30 long · 47 keystrokes · 3 pastes
Includes a tab-switch paste and an off-page paste. Open both.

The scrubber timeline

The horizontal bar is the full session. Drag the thumb and the panel below updates: absolute time (0:00 / 47:12), event description ("paste of 312 chars at position 1,408"), and a dot tracking the cursor.

Each tick is a flagged moment, color-coded:

Red — paste flag (large, tab-switch, or off-page)
Blue-grey — focus / visibility event tied to a paste
Amber — style-shift (paragraph voice diverges)
Sage — cursor / motion anomaly
Teal dot — reflection saved. Writers who pause to reflect mid-draft leave a different signature than ones who fill prompts at the end.

Hover a tick for category + timestamp. Show Replay → in the flag queue jumps the scrubber and zooms.

Touch-only sessions hide the cursor row. iPad sessions have no cur samples; the panel skips cursor signals (straight-line motion, off-page motion).

Scrubbable replay 12:34 / 47:12

Paste Tab / focus Style shift Cursor

paste · pos 1408 · len 312 chars position in document, character count of inserted text

Focus mode and skipping around

A 60-minute session on a 320-pixel scrubber is hard to drag precisely. Show Replay → on a queue item jumps the playhead and opens a focused window — a sub-track zoomed ±2.5 minutes.

The full-session bar stays above with a highlighted region showing the zoom. The scrubber moves into the focused bar.

View full session in the time read-out exits.

What playback looks like

Scrubber-driven. Drag, the panel paints the moment under the playhead. No play/pause, no speed selector.

Full session 38:21 / 47:12 · focused ±2.5m

Focused ±2.5m 35:51 → 40:51

Paste flags — five flavors

Pastes are the most useful signal because cost is asymmetric: typing 312 characters takes a minute; pasting takes a fraction of a second. Every paste captures when, where, and size. Content is never stored.

Pastes ≥100 characters classify into five categories. Below 100, recorded but unflagged — the "could have been hand-typed" threshold.

`large-paste`

100+ characters. The reason line names the shape: sentence fragment, full paragraph, multi-paragraph block, multi-sentence. Multi-paragraph blocks 300+ tag high — the AI block-paste fingerprint. Sentence fragments 100–150 tag low — likely a quoted source.

`tab-paste`

Tab regained focus, then a paste inside 3 seconds. Signature of copying out of ChatGPT, Google Docs, chat. High.

`off-page-paste`

Cursor left the editor pane, then a paste inside 3 seconds of return. Suggests assembly in another window. High.

`programmatic-input`

Text appeared via a JavaScript input event with no matching keystroke. Extension or script — Grammarly rewrite, AI browser extension, paste-as-keystrokes. High.

`external-edit`

The document changed between Red Stet sessions — work happened in a tool Red Stet couldn't observe (Google Docs, Pages, Word). Chain head hash mismatches; the gap is recorded. Medium.

Read the reason line. Every flag carries a one-sentence explanation: character count, shape, what triggered the rule. Half the time the flag explains itself ("paste of 142 chars · sentence fragment — verify against allowed-source list").

38:21

Tab switch + paste

Tab regained focus, then a paste of 312 characters within 2s — strong AI-paste fingerprint

Confirm Non-issue Pending

22:08

Large paste

Paste of 142 characters · sentence fragment — verify against allowed-source list

Confirm Non-issue Pending

06:14

Style shift

¶3 diverges in sentence length and AI-tell density from the rest of the document

Confirm Non-issue Pending

Why there's no "AI score"

No "% likely AI" on the finished document. The recording is process evidence — paste events with position and size, keystroke timings, cursor coordinates. Raising an issue points at "there's a 312-character paste at 38:21, four seconds after you switched back to this tab" — a specific, falsifiable claim the student can respond to.

One statistical check runs: a paragraph-level style-shift compares each paragraph to the rest along sentence length, vocabulary richness, and AI-tell phrase density ("furthermore," "in conclusion," "delve"). Sharp divergence flags yellow — never a verdict alone.

Asked for a "% AI" number? Share the recording or the exported .red.md. Process evidence holds up where statistical scores fall apart.

What Red Stet surfaces instead

38:21 — paste of 312 chars within 2s of regaining tab focus.
22:08 — sentence fragment paste (likely a quote).
06:14 — ¶3 voice diverges from ¶1, ¶2, ¶4.

Specific moments. Falsifiable claims. The conversation is about the work.

No "% AI" number anywhere. The recording answers what happened; you decide what it means.

The composition fingerprint cf-*

Expand Composition fingerprint below the flag queue for aggregate stats across all sessions:

Typing rhythm sparkline — eight buckets from <50ms through ≥3.2s. Human writing has a long pause tail; near-zero pauses with fast keystrokes is the bot signature. A healthy distribution is bottom-heavy on the right.
Typo / correction rate — backspace fraction. Below 0.5% is unusually clean (yellow); 1–5% is human-typical (green); 10%+ is heavy revision (fine).
Read-back behavior — forward-only scrolling suggests one-shot paste; re-reading suggests composition.
Tab-out pattern — quick lookups (<30s) vs. long absences (>30s). Long absences cluster with paste events.

Each renders as a one-line verdict with color. Open when something in the queue isn't sitting right.

At the top, the chain badge — "Chain intact across 4 sessions" (green) or "Chain broken between session 2 and 3" (red). The chain hash links each session's manifest to the previous; a break means the local sidecar was tampered with or rebuilt. Rare and worth investigating.

Typing rhythm · 6,418 keystrokes Chain intact · 1 session

<50<100<200<400<800<1.6s<3.2s≥3.2s

OK Natural typing rhythm (42% fast, 28% pauses)

OK 84 typo corrections — human-typical

OK Re-read behavior present (34% upward scroll)

Baseline match — the author profile authorProfiles

A per-student baseline aggregates session manifests across every document — typing rhythm, correction rate, sentence-start pause profile, scroll patterns. After three sessions, each recording gets a baseline match score 0–100.

75–100 (sage) — strong match. Isolated flags get implicitly downgraded.
50–74 (amber) — partial. Some metrics line up, others drift. Worth opening.
Below 50 (red) — significant divergence. The authorship-anomaly case — a fingerprint that doesn't match the student's prior work is worth a conversation, even with no flag triggers.

The card shows four drift bars: typing rhythm, correction rate, pause profile, scroll behavior. 12% drift on rhythm with 78% on correction rate tells you something different than the inverse.

The baseline lives on the student's device. Until three sessions accumulate, you'll see "Baseline being built · 1 of 3 sessions needed." A freshman week-two has no baseline; a senior on Red Stet a year has a strong one.

87/100 Matches established author profile

Comparing against this author's 14 prior sessions across 6 documents.

Typing rhythm 12% drift

Correction rate 18% drift

Pause profile 9% drift

Scroll behavior 14% drift

38/100 Significant divergence from this author's prior work

Comparing against this author's 9 prior sessions across 4 documents.

Typing rhythm 72% drift

Correction rate 84% drift

Cross-doc comparison provenanceChains

Inside the fingerprint, Compare against another document by this author → opens a side-by-side. Pick a prior recording; both sparklines render with a per-metric drift table.

Right move when the baseline is amber-to-red and you want to know "is this drift from the average, or from a specific recent piece I trust?" Side-by-side against last month's in-class essay sharpens the read.

Comparison uses the provenance chain — a per-student identifier linking sessions across documents and devices. Lives in localStorage, mirrors to Convex, so a Chromebook-last-week / iPad-today student still has a coherent history.

The authorship-anomaly flag is the cross-doc verdict raised to a flag. Sharp divergence surfaces in the main queue — you don't go looking.

Comparing against Memoir draft · 3 weeks ago

81/100 Consistent voice — same author with high confidence

Metric	Drift
Avg keystroke interval	8%
Typo correction rate	14%
Sentence-start pause	31%
Scroll-up fraction	11%

When to trust the recording, when to dig deeper

The detector surfaces possibilities, not pronouncements. Three rules:

The recording is reassuring when

Summary green, or yellow with only large-paste sentence fragment flags — almost always quotes from assigned reading.
Baseline match 75–100, drift metrics under 25%.
Sparkline healthy across all eight buckets with a pause tail on the right.
Read-back behavior present, typo corrections 1–5%.

Worth opening every flag before grading when

One+ tab-paste, off-page-paste, or programmatic-input fires.
Baseline match below 50, especially with a single drift metric above 70%.
Session unusually short — 12 keystrokes for a 1,200-word essay tells its own story.
Fingerprint shows uniformly fast typing (>70% under 100ms) with near-zero corrections.

Talk to the student when

High-severity flag and baseline divergence and rhythm fingerprint off. Any one alone is a question; all three is a conversation. Pull up the focused replay, show the student, ask them to walk you through the paragraph. The recording is a starting point, not a verdict.

Triage decisions stick. The pills — Confirm, Non-issue, Pending — save to your device. Counts show on the assignment dashboard.

All4 High1 Medium2 Low1 Pending2

Progress

2 of 4 reviewed
1 confirmed · 1 non-issue · 2 pending

Triage state is per-teacher, per-device. If you teach the same class with a co-teacher, each of you has your own running queue.

Sharing & exporting a recording

`Export review`

A snapshot of the Review panel — flag queue with your decisions, baseline card, composition fingerprint. Small; no raw event stream.

`Export provenance` — the `.red.md` bundle

Full recording with raw event stream, in a single .red.md file with the provenance bundle as a fenced JSON block. Drop into the standalone verifier at /verify/ and the recipient sees the same Review panel — scrubber, flags, baseline. No account.

The file that travels with the work. Parents, honor councils, transfer schools, archive-of-record. .red.md is canonical; the live view is convenient.

Verify import

The export round-trips through Verify import. A checks panel runs: chain integrity, content-hash match, paste positions valid, baseline consistent. Right tool when a student forwards a file from another device or claims their local sidecar got lost.

The .red.md bundle reads in the standalone verifier with no server, account, or network. The file is the proof.

Export review Export provenance Verify import

Opening a recording

From a submission

From an assignment overview

From the verifier

The summary header — what the count means

The scrubber timeline

Focus mode and skipping around

What playback looks like

Paste flags — five flavors

large-paste

tab-paste

off-page-paste

programmatic-input

external-edit

Why there's no "AI score"

The composition fingerprint cf-*

Baseline match — the author profile authorProfiles

Cross-doc comparison provenanceChains

When to trust the recording, when to dig deeper

The recording is reassuring when

Worth opening every flag before grading when

Talk to the student when

Sharing & exporting a recording

Export review

Export provenance — the .red.md bundle

Verify import

Related

`large-paste`

`tab-paste`

`off-page-paste`

`programmatic-input`

`external-edit`

`Export review`

`Export provenance` — the `.red.md` bundle