Audit humans and agents: don't leave them unobserved

Apr 27, 2026 | by Maximilian Kaske | [engineering]

Most audit logs are lies of omission.

They cover the dashboard. They miss the API key your CI script uses. They miss the cron job. They miss the Slack agent that just renamed a monitor at 2am because someone in #incident asked it to.

An audit log that only logs some of the writes isn't an audit log. It's vibes.

Here's how we made sure ours isn't.

Three surfaces, three CRUDs, zero blame

Three surfaces talk to our database: the public REST API (Hono), the dashboard tRPC layer (Next.js), and the Slack agent (tools, MCP upcoming). Until recently, all three reimplemented the same CRUD operations independently. Three "create status report" functions. Three "update page" paths. Three places to forget a side-effect, three places to fix the same bug, and exactly one of them would still be wrong on Friday afternoon.

If two code paths can write the same row, you have two audit logs. And at least one of them is wrong.

We're not in "add a new table" mode right now - we're consolidating around what's already there. Sometimes you have to take a few steps back to take more steps forward. Before adding anything new, we wanted to stop maintaining the same query three times. Turns out the fix was a package, not a process.

One package, one signature

We didn't build an audit log first. We built a chokepoint.

@openstatus/services is one package, one source of truth for every mutating database operation. REST calls into it. tRPC calls into it. The Slack agent calls into it. Tests for the CRUD layer live with the service, not scattered across three consumers each pretending they invented INSERT.

It sounds stupid simple, and it is. The hard part wasn't the design, it was carving out the week to ship it inside the openstatusHQ/openstatus monorepo without breaking everything our users were doing in the meantime.

The shape every service function takes:

export async function updateMonitor(args: {
  ctx: ServiceContext;
  input: UpdateMonitorInput;
}): Promise<Monitor> {
  const { ctx } = args;
  const input = UpdateMonitorInput.parse(args.input);

  return withTransaction(ctx, async (tx) => {
    const existing = await getMonitorInWorkspace({
      tx,
      id: input.id,
      workspaceId: ctx.workspace.id,
    });

    const updated = await tx
      .update(monitor)
      .set({ /* ...fields... */ })
      .where(eq(monitor.id, existing.id))
      .returning()
      .get();

    await emitAudit(tx, ctx, {
      action: "monitor.update",
      entityType: "monitor",
      entityId: existing.id,
      before: existing,
      after: updated,
    });

    return selectMonitorSchema.parse(updated);
  });
}

Three things matter here. ctx carries the actor and workspace - so the service knows who is doing the thing and where. withTransaction makes the audit insert atomic with the mutation. emitAudit is the single chokepoint for "something happened, write it down."

That last one is the whole point. There's exactly one place in the codebase where audit rows get written. Forget to call it and the type system tells you. Call it twice and the test catches you. There is no fourth path.

The audit log, almost for free

Once every write goes through one function, the audit log writes itself. There's nowhere left to forget.

export const auditLog = sqliteTable(
  "audit_log",
  {
    id: integer("id").primaryKey({ autoIncrement: true }),
    workspaceId: integer("workspace_id").notNull(),
    actorType: text("actor_type").notNull(),     // "user" | "apiKey" | "slack" | "system" | "webhook"
    actorId: text("actor_id").notNull(),         // polymorphic (userId | keyId | slackUserId | ...)
    actorUserId: integer("actor_user_id"),       // denormalized blame; nullable
    action: text("action").notNull(),            // "monitor.update", "page.delete", ...
    entityType: text("entity_type").notNull(),
    entityId: text("entity_id").notNull(),
    before: text("before", { mode: "json" }).$type<Record<string, unknown>>(),
    after:  text("after",  { mode: "json" }).$type<Record<string, unknown>>(),
    metadata: text("metadata", { mode: "json" }).$type<Record<string, unknown>>(),
    changedFields: text("changed_fields", { mode: "json" }).$type<string[]>(),
    createdAt: integer("created_at", { mode: "timestamp_ms" })
      .notNull()
      .$defaultFn(() => new Date()),
  },
  (t) => ({
    workspaceCreatedIdx: index("audit_log_workspace_created_idx")
      .on(t.workspaceId, t.createdAt),
    entityIdx: index("audit_log_entity_idx")
      .on(t.workspaceId, t.entityType, t.entityId, t.createdAt),
  }),
);

A few choices worth flagging - each one a small fight we picked on purpose:

No foreign keys. Audit rows outlive their subjects. Deleting a workspace shouldn't cascade-nuke its history. A soft-deleted user shouldn't block inserts. References are informational, enforced in app code. Foreign keys on an audit table are a foot-gun dressed up as integrity.
entityId is text. api_key uses string keyIds; everything else uses integers. Text is the only common shape, so text it is. Casting at write-time, not query-time.
Two indexes, on purpose. (workspace, created_at) for "show my audit log." (workspace, entityType, entityId, created_at) for per-entity history. "By actor" can be added later if a query demands it - we won't pay the write cost for a use-case nobody's asked for yet.
Order by (created_at DESC, id DESC). Bulk operations emit several rows in the same millisecond. The autoincrement id breaks ties deterministically. Without that, the UI flickers under load and you'll spend an afternoon convinced your clock is broken.

An audit entry that compiles is an audit entry that's complete

Every variant of action has its own shape. monitor.update looks different from api_key.create. We could've shoved everything into a Record<string, unknown> and validated at runtime. We didn't.

Discriminated unions are the contract. The verb decides what's required. The compiler enforces it.

const monitorActions = [
  action("monitor.create", "monitor", intId, { optionalMetadata: true }),
  action("monitor.update", "monitor", intId),
  action("monitor.delete", "monitor", intId),
] as const;

// ... pages, page_components, api_keys, notifications, maintenances, etc.

export const auditActionSchema = z.discriminatedUnion("action", [
  ...monitorActions,
  ...pageActions,
  ...apiKeyActions,
  // ...
]);

export const auditEntrySchema = auditActionSchema.and(
  z.object({
    before: snapshotSchema.optional(),
    after:  snapshotSchema.optional(),
  }),
);

/**
 * Snapshot presence is derived from the action verb at the type level:
 *   *.create  → after required, before forbidden
 *   *.delete  → before required, after forbidden
 *   *.update  → both required
 */
export type AuditEntry = AuditAction extends infer A
  ? A extends { action: `${string}.create` }
    ? A & { after: Snapshot; before?: never }
    : A extends { action: `${string}.delete` }
      ? A & { before: Snapshot; after?: never }
      : A extends { action: `${string}.update` }
        ? A & { before: Snapshot; after: Snapshot }
        : A & { before?: Snapshot; after?: Snapshot }
  : never;

Adding a new audit action is a single variant edit. Forgetting before on a monitor.update is a red squiggle at the call site - not a sad gap in your audit log six months later when someone's trying to reconstruct who broke production.

The actor is its own discriminated union for the same reason:

export type Actor =
  | { type: "user";    userId: number }
  | { type: "apiKey";  keyId: string;   userId?: number }
  | { type: "slack";   teamId: string;  slackUserId: string; userId?: number }
  // ...

Two columns, two questions. actorId answers what authenticated the call - that's forensics. actorUserId answers which human gets blamed - that's accountability. When an API key is owned by a user, both columns get filled. "Show me everything Alex did, including via her CLI tokens and Slack" becomes a single index lookup, not an archaeological dig.

This split matters more as agents take on more work. An MCP server acting on a user's behalf is the user. The log says so.

Atomic, or it's a wish list

If your audit log can fail without rolling back the mutation, you don't have an audit log. You have a probability distribution.

emitAudit writes inside the caller's transaction. If the audit insert fails, the surrounding update rolls back. If the surrounding update fails, the audit row never exists. Same transaction, same fate.

export async function emitAudit(
  tx: DB,
  ctx: ServiceContext,
  entry: AuditEntry,
): Promise<void> {
  const parsed = auditEntrySchema.parse(entry);  // throws → tx rolls back

  const changedFields =
    parsed.before !== undefined && parsed.after !== undefined
      ? diffTopLevel(parsed.before, parsed.after)
      : null;

  // No-op update: caller passed both snapshots, nothing changed → skip.
  if (changedFields !== null && changedFields.length === 0) return;

  await tx.insert(auditLog).values({
    workspaceId: ctx.workspace.id,
    actorType: ctx.actor.type,
    actorId: extractActorId(ctx.actor),
    actorUserId: tryGetActorUserId(ctx.actor),
    action: parsed.action,
    entityType: parsed.entityType,
    entityId: String(parsed.entityId),
    before: parsed.before ?? null,
    after: parsed.after ?? null,
    metadata: "metadata" in parsed ? (parsed.metadata ?? null) : null,
    changedFields,
  });
}

Fail-closed is the entire point. "The audit trail is complete for every successful operation" is a promise. "The audit trail mostly works, probably" is a liability. The first you can hand to a customer, an auditor, or a future you debugging an incident at 3am. The second you can hand to nobody.

changedFields is a top-level shallow diff. Nested JSON columns (headers, assertions, locales) report as a single key - the UI does deeper diffs client-side. Empty array means "both supplied, nothing changed" and the row is skipped. null means "this was a create or delete" and the row stands.

Where it lands

Live now in the dashboard for Pro workspaces.

The audit log table view in the openstatus dashboard.

The same table that answers "who renamed this monitor at 2am?" for an on-call engineer also answers "show me every privileged change in the last 14 days" for whoever shows up with an auditor's checklist. Compliance work that's also useful on a Tuesday.

Open Audit Logs in your openstatus dashboard to see every mutation across REST, tRPC, and Slack - by human or agent.

Every mutation in openstatus - any surface, any actor - lands in one row, in one table. That's the part you can't bolt on later. Everything else is a refactor.

Building a CLI That Works for Humans and Machines