← Prism
Docs
Overview defer orelse zeroinit raw auto-unreachable
Spec Draft
GitHub ↗

Prism Draft Specification

Status: Ideas under consideration. Nothing here is implemented. Items may be adopted, modified, or rejected. Once implemented, they move to SPEC.md.

---

1. Defer 2.0: Channels & Goto-Patch Emission

This is not a separate flag — it's the next evolution of -fdefer.

Problem 1: Error-path vs success-path cleanup

The most common goto cleanup pattern in C isn't "undo everything" — it's "undo on error, keep on success." A function allocates resources progressively, and on failure it must release the ones already acquired, but on success it returns ownership to the caller. Current defer fires on ALL exit paths — there's no way to say "only on error."

Problem 2: Inline emission bloats hot paths

Current defer inlines the cleanup body at every return/goto/break/scope-exit site. With 5 defers and 10 returns, that's 50 inlined cleanup statements — cold code polluting the hot path, bloating the instruction cache, and inflating binary size.

// Current emission — cleanup duplicated at every exit:

if (bad) { free(buf); fclose(f); return -1; }     // inlined

if (worse) { free(buf); fclose(f); return -2; }   // inlined again

{ fclose(f); return buf; }                          // inlined again

Solution: Layered goto-patch at end of function

Defer 2.0 changes the emission strategy. Instead of inlining cleanup at every exit point, each exit becomes a single goto into a layered cleanup chain at the bottom of the function. This is exactly the pattern that expert kernel developers write by hand — Prism automates it.

Syntax

ConstructMeaning
defer stmt;Always-fire, goto-patch emission (cold, end-of-function)
defer(name) stmt;Channel-tagged, goto-patch, fires only on return(name:)
defer_inline stmt;Always-fire, inlined at every exit point (for tiny one-liners)
defer_inline(name) stmt;Channel-tagged, inlined at every exit point
return expr;Fires only untagged defers
return(name:) expr;Fires untagged defers AND name-tagged defers
defer_flush;Fire and consume all active always-defers now (inlined)
defer_flush(name);Fire and consume all active name-channel defers now (inlined)

The colon in return(name:) disambiguates from return(expr). The tokenizer sees return + ( + identifier + : + ) — unambiguous, no conflict with standard C.

Example: source

int *parse_file(const char *path) {
    FILE *f = fopen(path, "r");
    defer fclose(f);

    int *buf = malloc(100);
    defer(err) free(buf);

    if (!f)  return(err:) NULL;
    if (bad) return(err:) NULL;

    return buf;
}

Example: emitted code (defer 2.0 goto-patch)

int *parse_file(const char *path) {
    FILE *f = fopen(path, "r");
    int *buf = malloc(100);

    int *__prism_rv;

    if (!f)  { __prism_rv = NULL; goto __prism_err_2; }
    if (bad) { __prism_rv = NULL; goto __prism_err_2; }

    __prism_rv = buf;
    goto __prism_defer_1;

    // — cold cleanup patch (end of function) —

__prism_err_2:              // error channel entry point

    free(buf);              // defer(err) — LIFO

__prism_defer_1:            // always-defer entry point

    fclose(f);              // defer — LIFO

    return __prism_rv;
}

What this achieves

Hot path optimization: The if (!f) and if (bad) branches contain a single goto — no cleanup code. The branch predictor marks them as not-taken. The instruction cache stays warm on the success path.

Cold code consolidation: All cleanup lives in one place at the end of the function. Binary size drops because cleanup statements appear once, not once per exit point.

Natural channel layering: The goto-patch is a cascade — error-channel defers sit above always-defers. A return(err:) enters at the top (fires both), a plain return enters below (fires only always-defers). No flag variables, no conditionals in the cleanup chain.

Emission rules

  1. Return value capture: Each return expr; becomes __prism_rv = expr; goto __prism_defer_N; where the entry point depends on which channels fire
  2. Void functions: No __prism_rv, just goto __prism_defer_N; and bare return; at the end
  3. LIFO ordering: Within each tier, defers execute in reverse declaration order (same as current)
  4. Cross-channel interleaving: If defer and defer(err) interleave, the cleanup chain preserves declaration order across tiers:

```c defer A; // always defer(err) B; // error defer C; // always defer(err) D; // error

// Goto-patch (LIFO): __prism_err: D; // defer(err) — most recent C; // defer — always (interleaved) B; // defer(err) __prism_defer: A; // defer — always (only the non-err defers below the lowest err) return __prism_rv; ``` Wait — interleaving requires more thought. See "Open questions" below.

  1. Scope nesting: Defers inside nested { } blocks create sub-chains that fire when that scope's } is reached (same as current), using local goto labels

Backward compatibility

Existing defer code with no channels works identically — just with better emission. defer in 2.0 defaults to goto-patch. To get the old inline behavior, use defer_inline. The return value of every exit path is unchanged.

Emission strategy summary

KeywordAt return / scope exitAt defer_flush
defergoto-patch (cold, end-of-function)inlined at flush site
defer_inlineinlined at every exit pointinlined at flush site
defer(name)goto-patch at return(name:)inlined at defer_flush(name)
defer_inline(name)inlined at return(name:)inlined at defer_flush(name)

defer_inline: Inlined at exit points

The goto-patch model trades a goto + label round-trip for code deduplication. For tiny cleanup statements, the goto overhead is worse than just inlining the statement. defer_inline opts into the old emission model — cleanup is duplicated at every exit point.

void locked_operation(mutex_t *m) {
    mutex_lock(m);
    defer_inline mutex_unlock(m);   // one instruction — cheaper to inline than to goto


    if (bad) return;                // emits: { mutex_unlock(m); return; }

    if (worse) return;              // emits: { mutex_unlock(m); return; }

    do_work();
}                                   // emits: mutex_unlock(m);

When to use defer_inline vs defer:

defer_inline supports channels: defer_inline(err) flags &= ~IN_PROGRESS;

The choice is purely an emission optimization — semantics are identical to defer.

defer_flush: Explicit fire-and-consume

Sometimes you need to fire defers without returning — mid-function cleanup, resource recycling, or transitioning between phases. defer_flush fires and consumes pending defers at any point in the function body.

defer_flush is always inlined at the call site (not goto-patched). The developer explicitly asked to run cleanup here — this is wanted hot-path work, not cold error handling. Inlining keeps the end-of-function goto-patch clean and one-directional.

Syntax

ConstructMeaning
defer_flush;Fire and consume all active always-defers (LIFO), inline
defer_flush(name);Fire and consume all active name-channel defers (LIFO), inline

"Consume" means the defers are removed from the defer stack after firing. They will not fire again at return or scope exit. This is "flush the cleanup queue now."

Usage: resource recycling

void process_files(const char **paths, int n) {
    for (int i = 0; i < n; i++) {
        FILE *f = fopen(paths[i], "r");
        defer fclose(f);

        char *buf = malloc(4096);
        defer free(buf);

        process(f, buf);

        defer_flush;            // fires: free(buf), fclose(f) — LIFO, inlined here

        // buf and f are now cleaned up, loop continues fresh

    }
    // no defers pending here — all consumed by defer_flush

}

Usage: phase transition with channels

void pipeline(void) {
    int *scratch = malloc(1024);
    defer(setup) free(scratch);

    int *result = malloc(2048);
    defer(err) free(result);

    if (!init(scratch, result)) return(err:) ;   // goto-patch: free(result), free(scratch)


    // Setup phase complete — release setup resources, keep result

    defer_flush(setup);         // fires: free(scratch) only, inlined here

    // scratch is freed, result survives


    // ... use result ...

    return;                     // fires: nothing (no remaining always-defers)

}

Emitted code

// Source:

FILE *f = fopen(path, "r");
defer fclose(f);
char *buf = malloc(4096);
defer free(buf);
process(f, buf);
defer_flush;

// Emitted (inlined at call site):

FILE *f = fopen(path, "r");
char *buf = malloc(4096);
process(f, buf);
{ free(buf); fclose(f); }       // inlined, LIFO

// defers consumed — return has nothing to fire

No goto, no labels, no resume point. The cleanup is inlined directly because the developer explicitly requested it. The end-of-function goto-patch stays clean — only return-triggered defers generate goto jumps.

Semantics

  1. Consume model: Fired defers are removed. Subsequent return only fires defers registered after the defer_flush call.
  2. LIFO order: Same as normal defer — most recently registered fires first.
  3. Scope-aware: defer_flush only fires defers in the current scope and its parents, same as a return would.
  4. Channel-specific: defer_flush(name) fires only defers tagged with name. Always-defers are NOT fired by defer_flush(name) — only defer_flush (no argument) fires always-defers.
  5. No-op safety: defer_flush with no pending defers is a no-op (no error, no warning).
  6. Cannot appear in defer bodies: defer_flush inside a defer body is a compile-time error (prevents infinite recursion in the cleanup chain).

Why defer_flush inlines and return goto-patches

return-triggered defers are almost always error paths — cold code that shouldn't pollute the icache. The goto-patch keeps them out of the hot path.

defer_flush is an explicit developer action — "I want cleanup to happen here, now." This is hot-path code by intent. Inlining it avoids the goto round-trip overhead and keeps the end-of-function cleanup patch purely one-directional (no backward jumps to resume points).

Why not just use a { } scope?

Scoped defers already fire at }. But scope-based cleanup has two problems:

defer_flush gives explicit, flat, channel-aware control.

Open questions

Interleaved channel ordering: When defer and defer(err) interleave, do error-channel defers fire in strict LIFO relative to all defers (preserving declaration order), or do they form a separate LIFO chain? The interleaved model is correct for resource cleanup (resources depend on allocation order), but the goto-patch cascade becomes more complex — it may require a conditional flag per entry rather than a simple label cascade.

Scope-exit defers: Current defer emits cleanup at } for nested scopes. The goto-patch model works naturally for function-level returns but scope-exit defers (not return, just leaving a { }) may still need inline emission or a per-scope sub-patch.

Return type inference: __prism_rv needs a type. For functions with explicit return types, this is straightforward. For functions returning structs, the temp declaration needs typeof or the explicit struct type from the function signature.

---

2. Taint Qualifiers (-ftaint)

Problem: Direct dereference of untrusted pointers is a systemic bug class across C codebases — not just the Linux kernel (__user), but network daemons (recv buffers), embedded systems (MMIO registers), database engines (mmap'd pages), sandboxes (guest memory), and IPC (shared memory). Today, catching these requires either a separate static analysis tool (Sparse) or runtime instrumentation (ASAN). Most projects use neither.

Design: User-defined taint qualifiers via pragma, with compile-time enforcement of dereference safety.

Declaration

#pragma prism taint untrusted
#pragma prism taint mmio
#pragma prism taint guest

Or via CLI:

prism -ftaint=untrusted,mmio,guest

Usage

void handle_request(untrusted char *buf, size_t len) {
    char c = *buf;              // ERROR: direct dereference of 'untrusted' pointer

    char c = buf[0];            // ERROR: subscript dereference of 'untrusted' pointer

    char *p = buf;              // ERROR: taint stripped without boundary function

    untrusted char *q = buf;    // OK: taint preserved


    char local[256];
    safe_copy(local, buf, len); // OK: passed to function (not dereferenced)

    char c = local[0];          // OK: local is not tainted

}

Core rule: taint can never be silently stripped

The enforcement model is error-on-strip, not dataflow tracking. No branch analysis, no assignment tracking, no CFG:

untrusted char *buf;

*buf;                        // ERROR: dereference of tainted pointer

buf[0];                      // ERROR: subscript of tainted pointer

buf->field;                  // ERROR: member access on tainted pointer


char *p = buf;               // ERROR: stripping taint qualifier

untrusted char *q = buf;     // OK: taint preserved


safe_copy(local, buf, len);  // OK: passed as function argument (boundary crossing)

The assignment char p = buf; is the error — not the later p. This eliminates the aliasing bypass (developer strips annotation and checker goes blind) without requiring any branch or dataflow analysis.

What about branches?

untrusted char *buf;
char *p = default_ptr;
if (cond) { p = buf; }   // ERROR fires HERE: taint stripped at assignment

char c = *p;              // irrelevant — already caught above

No CFG needed. The error fires at the assignment site, not the dereference site. This is simpler and catches the alias bypass that a dereference-only check misses.

Scope

Pure lexical enforcement within a function body. Does NOT track through:

This is a syntactic taint linter, not a provenance tracker.

Emitted output

The taint qualifier is stripped from emitted C. Optionally emitted as:

Architecture fit

Extends existing infrastructure:

Real-world applicability

DomainTaint nameProtects against
Kernel__userUser pointer dereference → privilege escalation
Kernel__iomemMMIO direct access → bus error / race
Network serversuntrustedRecv buffer direct parse → injection
Embedded/RTOSmmioRegister direct access → hardware fault
DatabasesmappedDirect mmap access → locking bypass
SandboxesguestGuest memory access → sandbox escape
IPCforeignShared memory direct use → TOCTOU

---

3. Built-in min / max / clamp (-fminmax)

Problem: The standard C #define min(x, y) ((x) &lt; (y) ? (x) : (y)) evaluates arguments twice. Side effects (++, function calls) cause double-evaluation bugs. The Linux kernel's safe min() macro is 50+ lines of _Generic/typeof/statement-expression soup. Every C project either has this bug or has its own ugly workaround.

Design: Recognize min(a, b), max(a, b), and clamp(val, lo, hi) as built-in function-like identifiers with strict side-effect rejection.

Usage

int x = min(a, b);              // emits: ((a) < (b) ? (a) : (b))

int y = max(f(), g());          // ERROR: arguments have side effects

int z = clamp(val, 0, 255);     // emits: ((val) < (0) ? (0) : (val) > (255) ? (255) : (val))

int w = min(a++, b);            // ERROR: argument has side effects

Side-effect detection

Reuses reject_orelse_side_effects scanner. Flags:

On rejection: hard error with actionable message:

error: arguments to min() have side effects; hoist to a temporary:
    int tmp = f(); int x = min(tmp, b);

Why not auto-hoist?

Auto-hoisting into temps (via statement expressions or pre-statement declarations) breaks short-circuit evaluation. In if (cond &amp;&amp; min(f(), g())), pre-hoisting evaluates f() and g() unconditionally. Statement expressions fix this but are GNU-only (no MSVC). Strict rejection is the safe choice.

Namespace collision

If the source already #defines min/max/clamp, Prism defers to the user's macro (same as defer/orelse — check typedef table, skip if user-defined).

---

4. Compiler Attribute Normalization (-fnormalize-attrs)

Problem: GCC, Clang, MSVC, and C23 all use different syntax for the same compiler attributes. Cross-platform C projects litter their headers with #ifdef chains.

Design: Write the canonical form, Prism emits the right syntax for the target compiler.

Candidates

CanonicalGCC/ClangMSVCC23
[[noreturn]]__attribute__((noreturn))__declspec(noreturn)[[noreturn]]
[[deprecated]]__attribute__((deprecated))__declspec(deprecated)[[deprecated]]
[[fallthrough]]__attribute__((fallthrough))n/a[[fallthrough]]
[[maybe_unused]]__attribute__((unused))n/a[[maybe_unused]]
[[nodiscard]]__attribute__((warn_unused_result))_Check_return_[[nodiscard]]

Status: Low priority

Prism already handles _Noreturn / [[noreturn]] / __attribute__((noreturn)) / __declspec(noreturn) for its own noreturn analysis. Generalizing to all attributes is straightforward but low impact — the #ifdef boilerplate is annoying but not dangerous.

---

5. sizeof Array Parameter Decay Check (-fsizeof-decay)

Problem: When an array is passed as a function parameter, it decays to a pointer. sizeof(arr) then returns the pointer size, not the array size — a silent, catastrophic bug that every C beginner hits and many experienced developers still miss. GCC has -Wsizeof-array-argument but it's not in -Wall.

Design: Hard error when sizeof is applied to a parameter that was declared with array syntax.

Detection

void process(int arr[], size_t n) {
    // Phase 1 sees: parameter 'arr' declared as 'int arr[]' (array syntax)

    size_t len = sizeof(arr);              // ERROR: sizeof on decayed array parameter 'arr'

    size_t elem = sizeof(arr) / sizeof(arr[0]); // ERROR: same

}

void ok(int *arr, size_t n) {
    size_t len = sizeof(arr);  // OK: declared as pointer, developer knows what they're getting

}

Rules

  1. In function parameter list, if an identifier is declared with [] or [N] syntax, tag it as "array-declared parameter"
  2. In the function body, if sizeof(ident) or sizeof ident appears where ident is tagged, emit error:

`` error: sizeof() on array parameter 'arr' returns pointer size, not array size; use an explicit size parameter instead ``

  1. sizeof(arr[0]) (element size) is allowed — the subscript dereference produces the element type, not the array
  2. sizeof(*arr) is allowed — same reason

Architecture fit

Why this matters

This is possibly the most common C bug that compilers don't warn about loudly enough. Stack Overflow has thousands of questions about it. It causes buffer overflows, truncated reads, and wrong-size allocations — all silently.

---

6. Mandatory Control-Flow Braces (-fmandate-braces)

Problem: Braceless if/for/while bodies are the root cause of the Apple goto fail vulnerability (CVE-2014-1266). A developer adds a second indented statement expecting it to belong to the if, but it executes unconditionally.

if (condition)
    check_something();
    do_critical_thing();   // always executes — indentation is a lie

Design: When enabled, any braceless control-flow body is a hard error.

Implementation

Prism already tracks braceless control flow via ctrl_state.pending (set by TT_IF, TT_LOOP, TT_SWITCH). When the flag is enabled and ctrl_state.pending is true, the next non-noise token must be {. If it isn't:

error: braceless control flow is forbidden (-fmandate-braces);
       wrap statement in { }

Exceptions

Architecture fit

ctrl_state.pending already exists in Pass 2. The check is a 4-line gate on the existing code path that injects braces for braceless bodies.

---

7. Strict Implicit Fallthrough Ban (-fno-fallthrough)

Problem: Missing break in switch cases causes silent execution bleed-through. This is one of the most common C bugs — CWE-484 (Omitted Break Statement in Switch). GCC/Clang have -Wimplicit-fallthrough but it's not universally in -Wall, and MSVC lacks it entirely.

switch (state) {
    case INIT:
        start_engine();
        // forgot break — falls through silently

    case RUNNING:
        update_engine();   // executes when state == INIT too

        break;
}

Design: When enabled, every case/default label must be preceded by a terminating statement (break, return, continue, goto, _Noreturn function call) or be an empty fallthrough (case X: case Y:). The C23 [[fallthrough]] attribute explicitly opts into intentional fallthrough.

Detection

Phase 1D already tracks P1K_CASE and P1K_DEFAULT. On encountering a new case/default:

  1. Scan backward from the : to find the previous statement's terminator
  2. Skip over nested { } blocks when scanning (a return inside a nested block within the case counts)
  3. If no terminator found and the case is non-empty (has statement-producing tokens), error:

`` error: implicit fallthrough from 'case INIT' to 'case RUNNING' (-fno-fallthrough); add 'break;' or '[[fallthrough]];' if intentional ``

Allowed patterns

case 1: case 2:         // OK: empty fallthrough (grouping cases)

    handle_both();
    break;

case 3:
    handle_three();
    [[fallthrough]];     // OK: explicit annotation

case 4:
    handle_four();
    break;

case 5:
    return;              // OK: return terminates


case 6: {
    if (x) return;
    break;               // OK: break inside nested block

}

Architecture fit

Phase 1D tracks case/default positions. The backward scan for terminators is the same kind of look-behind Prism already does for defer shadow checking and label resolution. The [[fallthrough]] attribute detection reuses existing C23 attribute recognition.

---

8. Forward-Only goto Enforcement (-fstrict-goto)

Problem: Backward goto creates unstructured loops that defeat human comprehension, static analysis, and code review. In modern C, goto is considered acceptable only for forward jumps to cleanup labels. Backward goto is spaghetti code — use a real loop construct.

retry:
    result = try_operation();
    if (result == RETRY)
        goto retry;        // ERROR: backward goto — use while/for loop

Design: When enabled, any goto that jumps to a label appearing earlier in the function is a hard error.

Implementation

The CFG verifier (p1_verify_cfg) already computes the topological direction of every goto relative to its label. A backward goto is one where the label index li satisfies li &lt; goto_index. When the flag is enabled:

error: backward goto to 'retry' is forbidden (-fstrict-goto);
       use a loop construct (while, for, do-while)

What remains allowed

    if (init_failed)
        goto cleanup;      // OK: forward goto to cleanup


    // ... normal code ...


cleanup:
    free(resources);
    return -1;

Architecture fit

Zero new scanning logic required. The label direction check already exists in p1_verify_cfg for VLA scope validation. The flag simply converts a "this is a backward goto" fact that Prism already knows into a hard error.

---

9. Auto-Static Constant Arrays (-fauto-static)

Problem: A common pattern in parsers, cryptography, and state machines is declaring a local const array initialized with literals (e.g., const uint32_t K[64] = { 0x428a2f98, ... };). Because it's a local variable, the C standard requires it to be instantiated on the stack. The compiler emits a hidden O(N) memcpy from .rodata to the stack on every function call. Compilers often refuse to optimize this to a static reference when the array is passed to an opaque function (aliasing/mutation fears).

Design: Automatically inject static into local const array declarations whose initializer consists strictly of compile-time constants.

Usage

// Source:

void hash_block(uint8_t *data) {
    const uint32_t K[64] = { 0x428a2f98, 0x71374491, /* ... */ };
    transform(data, K);
}

// Emitted:

void hash_block(uint8_t *data) {
    static const uint32_t K[64] = { 0x428a2f98, 0x71374491, /* ... */ };
    transform(data, K);
}

Detection rules

  1. Declaration is a local array (decl.is_array at brace_depth &gt; 0)
  2. Type has const qualifier
  3. Initializer { ... } contains only TK_NUM, TK_STR, punctuation (,, {, }), and sign operators (-, +). No identifiers, no function calls, no casts.
  4. Not already static

On match: inject static before the type specifier.

Safety

100% semantics-preserving. Mutating a const array is Undefined Behavior — so sharing a single .rodata instance across all stack frames is identical to per-call stack copies. The only observable difference is address identity (&amp;K returns the same address across calls), but comparing addresses of local const arrays is pathological and not a realistic concern.

Architecture fit

Impact

Eliminates hidden memcpy calls on every invocation of functions with large constant tables. Particularly impactful for:

---

10. Bounds Checking (-fbounds-check)

Problem: Buffer overflows from unchecked array subscripts are the #1 exploited vulnerability class in C (CWE-787, CWE-125). This is the single biggest argument for Rust over C. Rust inserts a runtime bounds check on every vec[i] and slice[i] — if the index is out of range, the program panics instead of silently corrupting memory or leaking secrets.

C has no equivalent. ASAN catches these at runtime with heavy shadow-memory instrumentation. Static analyzers find some at compile time. Neither is on by default. Most C code ships with zero bounds protection.

Design: Prism instruments array subscript accesses with lightweight runtime bounds checks. The check fires before the access, trapping on out-of-bounds instead of silently corrupting. Three tiers of coverage, from fully automatic to annotation-driven.

Tier 1: Fixed-size local arrays (automatic)

No annotation needed. Prism uses C's own sizeof operator to derive the array length — no transpile-time size evaluation, no stored constants.

// Source:

void process(void) {
    int arr[100];
    arr[i] = 5;
    int x = arr[j];
}

// Emitted:

void process(void) {
    int arr[100];
    arr[__prism_bchk((size_t)(i), sizeof(arr)/sizeof(arr[0]), "arr", __FILE__, __LINE__)] = 5;
    int x = arr[__prism_bchk((size_t)(j), sizeof(arr)/sizeof(arr[0]), "arr", __FILE__, __LINE__)];
}

sizeof(arr)/sizeof(arr[0]) is a compile-time constant for fixed arrays — the compiler folds sizeof(int[100])/sizeof(int)100 and the bounds check against a constant is trivially optimizable. For arr[5] with size 100, the entire check is dead-code-eliminated.

The (size_t) cast on the index handles negative indices correctly — they wrap to huge positive values, which are >= len, triggering the trap.

Tier 2: VLAs (automatic)

Same mechanism, same sizeof trick. C99 §6.5.3.4 guarantees sizeof evaluates VLAs at runtime, so sizeof(arr)/sizeof(arr[0]) returns the correct runtime length with no extra bookkeeping.

// Source:

void process(int n) {
    int arr[n];
    arr[i] = 5;
}

// Emitted:

void process(int n) {
    int arr[n];
    arr[__prism_bchk((size_t)(i), sizeof(arr)/sizeof(arr[0]), "arr", __FILE__, __LINE__)] = 5;
}

This is the key insight from using sizeof: Tiers 1 and 2 use identical emission. Prism doesn't need to distinguish fixed arrays from VLAs at the check site — sizeof handles both uniformly. The only thing Prism needs to know is "this identifier is a local array" (already tracked in the typedef table via is_array).

Tier 3: Function parameters (annotation)

Array parameters decay to pointers — the size is lost. The developer annotates the bound:

// Source:

void fill(int arr[bounds(n)], size_t n) {
    for (size_t i = 0; i < n; i++)
        arr[i] = 0;
    arr[n] = 0;  // BUG: caught at runtime

}

// Emitted (annotation stripped):

void fill(int *arr, size_t n) {
    for (size_t i = 0; i < n; i++)
        arr[__prism_bchk(i, n, "arr", __FILE__, __LINE__)] = 0;
    arr[__prism_bchk(n, n, "arr", __FILE__, __LINE__)] = 0;  // TRAP

}

The bounds(expr) annotation lives inside the array brackets — valid declarator position, stripped from emitted C. The expr is any expression visible at the function scope (typically a size parameter).

C99 [static N] is also recognized for constant bounds:

void process(int arr[static 10]) {
    arr[9] = 1;   // OK

    arr[10] = 1;  // TRAP

}

The __prism_bchk wrapper

static inline size_t __prism_bchk(size_t idx, size_t len,
        const char *name, const char *file, int line) {
    if (__builtin_expect(idx >= len, 0)) {
        fprintf(stderr, "%s:%d: index %zu out of bounds for '%s' (size %zu)\n",
                file, line, idx, name, len);
        __builtin_trap();
    }
    return idx;
}

Why an inline wrapper, not a macro? Single evaluation of idx. No double-eval bugs. The compiler inlines it and eliminates the check entirely when it can prove the index is in range (e.g., arr[0] where size > 0). __builtin_expect marks the failure path as cold — zero branch-prediction penalty on the hot path.

Return type: size_t. The wrapper replaces the original index at the subscript site: arr[expr]arr[__prism_bchk(expr, ...)]. This is type-safe because C array subscript accepts any integer type.

What gets checked

PatternChecked?Why
arr[i]YesDirect subscript on tracked array
arr[i][j]Both dimsEach [ is a separate check against its dimension
arr[i].fieldi checkedSubscript followed by member access
arr[f()]RejectedSide-effectful index — same rejection as min/max
arr[i++]RejectedSide-effectful index
sizeof(arr[0])Nosizeof doesn't evaluate
&amp;arr[i]YesOOB address formation is UB
p[i] where p = arrNoBounds lost at pointer assignment
*(arr + i)No (v1)Pointer arithmetic — future tier

Side-effect rejection in indices

Indices with side effects are rejected at compile time:

error: bounds-checked subscript 'arr[f()]' has side effects in index;
       hoist to a temporary: size_t tmp = f(); arr[tmp]

This reuses reject_orelse_side_effects — the same scanner used for min/max/clamp and bare orelse. The check fires on ++, --, =, +=, and ident( (function calls).

Opting out: raw blocks

For performance-critical inner loops where the developer has already validated bounds, suppress checking with raw:

int arr[1024];
// ... validate that 0 <= lo && hi <= 1024 ...


raw {
    for (int i = lo; i < hi; i++)
        arr[i] = 0;  // no bounds check — raw block

}

raw already suppresses Prism transformations (zero-init, orelse, defer emit). Extending it to suppress bounds checks is natural and consistent.

Bounds table

For Tier 1/2 (local arrays), no dedicated bounds table is needed. The existing typedef table already tracks is_array per identifier. At emit time, Prism sees TK_IDENT + [, looks up the typedef entry, and if is_array is set, wraps the subscript with __prism_bchk using sizeof(ident)/sizeof(ident[0]). Zero new infrastructure.

Tier 3 (annotated parameters) does need per-parameter tracking for the bounds(expr) size expression:

BoundsParamEntry {
    Token *name;           // parameter identifier token

    Token *size_start;     // first token of bounds(expr)

    Token *size_end;       // last token of bounds(expr)

    uint32_t scope_open;   // function body '{' token index

    uint32_t scope_close;  // function body '}' token index

    uint8_t ndim;          // number of dimensions

    bool is_param : 1;     // always true for this table

}

Registration happens in process_declarators when a parameter has bounds(...) annotation. Lookup happens in the Pass 2 emit loop when emitting TK_IDENT + [.

Multi-dimensional arrays

sizeof scales naturally to multi-dimensional arrays using C's type system:

int matrix[3][4];
matrix[i][j] = 1;

// Emitted:

matrix[__prism_bchk((size_t)(i), sizeof(matrix)/sizeof(matrix[0]), "matrix", __FILE__, __LINE__)]
       [__prism_bchk((size_t)(j), sizeof(matrix[0])/sizeof(matrix[0][0]), "matrix", __FILE__, __LINE__)] = 1;

sizeof(matrix)/sizeof(matrix[0]) → 3 (first dimension). sizeof(matrix[0])/sizeof(matrix[0][0]) → 4 (second dimension). Each dimension's check uses the appropriate sizeof ratio. The compiler constant-folds all of these.

For the Nth subscript on identifier arr, Prism emits sizeof(arr[0]...[0])/sizeof(arr[0]...[0]) with N-1 and N zero-subscripts respectively. This is mechanical token emission — no evaluation needed.

Struct member arrays

struct Packet {
    uint8_t data[1500];
    int len;
};

void parse(struct Packet *p) {
    p->data[i] = 0;  // Can Prism check this?

}

Tier 1 covers local struct instances: struct Packet pkt; pkt.data[i] — Prism knows data is uint8_t[1500] from the struct definition.

Pointer-to-struct (p-&gt;data[i]) requires struct field size tracking — feasible but heavier. Deferred to a later phase.

Comparison with Rust

RustPrism -fbounds-check
Array subscriptRuntime panicRuntime trap
DefaultAlways onOpt-in flag
Opt-out.get_unchecked() (unsafe)raw { } block
SlicesNative &amp;[T] fat pointerTier 3 bounds(n) annotation
Pointer arithmeticNo raw pointer deref outside unsafeNot checked (v1)
CostOne branch per subscriptSame
Overhead~0 with branch predictionSame

The key insight: Rust's bounds safety is primarily a runtime mechanism, not a compile-time one. The borrow checker handles lifetimes (use-after-free, double-free), but bounds checking is a simple runtime comparison. Prism can match Rust's bounds safety with zero syntax overhead — the same C code, with a flag.

Performance

Modern CPUs predict the "not-taken" branch (the trap path) with near-100% accuracy. The bounds check costs one comparison and one predicted-not-taken branch per subscript — typically < 1 cycle. In practice, enabling bounds checking adds 2-5% overhead across a full program. Disabling for hot inner loops via raw brings this to near-zero for compute-intensive workloads.

For production use: leave it on. The 2-5% overhead is dwarfed by the cost of a single buffer overflow vulnerability.

For debug/CI: mandatory. Catches OOB bugs that ASAN would catch, at a fraction of the memory and CPU overhead.

Why not slice(T) fat pointers?

An alternative design (inspired by Rust's &amp;[T]) would introduce a slice(T) type that bundles a pointer with its length — a "fat pointer" that carries bounds across function boundaries.

This is a non-starter for Prism:

The bounds(n) annotation is superior: the function signature stays int *arr in emitted C. The ABI doesn't change. Existing C code calls the function without modification. Only the function body gets bounds checks injected.

Architecture fit

Future tiers

---

12. Orelse Postcondition Injection

Problem: After int *p = malloc(100) orelse default_buf;, the developer knows p is guaranteed non-null (orelse provides a fallback). But the backend compiler doesn't — it sees a ternary expression and can't prove the result is non-null. Every subsequent if (!p) check and null-pointer sanitizer branch is wasted.

Design: After orelse expansion, inject __builtin_assume(result != 0) to communicate the postcondition to the backend.

Usage

// Source:

int *p = malloc(100) orelse (int *)fallback_buf;

// Current emission (simplified):

int *p = (malloc(100)) ? (malloc(100)) : ((int *)fallback_buf);
// (actual emission uses a temp to avoid double-eval)


// With postcondition:

int *__prism_tmp = malloc(100);
int *p = __prism_tmp ? __prism_tmp : (int *)fallback_buf;
__builtin_assume(p != ((void*)0));

What the backend gains

Scope

Only inject when the orelse fallback is a non-null expression:

Skip injection when the fallback could itself be null/zero (e.g., orelse other_ptr where other_ptr could be null).

Why only Prism can do this

The compiler sees a ternary — it doesn't know the developer's intent was "guarantee a valid fallback." Prism understands the orelse contract: "if the LHS evaluates to a falsy value, substitute the RHS." If the RHS is a non-zero constant or known-valid address, the result is guaranteed non-zero.

Architecture fit

---

13. Const-to-Literal VLA Demotion

Problem: In C (unlike C++), const int N = 10; does not create a constant expression — N is a variable with a const qualifier. Using it as an array dimension creates a VLA, which forces the compiler to dedicate the frame pointer register, emit alloca-style allocation, and generate VLA cleanup code. This is a well-known C/C++ gap that bites every C developer who writes:

void process(void) {
    const int N = 10;
    int arr[N];           // VLA in C, fixed array in C++

}

Design: When an array dimension is a single identifier that resolves to a const-qualified local initialized with a compile-time constant literal, substitute the literal value at the array declaration.

Usage

// Source:

void process(void) {
    const int N = 10;
    int arr[N];
}

// Emitted:

void process(void) {
    const int N = 10;
    int arr[10];          // fixed array — no VLA overhead

}

Detection rules

  1. Array dimension is a single TK_IDENT (no operators, no function calls)
  2. Identifier resolves in the typedef table to a local variable with is_const = true
  3. The variable's initializer is a single TK_NUM literal (no expressions, no identifiers)
  4. The variable is declared in the same scope or an enclosing scope

On match: substitute the TK_NUM value for the TK_IDENT in the dimension.

What this eliminates

Edge cases

Architecture fit

---

  1. Out-of-Line Assembly Extraction (-fnaked-asm)

Problem: Standard C inline assembly (__asm__ volatile (...)) is the least portable feature in the C ecosystem. It blinds the compiler's optimizer, requires complex register constraint boilerplate, and fragments codebases between AT&T syntax (GCC/Clang) and Intel syntax (MSVC). Maintaining separate assembly files and C header definitions to bypass this is a massive boilerplate tax.

Design: Introduce a naked_asm block. Prism extracts the raw assembly strings, automatically generates a parallel assembly file with the correct directives for the target compiler's native assembler (GNU AS or MASM), and replaces the block in the C output with a clean extern prototype.

Usage C // User's pure C file (main.c):

#include <stdio.h>

naked_asm void my_custom_dispatcher(void) { "add spl, 8\n" "jmp qword ptr [rsp]\n" }

int main(void) { my_custom_dispatcher(); return 0; } Prism's Dual Output

  1. Emitted C File (main.c.tmp):

Prism completely strips the assembly block and replaces it with a standard ISO C forward declaration. This guarantees 100% portability to any C compiler. The backend compiler treats it as an opaque external function, preventing it from flushing registers to RAM or panicking about clobbers.

C

#include <stdio.h>

extern void my_custom_dispatcher(void);

int main(void) { my_custom_dispatcher(); return 0; }

  1. Synthesized Assembly File (Backend-Aware):

Prism uses its existing compiler detection (cc_is_msvc, cc_is_clang) to wrap the exact strings the user wrote in the correct native assembler directives.

If targeting GCC / Clang / TCC (Generates prism_extracted.S):

Code snippet

#if defined(__APPLE__) #define SYM(x) _##x

#else #define SYM(x) x

#endif

.intel_syntax noprefix .text .globl SYM(my_custom_dispatcher) SYM(my_custom_dispatcher): add spl, 8 jmp qword ptr [rsp] If targeting MSVC (Generates prism_extracted.asm):

Code snippet PUBLIC my_custom_dispatcher .code my_custom_dispatcher PROC add spl, 8 jmp qword ptr [rsp] my_custom_dispatcher ENDP END Architecture Fit Detection: In Pass 2, when the naked_asm keyword is encountered, Prism parses the function signature, then loops through the { ... } block capturing all TK_STR (string literal) tokens.

Dual-Stream Emission: Prism introduces an asm_fp alongside the standard out_fp. The function signature is emitted to out_fp as extern ... ;, and the strings are stripped of their quotes and written directly to asm_fp.

The Pipeline: Prism's compile_sources already knows how to invoke the backend compiler with multiple files. If asm_fp has content, Prism dumps it to a temporary .S or .asm file and appends it to the compile_argv array.

Universal Build Routing: * On GCC/Clang, gcc main.c prism_extracted.S delegates to GNU AS.

On MSVC, Prism automatically invokes ml64.exe (MASM) on the .asm file to produce an .obj, then passes that object file to cl.exe alongside the C source.

Core Directives Maintained This respects Prism's core architectural constraint: We do not parse expressions. Prism doesn't need an x86 opcode table. It treats the assembly exactly as it treats C: as raw tokens to be macro-structurally reorganized and forwarded to the appropriate backend tool.

Impact The developer gets the absolute raw power and zero-overhead of bare-metal Intel assembly, but the developer experience is as seamless as writing a standard C function in a single file.

No .h header files to keep in sync.

100% portability across any C compiler (GCC, Clang, MSVC, TCC), because the C compiler only ever sees ISO C.

The agonizing assembler directive fragmentation (.globl vs PUBLIC, .text vs .code) is completely absorbed by the transpiler.

---

Priority Assessment

FeatureBug severityArch fitEffortPriority
Defer 2.0 (channels + goto-patch)High (resource leaks + icache bloat)High (extends existing defer infra)Medium1
Bounds checkingCritical (CWE-787/125, #1 exploit class)High (declaration scanner + emit loop)Medium2
Taint qualifiersCritical (security)High (extends existing taint infra)Medium3
min/max/clampMedium (double-eval bugs)High (reuses orelse scanner)Low4
sizeof decay checkMedium (silent wrong results)High (trivial token pattern)Very low5
Mandatory bracesMedium (CVE-2014-1266 class)High (ctrl_state exists)Very low6
Fallthrough banMedium (CWE-484 class)High (Phase 1D case tracking)Low7
Forward-only gotoLow (code quality)High (CFG verifier has it)Near zero8
Attribute normalizationNone (convenience)High (trivial)Low9
Auto-static const arraysHigh (eliminates hidden memcpy)Very high (trivial token scan)Low10
Unreachable after noreturn~~Medium~~~~Very high~~~~Near zero~~DONE
Orelse postconditionLow (missed optimizations)Very high (orelse semantics)Near zero12
Const-to-literal VLA demotionMedium (wastes frame pointer GPR)Very high (typedef table lookup)Very low13