module Noir::ZigCalleeExtractor

Overview

Shared structural helper for the Zig framework analyzers (jetzig, zap, httpz, tokamak). Zig has no vendored tree-sitter grammar in noir, so the analyzers lean on this hand-rolled miniparser for two jobs:

The scanners blank out comments and string/char literals first (#strip_non_code) so a call-shaped token inside a doc-string or a // comment is never surfaced as a phantom callee, and they address the source through an Array(Char) (O(1) indexing) so a single non-ASCII byte anywhere in the file can't turn the per-character loops into O(n²).

Extended Modules

Defined in:

miniparsers/zig_callee_extractor.cr

Constant Summary

CALL_REGEX = /(?<![A-Za-z0-9_.@])(@?[A-Za-z_][A-Za-z0-9_]*(?:\s*\.\s*[A-Za-z_][A-Za-z0-9_]*)*)\s*\(/

A call expression: an optional @ (builtin) + identifier, then any number of .identifier accessors, immediately followed by (. The lookbehind stops the match from starting in the middle of an identifier or chain, so foo.bar( yields foo.bar once, not also bar.

FUNCTION_REGEX = /(?:^|[^A-Za-z0-9_.])fn\s+([A-Za-z_][A-Za-z0-9_]*)\s*\(/

(?:pub )? (modifiers)* fn name ( — captures the function name. The extern "C" calling-convention string is already blanked by #strip_non_code, so only the keyword spacing has to be tolerated.

KEYWORDS = Set {"addrspace", "align", "allowzero", "and", "anyframe", "anytype", "asm", "async", "await", "break", "callconv", "catch", "comptime", "const", "continue", "defer", "else", "enum", "errdefer", "error", "export", "extern", "fn", "for", "if", "inline", "noalias", "noinline", "nosuspend", "opaque", "or", "orelse", "packed", "pub", "resume", "return", "linksection", "struct", "suspend", "switch", "test", "threadlocal", "try", "union", "unreachable", "usingnamespace", "var", "volatile", "while"}

Zig keywords that are followed by ( in normal code (if (…), while (…), switch (…), catch (…)) and would otherwise be reported as callees. fn/return/try etc. never precede a call paren directly but are kept here for clarity.

NOISE_ROOTS = Set {"std"}

Receiver roots whose calls are pure noise for endpoint review context. std.* (std.debug.print, std.mem.eql, std.fmt.*) appears in nearly every handler and drowns the meaningful callees.

TEST_BLOCK_RE = /(?:^|[^A-Za-z0-9_.])test\s*(?:"(?:[^"\\]|\\.)*"\s*)?\{/

test { … } / test "name" { … } block opener. Route registrations inside a test block are unit-test fixtures — and, in a framework's own source vendored as a loose file (modules/httpz.zig, modules/router.zig), its self-tests — never runtime endpoints.

VENDORED_FRAMEWORK_RE = /\/(?:deps|dep|lib|libs|vendor|vendored|pkg|pkgs|zig-pkg|packages|third_party|third-party|modules|subprojects|external|\.deps)\/(?:zap|httpz|http\.zig|tokamak|jetzig|zmpl|zmd)\//

A .zig file that belongs to a vendored copy of a framework checked into the source tree (zig-pkg/zap/…, src/deps/tokamak/…) rather than to the application. Such trees ship the framework's own tests and examples (zap/src/tests/test_auth.zig's .path = "/test", tokamak/example/'s @"GET /:name"), whose route literals would otherwise surface as phantom app endpoints. The standard fetched-dependency cache (.zig-cache) is already pruned upstream; this matches the manual-vendoring layouts — a vendor directory immediately followed by a framework package directory.

Instance Method Summary

Instance Method Detail

def attach_to(endpoint : Endpoint, callees : Array(Entry)) #

[View source]
def callees_for_body(body : String, file_path : String, start_line : Int32) : Array(Entry) #

[View source]
def find_matching(chars : Array(Char), open_index : Int32, open : Char, close : Char) : Int32 | Nil #

Find the index of the delimiter matching the opener at open_index. Returns nil on imbalance. Operates on the already-stripped char array so braces inside strings/comments are gone.


[View source]
def function_bodies(source : String, file_path : String) : Hash(String, FunctionBody) #

Convenience map keyed by simple function name. When two functions share a name (e.g. several zap endpoint structs each defining get) the first wins; callers that need struct scoping should filter #function_table by offset instead.


[View source]
def function_table(source : String, file_path : String) : Array(FunctionInfo) #

[View source]
def in_test_block?(offset : Int32, ranges : Array(Tuple(Int32, Int32))) : Bool #

[View source]
def line_at(chars : Array(Char), offset : Int32) : Int32 #

[View source]
def strip_comments(source : String) : String #

Like #strip_non_code but keeps the contents of double-quoted string literals — route paths live inside "…", so the framework analyzers scan this form to read the URL while still ignoring routes that sit in a comment, a \\ multiline doc-string, or a char literal.


[View source]
def strip_comments(chars : Array(Char)) : Array(Char) #

[View source]
def strip_non_code(source : String) : String #

[View source]
def strip_non_code(chars : Array(Char)) : Array(Char) #

[View source]
def test_block_ranges(stripped : String) : Array(Tuple(Int32, Int32)) #

Byte ranges of test blocks, brace-matched on the string-blanked source so a {/} inside a literal can't throw the matching off. #in_test_block? then lets an analyzer drop a route whose registration sits inside one.


[View source]
def vendored_framework_path?(path : String) : Bool #

[View source]