class NoirAIContext::Builder

Overview

Builds an AIContext for each endpoint by running every populate step (route / technology / method / param / tag / callee / source scan / guard-absence / combination signals) over the endpoint and its source. Pattern detection is delegated to PatternMatcher and source/snippet reading to a per-run SourceReader (file cache).

Defined in:

ai_context/builder.cr

Constant Summary

ACCOUNT_NAMESPACE_SEGMENT_PATTERN = /(?:^|\/)(user|users|account|accounts|auth)(?:\/|$)/i
ACCOUNT_REGISTER_SEGMENT_PATTERN = /(?:^|\/)(register|sign[_-]?up|signup)(?:\/|$)/i
AUTH_LIFECYCLE_SEGMENT_PATTERN = /(?:^|\/)(login|log[_-]?in|logout|log[_-]?out|authenticate|refresh[_-]?token|verify|verification|reset[_-]?password|forgot[_-]?password|password[_-]?reset)(?:\/|$)/i
AUTH_NAMESPACE_PATTERN = /(?:^|\/)auth(?:\/|$)/i
BODY_LIKE_PARAM_TYPES = Set {"json", "form"}
CALLEE_SNIPPET_RADIUS = 1
CREDENTIAL_KEY_IN_RESPONSE = /[^"'\w](password|passwd|token|secret|api[_-]?key|session_id|access_token|refresh_token|private_key)\s*:/i
CREDENTIAL_SOURCE_PATTERNS = [/(?:const|let|var)\s*\{[^}]*\b(password|passwd|token|secret|api[_-]?key|jwt|bearer)\b[^}]*\}\s*=\s*req\.body/i, /\breq\.body\.(password|passwd|token|secret|api[_-]?key|jwt|bearer)\b/i, /\brequest\.(form|json|data)\[\s*['"](password|passwd|token|secret|api[_-]?key|jwt|bearer)['"]/i, /\b(payload|credentials|creds|input|body)\.(password|passwd|token|secret|api[_-]?key|jwt|bearer)\b/i, /\b(?:r|req|request)\.(?:FormValue|PostFormValue)\s*\(\s*['"](password|passwd|token|secret|api[_-]?key|jwt|bearer)['"]/i, /\b(?:c|ctx|context)\.(?:PostForm|DefaultPostForm|FormValue|QueryParam)\s*\(\s*['"](password|passwd|token|secret|api[_-]?key|jwt|bearer)['"]/i, /\b(?:context\.Request|HttpContext\.Request|Request)\.(?:Form|Query|Headers|Cookies)\s*\[\s*['"](password|passwd|token|secret|api[_-]?key|authorization|jwt|bearer)['"]/i, /\b(password|passwd|token|secret|api[_-]?key|jwt|bearer)\s*=\s*req\./i]

Some analyzers (express's loose destructuring, Java route extractors that don't follow @RequestBody Credentials c.password, …) miss credential-bearing parameters at extract time. When that happens the param-based credential_input signal never fires and downstream heuristics like rate_limit_absence silently skip the route.

As a backstop, scan the route-scope snippet for the canonical request-body destructuring shapes (req.body.password, request.form['password'], { password } = req.body, …) and emit a credential_input signal with a slightly lower confidence than the param-based one. The kind is the same so downstream rate_limit_absence / guard_absence logic catches it transparently.

FOREIGN_IDENTIFIER_ASSIGNMENT_PATTERN = /\b([A-Za-z_][A-Za-z0-9_]*Id)\s*=\s*(?:[A-Za-z_][A-Za-z0-9_]*\.)?([A-Za-z_][A-Za-z0-9_]*Id)\b/
GRAPHQL_OPERATION_ROOTS = ["Query", "Mutation", "Subscription"]
GRAPHQL_RESOLVER_ANNOTATION_PATTERN = /@(QueryMapping|MutationMapping|SchemaMapping|SubscriptionMapping|BatchMapping|DgsQuery|DgsMutation|DgsData|DgsSubscription)\b/
KOTLIN_COLLECTION_ID_LOOKUP_PATTERN = /\b([A-Za-z_][A-Za-z0-9_]*)\s*\.\s*(firstOrNull|find|singleOrNull)\s*\{[^}]*\b(?:it|[A-Za-z_][A-Za-z0-9_]*)\.(?:id|[A-Za-z_][A-Za-z0-9_]*Id)\s*==/i
KOTLIN_COLLECTION_ID_LOOKUP_THROWS_PATTERN = /\b(?:firstOrNull|find|singleOrNull)\s*\{[^}]*\b(?:id|[A-Za-z_][A-Za-z0-9_]*Id)\b[^}]*\}[\s\S]{0,180}(?:\?:\s*throw|\borElseThrow\b|\bthrow\s+\w*(?:NotFound|NotExist|Missing)\b)/i
KOTLIN_CREDENTIAL_RETURN_PATTERN = /\bfun\s+\w+[^{|;]*=\s*(?:this\.)?(password|passwd|token|secret|api[_-]?key|session_?id|access_?token|refresh_?token|private_?key)\b/i
KOTLIN_WRITE_IN_SNIPPET_PATTERN = /\b(?:save|insert|persist|update|create|add|addAll)\s*\(|\.\s*(?:save|insert|persist|update|add|addAll)\s*\(/i
LOG_EMITTER_PATTERN = /\b(?:logger\.(?:info|warn|warning|error|debug|critical|fatal)|log\.(?:info|warn|warning|error|debug)|console\.(?:log|info|warn|error|debug)|print|puts|printf|System\.out\.println|Log\.[dwiev])\b/i

Log injection / sensitive data in logs. Catches the canonical "log user input or credential field directly" shape. False positives are bounded because both the log call AND a user- input reference (or credential noun) need to be on the same line / snippet window.

LOG_INPUT_OR_CRED_PATTERN = /\b(?:req\.body|req\.query|req\.params|request\.form|request\.json|request\.args|params\[|password|passwd|token|secret|api[_-]?key|jwt|session_id|access_token|refresh_token)\b/i
METHOD_DISPATCH_PATTERN = /(?:request\.method\s*==|req\.method\s*==|r\.Method\s*==|request\.getMethod\(\)\s*\.equals|\.match\(\s*['"](?:GET|POST|PUT|PATCH|DELETE)['"])/i

Suppression for unsafe_method: route handlers that branch on request.method (Flask / Django) or req.method (Express, Node, Go's r.Method, Java's request.getMethod()) typically register a single route under multiple methods. Analyzers split that into one endpoint per method but share the callees list. The mutating callee in the GET endpoint is often the POST branch's call, not actually reachable via GET.

MOBILE_SOURCE_EXTS = Set {".swift", ".m", ".mm", ".kt", ".java"}
MUTATING_CALLEE_PATTERN = /\b(create|destroy|delete|update|save|insert|remove|drop|truncate|persist|flush|commit|rollback|set_)\w*/i

Each verb may be followed by additional word chars (destroy_all, createMany, deleteOne, updateUser), so we don't anchor with a trailing \b — that would miss those suffixed forms because _ and word continuation count as the same word in regex.

MUTATING_POST_CALLEE_PATTERN = /(?:\A|[.:_])(?:create|save|update|delete|insert|remove|destroy|modify|persist|revoke|store)|(?:Create|Save|Update|Delete|Insert|Remove|Destroy|Modify|Persist|Revoke|Store)/

A mutating verb anywhere in the callee name (start, after a separator, or as a CamelCase segment) disqualifies the endpoint from "read-only POST". Without this, a callee like getOrCreate/findAndDelete matches the read-only pattern via its leading read verb, silently suppressing the state-change review signal on an endpoint that does mutate.

OBJECT_LOOKUP_FALLBACK_CALLEE_PATTERN = /(?:^|\.)(?:deleteById|removeById|get\w*ById|retrieve\w*)\b/i
OBJECT_LOOKUP_PRIMARY_CALLEE_PATTERN = /(?:^|\.)(?:findById|findOne|getOne|existsById|find\w*ById|find\w*By\w*Id|(?:find|count|exists)By\w*Id)\b/i
PRIORITY_SCORING_SINK_BLACKLIST = Set {"sql", "data_store_query", "command_exec", "code_eval", "deserialization", "template_injection", "xss", "mass_assignment", "crypto_weak", "webview_load", "intent_redirect"}

Categories whose mere presence is a security-review signal — used alongside concrete signals to compute the overall priority bucket.

READ_ONLY_POST_CALLEE_PATTERN = /(?:^|[.:])(?:find|findAll|findOne|findBy\w+|get|list|listAll|count|search|query|retrieve)\w*\b/i
REQUEST_COOKIE_READ_PATTERN = /\b[A-Za-z_][A-Za-z0-9_]*\s*\.\s*(?:cookies\b|getCookies\s*\()/i
RESPONSE_EMITTER_PATTERN = /\b(jsonify|res\.json|json_response|JsonResponse|render\s+json:|to_json|respond_with)\b/i

Sensitive-response detection runs as a two-step check on the route-scope snippet:

  1. RESPONSE_EMITTER_PATTERN — the snippet calls a response- serializing helper (res.json, jsonify, render json:, to_json, JsonResponse, …).
  2. CREDENTIAL_KEY_IN_RESPONSE — the snippet also has a credential noun as a key (followed by :, and not preceded by a quote or word character — so the noun appearing inside a string value like { message: "Set X-API-KEY header" } doesn't fire).

Both have to match in the same scope. The earlier single-regex version was too loose and caught english prose mentioning credentials in response strings.

REVIEW_PRIORITY_SIGNAL_KINDS = Set {"guard_absence", "authz_absence", "rate_limit_absence", "idor_review", "csrf_exempt", "jwt_unsafe", "cors_open", "open_redirect", "ssrf", "path_traversal", "sensitive_response", "server_secret_source", "unsafe_method", "log_injection", "deep_link_input"}

Concrete review-worthy signal kinds. These are the "this is actually scary" categories — not the structural ones (route_definition, technology, path_param) that every endpoint surfaces.

ROUTE_SNIPPET_RADIUS = 2
SAFE_METHODS = Set {"GET", "HEAD", "OPTIONS"}

HTTP method intent vs implementation mismatch. A GET/HEAD endpoint that mutates server state through a callee (User.create, record.destroy, db.delete, …) is a textbook CSRF / side-effect-on-read bug — the verb says "safe / idempotent" but the code says otherwise.

SOURCE_SCAN_RADIUS = 6
SPRING_CONFIG_KEY_PATTERN = /\$\{([^}:]+)/
SPRING_MVC_MAPPING_ANNOTATION = /@(GetMapping|PostMapping|PutMapping|PatchMapping|DeleteMapping|RequestMapping)\b/
SPRING_MVC_VIEW_EXPR = /\bfun\b[^{\n]*:\s*String\b[^{\n]*=\s*["']([^"']+)["']/
SPRING_MVC_VIEW_RETURN = /\breturn\s+["']([^"']+)["']/
SPRING_SECRET_NAME_PATTERN = /\b(pass(word)?|secret|token|api[_.-]?key|credential|jwt|private[_.-]?key)\b/i
SPRING_VALUE_ANNOTATION_PATTERN = /@Value\s*\(([^)]*)\)/
STATE_CHANGING_METHODS = Set {"POST", "PUT", "PATCH", "DELETE"}

Constructors

Instance Method Summary

Constructor Detail

def self.new #

[View source]

Instance Method Detail

def apply(endpoints : Array(Endpoint)) : Array(Endpoint) #

[View source]