module NoirPassiveScan::FalsePositive

Overview

Heuristics that drop passive-scan results which trip a keyword/regex matcher but are demonstrably not hard-coded secrets — chiefly runtime indirections (environment-variable reads, CI templating expressions) where the real value lives outside the source tree.

Guiding invariant: never hide a real literal. Every form matched here is one that cannot carry a checked-in secret — a ${{ secrets.FOO }} reference, an os.getenv("FOO") read, a <your-token> placeholder — so suppressing it cannot turn a true positive into a false negative. Anything that could be a literal (an actual ghp_… token, a PEM block, a quoted high-entropy value) is left untouched.

Scope is intentionally narrow: only secret-category findings are eligible. The dominant false-positive source for the bundled secret rules is their word matcher firing on a variable name (GITHUB_TOKEN, AWS_ACCESS_KEY_ID, …) on lines that merely reference the variable rather than assign it a literal value.

Defined in:

passive_scan/false_positive.cr

Constant Summary

ASSIGNMENT_VALUE = /(?::|=>?)\s*(.+?)\s*$/

Captures the value to the right of the first assignment separator (:, =, or the PHP/Ruby hash arrow =>), trimming surrounding whitespace. Lines with no assignment separator (e.g. a bare -----BEGIN RSA PRIVATE KEY-----) never match, so PEM blocks and similar literals fall through untouched.

COMMENT_PREFIXES = ["#", "//", "/*", "*", "<!--"]

Whole-line comment markers across the common languages noir scans (shell/Python/Ruby/YAML #, C-family // /* *, HTML/XML/MD <!--). A variable name mentioned in a comment is never a leaked secret; a real literal in a comment is still caught by the value-shape regex gate, so this can only drop false positives.

EMPTY_ASSIGNMENT = /(?::|=>?)\s*$/

Same separators as ASSIGNMENT_VALUE but with an empty value — AWS_ACCESS_KEY_ID= / password: in a .env.example or config template. An empty value cannot carry a secret, so a keyword match on such a line is always a false positive.

ENV_ACCESSOR_MARKERS = ["process.env", "import.meta.env", "os.environ", "os.getenv", "getenv(", "System.getenv", "System.getProperty", "Deno.env", "ENV[", "ENV.fetch", "Environment.GetEnvironmentVariable", "Sys.getenv"]

Runtime environment-variable accessors. A line that pulls its value from the environment at runtime has, by construction, no literal secret to leak. These are substring-matched anywhere on the line so key = os.getenv("OPENAI_API_KEY") is covered regardless of where the accessor sits.

PLACEHOLDER_VALUE = /\A(?:<[^>]*>|your[-_ ]|insert[-_ ]your|replace[-_ ](?:me|this|with)|(?:changeme|change[-_]me|replaceme|replace[-_]me|placeholder|redacted|dummy|todo|fixme|none|null|nil|undefined|x{4,}|\*{4,})\b)/i

Documentation/template placeholder values — what a reader is told to replace, never a real secret. Matched at the start of the value so a KEY=<token> … or KEY=your-access-key-id example is caught even with trailing text:

  • angle-bracket stubs (<token>, <your-key>)
  • your-… / your_… (your-access-key-id, your_api_key)
  • explicit "replace this" / "insert your" prose
  • bare null / dummy tokens (nil, null, none, changeme, placeholder, redacted, xxxx…, ****) All are forms a genuine high-entropy literal can never take, so this only removes false positives.
PURE_REFERENCE = /\A(?:\$\{?\{?[A-Za-z_][\w.\- ]*\}?\}?|\$\(.+\)|%[A-Za-z_]\w*%|\{\{.+\}\}|<[^>]+>|env\(\s*['"][^'",]+['"]\s*\))\z/

Whole-value forms that are references or placeholders, never literals: shell/template variable substitutions ($VAR, ${VAR}, ${{ … }}, $(…), %VAR%, {{ … }}), angle-bracket placeholders (<your-token>), and single-argument env-helper calls (env('AWS_ACCESS_KEY_ID'), common in Laravel/Symfony/Rails config). Anchored so it only fires when the entire value is a reference — a real secret that merely contains a $ (e.g. P$ssw0rd…) is not matched. The env-call form is deliberately single-argument: a two-argument env('K', 'default') could hide a literal default, so it is left to fall through.

SECRET_NAME = /\A(?=[A-Za-z0-9_]*[A-Z_])[A-Za-z_][A-Za-z0-9_]*\z/

A secret variable name. The bundled secret rules carry two kinds of word pattern: environment-variable names (GITHUB_TOKEN, DATABASE_URL, AWS_ACCESS_KEY_ID) and literal secret markers (-----BEGIN PRIVATE KEY-----). Only the former are eligible for the "merely mentioned" suppression below — a PEM marker is itself the secret and must never be dropped on a mention basis.

Names are required to look like an env var: an identifier carrying at least one uppercase letter or underscore (DATABASE_URL, github_pat_). This deliberately excludes bare lowercase words (token, secret) so a rule keyword that doubles as ordinary prose is never suppressed on a mention basis — keeping the change firmly on the false-positive-only side.

Class Method Summary

Class Method Detail

def self.assigns_literal?(line : String, name : String) : Bool #

True when name is assigned a (non-empty) value on the line — NAME=…, NAME: …, "NAME": …, NAME => …. A bare mention ("DATABASE_URL", env.delete("DATABASE_URL"), prose) does not match, so it is treated as a non-secret reference.


[View source]
def self.comment_line?(line : String) : Bool #

True when line's leading non-space content is a comment marker.


[View source]
def self.matched_secret_name(rule : PassiveScan, line : String) : String | Nil #

The first env-var-name-shaped word pattern of rule that occurs on the line, or nil. Literal markers like -----BEGIN …----- and bare lowercase words are not env-var-shaped and are excluded.


[View source]
def self.regex_value_hit?(rule : PassiveScan, line : String) : Bool #

True when any value-shape (regex) matcher of rule matches the line — mirrors detect.cr's matching so the gate above agrees with what actually fired.


[View source]
def self.secret_reference?(line : String) : Bool #

True when line exposes its secret-bearing value only through an indirection (env read / templating) or a placeholder — i.e. there is no literal secret on the line to leak.


[View source]
def self.suppress?(category : String, line : String) : Bool #

Decide whether a result on line for a rule of category should be dropped as a false positive. Only secret findings are eligible; everything else passes through unchanged. This category-only form is the reference/placeholder check; the rule-aware overload below adds matcher-type gating and the "merely mentioned" heuristic.


[View source]
def self.suppress?(rule : PassiveScan, line : String) : Bool #

Rule-aware suppression. In addition to the reference/placeholder check it gates on which matcher fired:

  • If a value-shape regex matcher hits the line, a real secret literal is present (ghp_…, AKIA…, a credentialed URL) — high confidence, never suppressed.
  • Otherwise the finding is backed only by a word matcher on a variable name. When that name is merely mentioned — in a comment, a string literal, prose, a dependency list — rather than assigned a literal value, it is not a leaked secret.

[View source]