class
Noir::PhpLexer
- Noir::PhpLexer
- Reference
- Object
Overview
PhpLexer is a hand-rolled structural lexer for PHP source. It exists to replace the per-analyzer character state machines that every PHP analyzer re-implements (balanced-brace matching, statement-end scanning, string/ comment skip ranges) with a single shared pass that is:
- heredoc/nowdoc aware —
<<<EOT … EOT/<<<'EOT' … EOTbodies are masked, so route-shaped text or stray{};inside a heredoc can no longer leak as a false endpoint or corrupt a brace/statement bound. None of the pre-existing scanners handled<<<at all. - PHP-8 attribute aware —
#[Route(...)]is code, not a#comment. - linear on multi-byte input — the source is materialised once into an
Array(Char)with O(1) indexing, so CJK-commented controllers stay O(n) instead of the O(n^2) thatString#[](Int)caused.
The lexer masks every non-code region (strings, comments, heredoc/nowdoc
bodies) into spaces in @masked while preserving newlines and overall
length, so the structural helpers below are plain depth counters over
@masked with no string-state bookkeeping of their own.
Defined in:
minilexers/php_lexer.crConstructors
Instance Method Summary
-
#expression_end(start_pos : Int32) : Int32
Index of the first top-level expression terminator (
,;or a closing) ] }that would pop above the starting level) at or afterstart_pos. - #in_code?(pos : Int32) : Bool
-
#masked : Array(Char)
Code with strings/comments/heredoc bodies blanked to spaces.
-
#matching_delimiter(open_pos : Int32) : Int32 | Nil
Index of the delimiter that closes the
(/[/{atopen_pos, or nil. -
#skip_ranges : Array(Range(Int32, Int32))
Character ranges occupied by strings, comments and heredoc/nowdoc bodies.
-
#statement_end(start_pos : Int32) : Int32
Index just after the top-level
;at or afterstart_pos, or the source size when none is found. -
#tokens : Array(PhpToken)
Lazily produce a flat token stream over the source: structural delimiters,
->/::/=>operators, identifiers,$variables, and one token per string/comment/heredoc span.
Constructor Detail
Instance Method Detail
Index of the first top-level expression terminator (, ; or a closing
) ] } that would pop above the starting level) at or after start_pos.
Mirrors find_arrow_expression_end.
Code with strings/comments/heredoc bodies blanked to spaces. Same character length as the source; newlines preserved so line/offset math against the original content stays valid.
Index of the delimiter that closes the (/[/{ at open_pos, or nil.
Counts only the matching pair type, which is correct for balanced code
and mirrors the engine's find_matching_php_close_brace.
Character ranges occupied by strings, comments and heredoc/nowdoc bodies.
Index just after the top-level ; at or after start_pos, or the source
size when none is found. Mirrors find_php_statement_end.
Lazily produce a flat token stream over the source: structural
delimiters, ->/::/=> operators, identifiers, $variables, and one
token per string/comment/heredoc span. This is the reusable miniparser
surface for consumers that want to walk PHP structurally (e.g. following
a Route::a(...)->b(...)->group(...) method chain).