[ Index ]

PHP Cross Reference of DokuWiki

title

Body

[close]

/inc/parser/ -> lexer.php (summary)

Author Markus Baker: http://www.lastcraft.com Version adapted from Simple Test: http://sourceforge.net/projects/simpletest/ For an intro to the Lexer see: https://web.archive.org/web/20120125041816/http://www.phppatterns.com/docs/develop/simple_test_lexer_notes

Author: Marcus Baker
Version: $Id: lexer.php,v 1.1 2005/03/23 23:14:09 harryf Exp $
File Size: 614 lines (20 kb)
Included or required: 2 times
Referenced: 0 times
Includes or requires: 0 files

Defines 3 classes

Doku_LexerParallelRegex:: (6 methods):
  __construct()
  addPattern()
  match()
  split()
  _getCompoundedRegex()
  _getPerlMatchingFlags()

Doku_LexerStateStack:: (4 methods):
  __construct()
  getCurrent()
  enter()
  leave()

Doku_Lexer:: (14 methods):
  __construct()
  addPattern()
  addEntryPattern()
  addExitPattern()
  addSpecialPattern()
  mapHandler()
  parse()
  _dispatchTokens()
  _isModeEnd()
  _isSpecialMode()
  _decodeSpecial()
  _invokeParser()
  _reduce()
  Doku_Lexer_Escape()


Class: Doku_LexerParallelRegex  - X-Ref

Compounded regular expression. Any of
the contained patterns could match and
when one does it's label is returned.

__construct($case)   X-Ref
Constructor. Starts with no patterns.

param: boolean $case    True for case sensitive, false

addPattern($pattern, $label = true)   X-Ref
Adds a pattern with an optional label.

param: mixed       $pattern Perl style regex. Must be UTF-8
param: bool|string $label   Label of regex to be returned

match($subject, &$match)   X-Ref
Attempts to match all patterns at once against a string.

param: string $subject      String to match against.
param: string $match        First matched portion of
return: boolean             True on success.

split($subject, &$split)   X-Ref
Attempts to split the string against all patterns at once

author: Christopher Smith <chris@jalakai.co.uk>
param: string $subject      String to match against.
param: array $split         The split result: array containing, pre-match, match & post-match strings
return: boolean             True on success.

_getCompoundedRegex()   X-Ref
Compounds the patterns into a single
regular expression separated with the
"or" operator. Caches the regex.
Will automatically escape (, ) and / tokens.

return: null|string

_getPerlMatchingFlags()   X-Ref
Accessor for perl regex mode flags to use.

return: string       Perl regex flags.

Class: Doku_LexerStateStack  - X-Ref

States for a stack machine.

__construct($start)   X-Ref
Constructor. Starts in named state.

param: string $start        Starting state name.

getCurrent()   X-Ref
Accessor for current state.

return: string       State.

enter($state)   X-Ref
Adds a state to the stack and sets it
to be the current state.

param: string $state        New state.

leave()   X-Ref
Leaves the current state and reverts
to the previous one.

return: boolean    False if we drop off

Class: Doku_Lexer  - X-Ref

Accepts text and breaks it into tokens.
Some optimisation to make the sure the
content is only scanned by the PHP regex
parser once. Lexer modes must not start
with leading underscores.

__construct($parser, $start = "accept", $case = false)   X-Ref
Sets up the lexer in case insensitive matching
by default.

param: Doku_Parser $parser  Handling strategy by
param: string $start            Starting handler.
param: boolean $case            True for case sensitive.

addPattern($pattern, $mode = "accept")   X-Ref
Adds a token search pattern for a particular
parsing mode. The pattern does not change the
current mode.

param: string $pattern      Perl style regex, but ( and )
param: string $mode         Should only apply this

addEntryPattern($pattern, $mode, $new_mode)   X-Ref
Adds a pattern that will enter a new parsing
mode. Useful for entering parenthesis, strings,
tags, etc.

param: string $pattern      Perl style regex, but ( and )
param: string $mode         Should only apply this
param: string $new_mode     Change parsing to this new

addExitPattern($pattern, $mode)   X-Ref
Adds a pattern that will exit the current mode
and re-enter the previous one.

param: string $pattern      Perl style regex, but ( and )
param: string $mode         Mode to leave.

addSpecialPattern($pattern, $mode, $special)   X-Ref
Adds a pattern that has a special mode. Acts as an entry
and exit pattern in one go, effectively calling a special
parser handler for this token only.

param: string $pattern      Perl style regex, but ( and )
param: string $mode         Should only apply this
param: string $special      Use this mode for this one token.

mapHandler($mode, $handler)   X-Ref
Adds a mapping from a mode to another handler.

param: string $mode        Mode to be remapped.
param: string $handler     New target handler.

parse($raw)   X-Ref
Splits the page text into tokens. Will fail
if the handlers report an error or if no
content is consumed. If successful then each
unparsed and parsed token invokes a call to the
held listener.

param: string $raw        Raw HTML text.
return: boolean           True on success, else false.

_dispatchTokens($unmatched, $matched, $mode = false, $initialPos, $matchPos)   X-Ref
Sends the matched token and any leading unmatched
text to the parser changing the lexer to a new
mode if one is listed.

param: string $unmatched Unmatched leading portion.
param: string $matched Actual token match.
param: bool|string $mode Mode after match. A boolean
param: int $initialPos
param: int $matchPos
return: boolean             False if there was any error

_isModeEnd($mode)   X-Ref
Tests to see if the new mode is actually to leave
the current mode and pop an item from the matching
mode stack.

param: string $mode    Mode to test.
return: boolean        True if this is the exit mode.

_isSpecialMode($mode)   X-Ref
Test to see if the mode is one where this mode
is entered for this token only and automatically
leaves immediately afterwoods.

param: string $mode    Mode to test.
return: boolean        True if this is the exit mode.

_decodeSpecial($mode)   X-Ref
Strips the magic underscore marking single token
modes.

param: string $mode    Mode to decode.
return: string         Underlying mode name.

_invokeParser($content, $is_match, $pos)   X-Ref
Calls the parser method named after the current
mode. Empty content will be ignored. The lexer
has a parser handler for each mode in the lexer.

param: string $content Text parsed.
param: boolean $is_match Token is recognised rather
param: int $pos Current byte index location in raw doc
return: bool

_reduce(&$raw)   X-Ref
Tries to match a chunk of text and if successful
removes the recognised chunk and any leading
unparsed data. Empty strings will not be matched.

param: string $raw         The subject to parse. This is the
return: array              Three item list of unparsed

Doku_Lexer_Escape($str)   X-Ref
Escapes regex characters other than (, ) and /

param: string $str
return: mixed