[ Index ]

PHP Cross Reference of DokuWiki

title

Body

[close]

/inc/Utf8/ -> Clean.php (summary)

(no description)

File Size: 204 lines (6 kb)
Included or required:0 times
Referenced: 0 times
Includes or requires: 0 files

Defines 1 class

Clean:: (8 methods):
  isASCII()
  isUtf8()
  strip()
  stripspecials()
  replaceBadBytes()
  deaccent()
  romanize()
  correctIdx()


Class: Clean  - X-Ref

Methods to assess and clean UTF-8 strings

isASCII($str)   X-Ref
Checks if a string contains 7bit ASCII only

param: string $str
author: Andreas Haerter <andreas.haerter@dev.mail-node.com>
return: bool

isUtf8($str)   X-Ref
Tries to detect if a string is in Unicode encoding

link: http://php.net/manual/en/function.utf8-encode.php
param: string $str
author: <bmorel@ssi.fr>
return: bool

strip($str)   X-Ref
Strips all high byte chars

Returns a pure ASCII7 string

param: string $str
author: Andreas Gohr <andi@splitbrain.org>
return: string

stripspecials($string, $repl = '', $additional = '')   X-Ref
Removes special characters (nonalphanumeric) from a UTF-8 string

This function adds the controlchars 0x00 to 0x19 to the array of
stripped chars (they are not included in $UTF8_SPECIAL_CHARS)

param: string $string The UTF8 string to strip of special chars
param: string $repl Replace special with this string
param: string $additional Additional chars to strip (used in regexp char class)
author: Andreas Gohr <andi@splitbrain.org>
return: string

replaceBadBytes($str, $replace = '')   X-Ref
Replace bad bytes with an alternative character

ASCII character is recommended for replacement char

PCRE Pattern to locate bad bytes in a UTF-8 string
Comes from W3 FAQ: Multilingual Forms
Note: modified to include full ASCII range including control chars

param: string $str to search
param: string $replace to replace bad bytes with (defaults to '?') - use ASCII
see: http://www.w3.org/International/questions/qa-forms-utf-8
author: Harry Fuecks <hfuecks@gmail.com>
return: string

deaccent($string, $case = 0)   X-Ref
Replace accented UTF-8 characters by unaccented ASCII-7 equivalents

Use the optional parameter to just deaccent lower ($case = -1) or upper ($case = 1)
letters. Default is to deaccent both cases ($case = 0)

param: string $string
param: int $case
author: Andreas Gohr <andi@splitbrain.org>
return: string

romanize($string)   X-Ref
Romanize a non-latin string

param: string $string
author: Andreas Gohr <andi@splitbrain.org>
return: string

correctIdx($str, $i, $next = false)   X-Ref
adjust a byte index into a utf8 string to a utf8 character boundary

param: string $str utf8 character string
param: int $i byte index into $str
param: bool $next direction to search for boundary, false = up (current character) true = down (next character)
author: chris smith <chris@jalakai.co.uk>
return: int byte index into $str now pointing to a utf8 character boundary