[ Index ]

PHP Cross Reference of DokuWiki

title

Body

[close]

/inc/ -> indexer.php (summary)

Functions to create the fulltext search index

Author: Andreas Gohr
Author: Tom N Harris
License: GPL 2 (http://www.gnu.org/licenses/gpl.html)
File Size: 1611 lines (55 kb)
Included or required: 1 time
Referenced: 0 times
Includes or requires: 0 files

Defines 1 class

Doku_Indexer:: (37 methods):
  addPageWords()
  getPageWords()
  addMetaKeys()
  renamePage()
  renameMetaValue()
  deletePage()
  deletePageNoLock()
  clear()
  tokenizer()
  getPID()
  getPIDNoLock()
  getPageFromPID()
  lookup()
  lookupKey()
  getIndexWords()
  getPages()
  histogram()
  lock()
  unlock()
  getIndex()
  saveIndex()
  getIndexKey()
  saveIndexKey()
  addIndexKey()
  listIndexLengths()
  indexLengths()
  updateTuple()
  parseTuples()
  countTuples()
  idx_get_indexer()
  idx_addPage()
  idx_lookup()
  idx_tokenizer()
  idx_getIndex()
  idx_listIndexLengths()
  idx_indexLengths()
  idx_cleanName()

Defines 37 functions

  idx_get_version()
  wordlen()

Class: Doku_Indexer  - X-Ref

Class that encapsulates operations on the indexer database.

addPageWords($page, $text)   X-Ref
Adds the contents of a page to the fulltext index

The added text replaces previous words for the same page.
An empty value erases the page.

author: Tom N Harris <tnharris@whoopdedo.org>
author: Andreas Gohr <andi@splitbrain.org>
return: string|boolean  the function completed successfully
param: string    $page   a page name
param: string    $text   the body of the page

getPageWords($text)   X-Ref
Split the words in a page and add them to the index.

author: Andreas Gohr <andi@splitbrain.org>
author: Christopher Smith <chris@jalakai.co.uk>
author: Tom N Harris <tnharris@whoopdedo.org>
return: array            list of word IDs and number of times used
param: string    $text   content of the page

addMetaKeys($page, $key, $value=null)   X-Ref
Add/update keys to/of the metadata index.

Adding new keys does not remove other keys for the page.
An empty value will erase the key.
The $key parameter can be an array to add multiple keys. $value will
not be used if $key is an array.

author: Tom N Harris <tnharris@whoopdedo.org>
author: Michael Hamann <michael@content-space.de>
return: boolean|string     the function completed successfully
param: string    $page   a page name
param: mixed     $key    a key string or array of key=>value pairs
param: mixed     $value  the value or list of values

renamePage($oldpage, $newpage)   X-Ref
Rename a page in the search index without changing the indexed content. This function doesn't check if the
old or new name exists in the filesystem. It returns an error if the old page isn't in the page list of the
indexer and it deletes all previously indexed content of the new page.

return: string|bool If the page was successfully renamed, can be a message in the case of an error
param: string $oldpage The old page name
param: string $newpage The new page name

renameMetaValue($key, $oldvalue, $newvalue)   X-Ref
Renames a meta value in the index. This doesn't change the meta value in the pages, it assumes that all pages
will be updated.

return: bool|string      If renaming the value has been successful, false or error message on error.
param: string $key       The metadata key of which a value shall be changed
param: string $oldvalue  The old value that shall be renamed
param: string $newvalue  The new value to which the old value shall be renamed, can exist (then values will be merged)

deletePage($page)   X-Ref
Remove a page from the index

Erases entries in all known indexes.

author: Tom N Harris <tnharris@whoopdedo.org>
return: string|boolean  the function completed successfully
param: string    $page   a page name

deletePageNoLock($page)   X-Ref
Remove a page from the index without locking the index, only use this function if the index is already locked

Erases entries in all known indexes.

author: Tom N Harris <tnharris@whoopdedo.org>
return: boolean          the function completed successfully
param: string    $page   a page name

clear()   X-Ref
Clear the whole index

return: bool If the index has been cleared successfully

tokenizer($text, $wc=false)   X-Ref
Split the text into words for fulltext search

TODO: does this also need &$stopwords ?

triggers: INDEXER_TEXT_PREPARE
author: Tom N Harris <tnharris@whoopdedo.org>
author: Andreas Gohr <andi@splitbrain.org>
return: array            list of words in the text
param: string    $text   plain text
param: boolean   $wc     are wildcards allowed?

getPID($page)   X-Ref
Get the numeric PID of a page

return: bool|int The page id on success, false on error
param: string $page The page to get the PID for

getPIDNoLock($page)   X-Ref
Get the numeric PID of a page without locking the index.
Only use this function when the index is already locked.

return: bool|int The page id on success, false on error
param: string $page The page to get the PID for

getPageFromPID($pid)   X-Ref
Get the page id of a numeric PID

return: string The page id
param: int $pid The PID to get the page id for

lookup(&$tokens)   X-Ref
Find pages in the fulltext index containing the words,

The search words must be pre-tokenized, meaning only letters and
numbers with an optional wildcard

The returned array will have the original tokens as key. The values
in the returned list is an array with the page names as keys and the
number of times that token appears on the page as value.

author: Tom N Harris <tnharris@whoopdedo.org>
author: Andreas Gohr <andi@splitbrain.org>
return: array         list of page names with usage counts
param: array  $tokens list of words to search for

lookupKey($key, &$value, $func=null)   X-Ref
Find pages containing a metadata key.

The metadata values are compared as case-sensitive strings. Pass a
callback function that returns true or false to use a different
comparison function. The function will be called with the $value being
searched for as the first argument, and the word in the index as the
second argument. The function preg_match can be used directly if the
values are regexes.

author: Tom N Harris <tnharris@whoopdedo.org>
author: Michael Hamann <michael@content-space.de>
return: array            lists with page names, keys are query values if $value is array
param: string    $key    name of the metadata key to look for
param: string    $value  search term to look for, must be a string or array of strings
param: callback  $func   comparison function

getIndexWords(&$words, &$result)   X-Ref
Find the index ID of each search term.

The query terms should only contain valid characters, with a '*' at
either the beginning or end of the word (or both).
The $result parameter can be used to merge the index locations with
the appropriate query term.

author: Tom N Harris <tnharris@whoopdedo.org>
return: array         Set to length => array(id ...)
param: array  $words  The query terms.
param: array  $result Set to word => array("length*id" ...)

getPages($key=null)   X-Ref
Return a list of all pages
Warning: pages may not exist!

author: Tom N Harris <tnharris@whoopdedo.org>
return: array            list of page names
param: string    $key    list only pages containing the metadata key (optional)

histogram($min=1, $max=0, $minlen=3, $key=null)   X-Ref
Return a list of words sorted by number of times used

author: Tom N Harris <tnharris@whoopdedo.org>
return: array            list of words as the keys and frequency as values
param: int       $min    bottom frequency threshold
param: int       $max    upper frequency limit. No limit if $max<$min
param: int       $minlen minimum length of words to count
param: string    $key    metadata key to list. Uses the fulltext index if not given

lock()   X-Ref
Lock the indexer.

author: Tom N Harris <tnharris@whoopdedo.org>
return: bool|string

unlock()   X-Ref
Release the indexer lock.

author: Tom N Harris <tnharris@whoopdedo.org>
return: bool

getIndex($idx, $suffix)   X-Ref
Retrieve the entire index.

The $suffix argument is for an index that is split into
multiple parts. Different index files should use different
base names.

author: Tom N Harris <tnharris@whoopdedo.org>
return: array            list of lines without CR or LF
param: string    $idx    name of the index
param: string    $suffix subpart identifier

saveIndex($idx, $suffix, &$lines)   X-Ref
Replace the contents of the index with an array.

author: Tom N Harris <tnharris@whoopdedo.org>
return: bool             If saving succeeded
param: string    $idx    name of the index
param: string    $suffix subpart identifier
param: array     $lines  list of lines without LF

getIndexKey($idx, $suffix, $id)   X-Ref
Retrieve a line from the index.

author: Tom N Harris <tnharris@whoopdedo.org>
return: string           a line with trailing whitespace removed
param: string    $idx    name of the index
param: string    $suffix subpart identifier
param: int       $id     the line number

saveIndexKey($idx, $suffix, $id, $line)   X-Ref
Write a line into the index.

author: Tom N Harris <tnharris@whoopdedo.org>
return: bool             If saving succeeded
param: string    $idx    name of the index
param: string    $suffix subpart identifier
param: int       $id     the line number
param: string    $line   line to write

addIndexKey($idx, $suffix, $value)   X-Ref
Retrieve or insert a value in the index.

author: Tom N Harris <tnharris@whoopdedo.org>
return: int|bool          line number of the value in the index or false if writing the index failed
param: string    $idx    name of the index
param: string    $suffix subpart identifier
param: string    $value  line to find in the index

listIndexLengths()   X-Ref
Get the list of lengths indexed in the wiki.

Read the index directory or a cache file and returns
a sorted array of lengths of the words used in the wiki.

author: YoBoY <yoboy.leguesh@gmail.com>
return: array

indexLengths($filter)   X-Ref
Get the word lengths that have been indexed.

Reads the index directory and returns an array of lengths
that there are indices for.

author: YoBoY <yoboy.leguesh@gmail.com>
return: array
param: array|int $filter

updateTuple($line, $id, $count)   X-Ref
Insert or replace a tuple in a line.

author: Tom N Harris <tnharris@whoopdedo.org>
return: string
param: string $line
param: string|int $id
param: int    $count

parseTuples(&$keys, $line)   X-Ref
Split a line into an array of tuples.

author: Tom N Harris <tnharris@whoopdedo.org>
author: Andreas Gohr <andi@splitbrain.org>
return: array
param: array $keys
param: string $line

countTuples($line)   X-Ref
Sum the counts in a list of tuples.

author: Tom N Harris <tnharris@whoopdedo.org>
return: int
param: string $line

idx_get_indexer()   X-Ref
Create an instance of the indexer.

author: Tom N Harris <tnharris@whoopdedo.org>
return: Doku_Indexer    a Doku_Indexer

idx_addPage($page, $verbose=false, $force=false)   X-Ref
No description

idx_lookup(&$words)   X-Ref
Find tokens in the fulltext index

Takes an array of words and will return a list of matching
pages for each one.

Important: No ACL checking is done here! All results are
returned, regardless of permissions

return: array             list of pages found, associated with the search terms
param: array      $words  list of words to search for

idx_tokenizer($string, $wc=false)   X-Ref
Split a string into tokens

return: array
param: string $string
param: bool $wc

idx_getIndex($idx, $suffix)   X-Ref
Read the list of words in an index (if it exists).

author: Tom N Harris <tnharris@whoopdedo.org>
return: array
param: string $idx
param: string $suffix

idx_listIndexLengths()   X-Ref
Get the list of lengths indexed in the wiki.

Read the index directory or a cache file and returns
a sorted array of lengths of the words used in the wiki.

author: YoBoY <yoboy.leguesh@gmail.com>
return: array

idx_indexLengths($filter)   X-Ref
Get the word lengths that have been indexed.

Reads the index directory and returns an array of lengths
that there are indices for.

author: YoBoY <yoboy.leguesh@gmail.com>
return: array
param: array|int $filter

idx_cleanName($name)   X-Ref
Clean a name of a key for use as a file name.

Romanizes non-latin characters, then strips away anything that's
not a letter, number, or underscore.

author: Tom N Harris <tnharris@whoopdedo.org>
return: string
param: string $name

Functions
Functions that are not part of a class:

idx_get_version()   X-Ref
Version of the indexer taking into consideration the external tokenizer.
The indexer is only compatible with data written by the same version.

triggers: INDEXER_VERSION_GET
author: Tom N Harris <tnharris@whoopdedo.org>
author: Michael Hamann <michael@content-space.de>
return: int|string

wordlen($w)   X-Ref
Measure the length of a string.
Differs from strlen in handling of asian characters.

author: Tom N Harris <tnharris@whoopdedo.org>
return: int
param: string $w