| [ Index ] |
PHP Cross Reference of DokuWiki |
[Source view] [Print] [Project Stats]
Functions to create the fulltext search index
| Author: | Andreas Gohr |
| Author: | Tom N Harris |
| License: | GPL 2 (http://www.gnu.org/licenses/gpl.html) |
| File Size: | 1397 lines (49 kb) |
| Included or required: | 2 times |
| Referenced: | 0 times |
| Includes or requires: | 0 files |
Doku_Indexer:: (31 methods):
addPageWords()
getPageWords()
addMetaKeys()
deletePage()
tokenizer()
lookup()
lookupKey()
getIndexWords()
getPages()
histogram()
lock()
unlock()
getIndex()
saveIndex()
getIndexKey()
saveIndexKey()
addIndexKey()
cacheIndexDir()
listIndexLengths()
indexLengths()
updateTuple()
parseTuples()
countTuples()
idx_get_indexer()
idx_addPage()
idx_lookup()
idx_tokenizer()
idx_getIndex()
idx_listIndexLengths()
idx_indexLengths()
idx_cleanName()
Class: Doku_Indexer - X-Ref
Class that encapsulates operations on the indexer database.| addPageWords($page, $text) X-Ref |
| Adds the contents of a page to the fulltext index The added text replaces previous words for the same page. An empty value erases the page. author: Tom N Harris <tnharris@whoopdedo.org> author: Andreas Gohr <andi@splitbrain.org> param: string $page a page name param: string $text the body of the page return: boolean the function completed successfully |
| getPageWords($text) X-Ref |
| Split the words in a page and add them to the index. author: Andreas Gohr <andi@splitbrain.org> author: Christopher Smith <chris@jalakai.co.uk> author: Tom N Harris <tnharris@whoopdedo.org> param: string $text content of the page return: array list of word IDs and number of times used |
| addMetaKeys($page, $key, $value=null) X-Ref |
| Add/update keys to/of the metadata index. Adding new keys does not remove other keys for the page. An empty value will erase the key. The $key parameter can be an array to add multiple keys. $value will not be used if $key is an array. author: Tom N Harris <tnharris@whoopdedo.org> author: Michael Hamann <michael@content-space.de> param: string $page a page name param: mixed $key a key string or array of key=>value pairs param: mixed $value the value or list of values return: boolean the function completed successfully |
| deletePage($page) X-Ref |
| Remove a page from the index Erases entries in all known indexes. author: Tom N Harris <tnharris@whoopdedo.org> param: string $page a page name return: boolean the function completed successfully |
| tokenizer($text, $wc=false) X-Ref |
| Split the text into words for fulltext search TODO: does this also need &$stopwords ? author: Tom N Harris <tnharris@whoopdedo.org> author: Andreas Gohr <andi@splitbrain.org> triggers: INDEXER_TEXT_PREPARE param: string $text plain text param: boolean $wc are wildcards allowed? return: array list of words in the text |
| lookup(&$tokens) X-Ref |
| Find pages in the fulltext index containing the words, The search words must be pre-tokenized, meaning only letters and numbers with an optional wildcard The returned array will have the original tokens as key. The values in the returned list is an array with the page names as keys and the number of times that token appears on the page as value. author: Tom N Harris <tnharris@whoopdedo.org> author: Andreas Gohr <andi@splitbrain.org> param: arrayref $tokens list of words to search for return: array list of page names with usage counts |
| lookupKey($key, &$value, $func=null) X-Ref |
| Find pages containing a metadata key. The metadata values are compared as case-sensitive strings. Pass a callback function that returns true or false to use a different comparison function. The function will be called with the $value being searched for as the first argument, and the word in the index as the second argument. The function preg_match can be used directly if the values are regexes. author: Tom N Harris <tnharris@whoopdedo.org> author: Michael Hamann <michael@content-space.de> param: string $key name of the metadata key to look for param: string $value search term to look for, must be a string or array of strings param: callback $func comparison function return: array lists with page names, keys are query values if $value is array |
| getIndexWords(&$words, &$result) X-Ref |
| Find the index ID of each search term. The query terms should only contain valid characters, with a '*' at either the beginning or end of the word (or both). The $result parameter can be used to merge the index locations with the appropriate query term. author: Tom N Harris <tnharris@whoopdedo.org> param: arrayref $words The query terms. param: arrayref $result Set to word => array("length*id" ...) return: array Set to length => array(id ...) |
| getPages($key=null) X-Ref |
| Return a list of all pages Warning: pages may not exist! author: Tom N Harris <tnharris@whoopdedo.org> param: string $key list only pages containing the metadata key (optional) return: array list of page names |
| histogram($min=1, $max=0, $minlen=3, $key=null) X-Ref |
| Return a list of words sorted by number of times used author: Tom N Harris <tnharris@whoopdedo.org> param: int $min bottom frequency threshold param: int $max upper frequency limit. No limit if $max<$min param: int $length minimum length of words to count param: string $key metadata key to list. Uses the fulltext index if not given return: array list of words as the keys and frequency as values |
| lock() X-Ref |
| Lock the indexer. author: Tom N Harris <tnharris@whoopdedo.org> |
| unlock() X-Ref |
| Release the indexer lock. author: Tom N Harris <tnharris@whoopdedo.org> |
| getIndex($idx, $suffix) X-Ref |
| Retrieve the entire index. The $suffix argument is for an index that is split into multiple parts. Different index files should use different base names. author: Tom N Harris <tnharris@whoopdedo.org> param: string $idx name of the index param: string $suffix subpart identifier return: array list of lines without CR or LF |
| saveIndex($idx, $suffix, &$lines) X-Ref |
| Replace the contents of the index with an array. author: Tom N Harris <tnharris@whoopdedo.org> param: string $idx name of the index param: string $suffix subpart identifier param: arrayref $linex list of lines without LF |
| getIndexKey($idx, $suffix, $id) X-Ref |
| Retrieve a line from the index. author: Tom N Harris <tnharris@whoopdedo.org> param: string $idx name of the index param: string $suffix subpart identifier param: int $id the line number return: string a line with trailing whitespace removed |
| saveIndexKey($idx, $suffix, $id, $line) X-Ref |
| Write a line into the index. author: Tom N Harris <tnharris@whoopdedo.org> param: string $idx name of the index param: string $suffix subpart identifier param: int $id the line number param: string $line line to write |
| addIndexKey($idx, $suffix, $value) X-Ref |
| Retrieve or insert a value in the index. author: Tom N Harris <tnharris@whoopdedo.org> param: string $idx name of the index param: string $suffix subpart identifier param: string $value line to find in the index return: int line number of the value in the index |
| cacheIndexDir($idx, $suffix, $delete=false) X-Ref |
| No description |
| listIndexLengths() X-Ref |
| Get the list of lengths indexed in the wiki. Read the index directory or a cache file and returns a sorted array of lengths of the words used in the wiki. author: YoBoY <yoboy.leguesh@gmail.com> |
| indexLengths($filter) X-Ref |
| Get the word lengths that have been indexed. Reads the index directory and returns an array of lengths that there are indices for. author: YoBoY <yoboy.leguesh@gmail.com> |
| updateTuple($line, $id, $count) X-Ref |
| Insert or replace a tuple in a line. author: Tom N Harris <tnharris@whoopdedo.org> |
| parseTuples(&$keys, $line) X-Ref |
| Split a line into an array of tuples. author: Tom N Harris <tnharris@whoopdedo.org> author: Andreas Gohr <andi@splitbrain.org> |
| countTuples($line) X-Ref |
| Sum the counts in a list of tuples. author: Tom N Harris <tnharris@whoopdedo.org> |
| idx_get_indexer() X-Ref |
| Create an instance of the indexer. author: Tom N Harris <tnharris@whoopdedo.org> return: object a Doku_Indexer |
| idx_addPage($page, $verbose=false, $force=false) X-Ref |
| No description |
| idx_lookup(&$words) X-Ref |
| Find tokens in the fulltext index Takes an array of words and will return a list of matching pages for each one. Important: No ACL checking is done here! All results are returned, regardless of permissions param: arrayref $words list of words to search for return: array list of pages found, associated with the search terms |
| idx_tokenizer($string, $wc=false) X-Ref |
| Split a string into tokens |
| idx_getIndex($idx, $suffix) X-Ref |
| Read the list of words in an index (if it exists). author: Tom N Harris <tnharris@whoopdedo.org> |
| idx_listIndexLengths() X-Ref |
| Get the list of lengths indexed in the wiki. Read the index directory or a cache file and returns a sorted array of lengths of the words used in the wiki. author: YoBoY <yoboy.leguesh@gmail.com> |
| idx_indexLengths($filter) X-Ref |
| Get the word lengths that have been indexed. Reads the index directory and returns an array of lengths that there are indices for. author: YoBoY <yoboy.leguesh@gmail.com> |
| idx_cleanName($name) X-Ref |
| Clean a name of a key for use as a file name. Romanizes non-latin characters, then strips away anything that's not a letter, number, or underscore. author: Tom N Harris <tnharris@whoopdedo.org> |
| idx_get_version() X-Ref |
| Version of the indexer taking into consideration the external tokenizer. The indexer is only compatible with data written by the same version. author: Tom N Harris <tnharris@whoopdedo.org> author: Michael Hamann <michael@content-space.de> triggers: INDEXER_VERSION_GET |
| wordlen($w) X-Ref |
| Measure the length of a string. Differs from strlen in handling of asian characters. author: Tom N Harris <tnharris@whoopdedo.org> |
| Generated: Wed May 23 03:00:10 2012 | Cross-referenced by PHPXref 0.7 |