tx_indexedsearch_lexer Class Reference

List of all members.

Public Member Functions

 tx_indexedsearch_lexer ()
 split2Words ($wordString)
 addWords (&$words, &$wordString, $start, $len)
 get_word (&$str, $pos=0)
 utf8_is_letter (&$str, &$len, $pos=0)
 charType ($cp)
 utf8_ord (&$str, &$len, $pos=0, $hex=false)

Public Attributes

 $debug = FALSE
 $debugString = ''
 $csObj
 $lexerConf


Detailed Description

Definition at line 73 of file class.lexer.php.


Member Function Documentation

tx_indexedsearch_lexer::tx_indexedsearch_lexer (  ) 

Constructor: Initializes the charset class, t3lib_cs

Returns:
void

Definition at line 105 of file class.lexer.php.

tx_indexedsearch_lexer::split2Words ( wordString  ) 

Splitting string into words. Used for indexing, can also be used to find words in query.

Parameters:
string String with UTF-8 content to process.
Returns:
array Array of words in utf-8

Definition at line 116 of file class.lexer.php.

tx_indexedsearch_lexer::addWords ( &$  words,
&$  wordString,
start,
len 
)

Add word to word-array This function should be used to make sure CJK sequences are split up in the right way

Parameters:
array Array of accumulated words
string Complete Input string from where to extract word
integer Start position of word in input string
integer The Length of the word string from start position
Returns:
void

Definition at line 178 of file class.lexer.php.

tx_indexedsearch_lexer::get_word ( &$  str,
pos = 0 
)

Get the first word in a given utf-8 string (initial non-letters will be skipped)

Parameters:
string Input string (reference)
integer Starting position in input string
Returns:
array 0: start, 1: len or false if no word has been found

Definition at line 239 of file class.lexer.php.

tx_indexedsearch_lexer::utf8_is_letter ( &$  str,
&$  len,
pos = 0 
)

See if a character is a letter (or a string of letters or non-letters).

Parameters:
string Input string (reference)
integer Byte-length of character sequence (reference, return value)
integer Starting position in input string
Returns:
boolean letter (or word) found

Definition at line 264 of file class.lexer.php.

tx_indexedsearch_lexer::charType ( cp  ) 

Determine the type of character

Parameters:
integer Unicode number to evaluate
Returns:
array Type of char; index-0: the main type: num, alpha or CJK (Chinese / Japanese / Korean)

Definition at line 329 of file class.lexer.php.

tx_indexedsearch_lexer::utf8_ord ( &$  str,
&$  len,
pos = 0,
hex = false 
)

Converts a UTF-8 multibyte character to a UNICODE codepoint

Parameters:
string UTF-8 multibyte character string (reference)
integer The length of the character (reference, return value)
integer Starting position in input string
boolean If set, then a hex. number is returned
Returns:
integer UNICODE codepoint

Definition at line 383 of file class.lexer.php.


The documentation for this class was generated from the following file:
This documentation has been generated automatically from TYPO3 source code using Doxygen and is provided as is by Cast Iron Coding as a courtesy to other TYPO3 developers and users. Please consider Cast Iron Coding — a full-service web development agency in Portland, Oregon specializing in TYPO3 extension development — for all of your TYPO3 development and consulting needs!