File "class-wp-token-map.php"
Full Path: /home/ycoalition/public_html/blog/wp-content/themes/poe/class-wp-token-map.php
File size: 4 KB
MIME-type: text/x-php
Charset: utf-8
<?php
/**
* Class for efficiently looking up and mapping string keys to string values, with limits.
*
* @package WordPress
* @since 6.6.0
*/
/**
* WP_Token_Map class.
*
* Use this class in specific circumstances with a static set of lookup keys which map to
* a static set of transformed values. For example, this class is used to map HTML named
* character references to their equivalent UTF-8 values.
*
* This class works differently than code calling `in_array()` and other methods. It
* internalizes lookup logic and provides helper interfaces to optimize lookup and
* transformation. It provides a method for precomputing the lookup tables and storing
* them as PHP source code.
*
* All tokens and substitutions must be shorter than 256 bytes.
*
* Example:
*
* $smilies = WP_Token_Map::from_array( array(
* '8O' => '😯',
* ':(' => '🙁',
* ':)' => '🙂',
* ':?' => '😕',
* ) );
*
* true === $smilies->contains( ':)' );
* false === $smilies->contains( 'simile' );
*
* '😕' === $smilies->read_token( 'Not sure :?.', 9, $length_of_smily_syntax );
* 2 === $length_of_smily_syntax;
*
* ## Precomputing the Token Map.
*
* Creating the class involves some work sorting and organizing the tokens and their
* replacement values. In order to skip this, it's possible for the class to export
* its state and be used as actual PHP source code.
*
* Example:
*
* // Export with four spaces as the indent, only for the sake of this docblock.
* // The default indent is a tab character.
* $indent = ' ';
* echo $smilies->precomputed_php_source_table( $indent );
*
* // Output, to be pasted into a PHP source file:
* WP_Token_Map::from_precomputed_table(
* array(
* "storage_version" => "6.6.0",
* "key_length" => 2,
* "groups" => "",
* "long_words" => array(),
* "small_words" => "8O\x00:)\x00:(\x00:?\x00",
* "small_mappings" => array( "😯", "🙂", "🙁", "😕" )
* )
* );
*
* ## Large vs. small words.
*
* This class uses a short prefix called the "key" to optimize lookup of its tokens.
* This means that some tokens may be shorter than or equal in length to that key.
* Those words that are longer than the key are called "large" while those shorter
* than or equal to the key length are called "small."
*
* This separation of large and small words is incidental to the way this class
* optimizes lookup, and should be considered an internal implementation detail
* of the class. It may still be important to be aware of it, however.
*
* ## Determining Key Length.
*
* The choice of the size of the key length should be based on the data being stored in
* the token map. It should divide the data as evenly as possible, but should not create
* so many groups that a large fraction of the groups only contain a single token.
*
* For the HTML5 named character references, a key length of 2 was found to provide a
* sufficient spread and should be a good default for relatively large sets of tokens.
*
* However, for some data sets this might be too long. For example, a list of smilies
* may be too small for a key length of 2. Perhaps 1 would be more appropriate. It's
* best to experiment and determine empirically which values are appropriate.
*
* ## Generate Pre-Computed Source Code.
*
* Since the `WP_Token_Map` is designed for relatively static lookups, it can be
* advantageous to precompute the values and instantiate a table that has already
* sorted and grouped the tokens and built the lookup strings.
*
* This can be done with `WP_Token_Map::precomputed_php_source_table()`.
*
* Note that if there is a leading character that all tokens need, such as `&` for
* HTML named character references, it can be beneficial to exclude this from the
* token map. Instead, find occurrences of the leading character and then use the
* token map to see if the following characters complete