Manager

Back Back Up Advanced Editor

<?php

/**
 * Class for efficiently looking up and mapping string keys to string values, with limits.
 *
 * @package    WordPress
 * @since      6.6.0
 */

/**
 * WP_Token_Map class.
 *
 * Use this class in specific circumstances with a static set of lookup keys which map to
 * a static set of transformed values. For example, this class is used to map HTML named
 * character references to their equivalent UTF-8 values.
 *
 * This class works differently than code calling `in_array()` and other methods. It
 * internalizes lookup logic and provides helper interfaces to optimize lookup and
 * transformation. It provides a method for precomputing the lookup tables and storing
 * them as PHP source code.
 *
 * All tokens and substitutions must be shorter than 256 bytes.
 *
 * Example:
 *
 *     $smilies = WP_Token_Map::from_array( array(
 *         '8O' => '😯',
 *         ':(' => '🙁',
 *         ':)' => '🙂',
 *         ':?' => '😕',
 *      ) );
 *
 *      true  === $smilies->contains( ':)' );
 *      false === $smilies->contains( 'simile' );
 *
 *      '😕' === $smilies->read_token( 'Not sure :?.', 9, $length_of_smily_syntax );
 *      2    === $length_of_smily_syntax;
 *
 * ## Precomputing the Token Map.
 *
 * Creating the class involves some work sorting and organizing the tokens and their
 * replacement values. In order to skip this, it's possible for the class to export
 * its state and be used as actual PHP source code.
 *
 * Example:
 *
 *      // Export with four spaces as the indent, only for the sake of this docblock.
 *      // The default indent is a tab character.
 *      $indent = '    ';
 *      echo $smilies->precomputed_php_source_table( $indent );
 *
 *      // Output, to be pasted into a PHP source file:
 *      WP_Token_Map::from_precomputed_table(
 *          array(
 *              "storage_version" => "6.6.0",
 *              "key_length" => 2,
 *              "groups" => "",
 *              "long_words" => array(),
 *              "small_words" => "8O\x00:)\x00:(\x00:?\x00",
 *              "small_mappings" => array( "😯", "🙂", "🙁", "😕" )
 *          )
 *      );
 *
 * ## Large vs. small words.
 *
 * This class uses a short prefix called the "key" to optimize lookup of its tokens.
 * This means that some tokens may be shorter than or equal in length to that key.
 * Those words that are longer than the key are called "large" while those shorter
 * than or equal to the key length are called "small."
 *
 * This separation of large and small words is incidental to the way this class
 * optimizes lookup, and should be considered an internal implementation detail
 * of the class. It may still be important to be aware of it, however.
 *
 * ## Determining Key Length.
 *
 * The choice of the size of the key length should be based on the data being stored in
 * the token map. It should divide the data as evenly as possible, but should not create
 * so many groups that a large fraction of the groups only contain a single token.
 *
 * For the HTML5 named character references, a key length of 2 was found to provide a
 * sufficient spread and should be a good default for relatively large sets of tokens.
 *
 * However, for some data sets this might be too long. For example, a list of smilies
 * may be too small for a key length of 2. Perhaps 1 would be more appropriate. It's
 * best to experiment and determine empirically which values are appropriate.
 *
 * ## Generate Pre-Computed Source Code.
 *
 * Since the `WP_Token_Map` is designed for relatively static lookups, it can be
 * advantageous to precompute the values and instantiate a table that has already
 * sorted and grouped the tokens and built the lookup strings.
 *
 * This can be done with `WP_Token_Map::precomputed_php_source_table()`.
 *
 * Note that if there is a leading character that all tokens need, such as `&` for
 * HTML named character references, it can be beneficial to exclude this from the
 * token map. Instead, find occurrences of the leading character and then use the
 * token map to see if the following characters complete