Flourish PHP Unframework

fUTF8

The class fUTF8 is a static class that provides UTF-8 compatible versions of almost every string function that is provided with PHP. Since UTF-8 uses multiple bytes of data for some characters and the built-in PHP string functions are built to work with single-byte encodings, many of the PHP string functions will perform incorrectly on UTF-8 strings.

There is a PHP extension called mbstring that is designed for dealing with multi-byte string encodings, however it is not installed by default, does not include many commonly used functions, and contains some bugs. The fUTF8 class will use the mbstring extension for performance benefits in appropriate situations if it is installed.

Method to Function Mapping

The table below contains a list of the built-in PHP string functions with the equivalent fUTF8 method beside it. Any additional features or differences will also be listed.

PHP Function !fUTF8 Method Differences
chr() chr() Accepts U+hex or decimal Unicode code point instead of ASCII decimal value
explode() explode() Parameter order is switched to $string, $delimeter - also accepts NULL delimeter to explode into characters
ltrim() ltrim()
ord() ord() Returns U+hex Unicode code point instead of ASCII decimal value
rtrim() rtrim()
str_ireplace() ireplace()
str_pad() pad()
str_replace() replace()
strcasecmp() icmp() Letters that are ASCII letters with diacritics are sorted right after the base ASCII letter
strcmp() cmp() Letters that are ASCII letters with diacritics are sorted right after the base ASCII letter
stripos() ipos()
stristr() istr()
strlen() len()
strnatcasecmp() inatcmp() Letters that are ASCII letters with diacritics are sorted right after the base ASCII letter
strnatcmp() natcmp() Letters that are ASCII letters with diacritics are sorted right after the base ASCII letter
strpos() pos()
strrev() rev()
strripos() irpos()
strrpos() rpos()
strstr() str()
strtolower() lower()
strtoupper() upper()
substr() sub()
trim() trim()
ucfirst() ucfirst()
ucwords() ucwords()
wordwrap() wordwrap()

Cleaning Strings (Security)

Due to the way that UTF-8 is implemented, certain character combinations are not allowed. Allowing such invalid data into a system could easily lead to all sorts of bugs with character parsing. To solve this issue, the clean() method will remove any malformed UTF-8 characters from a string.

This method should be used when importing data into a system from an external data source that may contain invalid data. Please note that fRequest::get() and fCookie::get() automatically call this method, so it is not necessary to clean it again.

$cleaned_string = fUTF8::clean($imported_string);