You may need to get part of a text when you work with text in PHP. The mb_substr gives you better results than substr()
if the text uses UTF-8 or other non-English characters.
Table of Content
- Understand the PHP mb_substr Function
- Handle Multibyte Strings with PHP mb_substr Function
- Use Negative Start and Length Values in PHP mb_substr Function
- The Difference Between
mb_substr()
andmb_strcut()
- How to Detect Encoding Before Using
mb_substr
- How
mb_substr()
Works with Invalid Encoding - How to Set Default Encoding Globally
- Supported PHP Versions and Requirements
- Wrapping Up
- FAQ’s
Understand the PHP mb_substr Function
The mb_substr() function is a multi-byte safe substr operation in PHP. It lets you extract the substring of a text. It works based on the number of characters not bytes, which makes it ideal for non-ASCII strings like UTF-8.
By default, it uses the encoding parameter set by PHP. You can override it by passing a different encoding parameter as the fourth argument.
Here is the syntax:
mb_substr( $string, $start, $length, $encoding)
- $string refers to the input text
- $start is the character position, where
0
is the first character. - $length is the maximum number of characters to extract.
- $encoding is the optional character set, like
"UTF-8"
Here are the key differences from substr()
:
substr()
returns the portion based on bytes, which can break multibyte characters.mb_substr()
returns a clean part of a string even when working with non-ASCII text.- If you set length to null,
mb_substr()
will extract from the start point to the end of the string using the encoding parameter. This lets you null the internal character limit manually.
Use mb_substr()
to avoid broken output. It reads characters, not bytes. This doesn’t split letters like ñ, ö, or 日.
Here is a quick example:
$text = "Mañana";
echo mb_substr($text, 0, 3);
Output:
Mañ
Here:
- It starts at character position 0.
- Extracts 3 characters, not bytes.
- Uses UTF-8 encoding.
- So,
mb_substr()
gets part of the string safely.
Handle Multibyte Strings with PHP mb_substr Function
UTF-8 uses more than one byte for some characters. For example, “日” uses three bytes. It might grab only the first byte if you use substr()
. That gives invalid text.
mb_substr()
solves this. It reads whole characters. You do not need to know how many bytes each character takes. Just work with start and length parameters like you would in regular English.
Always check that your text uses the right encoding parameter. Pass it directly in the function if not:
mb_substr($text, 0, 5, "UTF-8");
Set it as the default if you deal with UTF-8:
mb_internal_encoding("UTF-8");
Use Negative Start and Length Values in PHP mb_substr Function
You can pass negative numbers to mb_substr()
. This counts from the end of the text.
- Negative start: counts from the last character backward
- Negative length: leaves out characters at the end
Example:
$text = "Hello World";
// Get last 5 characters
echo mb_substr($text, -5); // Output: World
// Skip last 3 characters
echo mb_substr($text, 0, -3); // Output: Hello Wo
This helps when you need to extract the substring near the end.
The Difference Between mb_substr()
and mb_strcut()
Both functions return a piece of the string. But they count in different ways.
mb_substr()
counts charactersmb_strcut()
counts bytes
This matters when you need a certain number of bytes instead of characters. You might do this when you work with databases of file limits.
For example:
$text = "あいうえお";
// 3 characters
echo mb_substr($text, 0, 3); // Output: あいう
// 9 bytes
echo mb_strcut($text, 0, 9); // Output: あい
Use mb_strcut()
only when you need to stay under a byte limit. To handle the normal text or stick with mb_substr()
.
How to Detect Encoding Before Using mb_substr
PHP provides the mb_detect_encoding()
function to guess the character encoding of a string. It checks the string against a list of encodings and returns the one that matches best.
For example:
$text = "こんにちは世界";
// Japanese greeting that means Hello, World
$encoding = mb_detect_encoding($text, mb_list_encodings(), true);
if ($encoding) {
$substring = mb_substr($text, 0, 5, $encoding);
echo $substring;
} else {
echo "Encoding could not be detected.";
}
The output:
こんにちは
mb_list_encodings()
returns a list of supported encodings.- The third argument (
strict
) improves detection accuracy when verify the string strictly matches the encoding. - Always pass the detected encoding to
mb_substr
.
How mb_substr()
Works with Invalid Encoding
The mb_substr()
function needs valid multibyte encoding to read the string correctly. If the string has bad byte sequences, the function may return an empty result, cut the string in the wrong spot, or show a warning.
By default, mb_substr()
follows the encoding set by mb_internal_encoding()
. If that encoding does not match the string, the function may read it wrong and return the wrong result.
To avoid this, you need to stay careful about encoding. In the following part, you will see how to handle it the right way.
- Always pass the correct encoding in the fourth parameter.
- Use
mb_check_encoding()
to check if the string is valid before you pass it. - Use
mb_convert_encoding()
to fix the input before callingmb_substr()
.
Here is a quick example:
$str = "\xE3\x81"; // broken UTF-8 byte sequence
echo mb_substr($str, 0, 1, 'UTF-8'); // may return an empty string or unreadable output
To avoid that problem, check the encoding first:
if (mb_check_encoding($str, 'UTF-8')) {
echo mb_substr($str, 0, 1, 'UTF-8');
} else {
echo "Invalid encoding detected.";
}
This way, you avoid bad output and stop the function from breaking.
How to Set Default Encoding Globally
You don’t need to pass the encoding manually to every multibyte function, you can set a global default with mb_internal_encoding()
and mb_http_output()
.
Set internal encoding:
mb_internal_encoding("UTF-8");
PHP uses UTF-8 for all multibyte string work. It switches only if you set a different encoding.
Set HTTP output encoding:
mb_http_output("UTF-8");
Use mb_http_output()
to set the character encoding for output (like echo
or print
) when output buffering or conversion is enabled.
Supported PHP Versions and Requirements
You can use mb_substr()
if the mbstring extension is active.
- It works in PHP 4.0.6 and above
- It runs on PHP 7 and PHP 8
- You must enable mbstring in your
php.ini
To check if it is active, run this:
extension_loaded('mbstring'); // Returns true or false
If it is not installed, you can add it:
- On Ubuntu:
sudo apt install php-mbstring
- On Windows: enable
extension=mbstring
inphp.ini
Wrapping Up
Use mb_substr()
when your text includes multibyte characters. It reads characters the right way and avoids broken letters.
It helps with:
- UTF-8 and other non-ASCII text
- It gets parts of a text by character position
- It works with negative values
- It avoids bugs from byte-based slicing
The mb_substr()
function is the right tool if you just need to handle the plain character. It works better than substr()
when your data uses any language beyond English.
Each parameter is the character position, not byte offset. So mb_substr()
gives you exact control over what you extract.