It might not be windows per se, but pcre and how its compiled in. From the pcre docs, http://www.pcre.org/pcre.txt
, scroll down to "Generic character types".
- character classes, like \s, normally only include ascii characters
- if pcre is compiled with unicode support and its called using the PCRE_UCP option (which PHP does), the character classes will include non-ascii characters, e.g. 'A0' (full list is given in the docs).
Finally, there is this para, "In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are relevant."
which seems to imply, that when not in utf-8 mode, pcre compiled with unicode & called with PCRE_UCP, will match '\s' to two additional bytes, 'A0' (nbsp) & '85' next line.
Also looking through the pcre changelog, http://www.pcre.org/changelog.txt
for Version 8.10 25-Jun-2010, point 20.
And bug reports, https://bugs.php.net/bug.php?id=52971
- Windows binaries for PHP are supplied with the bundled pcre library compiled in.
- My PHP versions, OSX supplied 5.3.26 (pcre 8.02), MacPorts 5.5.5 (8.33), and OpenIndiana(Solaris/Ilumos) 5.2.17 (8.02) don't use the bundled library.
Running something similar to yonizaf's small test script, I get the following wierd results.
All running under a webserver (two use mod_php, and 5.5.5 is cgi) do not split the string at 'A0' when not using the '/u' flag.
The two running OSX versions split it at any 'A0' when using CLI :!:, the solaris version doesn't.
The two OSX php split the string at a valid UTF-8 non-breaking space (C2 AO) when using '\u'. Solaris version doesn't.
All of this wierdness argues against using '\s' anywhere.
Plus on a related note, the really bad performance of '/u' appears to have been fixed since PHP5.2.9, https://bugs.php.net/bug.php?id=44336