This is a static dump of issues in the old "Flyspray" bugtracker for DokuWiki. Bugs and feature requests
are now tracked at the issue tracker at Github.
Closed
Duplicate
FS#2321 Search doesn't work well with <html> tags
CSS, XHTML, JS, Browsers
2011-09-02zioth
When HTML is enabled, you can put the following in your page:
<html><script>var str = "Hello";</script></html>
If you then search for the string "Hello" and click through to this page, you end up with this:
This creates a javascript syntax error, which, in many cases, prevents the page from rendering correctly. It also puts invalid text in a javascript context. The same problem can happen if a plug-in generates javascript or php. It can also happen in a straight <html> tag without <script>.
Solving this problem in PHP would be messy, since you'd have to account for javascript, HTML attributes, PHP script and plug-ins. It would be easier and more effective to move syntax highlighting from php to javascript. A javascript function could go into the "bodyContent" div, and iterate through the DOM tree. It would highlight text only in text nodes. This code is pretty easy to write - if I have time, I might add the solution to this bug.
Partial work-around:
In fulltext.php, html.php and search.php, remove the quotes around "search_hit." That creates invalid HTML, but I'm pretty sure all browsers are okay with it. This does not fix the bug, but it does prevent javascript errors.
2011-09-04ach
The problem is not in fulltext.php and search.php, only in html.php. Because the search results display special characters as HTML entitites (so, HTML won't get interpreted).
2012-09-24zioth
This url has code to find all text nodes in a document. Those nodes can then be searched for the search string.
The main disadvantage is that it's hard to find strings that span two text nodes. For example, finding "I like Dokuwiki" in this HTML:
<span>I like</span><span> Dokuwiki</span>
A further effect of this bug as described on FS#2651: An entity will get ripped apart as well, e.g. `'` will become `�<span class="search_hit">3</span>9;`.
2013-02-16andi
I believe this can't be fixed properly. It boils down to parsing HTML with RegExps which is a very bad idea, but is what we do for highlighting currently. Works fine in many cases but not all.
1) Remove the quotes from the class name. This will mess up javascript and html, but at least there won't be javascript errors.
2) Don't do syntax highlighting inside of html tags. This is also not a perfect solution, since some words won't be highlighted.
3) Do the highlighting in javascript. This has multiple advantages: It's more efficient (less server-side load), it fixes the bug completely, and it highlights text produced by javascript plugins, which the php highlighter is guaranteed to miss.
2013-07-31Michitux
What I would suggest is to turn off search highlighting for pages that contain html tags by extending the metadata renderer to record when the html-function is called, i.e. when HTML is actually used in a page. However I don't think this will fix the problem with entities.
2014-12-30glen
maybe the hiliting should be done at client side, i.e with javascript?
I mean jQuery (or even pure JS) should be able to walk DOM