-
2006-08-20
randhol
Hi
I found a bug in dokuwiki 0.0.20060309-5. I also tracked it down so it should be easy to fix if you agree with me.
Now the bug occurs if you have a section that starts with a number or that is not written in ascii like:
===== 9th =====
===== Γεια =====
etc...
the _headerToLink function in inc/parser/xhtml.php will create a ascii only anchor or id for these section and will strip the numbers at the start.
This is fine. A TOC in the same page will also work just fine.
However, the problem is when one write [[#9th]] or [[somepage#Γεια]] as a local link. When cliking on the link one jump to the correct page, but not the correct place inside the page.
The reason for this is that in the function internallink the 9th and Γεια are not transcoded with _headerToLink.
In the internallink function it is written:
//keep hash
if($hash) $link['url'].='#'.$hash;
This should be changed to:
//keep hash
if($hash) {
$hash = $this->_headerToLink($hash);
$link['url'].='#'.$hash;
}
because $hash needs to be transcoded the same way as the sections were in order to refer to the same location.
HTH
Preben
-
2006-08-22
ChrisS
I can't help thinking this is really only a partial description of the problem and the solution.
_headerToLink by its nature can create the same "base" string from many different input strings. The fragment IDs generated for a page's TOC and header overcome duplicates by attaching a unique number to fragments with the same base string. The above description and solution don't take this into account. I.e. the solution described will generate an existing fragment identifier, but it may not be the correct one.
A complete solution would be to store the entire TOC, including both original section text and processed fragment ids in the page metadata (which I think has been mentioned before as desirable) and to reference that data to retrieve the correct fragment id.
A workable solution, is to require users to copy/paste a link from the page TOC.
-
2006-08-22
randhol
That I agree with, but how does the TOC calculate the right name? I have only looked at parts of the code. Anyway, to require to copy/paste links from a TOC Bad Solution[tm].
Firstly, wikis are meant to be highly dynamic and not involve intricate technical tricks to get them to work. I mean wikis are meant to be an easy system to add content not layout and functionality.
Secondly, how can one know that _headerToLink will generate the same unique number each time? As you say, in case of a conflict it generates a unique number and as one cannot call _headerToLink to get the same number in the internallink, then one cannot know which number the page will have once the cache is flushed. So the link won't work again.
But one thing that I don't understand is why do one need to do the conversion in the first place? Why do _headerToLink need to first convert the link to ascii and then strip all numbers. Why can't we use UTF-8 or better? I don't see why we cannot use the string as it is? Links are certainly working in this way. I mean I can make a page named Γεια and dokuwiki handles this correctly.
-
2006-08-22
randhol
OK I now have a improved solution. If one do the change I suggest above and then change the _headerToLink function to be simply:
function _headerToLink($title,$create=false) {
return rawurlencode(utf8_strtolower($title)); //alternatively remove the rawurlencoding
}
Then it works. I don't see why one need to care about if people write two sections in the same page with the same name. One should rather show that this doesn't work as the TOC will go to the first so that one changes the name of the first or second section...
PS: Why isn't utf8_strtolower used in xhtml.php in stead of strtolower?
-
2006-08-22
randhol
ARG! I see now that this doesn't work if there are spaces in the section header as spaces are not replaced my underscore. OK, some more digging in the source then...
I'll be back :-)
-
2006-08-22
ChrisS
Preben, its a little more complex than that.
e.g.
1. Project #1
1.1 Project Team
2. Project #2
2.1 Project Team
The headings are different, but _headerToLink() will generate the same base fragment.
I believe the a similar thing can happen with the conversion from non-ASCII utf-8 text to ascii. Different text can generate the same fragment.
As to your earlier question regarding the character restrictions, its tied in with the HTML and XHTML rules for id attributes. I looked into this earlier today. My thought is it may be feasible to have a switch(*) allowing utf-8 characters in the fragment identifier. They are allowed under XHMTL (not under HTML).
(*) iirc, there already is a config setting which governs the characters allowed in page IDs. Using that switch in _headerToLink() may make sense.
-
2006-08-22
randhol
I see. For the above examples you wrote only the # causes problems. The others work. However that could be fixed by patching the link parser? And also replacing spaces and # with underscores. HTML isn't generally strict, so it shouldn't much matter what it allows or not? If HTML doesn't allow HTML then it is obsolete IMHO. Isn't dokuwiki also XHTML?
Dokuwiki isn't much use to me if it doesn't support utf-8. I need something that works for utf-8 so this is why I would like to help find a solution. :-)
-
2006-11-04
ChrisS
patches created for devel version.
see patches "no forcing of ASCII in section IDs".