2012-02-23
Michitux
When a plugin adds dynamically links to a page that are also added to the meta data (the tag plugin does that in the topic syntax) the metadata search index isn't updated until the page (i.e. the .txt file) is changed.
The easiest solution is to update the meta index whenever the .meta file has been changed (which only happens when the meta data really changes), however there could be some meta data that might be changed very frequently which would cause a very high load on the server.
Some other ideas:
- Determine if the index needs to be updated in an event in the indexer. But how does a plugin know the index needs to be updated? It would need to load the indexed values and compare them with the current values. This would be slow and plugins would need to do that explicitly but what if more than one plugin uses the same meta property?
- Maintain a list of pages that need to be re-indexed and update that list e.g. whenever meta data is updated. During the meta update an event could be used in order to allow plugins to flag the index as out of date. Saving pages could also add these pages to the index. When external edits are detected a page could be added, too. Additionally plugins could also add a page directly e.g. in an event in lib/exe/indexer.php. The advantage would be that it could be determined very quickly if anything needs to be indexed and if anything needs to be indexed this could be done from every page. A problem could be index version updates, that would need to be managed somewhere else, maybe one could also "simply" generate a list of all pages whenever such an update is detected in lib/exe/indexer.php? Instead of one file one could also use a directory with one file for each page that needs to be indexed, that way adding pages would be atomic without any need to check if a page is already on the list, however for adding all pages of the wiki it might not be the ideal solution. Maybe there could also be something to trigger a scan through all pages, one page in each run, the current page could be taken from the line in the list of all pages in the index, the line number could be stored in a file and everything could be protected by a lock. That way only one page would be processed at the same time, but the index doesn't support more anyway. Such a trigger could also be used in order to send digest subscription emails in order to make sure that infrequently visited pages do trigger subscription emails when somebody has subscribed to such a page with a digest subscription.
I currently like the second idea, but I think we should think a bit more on that.