Wanted Pages/Discuss
Source for the Wanted Pages feature: WikiPatches/MagicWantedPages
Presentation
list order
Wormbo: Maybe it would be nice to be able to change the sorting of the wanted pages list between number of requests (like now) and alphabetically-only.
Tarquin: That's relatively easy for me to code ( as long as I can figure out the sort function) but how do you suggest users access the two different flavours? I suppose I can make a [Wanted Pages Alpha]?...
I've had an idea of how to tackle this: put lines like
#MAGIC throttle=1 #MAGIC order=alpha
at the very end of the page. I'm having a few problems implementing it. These lines have to be read before the wikitext is parsed, because I want to pull them out. But the call to Magic module is after. So I have to store them in the function (yet another function var) and then pass them, probably as a hash to the module, which has to reconstruct the hash from the flat var list. all sounding rather complicated
Mychaeel: I think the neatest way would be designing your "Wanted Pages" class in a way that it can be easily subclassed to sort its results alphabetically (or by number of consonants in the page title, or whatever) – then create "Wanted Pages/By Name" and "Wanted Pages/By Requests" that use those "Wanted Pages" MagicPage classes.
Tarquin: following debate with Sunir over at UseMod, I wanted to give control of the page count throttle to users, rather than hard-coded in the script. I agree though, it's probably best to be able to view both sort orders without any fiddly editing. Not sure I can use a "/" in the package name though
Mychaeel: You also can't have blanks in package names even though "Wanted Pages" contains one, right? There must be some abstraction layer between the package name and the actual page name...
Tarquin: I guess I have three options:
- use some sort of system that represents a "/" in the package name
- switch to using an arbitrary package name, which holds a $pageName. Complication: The registering process must then store two things: "this page name => has a script in this package". Advantage: page name can be anything
- as discussed with Sunir a while ago, move the information about which pages are magic to the wiki content itself, allowing the community to choose which pages are magic. Something like: "#MAGIC WantedPages throttle=1" at the foot of the page. Advantage: it's the WikiWay complication: I need to work out a simple syntax which also allows parameters (like the pagecount throttle) to be passed.
Mychaeel: The third option is probably the best. The #MAGIC header should be at the top of the page though, not the bottom... – Parsing the parameters isn't very complicated. Be $line the variable holding everything that follows "#MAGIC WantedPages", here's a regular expression to parse it:
%params = map { s/^"|"$//g; $_ } ($line =~ m[(\w+)\s*=\s*("[^"]*"?|\S+)]g);
Be sure to do some tight security checks on the "package name" parameter following #MAGIC before using it; we don't want to introduce security holes by this.
Tarquin: basically, if it's a package in MagicPages.pm, run it. If not – ignore. Thanks for your advice Mych, it's much appreciated
Tarquin: Slowly reading the above regexp... it's this, right?
m[ (\w+) # grab word as $1 \s* # optional whitespace = # equals sign \s* # optional whitespace ( # now grab as $2... either: "[^"]*"? # something in quotes | # or \S+ # some non-whitespace ) # end of $2 grab ]gx
but I'm confused about $2. Why is the " at the end of "something in quotes" optional?
Mychaeel: Just in order to capture something like that as well:
foo=bar name="Michael Buschbeck" baz=quux base="Unreal Wiki
Tarquin: ah... MagicContent is about ready to run Future exansion:
- MAGIC Category – appends the search results for the page name to the page content
formatting
I think a nice formatting would be
- The Weather In London (4 requests)
Requested by Mychaeel, Tarquin, Wanted Pages, Wiki Markup
Or don't add links to the pages but just put their titles in the edit link's "title" attribute; when you hover the link you see where the page was requested from.
Known problems
Using the algorithm in wiki.cgi
It causes the Umake bug:
It's the dot in UMake 1.1 ...which means the bug is NOT in my code (yay!) it's in GetFullLinkList, and it's outputting: "UMake /Discuss 1.0 1.1"
Mychaeel: The algorithm used in wiki.cgi to gather all page links from a given page is inherently flawed... it would be way better to make that part of the regular markup parser. The link list returned by GetPageLinks is potentially inconsistent with what is actually displayed as a link.
Mychaeel: Actually it'd be neater if that function had Wookee output a list of page links. (Wookee can do that.) However, Wookee's parsing is quite more expensive than the simplistic (and technically Wookee-incompatible) parsing in GetPageLinks.
Tarquin: I'd be inclined to keep the less expensive option, since it's already quite a slow-loading page. I'll see if I can work with GetFullLinkList, but it won't be for a while as there's only one place it's causing a problem (he says, HOPING no idiot will create the page "1". .... yeah, I'm an optimist). Formatting done with a DL and a little bit of CSS insterted directly into the page. so it's better than the OL in appearance AND semantically!
Mychaeel: Anyway... that bug should be fixed, regardless of who caused it.
Tarquin: I've found the bug, but I lack the perlness to fix it. It's the definition of $InterLinkPattern which can't handle faced [[InterSite:Foo bar]] correctly. The options to fix it are:
1. rewrite the definition. It's in InitLinkPatterns, line 294 of wiki.cgi:
$InterLinkPattern = "((?:$InterSitePattern:[^\\]\\s\"<>$FS]+)$QDelim)";
Mychaeel: For some reason that pattern expects page names in InterWiki links not to contain any spaces. If that is the problem, just remove the "\\s" from the pattern.
Tarquin: 2. write a Wookee-based version of GetPageLinks, line 4090. (specs on usemod page)
Mychaeel: While parsing and formatting a page Wookee records information about the parsed page that can be retrieved from the BlockWiki object after calling BlockWiki→parseBlock(). That information includes a list of all page links found in the page.
Obviously using that method involves having the page parsed by Wookee; and obviously having a page parsed in full by Wookee is computationally more expensive than using the simple and UseMod-specific parsing algorithm used by GetPageLinks. Whether it'd be prohibitively expensive would be a matter of trying it out.
Tarquin: Bug fixed. tx Mych
Subpages aren't listed at all
Tarquin: Trying to fix this. Problem match subpages with no main page, ie "/Foo". But I have a problem matching "/Foo" when it's at the start of a string. ie:
"/MatchThisWithNoInitialSpace /MatchThisToo Foo/DontMatchThis DontMatchThis"
Hmm... I think \B/\w+ will do it ... fixed it
Fixed
Mychaeel: Home Page/Changes is listed as "wanted." Also, very frequently the list of referrers still includes the same page several times.
Tarquin: Aw, crap. will investigate. sigh... is this whole wanted pages feature actually useful?
Tarquin: Right. Three bugs:
1 UT
The pagename "UT" isn't in the output, AT ALL. So I suspect it's pagenames like "Canvas (UT)" being incorrectly parsed.... got it. It's Sunir's "while" statement.
while( /(\w+)\s*/g ) {
grrr... That should be using one of the ready-made UseMod LinkPatterns.
2 Home page/Changes
I suspect this is because Home Page/Changes itself requests a "/Changes" page. hmm.. the raw output line is
Home Page /Changes /Discuss Graphics_Design Mesh_Overview
That lists the requests made by Home Page. GetPageLinks is the culprit again: it considers /Changes and /Discuss to be ghost links, but they are not. I know what to look for now
3 Repeated links
Same as bug 1. "Category_Class_Tree" is requested by Actor Class Hierarchy (UT) and Actor Class Hierarchy.
PS: I've tracked down the above bugs without looking at the code, but with the output of an intermediary function. It's actually quite helpful to do this without the code. I'd compare this to Mychaeel's suggestion of working out a design brief before writing any code
Mychaeel: Good luck fixing all these bugs, but I slowly get the impression that with every fix you apply you get two more new bugs... maybe the code is at a stage at the moment where it'd be worthwhile to just create it from scratch and see to it that it's free of bugs conceptually already (probably even under the same simple premises as the original code, as opposed to using Wookee's parser for example).
Tarquin: As far as I can tell, it's now free of bugs. "UT" is no longer there, no repeated links, and I can't see "Home Page/Changes" either (though that is possibly because it only has one request. I think I know how to fix it, I just need to set up my local wiki to reproduce it, and I think I know how to do that too. Question of finding the courage to do it). It was't the fixing that caused a bug, it was fixing + me thinking "hey, now I've fixed it, let's implement subpage support". The problems were down to the code I filched from Sunir, which uses \w+ to match elements of a list delimited by space characters (and hence wasn't reading "Foo (UT)" correctly). \S+ is far more sensible. Update The "Home Page/Changes" bug is now fixed locally, it was the UseMod function not handling "headless" subpages correctly. Will upload soon.
Tarquin: AFAIK all bugs fixed