Tarquin/Perl
Tarquin's Little Corner of Perl
(a pun on Wisdom of Perl ... has that been done before? probably. anyway, I can hardly class this as "wisdom"!)
Insane ramblings on perl...
scripts
These both use Mych's Wookee playground, but with different 'brains' – much simpler brains.
I wonder ... has anyone made a script like the playground, with two input fields:
- text to process
- perl script to exectute on the text
Mychaeel: I suppose anything has been done already. Are you asking because you'd like to create such a script yourself?
Tarquin: I made the reverser script for wikipedia; and the core is so simple (4 lines but it could be just one), that I thought it would be neat if one script could take different instructions. Not sure how to do it, the dark arts of reading url parameters are a bit beyond me. BTW, why is the main portion of your playground script within an "eval"?
Mychaeel: To catch errors (which would otherwise end the script and make the server output "Internal Server Error" because no headers have been printed yet).
references
The use of [, { and ( for arrays and hashes confuses me, but I think I've cracked it. The basic token is (. Perl knows whether this is to be an array or a hash from the variable name & the nature of the list:
@anArray = ( 3, 4, 5 );
%aHash = ( 'Waterloo' => 'Abba' , 'Diva' => 'Dana International' );
and so forth. I think a lot of confusion sprang up when I was creating an array of hashes (or was it a hash or arrays), and I read perlreftut with an insufficient concentration of mood-enhacing chemicals in my system.
This came to light when I set off today to understand what → means. It's converted into a nice little arrow on this wiki, but in its promordial state, -> it's altogether different. Look at it. It's like goo oozing through a ceiling grate in cheap sci-fi. It's like chicken with teeth. Those two little characters together are not of this world; something about the shape they form together projects the idea that it is not of this world. Someone once took a hyphen, ordinary, and a right-angle-bracket aka a greater than symbol, also fairly rudimentary and put them together (incidentally, I've encountered people who can't remember which of < and > means greater and lesser, and resort to complex mnemonics (more on those later). What's there to miss? Big side, big number. Pointy small side, small number) – two innocuous symbols, yet adjacency breeds unfamiliarity. It's certainly gestalt. Or emergent. Or something.
So... if $myRef holds a reference, {$myRef} can be made to stand for the array, in @{$myRef}, so ${$myRef}[1] is an element like $myArray[1] is, and the evil arrow comes in as a shorthand to write $myRef→[1]. Hooray. One step closer to understanding Wookee...
Mychaeel: Good luck on your pursuit of the Evil Arrow. I actually find the $myRef->[1]
syntax much neater looking than the ${$myRef}[1]
one, that's why I'm using it wherever possible; and it's mandatory for calling object or class methods as in $myObject->blah()
, so accessing myObject's properties as $myObject->{foo}
makes for a nice analogy between the two.
Tarquin: It is nicer, but like nice things in maths, I find I need to understand the complicated thing before I can really grasp the shorthand.
Dma: You mean this stuff doesn't just intuitively make sense to you?
Tarquin: I'm currently feeling extremely smug for having done:
foreach ( \@classPublicVars , \@classPrivateVars ) { @$_ = sort {uc($a) cmp uc($b)} @$_ ; }
Sure, I could have done the same thing with two consecutive sort functions, once on each array – but see under "laziness"
objects
If I understand correctly, objects in Perl work exactly liek the hash references above, with $myObject→property analogous to myUnrealObject.property.
So an object just works like a souped-up reference things.
Mychaeel: That's a pretty to-the-point description of what Perl objects are. In fact, the manual describes them as "references that know to which package they belong." Perl just memorizes a package name along with a regular reference (via the use of the bless
command, given the reference and the package name) and uses that package name to call methods.
I'm reading perltoot. It seems that the object stuff has to be within a package that's called up by a procedural main perl file.
Mychaeel: The "package" part is right, I don't understand the "procedural main Perl file" part though.
Tarquin: it's not like UnrealScript which has no single "way in", or Java where the root class (I think) has a main() method. There's a layer of Perl above the classes which created the objects and does stuff – a bit like Unreal has the native code that creates objects and makes them do stuff.
Mychaeel: ...or like C++ which has a main() procedure either. In theory this main program could just create an "application" object and call a "main" method of it, if you wish to stay as object-oriented as possible. (Delphi does that, basically.)
Birelli: I'm pretty sure in Java it's actually called init() (at least in applets), but that really doesn't matter so I'll be quiet now
Olorin: I'll pitch in: I think perl implicitly uses a routine called 'MAIN' if you don't have one. I think it's good practice to explicitly have a MAIN label to indicate where your execution begins, so my Perl scripts start with:
#! usr/bin/perl MAIN: #stuff exit(0) #other routines
From which you can see I grew up on C/C++.
wookee
moving on to working out what wookee does...
the modified sub WikiToHTML does this:
- create a new BlockWiki object
- feed it the html from @_
- let BlockWiki cascade everything from there
Questions:
Tarquin: is it necessary to create a new object? why not just use 'static' things in the class?
Mychaeel: If you look into that class you'll find that it indeed has to save a lot of state information – for instance, the variable that accumulates the formatted output. Hence, we need an object. (Of course that information could also be stored in "static" (package- or class-level) variables, but that'd somewhat defeat the purpose of OOP and would make the code non-reentrant, and you couldn't nest <wiki>
blocks in <wiki>
blocks.)
Tarquin: Thank you I think I get it. So it's not necessary from a syntax point of view, it's just useful to do it that way. For my page idea I might as well use static, since I don't need multiple instances of the page object... hmm. Not sure how I'll handle redirects, maybe that could fit in like that. more pondering required.
Mychaeel: From an OOP point of view, it'd be advisable to use static properties and methods only if you need them and stick to instance properties and methods otherwise.
shift
Tarquin: "shift" called with no argument means the first element from the @_ array, and pops it off too so the next use of shift gets the next one – I think. It's syntax like that that mkes Perl scary. It takes an initial Klein-bottle flip of the brain to understand, but once you know it it's extremely simple to use. It's like maths.
Mychaeel: Hehe. How true, how true.
The array @_ is a local array, but its elements are aliases for the actual scalar parameters.
That means ALL perl subroutine parameters are "out" parameters in Uscript jargon!
Mychaeel: If you use them like that, yes. (Usually you copy the @_ elements to local variables.) But be careful: If you try to assign to a @_ element that represents a constant expression, you'll get a fatal error. The much better way of dealing with "out" parameters is passing references.
Tarquin: Do some people consider perl to have too many built-in functions? I think it's great – once you get the hang of enough of them to do something.
Something like join("<br>", @MyArray );
in another language might go like...
for( i=0, i< ... // urg whats the syntax for counting an arrays elements // urg used , instead of ; again // urg is it "i<Elements", "i<=Elements" or "i<Elements-1"? // (Why is there a zero? Life would be so much easier without it) MyArray[i] = // damn. I need something to output to. // go back up and create a local variable // decide to put this all in a seperate function // spend 5 minutes moving code around // make up a new variable name for the local version of the array MyString = "<br>"$ MyArray[i] ; // hm. would it be handy if my function took an arbitrary joining string?
...and so on.
Mychaeel: Perl certainly isn't a "minimal" language like C where the language base consists of only a small handfull of statements and everything else is added through libraries. Some people might consider Perl "ugly" from a language design point of view because of that, but Perl never claimed to be a "beautiful" language, just a useful and efficient one. (C++ at least provides syntax constructs to make library functionality look as if it was built in.)
For that matter, PHP is worse than Perl: Virtually every functionality extension has to be provided as native code and preferably be compiled directly into the PHP executable. Perl is powerful enough to allow for such extensions to be provided as Perl modules themselves. Only very few Perl modules require support through native code, and usually you can clearly see the reason why.
Linguistics
something I'll write about later...
I had an idea about all the shortcuts in perl like "shift" with no argument implies shift @_. Analogous to things like "shouldn't" in English.
General comments
Mychaeel: tarquin, if you're looking for a book about Perl, get [Programming Perl] (co-authored by Larry Wall, Perl's creator), usually only referred to as "the Camel Book" (for obvious reasons). It's actually a pretty entertaining read.
Links
Dma: http://www.bbspot.com/News/2001/03/perl_test.html <==
Tarquin: http://www.perl.com/pub/a/1998/08/show/onion.html
Mychaeel: A very good read... thanks for the pointer. "The fact is, your brains are built to do Perl programming." – The only thing that has always bothered me in Perl is that if
and all sorts of loops (for
, while
and so on) require a block in braces. It has me wondering whether that's a problem related to parsing the otherwise pretty form-free Perl source, or whether Perl's creators just had a fit of wanting to inflict something they consider "good programming style" on lowly Perl programmers...
T1: If doesn't require a block in braces. you can do dosomething() if(condition)
Mychaeel: Yes, I know. What I meant is the if (cond) { block }
form, which you'll have to use if you need an else
or elsif
.
Mychaeel: Hehe: "We can debug relationships, but it's always good policy to consider the people themselves to be features. People get annoyed when you try to debug them." How true, how true...
Tarquin: For Mych: http://c2.com/cgi/wiki?ThereIsNothingPerlCannotDo There's plenty of perl stuff on PPR, including the idea of using ||= as an assignment operator for OO (and here I say OO.. !). I wonder if there's a Perl-specific Wiki.
Mychaeel: I agree with that Wiki page. There's a link to an article on [Why Perl prototypes are bad] which I definitely disagree with though. The "inherent bugs" the author talks about aren't in fact any – it's just that Perl prototypes and C prototypes aren't the same, and aren't supposed to be. (Like that a prototype becomes effective only after it has been parsed.) Quite simply, Perl isn't a language for the easily confused.
Tarquin: Hmm... that wiki on perl we found the other day is offline I was looking forward to rambling there, too.
Problems with Magic content
I have, basically, this:
package Parent { sub ListParameters { my $class = shift; my @parameters = $class->CommandParameters; $text .= join '', map { "<LI>$_</LI>" } @parameters ; } # called from UseModWiki sub MakeSection { my $class = shift; my $page = shift; # the page being browsed my $magicmodule = shift; # the requested magicmodule my %params = map { s/^"|"$//g; $_ } (shift =~ m[(\w+)\s*=\s*("[^"]*"?|\S+)]g); #" $params{'thispage'} = $page; # add the page to the parameters hash $text .= qq[<div class="magic" id="$magicmodule">\n]; if( grep /^\Q$magicmodule\E$/ , @registered ){ # if one of our packages matches the name of the wiki page, run the script $text .= $magicmodule->GenerateContent(%params); $text .= $magicmodule->ListParameters; return $text; } } package CommandClass { @ISA = qw(Parent); sub CommandParameters { } }
If CommandParameters contains something, eg
sub CommandParameters { 'foo', 'bar' }
then I get it out as a list. If it contains NOTHING, the array @parameters gets the value of $class in it. Now I understand that CommandParameters is called with $class as the first element of its @_. But why is it spitting it back out, and how do I stop it?
...
I've switched to using
sub CommandParameters { undef }
but I still can't work out how to test that the @parameters array is empty.
ALWAYS FALSE:
return "" if scalar @parameters == 0; return "" if @parameters == ()
ALWAYS TRUE
return "" if $parameters == 0; return "" unless defined @parameters; # I've put "undef" into an array ... what does that do to it?
Mychaeel: Well... an array with "undef" in it has at least a single (undefined) element. An empty array is "()". And: An empty function always returns the value of its "@_" – you have to do something in that function to set a different return value. Also, "$parameters" and "@parameters" are two different and completely unrelated variables.
Tarquin: For some reason I had it in my head that $array returns the number of elements. :con: So how do I check @array eq (undef) ? That gives a "Use of uninitialized value in string eq at emptyarray.pl line 31." Though I could just keep the function empty and test for a return equal to ( $class ).
Mychaeel: The much neater way would be simply having the function return an empty array:
sub CommandParameters { () }
That could be the default implementation in the common base class, so that only subclasses which do have parameters need to overwrite that function.
Tarquin: Done. Thanks. I've just used "scalar @parameters == 0" as I can't work out how to check it's equal to ().
Mychaeel: Since "==" expects scalar operands, the "scalar" operator is redundant; otherwise yes, that'd be the way to check whether an array is empty.
if (@parameters == 0) { ... }
Tarquin: Stylistic question. Would the following be considered really bad, because a sub is calling a child implementation of itself?
{ package WantedPages; @ISA = qw(MagicContentMaker); # ...snip... sub GenerateContent { my ( $class, %params ) = @_; my ($text, @links); #my $testoutput; if( $params{sort} = 'alpha' ) { return WantedPagesAlpha->GenerateContent } else { return WantedPagesRequests->GenerateContent; } } { package WantedPagesAlpha; @ISA = qw(WantedPages); # ...etc }
Mychaeel: Well... I'd consider it bad OOP, and it also only works because GenerateContent is technically a static method for class WantedPages whereas it is a normal (non-static) method in its superclass – which in turn is bad OOP too and wouldn't work with any programming language except Perl which doesn't have an all-too-strict syntactical notion of OOP in the first place.
So I'd say it is ugly on several levels; but why make "sort=..." a parameter anyway? It'd be very neat and OOP-nice to create an (abstract) class WantedPages where WantedPagesAlpha and WantedPagesRequests are subclasses of which can be used as such in the #MAGIC header directly.
Tarquin: It seemed to me easier for the end-user to think of WantedPages as a single command with different options. But it's a lot easier to code your way. I was trying to understand how to travel down a class tree based on options for the Lunatic UseMoo project. BTW, have you thought of making a perl wrapper like UMake? Just something that registers to open perl filetypes, displays output and provides a "run last script again" button.
Mychaeel: No, I never considered doing that. Setting up the file types is a one-time configuration task, and my editor allows setting up a custom menu command to run the script I'm editing without any helper tools.
Actually, with something like UnrealScript's "states" you could create your "runtime subclass selection" quite neatly. States are more or less a type of subclassing that's "orthogonal" to the normal subclassing. Unlike normal subclasses however it is possible to switch an object's state at runtime, so you could in theory have a single WantedPages class that has two states, "SortAlpha" and "SortRequests", each of which provides its own implementation of the GenerateContent method. However, because states can be switched at runtime, it is impossible to have different data in different states – and that's also why you cannot simply switch classes at runtime for a given object because there you can have different data in different objects. (Typecasting objects – as opposed to object references – therefore is a non-trivial task that usually involves the creation of an all-new object of the target type in the process.)
Tarquin: Which text editor do you use?
Mychaeel: Still this one. It may not be as feature-complete as other text editor that are around, but what's more important to me is that it's entirely feature-complete in respect to the features I want. – Oh, and I have replaced that very old "umake" version by UMake meanwhile.
Feb 2004
I think I FINALLY really get references! I wrote this last night to check:
#!/usr/bin/perl %myHash = ( foo => 'bar', biz => 'bax', ); sub takeHash { my $hashRef = shift; foreach (keys %$hashRef) { print $hashRef->{$_}, ', '; } } takeHash(\%myHash);
I am having to make design decisions in UseMOO. The key problem is that I am breaking OO principles in bad ways. The basic idea behind MOO is that we have a class tree like this:
- Page
- PrefsPage
- AdminPage
- WikiPage
- RecentChanges
- EditPage
... and so on. Each object adds new functionality. To make this actually work, the script needs to start by choosing which class to create an instance of, and then run that instance's browsePage() method. The problem is how to choose the class in the first place. I've borrowed Mychaeel's registration system, and using this, the new() function is able to travel down the class tree, checking which conditions the browser parameters satisfy. So:
- register() needs to call its immediate parent, but the parent itself needs to call its parent!
- new() needs to call implementations of itself in subclasses, which is BAD.
For register, I think the least yucky solution is this:
sub WikiPage::register { my $class = shift; $class = (ref $class or $class); # WikiPage can't call this, it must call something further up if ($class eq 'WikiPage') { $class->SUPER::register; return; } push @registered, $class if $class->isa(Page) and not grep /^\Q$class\E$/, @registered; }
and for new(), to avoid the danger of recursive calling, have new(), newPageChild(), newWikiPageChild() etc.
Feb 2004 part 2
I think I have a tweak to the above that makes it slightly more palatable.
Give each class a sub called "this" that returns the class name. (AFAIK there's no way to magically do this in perl.) To use the example from the perlboot manpage:
{ package Animal; sub this { 'Animal' } sub speak { my $class = shift; $class = (ref $class or $class); print "a $class goes ", $class->sound, "!\n"; if ( $class->this ne this ) { print "we're not where we were"; } return; } }
Now if we call Horse→speak (or some other subclass), the speak sub know it's running in a superclass. This approach would eliminate the use of the class name in my register sub above, which makes it look a bit cleaner.
Feb 2004 part 3
I think I finally understand what Mych means when he says that trying something and finding out it doesn't work days later can be a good thing.
I've had to change the way Moo creates instances, because the first way isn't flexible enough. The new method is a little unorthodox, but having seen for myself the drawbacks of the original method, I'm more convinced the new method is right.
What I'm now doing is this:
my $instance = {} Page->constructor( $instance, @variables );
in effect, constructor() is really a blesser.