Perl, threading, locking and semaphores
Jan. 26th, 2012 11:15 pmSo, being a fan - as I am - of Perl, I've had a reason to take a bit of a look at threaded code.
Threading is one possible implementation of parallel code and - for my purposes - is quite useful when you've got multiple things going on, which require 'something else' to respond.
Such as - for example - if you've got to log in to a lot of different servers, to perform a simple task - the 'login' takes more real time, than it does 'processing time' - so by threading, you end up with the task being accomplished faster.
Perl is actually quite easy to 'thread' with - the only really hard part is that your perl interpreter must be compiled to support threading - and if it doesn't by default, then it's a bit of hassle to recompile it. The good news is, current versions of perl seem to by default (my small sample of 'a couple of systems' I didn't need to rebuild).
The basics of how to do it, can be found in 'perldoc perlthrtut' - but it goes a bit like this:
add 'use threads;' to the start of your code.
In your perl code, create a sub routine, that will run as a thread.
When you call that subroutine, rather than doing so in the normal fashion, do so using
threads -> create.
E.g.
And for the sake of neatness, you need to 'join' the thread (joining is perlish for 'wait for it to finish, and get any return codes).
threads -> join ( $thread );
And it's just like that.
Because I'm a smartarse, I wanted to get a bit more clever - you can extend this idea for creating several threads, to all run in parallel.
So for example:
Which creates 10 threads, waits for all 10 to 'do their thing' and then joins them. Hopefully you can see where that gets a bit handy, if you've got a lot of networked devices to do stuff with.
But the hard part when messing with threads, is things like communicating between them. There's two libraries that I've been looking at today for that purpose.
You see, when I create a 'load' of threads, the last thing I want to do is to do so in an open ended fashion - 100 threads for 100 servers might be ok.
10,000 threads for 10,000 servers might cause a bit of a problem.
That's where Thread::Semaphore comes in, and - because Thread::Semaphore isn't part of the base distribution - I've also been looking at threads::shared.
Thread::Semaphore 'lets you create a 'shared' counter. (Which defaults to 1).
There's two bits to it - 'down()' and 'up()'.
down() is used to decrease the semaphore, and - if it's zero - will wait until it can do this.
up() increases the semaphore.
So if you insert above:
my $resource_limit = Thread::Semaphore -> new ( 5 );
And within each subroutine, used:
$resource_limit -> down();
And
$resource_limit -> up();
when done, you'd end up with a scenario that - with 10 threads 'existing' - you'd only actually have 5 'running' at any time.
You can also have your resource limit locally, within a loop:
Which passes your 'semaphore' into the thread as a reference, such that you'll only have to 'active' threads per server in your list.
If you don't have Thread::Semaphore available, you can 'fake it' by using a (thread) shared variable, and a lock.
E.g.
Not quite as good, as you can only have two states - 'in use' or 'not' per host. But does still allow for a bit of crude throttling.
But it was somewhat easier than I thought to do something that was practically useful, using threaded perl. These snippets are for my own reference, rather than practical usefulness, and bear in mind - if you're playing with parallel programs, you can end up with all kinds of exciting and interesting things going on, if you're not careful.
Threading is one possible implementation of parallel code and - for my purposes - is quite useful when you've got multiple things going on, which require 'something else' to respond.
Such as - for example - if you've got to log in to a lot of different servers, to perform a simple task - the 'login' takes more real time, than it does 'processing time' - so by threading, you end up with the task being accomplished faster.
Perl is actually quite easy to 'thread' with - the only really hard part is that your perl interpreter must be compiled to support threading - and if it doesn't by default, then it's a bit of hassle to recompile it. The good news is, current versions of perl seem to by default (my small sample of 'a couple of systems' I didn't need to rebuild).
The basics of how to do it, can be found in 'perldoc perlthrtut' - but it goes a bit like this:
add 'use threads;' to the start of your code.
In your perl code, create a sub routine, that will run as a thread.
sub thread_test_subroutine
{
my ( $arg_1, $arg_2 ) = @_;
print $arg_1;
sleep ( rand(10) );
print $arg_2;
return $arg_1 + $arg_2;
}
When you call that subroutine, rather than doing so in the normal fashion, do so using
threads -> create.
E.g.
my $thread = threads -> create ( \&thread_test_subroutine, ( $first_arg, $second_arg ) );
And for the sake of neatness, you need to 'join' the thread (joining is perlish for 'wait for it to finish, and get any return codes).
threads -> join ( $thread );
And it's just like that.
Because I'm a smartarse, I wanted to get a bit more clever - you can extend this idea for creating several threads, to all run in parallel.
So for example:
for ( my $count=0; $count < 10; $count++ )
{
threads -> create ( \&thread_test_subroutine, ( $first_arg, $second_arg ) );
}
foreach my $thread ( threads -> list() )
{
if ( $thread -> tid() )
{
my $result = $thread -> join();
print $thread -> tid(), " returned ", $result, "\n";
}
}
Which creates 10 threads, waits for all 10 to 'do their thing' and then joins them. Hopefully you can see where that gets a bit handy, if you've got a lot of networked devices to do stuff with.
But the hard part when messing with threads, is things like communicating between them. There's two libraries that I've been looking at today for that purpose.
You see, when I create a 'load' of threads, the last thing I want to do is to do so in an open ended fashion - 100 threads for 100 servers might be ok.
10,000 threads for 10,000 servers might cause a bit of a problem.
That's where Thread::Semaphore comes in, and - because Thread::Semaphore isn't part of the base distribution - I've also been looking at threads::shared.
Thread::Semaphore 'lets you create a 'shared' counter. (Which defaults to 1).
There's two bits to it - 'down()' and 'up()'.
down() is used to decrease the semaphore, and - if it's zero - will wait until it can do this.
up() increases the semaphore.
So if you insert above:
my $resource_limit = Thread::Semaphore -> new ( 5 );
And within each subroutine, used:
$resource_limit -> down();
And
$resource_limit -> up();
when done, you'd end up with a scenario that - with 10 threads 'existing' - you'd only actually have 5 'running' at any time.
You can also have your resource limit locally, within a loop:
sub thread_test_subroutine
{
my ( $semaphore, $host, $arg_1, $arg_2 ) = @_;
print "$host";
print $arg_1;
#check there's a resource available to do this bit
$semaphore -> down()
sleep ( rand(10) );
$semaphore -> up()
print $arg_2;
return $arg_1 + $arg_2;
}
my @server_list = ( "one", "two", "three", "four" );
foreach my $host ( @server_list )
{
my $resources_per_host = Thread::Semaphore -> new ( 2 );
for ( my $count=0; $count < 10; $count++ )
{
thread -> create ( \&thread_test_subroutine, ( \&resources_per_host, $host, $first_arg, $second_arg ) );
}
}
Which passes your 'semaphore' into the thread as a reference, such that you'll only have to 'active' threads per server in your list.
If you don't have Thread::Semaphore available, you can 'fake it' by using a (thread) shared variable, and a lock.
E.g.
sub thread_test_subroutine
{
my ( $semaphore, $host, $arg_1, $arg_2 ) = @_;
print "$host";
print $arg_1;
#check there's a resource available to do this bit
{
lock ( $semaphore );
sleep ( rand(10) );
}
#lock is released, because it's now out of scope.
print $arg_2;
return $arg_1 + $arg_2;
}
my @server_list = ( "one", "two", "three", "four" );
foreach my $host ( @server_list )
{
my $resources_per_host : shared;
for ( my $count=0; $count < 10; $count++ )
{
thread -> create ( \&thread_test_subroutine, ( \&resources_per_host, $host, $first_arg, $second_arg ) );
}
}
Not quite as good, as you can only have two states - 'in use' or 'not' per host. But does still allow for a bit of crude throttling.
But it was somewhat easier than I thought to do something that was practically useful, using threaded perl. These snippets are for my own reference, rather than practical usefulness, and bear in mind - if you're playing with parallel programs, you can end up with all kinds of exciting and interesting things going on, if you're not careful.
no subject
Date: 2012-01-27 01:31 pm (UTC)no subject
Date: 2012-01-27 06:04 pm (UTC)