sobrique: (bubble tree)
You may have found Object Oriented Perl useful - it's not a tool for every job, but if you've a complicated data model, or a data driven process, it's invaluable. (Not to mention code encapsulation - but that doesn't actually seem to come up as much).

You may have also found being able to thread and queue useful. (Perl threading and queues)

However what you'll also have probably found is that multithreading objects is a significant sort of nuisance. Objects are jumped up hashes, and there's some quite significant annoyances with sharing hashes between threads.

However, another module which I find very useful in this context is Storable (CPAN).
What Storable does - essentially - is allow you to easily store and retrieve data to the local filesystem. It's geared up to hashes particularly:
use Storable;  
store \%table, 'file';  
$hashref = retrieve('file');  


This is quite a handy way for handling 'saved state' in your Perl code. (Less useful for config files, because the 'stored' file is binary formatted).

However what Storable _also_ supports is objects - which as you'll recall from the previous blog post are basically hashes with some extra bells and whistles. Better yet, there are two other methods that allows Storable to store to memory.
my $packed_table = freeze ( \%table );  
my $hashref = thaw ( $packed_table );   


This also works very nicely with objects, which in turn means you can then 'pass' an object around a set of threads using the Thread::Queue module.
use Storable qw ( freeze thaw );  
use MyObject;  
use Thread::Queue;  
  
my $work_q = Thread::Queue -> new();  
  
sub worker_thread {  
  while ( my $packed_item = $work_q -> dequeue )  {  
    my $object = thaw ( $packed_item );  
      $object -> run_some_methods();  
      $object -> set_status ( "processed" );  
       #maybe return $object via 'freeze' and a queue?  
  }  
}  
  
my $thr = threads -> create ( \&worker_thread );  
my $newobject = MyObject -> new ( "some_parameters" );  
$work_q -> enqueue ( freeze ( $newobject ) )  
$work_q -> end();  
$thr -> join();  
 


Because you're passing the object around within the queue, you're effectively cloning it between threads. So bear in mind that you may need to freeze it and 'return' it somehow once you've done something to it's internal state. But it does mean you can do this asynchronously without needing to arbitrate locking or shared memory. You may also find it useful to be able to 'store' and 'retrieve' and object - this works as you might expect. (Although I daresay you might need to be careful about availability of module versions vs. defined attributes if you're retrieving a stored object)
sobrique: (Default)
This week, I have mostly been playing with Thread::Queue.
Once of the downsides of perl threading is that it's not particularly lightweight. Spawning lots of new threads to do a single task isn't a very efficient way of doing a task - especially if you have libraries imported, and large data tables.

So the method I've been playing with is queue oriented - spawn a number of threads equal to some arbitrary parallelism target - 1 per 'resource' consumed is a good bet (so for processor intensive stuff, one per processor - if you're doing remote access to 15 servers, one each).

And then implement a 'queue' which is a thread safe implementation of a FIFO queue (FIFO = First in, First out).

It uses the library Thread::Queue, so you include that at the start of your program. You don't actually strictly speaking need to be threading to use it though - there's other reasons to use a FIFO.

So as a sample:

Read more... )

Fairly simple, but does allow for daisy chained processing (e.g. moving from one FIFO queue to the next).
The only slightly complicate part is in handling 'thread exiting'. I've taken to using an 'exit' signaler in the queue. (use an arbitrary pattern, and 'catch' when that occurs).
However the other possibility is in just using some kind of 'all done' shared variable, that you set once the queue is fully populated - because what you don't want to do is just assume that because the queue is empty, work is finished - because when you first start the thread, this might be the case, or perhaps if there's a dependency - or perhaps once the first items get 'dequeued' then the other threads might see an empty queue.

I've been using this mechanism to create a 'cascade' of tasks - run something on one (group of) server(s). Do a some processing. Run something based on the result on another server. This is well suited to queue style processing.
Similarly - because you're queue oriented, then it's also well suited to scaling up (or down) the parallelism. Such as when you're in a multi processor environment, for example - you may want to hog all the processors that are available, but you'll lose efficiency if you overdo it.
sobrique: (Default)
This is perhaps in the wrong order, but to follow on from a couple of rambles lately - threading and perl.
How do you basically do this?
Here is an example:
Read more... )

OK, that's pretty simplistic I know - but the major way I've found threads useful is for what amount to embarrassingly parallel problems, like 'connect to 200 servers, and run the same commands on each of them'. You could quite easily replace that rather trivial 'sleep' subroutine, with one that does an ssh to a host, to run a command and capture the output. (And maybe process the results, before returning them)

Also note: Perl has had threads for a while, but the module doesn't necessarily contain all the functions you need - latest version as of 2012-07-23 is 1.86 which is available from CPAN. (http://search.cpan.org/~jdhedden/threads-1.86/lib/threads.pm)

More detail: http://perldoc.perl.org/threads.html
sobrique: (Default)
So, I've been fiddling around with Perl, and threading.
One of the things that's been bugging me, is that when I've tried to do a 'return' from a perl subroutine, it's not worked - and I couldn't for the life of me figure out why.
What's _supposed_ to happen, is that you do 'thread -> join' to join the thread (once it's finished running) and that's supposed to capture the return result.

Why it wasn't working is thanks to this little snippet in the documentation (Yes, RTFM, I know. But to be fair, I wasn't looking for it in _that_ bit).

"The context (void, scalar or list) for the return value(s) for "->join()" is determined at the time of thread creation."

Perhaps I'd better backtrack a little though - I mean, anyone who's not really 'into' perl, might not have a clue what I'm talking about when I say 'context'.
So I shall summarise.

Perl is quite clever - it has two 'real' variable types - scalar - which contains anything that's a single value (So any string, integer, float, character, reference). And array (or list) - which is a group of zero or more scalars.

The clever bit is that it can figure out what you mean, by the context in which you do it - a brief illustration (lj cut to avoid breaking formatting).

code here )

What's happening is the rather clever function 'wantarray' is being used to tell the call context of the subroutine 'wantarray' is undefined if it's a void context (the result is discarded). True if a list/scalar context and false if in a scalar context.

As for why this is useful - consider if you do something like 'grep' - a Unix command to find 'matches' against text patterns. If you do it in a 'list' context, having a list of the lines it found would be useful. If you do it in a scalar context, then having a number of matches is probably more useful (0 being 'false' you could do 'if grep("pattern", @text_block)' for example)

So anyway - the context of a threaded subroutine is set when the thread is _created_.
Which means you need to do something like:
Read more... )

On the face of it, you immediately discard '$thread' because it drifts out of scope (and it does). However, it also means your thread is created in a scalar context, so any results it returns will be scalar.
If you _don't_ do this, it'll be in a void context, and any return is discarded. Which was what was tripping me up.

And you will then be able to do:

Read more... )
Which won't work if you don't start the thread in a scalar context.
sobrique: (Default)
So, being a fan - as I am - of Perl, I've had a reason to take a bit of a look at threaded code.
Threading is one possible implementation of parallel code and - for my purposes - is quite useful when you've got multiple things going on, which require 'something else' to respond.

Such as - for example - if you've got to log in to a lot of different servers, to perform a simple task - the 'login' takes more real time, than it does 'processing time' - so by threading, you end up with the task being accomplished faster.

Perl is actually quite easy to 'thread' with - the only really hard part is that your perl interpreter must be compiled to support threading - and if it doesn't by default, then it's a bit of hassle to recompile it. The good news is, current versions of perl seem to by default (my small sample of 'a couple of systems' I didn't need to rebuild).

The basics of how to do it, can be found in 'perldoc perlthrtut' - but it goes a bit like this:

add 'use threads;' to the start of your code.

In your perl code, create a sub routine, that will run as a thread.
sub thread_test_subroutine
{
  my ( $arg_1, $arg_2 ) = @_;
  print $arg_1;
  sleep ( rand(10) );
  print $arg_2;
  return $arg_1 + $arg_2;
}

When you call that subroutine, rather than doing so in the normal fashion, do so using
threads -> create.

E.g.
my $thread = threads -> create ( \&thread_test_subroutine, ( $first_arg, $second_arg ) );

And for the sake of neatness, you need to 'join' the thread (joining is perlish for 'wait for it to finish, and get any return codes).

threads -> join ( $thread );

And it's just like that.
Because I'm a smartarse, I wanted to get a bit more clever - you can extend this idea for creating several threads, to all run in parallel.

So for example:
for ( my $count=0; $count < 10; $count++ )
{
  threads -> create ( \&thread_test_subroutine, ( $first_arg, $second_arg ) );
}

foreach my $thread ( threads -> list() )
{
  if ( $thread -> tid() ) 
  {
    my $result = $thread -> join();
    print $thread -> tid(), " returned ", $result, "\n";
  }
}


Which creates 10 threads, waits for all 10 to 'do their thing' and then joins them. Hopefully you can see where that gets a bit handy, if you've got a lot of networked devices to do stuff with.

But the hard part when messing with threads, is things like communicating between them. There's two libraries that I've been looking at today for that purpose.

You see, when I create a 'load' of threads, the last thing I want to do is to do so in an open ended fashion - 100 threads for 100 servers might be ok.
10,000 threads for 10,000 servers might cause a bit of a problem.

That's where Thread::Semaphore comes in, and - because Thread::Semaphore isn't part of the base distribution - I've also been looking at threads::shared.

Thread::Semaphore 'lets you create a 'shared' counter. (Which defaults to 1).
There's two bits to it - 'down()' and 'up()'.
down() is used to decrease the semaphore, and - if it's zero - will wait until it can do this.
up() increases the semaphore.

So if you insert above:

my $resource_limit = Thread::Semaphore -> new ( 5 );

And within each subroutine, used:

$resource_limit -> down();
And
$resource_limit -> up();
when done, you'd end up with a scenario that - with 10 threads 'existing' - you'd only actually have 5 'running' at any time.

You can also have your resource limit locally, within a loop:
sub thread_test_subroutine
{
  my ( $semaphore, $host, $arg_1, $arg_2 ) = @_;
  print "$host";
  print $arg_1;
  #check there's a resource available to do this bit
  $semaphore -> down()
  sleep ( rand(10) );
  $semaphore -> up()
  print $arg_2;
  return $arg_1 + $arg_2;
}

my @server_list = ( "one", "two", "three", "four" );

foreach my $host ( @server_list )
{
  my $resources_per_host = Thread::Semaphore -> new ( 2 );
  for ( my $count=0; $count < 10; $count++ )
  {
    thread -> create ( \&thread_test_subroutine, ( \&resources_per_host, $host, $first_arg, $second_arg ) );
  }
}

Which passes your 'semaphore' into the thread as a reference, such that you'll only have to 'active' threads per server in your list.

If you don't have Thread::Semaphore available, you can 'fake it' by using a (thread) shared variable, and a lock.

E.g.

sub thread_test_subroutine
{
  my ( $semaphore, $host, $arg_1, $arg_2 ) = @_;
  print "$host";
  print $arg_1;
  #check there's a resource available to do this bit
  {
    lock ( $semaphore );
    sleep ( rand(10) );
  }
  #lock is released, because it's now out of scope. 
  print $arg_2;
  return $arg_1 + $arg_2;
}

my @server_list = ( "one", "two", "three", "four" );

foreach my $host ( @server_list )
{
  my $resources_per_host : shared;
  for ( my $count=0; $count < 10; $count++ )
  {
    thread -> create ( \&thread_test_subroutine, ( \&resources_per_host, $host, $first_arg, $second_arg ) );
  }
}


Not quite as good, as you can only have two states - 'in use' or 'not' per host. But does still allow for a bit of crude throttling.

But it was somewhat easier than I thought to do something that was practically useful, using threaded perl. These snippets are for my own reference, rather than practical usefulness, and bear in mind - if you're playing with parallel programs, you can end up with all kinds of exciting and interesting things going on, if you're not careful.

Profile

sobrique: (Default)
sobrique

December 2015

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
2728 293031  

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 25th, 2017 02:36 pm
Powered by Dreamwidth Studios