Klutz was originally designed to fetch on-line comic strips to one place I could read them. It's original version was written back when I had a 14.4kbps modem and preferred to get all my comics on one page and walk away for a while to let it load. I grabbed a whopping five strips each day. Even then I had a few things that kept throwing me off, so when I decided to rewrite it as a Horde module, not a Perl script, I added quite a few features that make it more flexible. Though I haven't tried yet, I'm reasonably certain it could be used to fetch the latest photo on a friend's blog (though currently limited to one per day), and with a few minor tweaks it could be used to grab other media types as well.
But enough rambling, let's get to some configuration.
When you first install Klutz you are required to copy the comics.php.dist configuration file to comics.php (under the klutz/config folder). This file is a PHP file that defines an array holding a list of comics and all their settings. Standard PHP syntax applies. The default file has dozens of comics for samples and each configuration can grow to be insanely complex. So far the only comics I've found that Klutz won't handle are made up of multiple images.
Each comic definition is made up of a few simple building blocks. A sample might look like:
'doonesbury' => array( 'name' => 'Doonesbury', 'author' => 'Gary Trudeau', 'homepage' => 'http://www.doonesbury.com/', 'method' => 'direct', 'url' => 'http://images.ucomics.com/comics/db/{%Y}/db{%y%m%d}.gif', 'days' => array('mon', 'thu'), 'enabled' => true ),
Because the overall layout of the file is a giant array declaration, each comic defines an element in an array. The first line:
'doonesbury' => array(
states that we're creating a new comic definition for key "doonesbury". The key will be used in two ways: (1) to internally identify the comic, and (2) to name the files when running in caching mode. That means it needs to be unique and follow all filename constraints of PHP and your operating system.
The next three settings are mandatory but are purely annotation, and fairly self-explanatory.
The next setting, method, sets which method should be used for fetching the comic, and will determine other options that are needed. The current modes are:
The url setting is the first page that the library will use to start the fetch cycle. For direct this is the image itself; otherwise it's just the start of the page and what's found mixed with other settings will direct things from there.
The days setting is to specify the page apparition frequency. Available options are:
Finally, for this simplified example, enabled is set to true. Rather than having to comment out large chunks of comics.php, we have a setting for each comic that will set it to disabled which will not fetch or display the comic. It's also handy to allow us to keep older comics around without them showing up in the user interface.
Here is a "date" example in witch the image name is related to the date it has appeared.
'url' => 'http://images.ucomics.com/comics/db/{%Y}/db{%y%m%d}.gif',
If, for example you fetch the comic on the June 24th of 2009, the {%y%m%d} would be replace by 090624 , the {%Y} would give 2009 to finally have a working url:
http://http://images.ucomics.com/comics/db/2009/db090624.gif
Some web comic use a incrementing number for they're file name like in this example:
'lfg' => array( 'name' => 'Looking for Group', 'author' => 'Ryan Sohmer and Lar deSouza', 'method' => 'direct', 'url' => 'http://archive.lfgcomic.com/lfg{i}.gif', 'icount' => 263, 'idate' => 'June 22, 2009', 'iformat' => '%04d', 'itype' => 'ref', 'days' => array('mon', 'thu'), 'homepage' => 'http://lfgcomic.com/', 'enabled' => true ),
The url setting is now using a variable directly manageable by those others settings icount, idate and iformat. It will increment (or decrement) at each day given in the days setting.