6.0.0-git
2024-03-19
Last Modified 2009-06-24 by Guest

Adding Comics to Klutz

Background

Klutz was originally designed to fetch on-line comic strips to one place I could read them. It's original version was written back when I had a 14.4kbps modem and preferred to get all my comics on one page and walk away for a while to let it load. I grabbed a whopping five strips each day. Even then I had a few things that kept throwing me off, so when I decided to rewrite it as a Horde module, not a Perl script, I added quite a few features that make it more flexible. Though I haven't tried yet, I'm reasonably certain it could be used to fetch the latest photo on a friend's blog (though currently limited to one per day), and with a few minor tweaks it could be used to grab other media types as well.

But enough rambling, let's get to some configuration.

klutz/config/comics.php

When you first install Klutz you are required to copy the comics.php.dist configuration file to comics.php (under the klutz/config folder). This file is a PHP file that defines an array holding a list of comics and all their settings. Standard PHP syntax applies. The default file has dozens of comics for samples and each configuration can grow to be insanely complex. So far the only comics I've found that Klutz won't handle are made up of multiple images.

Basic Syntax

Each comic definition is made up of a few simple building blocks. A sample might look like:

    'doonesbury' => array(
        'name'      => 'Doonesbury',
        'author'    => 'Gary Trudeau',
        'homepage'  => 'http://www.doonesbury.com/',
        'method'    => 'direct',
        'url'       => 'http://images.ucomics.com/comics/db/{%Y}/db{%y%m%d}.gif',
        'days'      => array('mon', 'thu'),
        'enabled'   => true
    ),

Because the overall layout of the file is a giant array declaration, each comic defines an element in an array. The first line:

'doonesbury' => array(

states that we're creating a new comic definition for key "doonesbury". The key will be used in two ways: (1) to internally identify the comic, and (2) to name the files when running in caching mode. That means it needs to be unique and follow all filename constraints of PHP and your operating system.

The next three settings are mandatory but are purely annotation, and fairly self-explanatory.

  • name sets the display name for the comic.
  • author displays the author name.
  • homepage is the home page for the comic. I usually recommend this be the comic's main page, not a link to "today's strip" pages, unless they're the same. If you click on the title of the comic on a display page it will automatically take you to the site.

The next setting, method, sets which method should be used for fetching the comic, and will determine other options that are needed. The current modes are:

  • direct - The URL will have any substitutions done, and the result of that should be a direct URL to the image.
  • search - Fetch the URL then search for the text matched by the regular expression. The first capture group is assumed to be the next URL to fetch. The last one should be the URL for the actual image.
  • bysize - Try to make a best guess at which image on a page is the comic based on the size of the image (after some filters are applied).

The url setting is the first page that the library will use to start the fetch cycle. For direct this is the image itself; otherwise it's just the start of the page and what's found mixed with other settings will direct things from there.

The days setting is to specify the page apparition frequency. Available options are:

  • array('mon', 'thu') - It will make the comic entry appear only on the specified days.
  • random - It will make the comic only appear on the date where there was an fetched images file.

Finally, for this simplified example, enabled is set to true. Rather than having to comment out large chunks of comics.php, we have a setting for each comic that will set it to disabled which will not fetch or display the comic. It's also handy to allow us to keep older comics around without them showing up in the user interface.

URL Example

Here is a "date" example in witch the image name is related to the date it has appeared.

'url'       => 'http://images.ucomics.com/comics/db/{%Y}/db{%y%m%d}.gif',

If, for example you fetch the comic on the June 24th of 2009, the {%y%m%d} would be replace by 090624 , the {%Y} would give 2009 to finally have a working url:

http://http://images.ucomics.com/comics/db/2009/db090624.gif

Some web comic use a incrementing number for they're file name like in this example:

    'lfg' => array(
        'name'      => 'Looking for Group',
        'author'    => 'Ryan Sohmer and Lar deSouza',
        'method'    => 'direct',
        'url'       => 'http://archive.lfgcomic.com/lfg{i}.gif',
        'icount'    => 263,
        'idate'     => 'June 22, 2009',
        'iformat'   => '%04d',
        'itype'     => 'ref',
        'days'      => array('mon', 'thu'),
        'homepage'  => 'http://lfgcomic.com/',
        'enabled'   => true
    ),   

The url setting is now using a variable directly manageable by those others settings icount, idate and iformat. It will increment (or decrement) at each day given in the days setting.

  • icount - Is the initial value from where it start incrementing or decrementing.
  • idate - Is the initial date from witch it start incrementing or decrementing.
  • iformat - Is the number of characters to keep in the variable, for this example 4, then : 0263 .