RabbitMQ

posted by crafterm, 31 August 2009

Last week I was privileged to present at our local Melbourne Ruby/Rails user group with fellow CLEAR Interactive colleague Daniel Neighman. Daniel and I gave a talk about RabbitMQ, the exciting AMQP based messaging platform.

We focused on discussing how RabbitMQ and AMQP came into existence and its architecture. I also showed a few demo applications I’d prepared, one a Rails application that used RabbitMQ to resize and process images via Core Image in the background, the other, a RubyCocoa Desktop client that posted surf report measurements to a fanout exchange that drove a video news feed of surfer quotes.

The slides for the presentation are available at slideshare.

The example applications I demonstrated during the talk are available as a GitHub project as well.

Big thanks to Nick Marfleet for organising and Square Circle Triangle for hosting the night, looking forward to next month already!

Comma, CSV for all

posted by crafterm, 10 March 2009

CSV can be quite uninspiring at times, but as I’m sure many of you are all too familiar, many modern applications still require parsing and generation of CSV to interface with legacy systems and/or desktop software, notably Excel.

One of my Ruby on Rails clients required CSV data generation to support an ‘export to excel’ feature – so I embarked on a journey to look at the various CSV gems/plugins available at the time to export our data.

The result of this adventure gave birth to Comma, a small and simple (just over 60 lines implementation) gem that adds CSV generation support to arbitrary Ruby objects.

Using a declarative approach, you specify the output CSV format naming attributes, methods, associations, etc, all within a block with optional header names. Comma traverses these definitions to fetch model data, with conventions inferring headers when not specified using sensible defaults.

I had a few particular requirements while researching, which led to Comma’s development:

  1. Support pure Ruby objects

    I wanted to export arbitrary instances to CSV, not just ActiveRecord derived objects, and hence didn’t want to use a plugin specific to Rails, or one that had internal knowledge of ActiveRecord or similar models for inferring information such as associations and attributes.

  2. Flexibility

    Transparency across associations, attributes and methods – they should all be treated the same. Some of the plugins I looked at required different configuration to name methods or associations to use, as opposed to attributes. I wanted to be able to cleanly define where the data for export should come from, and have Comma transparently access to it (after all, Ruby’s #send mechanism provides the base foundations for this).

  3. Multiple CSV output formats per class

    One class we have requires several CSV output formats, one for delivery to end users, and another for escrow purposes. I wanted to be able to define multiple output formats per class, and be able to call upon them when required.

  4. Integration

    We’re using Ruby on Rails, so integration with Rails would be useful, particularly at the controller level, which should be DRY and able to ‘render :csv => @objects’.

  5. Simplicity

    CSV export shouldn’t be that hard on the plugin/gem implementer, nor the plugin/gem user – ideally I wanted to be able to define a CSV configuration (with an optional name) using a declarative syntax that names what should be exported, and have that same definition used for data access and header name generation.

An example use of Comma follows:

class Book < ActiveRecord::Base

  # ================
  # = Associations =
  # ================
  has_many   :pages
  has_one    :isbn
  belongs_to :publisher

  # ===============
  # = CSV support =
  # ===============
  comma do

    name
    description

    pages :size => 'Pages'
    publisher :name
    isbn :number_10 => 'ISBN-10', :number_13 => 'ISBN-13'
    blurb 'Summary'

  end

end

Annotated, the ‘Comma’ description includes:

# starts a Comma description block, generating 2 methods #to_comma and #to_comma_headers for this class.
comma do

  # name, description are attributes of Book with the header being reflected as 'Name', 'Description'
  name
  description

  # pages is an association returning an array, :size is called on the association results, with the header name specifed as 'Pages'
  pages :size => 'Pages'

  # publisher is an association returning an object, :name is called on the associated object, with the reflected header 'Name'
  publisher :name

  # isbn is an association returning an object, :number_10 and :number_13 are called on the object with the specified headers 'ISBN-10' and 'ISBN-13'
  isbn :number_10 => 'ISBN-10', :number_13 => 'ISBN-13'

  # blurb is an attribute of Book, with the header being specified directly as 'Summary'
  blurb 'Summary'

end

Notice above how attributes and associations are all specified and treated the same, header names are reflected from the method names using sensible conventions unless provided directly, and more complex combinations of data can be grouped together into methods if required.

Multiple descriptions can be specified with a named Comma block:

# ===============
# = CSV support =
# ===============
comma do  # implicitly named :default

  name
  description

  pages :size => 'Pages'
  publisher :name
  isbn :number_10 => 'ISBN-10', :number_13 => 'ISBN-13'
  blurb 'Summary'

end

comma :brief do

  name
  description
  blurb 'Summary'

end

You can specify which format you’d prefer as an optional parameter to #to_comma.

If you’re using Ruby on Rails, your controllers automatically gain Comma-fu.

class BooksController < ApplicationController

  def index
    respond_to do |format|
      format.csv { render :csv => Book.limited(50) }
    end
  end

end

Comma is licensed under the MIT License, and can be installed directly from github’s gem server.

sudo gem install crafterm-comma

Please feel free to contact me if you have any questions and/or feedback regarding Comma.

Distributed Image Processing with Airbrush

posted by crafterm, 02 November 2008

Airbrush is a lightweight distributed processing tool, that Rails applications can use to communicate with and offload heavy processing of images, and/or other tasks while they continue processing requests.

Early this year, one of my clients started experiencing issues with their Rails application that managed the content for approximately 40 sites. The problem was in the area of image processing, with very large images, many several hundred megabytes in size that would be uploaded by the site’s administrators for publishing on various sorts of media.

When an image was uploaded several previews were being generated, and this process was bringing the system to a halt, consuming all resources of the Mongrel processing the request, to the point where the virtual server hosting that process would kill it off as a rampant process. Even testing some of the offending images would bring our MacBook Pro’s to a grinding halt, with memory use soaring into swap, causing everything to slow down to a snails pace.

We employed various tools for the platforms we were testing on, strace under Linux and later dtrace under Mac OS X. We noticed one example image, 10mb in size, would allocate 700mb of memory while it was being read by the image library (RMagick 2.x at the time). Image Science and even Quartz on my Mac OS X Tiger install exhibited similar behaviour.

After much research and testing, we found many of the offending images to be in non-RGB colour profiles, and to include all sorts of meta data (one even included an entire XML formatted Mac OSX plist file in its header). Installing profile management and pre-processing metadata alleviated much of the memory exhaustion pain.

The production environment consisted of many smaller Ubuntu Linux virtual private servers (~256mb ram, etc), so we decided to isolate the processing of images into a dedicated slice, so that could be shared across all the application servers, and could be configured to any specifications we required to handle the scale content being rendered.

This gave birth to Airbrush, which has been in use now for several months now with great success, and Square Circle Triangle, the client who paid for its development, has allowed us to open source Airbrush for all to use under the MIT license.

Airbrush was designed to abstract the three main roles in its architecture – the listening, the processing and the publishing of results from incoming jobs. This was done primarily to allow us to provide any style of access to Airbrush’s services now and in the future (eg. a queuing system, webservice, etc), and to allow processing of any job type, not just related to image processing (eg. bulk emailing, report generation, etc).

Around the time Airbrush was architected, Starling, a memcache derived persistent queue implementation was released, and became the perfect fit for Airbrush’s first listener implementation.

To get up and running with Airbrush, first install the following gems:

$> gem install starling airbrush rmagick

Then, create a memcache queue using Starling (specifying the queue and pid file locations):

$> starling -q /var/tmp/starling -P /var/tmp/starling.pid
I, [2008-11-02T11:58:13.443012 #77820]  INFO -- : Starling STARTUP on 127.0.0.1:22122

Then, start any number of Airbrush server instances:

$> airbrush -v
Sun Nov 02 11:58:22 +1100 2008: Accepting incoming jobs

‘v’ indicates verbose operation, so that you receive extra logging information. Several other options can be passed to Airbrush such as the memcache server location and port, job poll frequency and a log target, run ‘airbrush -h’ for further details. Both Starling and Airbrush’s default to the localhost as the memcache server on port 22122, so if you are running both Starling and Airbrush locally the defaults will be fine.

To send a preview request to an Airbrush server, you can use the example airbrush-example-client command included with the Airbrush gem:

$> airbrush-example-client -i leaves_desktop.jpg -o resized
Sending leaves_desktop.jpg for preview processing

This sends the image ‘leaves_desktop.jpg’ to be resized into two smaller previews, with filenames starting with ‘resized’.

Back on the Airbrush server side, you’ll notice some further logging output when this happens:

Sun Nov 02 12:02:16 +1100 2008: Processing generate-previews
Sun Nov 02 12:02:18 +1100 2008: Processed previews ({:filename=>"leaves_desktop.jpg", :sizes=>{:small=>[300], :large=>[600]}, :image=>"[FILTERED]"})
Sun Nov 02 12:02:18 +1100 2008: Published results from generate-previews
Sun Nov 02 12:02:18 +1100 2008: Processed generate-previews: 0.806346 seconds processing time

Indicating a successful job. Should an error occur, both the client and server will report what happened.

Internally, the API request to create several image previews is:

client = Airbrush::Client.new(memcache_host)
client.process(
  'generate-previews', :previews, 
    :image => File.read(OPTIONS[:image]), 
    :sizes => { :small => [300], :large => [600] } )

This creates an instance of the Airbrush client, and instructs it to process a given job via the #process method. The parameters to #process specify a unique job id (also used as the return memcache queue name for any results the job may provide), the job name (:previews in this case), and arguments to the job (two sizes in this example, for the creation of ‘small’ and ‘large’ previews, with a longest edge of 300 and 600 pixels respectively). Airbrush will calculate the resultant dimensions using the aspect ratio of the image.

An Airbrush server reads this job request from the Starling memcache queue, processes it and places the results back on the queue for the client to read either immediately, or at a later time.

Processing is a simple Ruby class with method names matching job names, here’s an example taken from Airbrush’s RMagick based image processor with the resize and previews job implementations:

module Airbrush
  module Processors
    module Image
      class Rmagick < ImageProcessor
        filter_params :image # ignore any argument called 'image' in any logging

        def resize(image, width, height = nil)
          width, height = calculate_dimensions(image, width) unless height

          process image do
            change_geometry("#{width}x#{height}") { |cols, rows, image| image.resize!(cols, rows) }
          end
        end

        def previews(image, sizes) # sizes => { :small => [200,100], :medium => [400,200], :large => [600,300] }
          sizes.inject(Hash.new) { |m, (k, v)| m[k] = crop_resize(image, *v); m }
        end

        # ... snip ...

      end
    end
  end
end

The parameters to each method are extracted from the options has passed in as arguments to the named job (similar to how Merb can extract values from the params[] hash automatically).

The current RMagick based image processor supports resizing, cropping and multiple preview generation, with images in RGB and CMYK colour profiles. ‘before’ and ‘after’ filters are also supported to pre or post process images (eg to add a watermark or filter out metadata amongst other things) as well. A Core Image based processor is also available.

The distributed nature of memcache allows us to daisy chain as many airbrush servers as we’d like to handle anticipated load, and we can spread this across any number of VPS or real servers as we’d like (even other platforms such as X-Serve’s with dedicated video processing hardware). Another added benefit of using Starling to handle the incoming job queue is that Airbrush servers can be increased/decreased without affecting the queue reliability – even when no Airbrush servers are running, jobs will simply be added to the Starling queue and wait until one is started.

Airbrush is available as a gem on GitHub and Rubyforge, and is under the MIT license. We’re excited to see it released, and keen to continue its development. Please also feel free to contact us with any suggestions or ideas to make Airbrush more useful for everyone.