Red Artisan

Agile Ruby on Rails Specialists

Sprinkle Some Powder!

posted by crafterm, 27 May 2008

Provisioning a brand new server or VPS slice can be quite tricky, tedious and time consuming, particularly if done manually with changing software versions and configurations.

In the Rails world, most of us are using virtual private servers which are instantiated from base operating system images, it takes only a few minutes to create a slice, however installing the rest of your server's stack, be it Rails, Merb or another framework is where the work begins. Provisioning in this sense, is installing all software required post operating system install.

Sprinkle is a new prototype tool that you can use to provision your servers/slices. Its declarative policy/state based approach for specifying how a remote system should be provisioned with intelligent logic to support dependencies, multiple installer types and remote installation is really compelling.

Several free and commercially available tools already exist to help automate the installation of software however most fall into two styles of design:

1 - Task based, where the tool issues a list of commands to run on the remote system, either remotely via a network connection or smart client.
2 - Policy/state based, where the tool determines what needs to be run on the remote system by examining its current and final state.

Task based solutions are usually quite easy and fast to get up and running, but can be problematic as the user has to define all of the commands manually (not to mention get them right with testing). Policy/state based solutions have much more intelligence about how to modify and adapt the remote system, but often require specialized software to run remotely.

Sprinkle is a prototype tool I've been working on recently in this space that merges both concepts together, using a Ruby domain specific language to declaratively describe the state of the remote system. Using Sprinkle, provisioning your brand new remote server or slice can be automated using pre-defined and/or customized scripts from a single machine at your fingertips.

Sprinkle reads a script that defines a set of packages, a set of policies that define what packages should be installed on what roles of target machines, and a deployment section that defines the delivery mechanism for communicating with remote machines, and any default settings.

Packages can have relationships between each other to support dependencies. Virtual packages are also supported allowing you to define a role that a package (or multiple) fulfills, with the user or Sprinkle selecting which concrete package should be used at runtime.

Packages can also support arbitrary installer types, allowing you to install packages from source, gems, apt, or any other installer you'd like to employ. Installer types know what commands need to be issued to install packages, so all that needs to be specified in a script is the installer type and metadata about the package itself.

In essence, Sprinkle is about defining a domain specific meta-language for describing and processing the installation of software.

Example Sprinkle Script

Here's an example Sprinkle deployment script, annotated to explain each section:

# Annotated Example Sprinkle Rails deployment script
#
# This is an example Sprinkle script configured to install Rails from Gems, Apache, Ruby and
# Sphinx from source, and MySQL from APT on an Ubuntu system.
#
# Installation is configured to run via capistrano (and an accompanying deploy.rb recipe script).
# Source based packages are downloaded and built into /usr/local on the remote system.
#
# A sprinkle script is separated into 3 different sections. Packages, policies and deployment:


# Packages (separate files for brevity)
#
#  Defines the world of packages as we know it. Each package has a name and
#  set of metadata including its installer type (eg. apt, source, gem, etc). Packages can have
#  relationships to each other via dependencies.

require 'packages/essential'
require 'packages/rails'
require 'packages/database'
require 'packages/server'
require 'packages/search'


# Policies
#
#  Names a group of packages (optionally with versions) that apply to a particular set of roles:
#
#   Associates the rails policy to the application servers. Contains rails, and surrounding
#   packages. Note, appserver, database, webserver and search are all virtual packages defined above.
#   If there's only one implementation of a virtual package, it's selected automatically, otherwise
#   the user is requested to select which one to use.

policy :rails, :roles => :app do
  requires :rails, :version => '2.0.2'
  requires :appserver
  requires :database
  requires :webserver
  requires :search
end


# Deployment
#
#  Defines script wide settings such as a delivery mechanism for executing commands on the target
#  system (eg. capistrano), and installer defaults (eg. build locations, etc):
#
#   Configures sprinkle to use capistrano for delivery of commands to the remote machines (via
#   the named 'deploy' recipe). Also configures 'source' installer defaults to put package gear
#   in /usr/local

deployment do

  # mechanism for deployment
  delivery :capistrano do
    recipes 'deploy'
  end

  # source based package installer defaults
  source do
    prefix   '/usr/local'
    archives '/usr/local/sources'
    builds   '/usr/local/build'
  end

end

# End of script, given the above information, Spinkle will apply the defined policy on all roles using the
# deployment settings specified.

Given such a script, Sprinkle will apply the defined policy rails on the target machines identified by the role app, where the policy rails is composed of packages for the rails gem itself, an application server, webserver, search daemon, and the ruby runtime.

Currently, Sprinkle uses Capistrano internally for communicating with remote systems, however this is pluggable as well, allowing for just about any concievable delivery mechanism in the future. The deployment section above identifies Capistrano as the delivery mechanism, specifying a local deploy.rb script that defines what roles are available, and what machines are defined within those roles.

This particular script breaks the package section up into multiple files, here are some of the actual package definitions (complete example available here):

package :ruby do
  description 'Ruby Virtual Machine'
  version '1.8.6'
  source "ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-#{version}-p111.tar.gz" # implicit :style => :gnu
  requires :ruby_dependencies
end

package :rubygems do
  description 'Ruby Gems Package Management System'
  version '1.0.1'
  source "http://rubyforge.org/frs/download.php/29548/rubygems-#{version}.tgz" do
    custom_install 'ruby setup.rb'
  end
  requires :ruby
end

package :rails do
  description 'Ruby on Rails'
  gem 'rails'
  version '2.0.2'
end

package :sphinx, :provides => :search do
  description 'MySQL full text search engine'
  version '0.9.8-rc2'
  source "http://www.sphinxsearch.com/downloads/sphinx-#{version}.tar.gz"
  requires :mysql_dev
end

package :apache, :provides => :webserver do
  description 'Apache 2 HTTP Server'
  version '2.2.6'
  source "http://apache.wildit.net.au/httpd/httpd-#{version}.tar.bz2" do
    enable %w( mods-shared=all proxy proxy-balancer proxy-http rewrite cache headers ssl deflate so )
    prefix "/opt/local/apache2-#{version}"
    post :install, 'install -m 755 support/apachectl /etc/init.d/apache2', 'update-rc.d -f apache2 defaults'
  end
  requires :apache_dependencies
end

Each package includes a description, optional version, optional list of dependencies and an installer type (also optional allowing for meta-packages).

Source installers are particularly intelligent and will download, configure and install source archives from a remote location directly on the target machine. They assume GNU style source archives by default (ie. tar.gz/tar.bz2 compressed archives, configure script and make, make install style semantics), however are completely customziable to support any arbitrary build style (rubygems for example does this above), with pre and post commands.

The Apache installer for example, specifies a few extra source installer options such as a set of --enable options, an alternate installation prefix and a series of post installation commands to be executed.

With this example configuration, lets take a look at actually using Sprinkle to provision a remote server.

Usage

Sprinkle supports several command line options:

Usage
=====

$> sprinkle [options]

Options are:

  -s, --script=PATH                Path to a sprinkle script to run
  -t, --test                       Process but don't perform any actions
  -v, --verbose                    Verbose output
  -c, --cloud                      Show powder cloud, ie. package hierarchy and installation order
  -h, --help                       Show this help message.

where you can name the script to be procesed, enable testing mode or verbose output, and/or examine the cloud of packages and operations that will be performed.

Viewing the powder cloud!

Sprinkle calculates all operations to be performed on remote servers upfront which is nice, as it allows you to inspect what modifications will be made to the system before any are actually performed. Lets inspect the powder (ie. package) cloud for the above script:

$> sprinkle -c -t -s rails.rb
--> Cloud hierarchy for policy rails

Policy rails requires package rails
        Package rails requires rubygems
                Package rubygems requires build_essential
                Package rubygems requires ruby
                        Package ruby requires build_essential
                        Package ruby requires ruby_dependencies

Policy rails requires package appserver
Selecting mongrel_cluster for virtual package appserver
        Package mongrel_cluster requires rubygems
                Package rubygems requires build_essential
                Package rubygems requires ruby
                        Package ruby requires build_essential
                        Package ruby requires ruby_dependencies
        Package mongrel_cluster requires mongrel
                Package mongrel requires rubygems
                        Package rubygems requires build_essential
                        Package rubygems requires ruby
                                Package ruby requires build_essential
                                Package ruby requires ruby_dependencies

Policy rails requires package database
Selecting mysql for virtual package database

Policy rails requires package webserver
Selecting apache for virtual package webserver
        Package apache requires build_essential
        Package apache requires apache_dependencies

Policy rails requires package search
Selecting sphinx for virtual package search
        Package sphinx requires build_essential
        Package sphinx requires mysql_dev

--> Normalized installation order for all packages: build_essential, ruby_dependencies, ruby, rubygems, rails, mongrel, mongrel_cluster, mysql, apache_dependencies, apache, mysql_dev, sphinx

-c indicates that Sprinkle should print the powder cloud (ie. the output above)
-t indicates that we're operating in test mode, so we won't actually perform any remote commands
-s identifies the Sprinkle script that should be processed

Above we can see that the policy rails required packages rails, appserver, database, webserver and search.

Note that all of these packages bar rails are actually virtual packages, so Sprinkle has selected an appropriate implementation of each virtual package automatically based on the supplied package definitions. If more than one package provided an implementation of a virtual package, then the user would be given the opportunity to select which one they prefer.

Under each package is a textual representation of that package's dependency tree, including all sub-dependencies, etc. Dependencies are packages that need to be installed first before a higher level package can be installed.

You'll notice that several packages have the same dependencies, eg. both rails and mongrel require ruby, which has its own dependencies as well. Sprinkle will install all packages in reverse dependency order so that lower level dependencies are installed before higher level packages, and it will also normalize the final package list to remove duplicates so that packages aren't installed multiple times unnecessarily. This is the final line in the output above which lists the actual packages to be installed and order of installation.

Provisioning a remote system

To actually provision a remote server we simply remove the testing (and if desired cloud) flags from the command issued above and Sprinkle will process the configuration and provision the remote system. Note for the moment, you'll need to ensure that your SSH keys are appropriately installed on the remote server under a user that has enough privileges to install software (generally the root user):

$> sprinkle -s rails.rb
--> Installing build_essential for roles: app
--> Installing ruby_dependencies for roles: app
--> Installing ruby for roles: app
--> Installing rubygems for roles: app
--> Installing rails for roles: app
--> Installing mongrel for roles: app
--> Installing mongrel_cluster for roles: app
--> Installing mysql for roles: app
--> Installing apache_dependencies for roles: app
--> Installing apache for roles: app
--> Installing mysql_dev for roles: app
--> Installing sphinx for roles: app

(its also possible to put the #!/usr/bin/env sprinkle -c line at the top of a sprinkle script and make it executable).

After the command is finished, all of the requested software will have been applied on your target system.

If you'd like to see more action printed as commands are run, specify the --verbose (-v) flag. Internally, Capistrano tasks are dynamically defined and executed at runtime for each package's installation, using a Capistrano configuration file to identify the actual roles and hostnames associated with those roles to communicate with. The verbose option will display Capistrano activity in addition to the usual Sprinkle output.

An extra benefit of leveraging Capistrano is that you can actually provision multiple servers/slices simultaneously and in parallel if desired.

I want!

If you're interested in downloading and experimenting with Sprinkle, you can clone and/or watch the project at GitHub, or download it from GitHub's gem server using:

$> sudo gem install crafterm-sprinkle --source http://gems.github.com/

The official Rubyforge gem server will also be updated over the coming days as well. If you download the source, you can create a gem package for installation by:

$> rake package
$> sudo gem install -l pkg/sprinkle-0.1.0

There are also specs with a decent amount of coverage over the code base that you can run as well:

$> rake spec

Prerequsites

Installing the Sprinkle gem will also install all pre-requsite gems such as activesupport, highline and capistrano. The only other pre-requisite is that you have SSH connectivity to the remote system you wish to provision, preferably with SSH keys in place to prevent passwords being asked for.

Finally

Sprinkle is a young project and while operational still in development, with several limitations. Currently, only Ubuntu/Debian has been tested as a target deployment platform, operating system abstraction and other platforms will be tested and supported in the future, along with several new features that are in the pipeline.

I'm most certainly open to ideas, suggestions and thoughts about how Sprinkle can be improved and generally made better for the community, and I really welcome any bug reports, patches and suggestions. Please feel free to contact me with any comments at all.

Special Thanks!

Several people have been really helpful during the development of Sprinkle. In particular I'd like to thank Ben Schwarz and Pete Yandell for their initial feedback and help after my first demos. I'd also really like to thank Matthew and Jared from Slicehost for their help and support as well.

Ruby && DTrace!

posted by crafterm, 18 May 2008

Performance, memory and runtime analysis of software has always been a tricky subject, often requiring special debug versions of code or application specific parameters to determine what's going on. Additionally, developer debugging information can often clutter source code making it harder to see the intent and design of source.

D-Trace offers an interesting solution to this problem, by dynamically instrumenting your application at runtime to enable probes to report various pieces of information as to how your application is running. DTrace is a part of Solaris, but is now also available under Mac OS X Leopard. A typical installation of DTrace can offer 20,000 different types of probes (or more depending on what applications are running), from kernel level information, all the way to application specific data.

Applications, such as the Ruby interpreter can also define their own domain specific probes, last year Joyent added support for D-Trace probes to MRI (Matz-Ruby), allowing developers to analyze runtime behaviour of their Ruby based applications. Starting with this particular commit, compatible D-Trace probes for Rubinius have been developed as well.

What can you do with DTrace?

The list of uses for DTrace are endless, as it provides the means to gain answers to many questions about how your application is behaving. Some practical questions DTrace can answer include:

  1. Tracing execution flow through your application as it steps through each method/class

  2. Determining runtime performance analysis, working out what methods are the most expensive (excellent for 80/20 performance analysis)

  3. Heap analysis, determining what objects are consuming the most memory

  4. Garbage collection impact, determining how often the garbage collector is running an impacting your applications performance

  5. and much more...

Getting started with DTrace and Ruby

To get started with Ruby and DTrace, you'll need a compatible operating system, eg. Mac OS X Leopard, and you'll need to get the Ruby source to build MRI with Joyent's DTrace patches. Luckily, if you're using Mac OS X Leopard, Apple has already appropriately patched their bundled version of Ruby to include DTrace patches for you, and no extra compilation is necessary. DTrace itself is also included by default under all Mac OS X Leopard operating system installs.

DTrace primer

There is a wealth of information available on the internet that I'd certainly recommend taking a look at to learn using DTrace in depth (in particular Sun's DTrace admin guides). Essentially to interact with DTrace, you write a script in a language called 'D', which defines what probes you're interested in, and what to do with the data when the probe fires. This script is then read and bytecode compiled by DTrace's command line and user land libraries, and then passed to the DTrace virtual machine running inside of the kernel to be interpreted. Probes are enabled, and appropriately fired, with data being collected according to your scripts for analysis.

Anatomy of a DTrace script

provider : module : function : name
/ predicate /
{
  action
}

Above is generic breakdown of a DTrace script (somewhat bearing similarity to an awk script). Most parts of the script are optional as you'll see further, such as the module or function name, or even the action.

Probes are grouped by providers, of which there are many (io, pid, objc, profile, to name a few). module and function's meaning are somewhat dependant on the provider being used. The name parameter identifies the actual name of the probe that is to be fired.

The predicate identifies a clause that must evaluate to true for the probe to fire and allows for conditional firing of probes.

The action contains arbitrary instructions to be performed when the probe fires.

To probe your application you also need root privileges on the machine you'll be running DTrace on, this is due to kernel level interaction of DTrace.

Usually you'll run DTrace and pass it the name of a D script (or include the script on the command line if it's brief), either with a command to run, or a process ID of an application that's already running that should be attached to. For example:

$> sudo dtrace -s profile.d -c 'ruby script.rb'

or:

$> ps aux|grep ruby
crafterm 29877   0.0  0.0   590472    188 s001  R+    5:04pm   0:00.00 grep ruby
crafterm 29875   0.0  1.2   622564  25016 s001  S     5:04pm   0:00.85 ruby
$> sudo dtrace -s profile.d 29875

Ruby Provider Probes

To see how many probes are available, and those that are Ruby related, we can ask DTrace to print their specifics to the console, eg:

$> sudo dtrace -l | wc -l
31569

indicating I currently have 31569 probes available to query, this number will change depending on what applications are running at the time of running the dtrace command.

Ruby specific probes can be found by:

$> sudo dtrace -l | grep ruby
21427  ruby53816   libruby.1.dylib                          rb_call0 function-entry
21428  ruby53816   libruby.1.dylib                          rb_call0 function-return
21429  ruby53816   libruby.1.dylib                   garbage_collect gc-begin
21430  ruby53816   libruby.1.dylib                   garbage_collect gc-end
21431  ruby53816   libruby.1.dylib                           rb_eval line
21432  ruby53816   libruby.1.dylib                      rb_obj_alloc object-create-done
21433  ruby53816   libruby.1.dylib                      rb_obj_alloc object-create-start
21434  ruby53816   libruby.1.dylib                   garbage_collect object-free
21435  ruby53816   libruby.1.dylib                        rb_longjmp raise
21436  ruby53816   libruby.1.dylib                           rb_eval rescue
21437  ruby53816   libruby.1.dylib                 ruby_dtrace_probe ruby-probe

As you can see from the list in the last column of the output, probes are available between method invocations, runs of the garbage collector, creation and destruction of objects and exceptions. The last probe actually allows the application writer to fire an arbitrary ruby probe containing application specific data from ruby code.

A full list of probes and arguments supplied to them is available at Joyent's Ruby provider wiki page.

Example 1: Tracing execution flow through your application

Lets start by tracing the execution flow through a small Ruby program.

Simple Ruby Program

class World
  def say(message)
    puts message
  end
end

world = World.new
world.say('hello')

This small Ruby program creates an instance of the World class, sends it the say message with hello as a String parameter which is printed to the console.

execution-flow.d

ruby$target:::function-entry
{
    printf("%s:%s\n", copyinstr(arg0), copyinstr(arg1));
}

ruby$target:::function-return
{
   printf("%s:%s\n", copyinstr(arg0), copyinstr(arg1));
}

This particular script enables the function-entry probe on the Ruby provider, and prints the first and second arguments passed to the probe in a C-style printf command.

The first and second arguments passed are provider specific, but in the Ruby provider's case for these probes they are always the class/module and method names being executed.

The $target variable enables the probe on the PID of the command specified via the -c parameter to DTrace itself.

Results

$> sudo dtrace -q -F -s execution-flow.d -c "ruby hello.rb"
CPU FUNCTION
1  -> rb_call0                              Class:inherited
1  <- rb_call0                              Class:inherited
1  -> rb_call0                              Module:method_added
1  <- rb_call0                              Module:method_added
1  -> rb_call0                              Class:new
1    -> rb_call0                            Object:initialize
1    <- rb_call0                            Object:initialize
1  <- rb_call0                              Class:new
1  -> rb_call0                              World:say
1    -> rb_call0                            Object:puts
1      -> rb_call0                          IO:write
1      <- rb_call0                          IO:write
1      -> rb_call0                          IO:write
1      <- rb_call0                          IO:write
1    <- rb_call0                            Object:puts
1  <- rb_call0                              World:say

Taking a look at the results we can almost visually 'see' how the script was parsed and executed. First Class was 'inherited', this is part of the creation of our 'World' class, then we defined World#say (invoking the Module:method_added method) to contain some operations. We then created a new instance of our class (Class:new and Object:initialize to create and construct our object), and then invoked World#say which we can see calls Object:puts and IO:write.

(In this particular case, the program is small and the instructions simple, however just add "require 'rubygems'" to the top of the source and re-run the DTrace script again and you'll quickly be overwhelmed with too much information - writing effective DTrace scripts is an art, but it's well worth learning to ensure you get the answers you're looking for)

Example 2: Individual Method Performance

Lets use DTrace to take another look at our application from a different perspective and see what methods are most expensive. To do this we'll use the function entry and return probes to capture a time stamp interval for each method call.

We'll also use an aggregate DTrace variable to store a running average of how long each method takes so that multiple method calls are recorded together and averaged across the count of method invocations, and we'll print the results according to most expensive method execution time.

Method Performance DTrace script

timestamps.d

ruby$target:::function-entry
{
  self->start = timestamp;
}

ruby$target:::function-return
/self->start/
{
  @[copyinstr(arg0), copyinstr(arg1)] = avg(timestamp - self->start);
  self->start = 0;
}

This script introduces a few more DTrace constructs, associative arrays and aggregate functions.

We enable two probes, function entry and function return on our Ruby program. When any method is entered, we capture a timestamp and store it in the 'start' variable. When any method is exited, we gather another timestamp, subtract it from the entry, and pass it to the avg DTrace aggregate function to be averaged.

The average execution time is then stored in an associative array, indexed by the module/class and method name (arg0 and arg1 respectively). Finally, we reset 'start' to zero once its no longer required.

A predicate is also set on the function return probe to fire only if we have a 'start' timestamp value, which prevents us from seeing any errors or miscalculations if we attach our DTrace script to an application that is already running (since after attaching, a return probe could fire for which we have no start timestamp).

Results

$> sudo dtrace -s timestamps.d -c "ruby hello.rb"
Object                   initialize                     9185
Module                   method_added                   10021
Class                    inherited                      25323
IO                       write                          98956

DTrace automatically prints data collected unless you supply a custom output format (eg. by using the C-style printf DTrace function as in the method flow example above) which in this case is the associative array, indexed by class/module and method name.

From these results we can see that IO#write was the slowest of the methods called, taking an average of 98956 nanaseconds to run, most likely due to its interation with the rest of the system and IO nature.

Example 3: Quantized Method Performance

Average values can often be affected by a few large values during program startup, so lets take a closer look and see what values consititue the calculation the averages above. To do this we'll use the DTrace quantize aggregate function, which will provide us with a distribution breakdown of each individual component within the average. We'll also specifically target the IO#write method, and update our Ruby program to print 'hello' 10 times to collect some more data over a period of invocations.

timestamps-q.d

ruby$target:::function-entry
/copyinstr(arg0) == "IO" && copyinstr(arg1) == "write"/
{
  self->start = timestamp;
}

ruby$target:::function-return
/copyinstr(arg0) == "IO" && copyinstr(arg1) == "write" && self->start/
{
  @[copyinstr(arg0), copyinstr(arg1)] = quantize(timestamp - self->start);
  self->start = 0;
}

Results

$> sudo dtrace -s timestamps-q.d -c "ruby hello.rb"

  IO                                                  write
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |@@@@@@@@@@@@@@@@@@@@                     10
           16384 |@@@@@@@@@@@@@@@@                         8
           32768 |@@                                       1
           65536 |                                         0
          131072 |@@                                       1
          262144 |                                         0

Here we see the distribution of how long each invocation of IO#write took. There was one invocation that particuarly long, over 131k nanoseconds, with most landing between 8k and 16k nanoseconds.

From here we could step further into the runtime and determine which particular IO#write call was slower than the others by inspecting the user stack inside the virtual machine, and/or by enabling probes in lower level providers.

Example 4: Memory allocation

To profile memory allocation we need to enable the object-create-start, object-create-done, and object-free probes. Creation of objects is separated into two probes to allow you to determine exactly how long it takes to construct an object.

First, lets create a simple balance script to check that objects are being allocated and deallocated correctly within the Ruby runtime. The script we'll use will create an associative array and index a counter by object type. Each time an object is created we'll increment the counter, conversely each time an object is freed we'll decrement the counter.

object-balance.d

ruby$target:::object-create-done { @[copyinstr(arg0)] = sum(1); }
ruby$target:::object-free        { @[copyinstr(arg0)] = sum(-1); }

Results

$> sudo dtrace -s object-balance.d -c "ruby hello.rb"

  NoMemoryError                                                     1
  SystemStackError                                                  1
  ThreadGroup                                                       1
  World                                                             1
  fatal                                                             1
  Object                                                            3

Glancing over the results we can see that Ruby is creating an instance of several error classes and a thread group during the run of our application (perhaps during startup), we can also see a single instance of our World class, and three other Objects that have also been allocated.

Our particular hello.rb script is quite small, and probably finishes executing before the garbage collector has had a change to reclaim any unused objects. If you run this script over a large application though, you'll see a line for each Object type in the application, and essentially a reference count of how many have been created and free'd.

In an ideal application, after garbarge collector has finished, all object types (except those required to keep the application running) will be listed with the value '0' alongside it indicating a corresponding deallocation for each allocation.

Example 5: Inspecting memory allocation points

The object-create-start/done probes also provide the source file and line number of where the allocation was made in Ruby script which we can use. For example if our World class came from another developer's library and we wanted to find out where it was allocated we could use the following script:

object-world-location.d

ruby$target:::object-create-start
/copyinstr(arg0) == "World"/
{
  printf("%s was allocated in file %s, line number %d\n", copyinstr(arg0), copyinstr(arg1), arg2);
}

Results

$> sudo dtrace -s object-world-location.d -c "ruby hello.rb"

  World was allocated in file hello.rb, line number 7

which matches our source file.

Example 6: Inspecting stack traces

We also saw above that several other 'Object's are created within the C portion of the Ruby interpreter upon startup. We can also inspect where these objects were created by saving the user C stack inside the virtual machine at the point of allocation.

object-user-stack.d

ruby$target:::object-create-start
/copyinstr(arg0) == "Object"/
{
  @[ustack(4)] = count();
}

This particular script uses the DTrace ustack function to access the actual runtime stack of the Ruby interpreter in userland at the point in time when the probe fired, to a depth of 4 method calls.

We then use the stack as an index into an associative array and store the number of times an Object type was created at the same point in the interpreter. This example really shows how flexible associative arrays can be in DTrace, by using a full stack trace as an index.

Results

$> sudo dtrace -s object-user-stack.d -c "ruby hello.rb"

          libruby.1.dylib`rb_obj_alloc+0x90
          libruby.1.dylib`Init_Object+0x130b
          libruby.1.dylib`rb_call_inits+0x15
          libruby.1.dylib`ruby_init+0x14f
            1

          libruby.1.dylib`rb_obj_alloc+0x90
          libruby.1.dylib`Init_IO+0x1059
          libruby.1.dylib`rb_call_inits+0x6a
          libruby.1.dylib`ruby_init+0x14f
            1

          libruby.1.dylib`rb_obj_alloc+0x90
          libruby.1.dylib`Init_Hash+0x903
          libruby.1.dylib`rb_call_inits+0x51
          libruby.1.dylib`ruby_init+0x14f
            1

Here we can see three separate stack traces indicating a Hash, IO and Object being created as part of the ruby_init method inside the Ruby interpreter.

Summary

DTrace is a very powerful framework, allowing you to really hypothesise and ask arbitrary questions about the behaviour of your system and applications. Often, the answer to one question will lead to another, and this is very much in the sprit of DTrace. The ability to script questions and format results allows you to slice behavioural data from any perspective and depth from your application, all the way to the operating system kernel.

DTrace scripts are the key to reducing complexity and understanding the true behaviour of your application at runtime, and I certainly recommend learning as much about the D script format and language as you can. Fine tuning your script to return the exact data you're after can be an art, but its well worth learning so that you can specify exactly what data you are after, and not be cluttered with too much information obscuring the information you are searching for.

Collections of commonly used DTrace scripts are available as part of the DTraceToolkit, in particular several very useful and high quality Ruby DTrace scripts. I'd recommend taking a look at them to see how probes can be used in combination with each other, also in multi-threaded environments.

In future articles I'll step further into using DTrace via Instruments, and also look at instrumenting your Rails or Merb application to collect runtime data about the performance of your web applications.

Cool Effects with Core Image!

posted by crafterm, 31 Dec 2007

In a previous article I described how to use Core Image as the backend image processor for Attachment Fu in your Rails applications. In that particular article we looked at supporting image scaling and thumbnails to be compatible with the other Attachment Fu backends such as RMagick and ImageScience.

With Core Image available however, we have an entire range of post processing filters available to use at our fingertips. In this article we’ll step through a few of these additional filter options that you can use to post process your images with.

Here’s a few examples of what we can do with Core Image post file upload. All of the following examples use the following input image taken in Berlin while at RailsConf EU (also used in the performance measurements article):

Greyscale or Sepia

A scaled version of the source image is cool, but how about an automatic greyscale or sepia version of the image:

Greyscale

Sepia

Code Fragment

module RedArtisan
  module CoreImage
    module Filters
      module Color

        def greyscale(color = nil, intensity = 1.00)
          create_core_image_context(@original.extent.size.width, @original.extent.size.height)

          color = OSX::CIColor.colorWithString("1.0 1.0 1.0 1.0") unless color

          @original.color_monochrome :inputColor => color, :inputIntensity => intensity do |greyscale|
            @target = greyscale
          end
        end

        def sepia(intensity = 1.00)
          create_core_image_context(@original.extent.size.width, @original.extent.size.height)

          @original.sepia_tone :inputIntensity => intensity do |sepia|
            @target = sepia
          end
        end

      end
    end
  end
end

Exposure and Noise Control

Another option for us is to automatically adjust exposure and noise parameters upon upload to brighten images up, or remove unwanted noise from lower quality images:

1 F-Stop

2 F-Stops

Noise Removal

Code Fragment

module RedArtisan
  module CoreImage
    module Filters
      module Quality

        def reduce_noise(level = 0.02, sharpness = 0.4)
          create_core_image_context(@original.extent.size.width, @original.extent.size.height)

          @original.noise_reduction :inputNoiseLevel => level, :inputSharpness => sharpness do |noise_reduced|
            @target = noise_reduced
          end
        end

        def adjust_exposure(input_ev = 0.5)
          create_core_image_context(@original.extent.size.width, @original.extent.size.height)

          @original.exposure_adjust :inputEV => input_ev do |adjusted|
            @target = adjusted
          end          
        end

      end
    end
  end
end

Watermarking

Sometimes we’d like to automatically add a watermark to our images, either with a single watermark image, or as a tiled watermark image:

Single Watermark

Tiled Watermark

Code Fragment

module RedArtisan
  module CoreImage
    module Filters
      module Watermark

        def watermark(watermark_image, tile = false, strength = 0.1)
          create_core_image_context(@original.extent.size.width, @original.extent.size.height)

          if watermark_image.respond_to? :to_str
            watermark_image = OSX::CIImage.from(watermark_image.to_str)
          end

          if tile
            tile_transform = OSX::NSAffineTransform.transform
            tile_transform.scaleXBy_yBy 1.0, 1.0

            watermark_image.affine_tile :inputTransform => tile_transform do |tiled|
              tiled.crop :inputRectangle => vector(0, 0, @original.extent.size.width, @original.extent.size.height) do |tiled_watermark|
                watermark_image = tiled_watermark
              end
            end
          end

          @original.dissolve_transition :inputTargetImage => watermark_image, :inputTime => strength do |watermarked|
            @target = watermarked
          end
        end

      end
    end
  end
end

Funky Effects

We can also use cool and funky effects used in applications like Photobooth, here’s an example using the edge colouring algorithm:

Edges

Core Fragment

module RedArtisan
  module CoreImage
    module Filters
      module Effects

        def edges(intensity = 1.00)
          create_core_image_context(@original.extent.size.width, @original.extent.size.height)

          @original.edges :inputIntensity => intensity do |edged|
            @target = edged
          end
        end

      end
    end
  end
end

The sign artwork works particularly well with this algorithm.

Core Image Processor

All of the code above is also available as a usable image processor via git.

Some examples of using the processor:

require 'red_artisan/core_image/processor'

# generate some test output images for various effects

processor = RedArtisan::CoreImage::Processor.new('berlin.jpg')

grey = processor.greyscale
grey.save 'results/berlin-grey.jpg'

sepia = processor.sepia
sepia.save 'results/berlin-sepia.jpg'

watermarked = processor.watermark('watermark_image.png')
watermarked.save 'results/berlin-watermarked.jpg'

watermarked = processor.watermark('watermark_image.png', true)
watermarked.save 'results/berlin-watermarked-tiled.jpg'

noise_reduced = processor.reduce_noise
noise_reduced.save 'results/berlin-noise-reduced.jpg'

exposure_adjusted = processor.adjust_exposure
exposure_adjusted.save 'results/berlin-exposure-adjusted-half-stop.jpg'

exposure_adjusted = processor.adjust_exposure(2.0)
exposure_adjusted.save 'results/berlin-exposure-adjusted-two-stops.jpg'

edge = processor.edges
edge.save 'results/berlin-edge.jpg'

Summary

The above shows us only a fraction of what can be done with the 100+ filters Core Image provides by default. There’s many other filters that let you create all sorts of effects with single and multiple images combined. Enjoy!

Getting Started with Rubinius II: Coding!

posted by crafterm, 11 Oct 2007

In a previous article we examined the process of checking out Rubinius, building it from source and discussed its directory structure. In this article, we’ll take it one step further and examine the process of implementing an example method that can be contributed back to the project as a patch for inclusion in the official Rubinius source base.

If you haven’t checked out or built Rubinius please see my previous post which details the preliminary steps required before we can start implementing.

The feature we’ll implement is the File.link method, the implementation is quite simple and only a few lines of code but it will take us through the process of adding a method to an existing class with an existing spec, and will also take us into the system call layer where we’ll interact with the underlying operating system to perform a symlink.

In this case it’s not required however generally it’s a good idea to run the rake dev:setup rake task before implementation to ensure that we have pristine copies of our runtime archives available. We do this because the compiler itself requires that the runtime archives work, and if we introduce a defect it’s possible to enter the situation where we cannot compile a fix.

dev:setup essentially makes a backup of the runtime archives that will always be used for compilation. In our particular case the compiler doesn’t create any symlinks so this step is optional but it’s a good idea if you’re working on existing code or low level methods such as File.stat, Hash, Array, etc to do so.

Normally when using git we would create a feature branch, implement our specs and changes on that branch, commit it locally and then rebase the source code off the master branch before pushing it to the main repository (this is how Rubinius committers integrate their work into the main line development). In this article we’ll omit these stages as they’re well documented on the Rubinius project pages, and here we want to focus on the changes to be made to Rubinius itself.

Specification

Back to our new feature - a spec already exists for File.link and it’s in the spec/core/file/link_spec.rb file:

require File.dirname(__FILE__) + '/../../spec_helper'

describe "File.link" do
  before do 
    @file = "test.txt"
    @link = "test.lnk"     
    File.delete(@link) if File.exists?(@link)
    File.delete(@file) if File.exists?(@file)
    File.open(@file, "w+")
  end

  platform :not, :mswin do
    it "link a file with another" do
      File.link(@file, @link).should == 0
      File.exists?(@link).should == true
      File.identical?(@file, @link).should == true
    end

    it "raise an exception if the target already exists" do
      File.link(@file, @link)
      should_raise(Errno::EEXIST) { File.link(@file, @link) }
    end

    it "raise an exception if the arguments are of the wrong type or are of the incorrect number" do
      should_raise(ArgumentError) { File.link }
      should_raise(ArgumentError) { File.link(@file) }
    end
  end

  after do
    File.delete(@link)
    File.delete(@file)
  end
end

The core specification suite is laid out in the spec/core directory using the convention of a having a spec file per method on each class containing all behaviour for that corresponding method. Platform and bootstrap specs are in the spec/platform and spec/bootstrap directories respectively.

Examining the specification above, there’s three tests that are run on all non-mswin platforms (ie. those supporting the creation of symlinks). The tests ensure that when called, File.link creates a symlink between the source and target, or raises an exception either if the target already exists or if it’s given incorrect arguments.

This identifies what we need to implement.

Let’s run the spec to see what’s failing:

$> bin/mspec -f s spec/core/file/link_spec.rb

File.link
- link a file with another  (ERROR - 1)
- raise an exception if the target already exists (ERROR - 2)
- raise an exception if the arguments are of the wrong type or are of the incorrect number (ERROR - 3)


1)
File.link link a file with another  FAILED
No method link on an instance of Class.: 
    Object(Class)#link (method_missing) at kernel/core/object.rb:98
                        main.__script__ at spec/core/file/link_spec.rb:14
                              Proc#call at kernel/core/context.rb:262
                          SpecRunner#it at spec/mini_rspec.rb:337
                                main.it at spec/mini_rspec.rb:369
                        main.__script__ at spec/core/file/link_spec.rb:24
                          main.platform at ./spec/core/file/../../spec_helper.rb:96
                        main.__script__ at spec/core/file/link_spec.rb:30
                              Proc#call at kernel/core/context.rb:262
                    SpecRunner#describe at spec/mini_rspec.rb:347
                          main.describe at spec/mini_rspec.rb:365
                        main.__script__ at spec/core/file/link_spec.rb:3
                              main.load at kernel/core/compile.rb:78
                   main.__eval_script__ at (eval):8
                             Array#each at kernel/core/array.rb:526
                  Integer(Fixnum)#times at kernel/core/integer.rb:19
                             Array#each at kernel/core/array.rb:526
                   main.__eval_script__ at (eval):5
                CompiledMethod#activate at kernel/core/compiled_method.rb:110
                        Compile.execute at kernel/core/compile.rb:34
                        main.__script__ at kernel/loader.rb:170
..snip..
$>

From the stacktraces we can see:

No method link on an instance of Class.

indicates that File.link doesn’t even exist inside the current implementation of File.

Design

The corresponding source file to implement File.link is in kernel/core/file.rb:

# depends on: io.rb

class File < IO
  ..snip..

  def self.new(path, mode)
    return open_with_mode(path, mode)
  end

  def self.open(path, mode="r")
    raise Errno::ENOENT if mode == "r" and not exists?(path)

    f = open_with_mode(path, mode)
    return f unless block_given?

    begin
      yield f
    ensure
      f.close unless f.closed?
    end
  end

  def self.exist?(path)
    out = Stat.stat(path, true)
    if out.kind_of? Stat
      return true
    else
      return false
    end
  end

  def self.file?(path)
    st = Stat.stat(path, true)
    return false unless st.kind_of? Stat
    st.kind == :file
  end

  ..snip..
end

Here we see methods implementing various parts of the File API. The above methods show the implementation of File.new, File.open, File.exist? and File.file? (to compare MRI’s implementation of the above methods check the file.c source file in the Ruby tar.gz source archive).

Lets look a first implementation of File.link. The primary behaviour of File.link is to create a hard link between two filenames. To do this we need to invoke the link(2) system call on the underlying operating system to create the link.

A quick examination of the link(2) man page yields:

$> man 2 link

LINK(2)             BSD System Calls Manual            LINK(2)

NAME
     link - make a hard file link

SYNOPSIS
     #include <unistd.h>

     int
     link(const char *name1, const char *name2);

DESCRIPTION
     The link() function call atomically creates the specified directory entry
     (hard link) name2 with the attributes of the underlying object pointed at
     by name1 If the link is successful: the link count of the underlying
     object is incremented; name1 and name2 share equal access and rights to
     the underlying object.

     ..snip..

RETURN VALUES
     Upon successful completion, a value of 0 is returned.  Otherwise, a value
     of -1 is returned and errno is set to indicate the error.

     ..snip..

STANDARDS
     The link() function is expected to conform to IEEE Std 1003.1-1988
     (POSIX.1).

According to the man page, link(2) accepts the source and target of the symlink as paramaters, and returns an integer indicating success or failure.

FFI

To invoke link(2) we need to add a new method to the ffi layer inside of Rubinius. ffi stands for ‘foreign function interface’, and it’s a really neat way of being able to interact with system calls on the underlying operating system without needing to write a lot of stub or native integration code.

ffi bindings are compiled into the platform.rba archive, and since link(2) conforms to a POSIX standard the file we need to modify is kernel/platform/posix.rb.

Opening kernel/platform/posix.rb we’ll see blocks of code such as the following inside the Platform::POSIX module:

# file system
attach_function nil, 'access', [:string, :int], :int
attach_function nil, 'chmod',  [:string, :int], :int
attach_function nil, 'fchmod', [:int, :int], :int
attach_function nil, 'unlink', [:string], :int
attach_function nil, 'getcwd', [:string, :int], :string
attach_function nil, 'umask', [:int], :int

This code dynamically attaches methods to the module, and specifies the parameter types and return values of each method.

The general format of the ‘attach_function’ method is as follows:

attach_function library, method name, [ parameters ], return value

  • library, library name to load dynamically, nil otherwise
  • name, name of the method to attach, this is also the name the method will be available as inside the module
  • parameters, array of symbols identifying the types this method accepts as parameters
  • return value, type of the return value

(attach_function can also accept several other formats of parameters, please take a closer look at kernel/platform/ffi.rb for more details)

Symbols are defined for most primitive types, ie: :short, :int, :long, :string, :char, etc, which can be used in the parameter list and return value specifier.

Following the examples above, link(2) can be attached to the Platform::POSIX module with one line of code:

attach_function nil, 'link', [:string, :string], :int

After adding this line of code to the Platform::POSIX module, we need to update the platform.rba archive to ensure it now includes knowledge of link(2) system call.

$> rake build:platform

Implementation

Now that we have access to the link(2) system call, we can invoke it via ffi from the file module.

Open up kernel/core/file.rb, and in between two existing methods, enter the following code:

def self.link(from, to)
  Platform::POSIX.link(from, to)
end

As with the platform archive, we’ll need to update the core archive:

$> rake build:core

Let’s re-run our specifications to see if it passes:

$> bin/mspec -f s spec/core/file/link_spec.rb

File.link
- link a file with another 
- raise an exception if the target already exists (ERROR - 1)
- raise an exception if the arguments are of the wrong type or are of the incorrect number


1)
File.link raise an exception if the target already exists FAILED
Expected EEXIST, nothing raised: 
          main.should_raise at ./spec/core/file/../../mspec_helper.rb:27
            main.__script__ at spec/core/file/link_spec.rb:21
                  Proc#call at kernel/core/context.rb:262
              SpecRunner#it at spec/mini_rspec.rb:337
                    main.it at spec/mini_rspec.rb:369
            main.__script__ at spec/core/file/link_spec.rb:24
              main.platform at ./spec/core/file/../../spec_helper.rb:96
            main.__script__ at spec/core/file/link_spec.rb:30
                  Proc#call at kernel/core/context.rb:262
        SpecRunner#describe at spec/mini_rspec.rb:347
              main.describe at spec/mini_rspec.rb:365
            main.__script__ at spec/core/file/link_spec.rb:3
                  main.load at kernel/core/compile.rb:78
       main.__eval_script__ at (eval):8
                 Array#each at kernel/core/array.rb:526
      Integer(Fixnum)#times at kernel/core/integer.rb:19
                 Array#each at kernel/core/array.rb:526
       main.__eval_script__ at (eval):5
    CompiledMethod#activate at kernel/core/compiled_method.rb:110
            Compile.execute at kernel/core/compile.rb:34
            main.__script__ at kernel/loader.rb:170

3 examples, 1 failures
$>

We’re in better shape, two spec’s are now passing, including the link test - we’re successfully creating a hard link between 2 filenames, but one spec is still failing in the area of handling error conditions, in particular when the target filename already exists.

Lets update our File.link implementation appropriately:

def self.link(from, to)
  raise Errno::EEXIST if exists?(to)
  Platform::POSIX.link(from, to)
end

and naturally, rebuild the core:

$> rake build:core

and re-run our specification:

$> bin/mspec -f s spec/core/file/link_spec.rb

File.link
- link a file with another 
- raise an exception if the target already exists
- raise an exception if the arguments are of the wrong type or are of the incorrect number


3 examples, 0 failures

hooray, all link specifications passed.

If the specs for File.link are complete (ie. document all areas of File.link’s behaviour), we are ready to submit a patch back to the Rubinius community. Alternatively, if some behaviour is lacking from the specs, we could now iterate through the above process adding a spec to document additional behaviour, and implement it following TDD/BDD practices until all expected behaviour has been added.

Patch

To create a patch we can use git and issue the command:

$> git diff > file_link.diff

This will create a patch for us containing the changes we made across the entire Rubinius project. We can then send this back to the community for inclusion into the official Rubinius repository, by submitting it in a Rubinius Lighthouse ticket.

Summary

We’ve stepped through the process of implementing a feature in Rubinius by examining the behaviour of a particular method via it’s corresponding specification tests. As part of the implementation we’ve added a binding to an underlying operating system call via the ffi layer in Rubinius, and then called upon that binding in the class where the functionality is expected.

We then ensured that all required behaviour including error conditions have been met by making sure the spec test suite passes. Finally we’ve created a patch using git that we can submit back to the Rubinius project via lighthouse.

Implementing a feature in Rubinius can certainly be as straightforward and as easy as what we’ve seen above. There’s many specifications that have been written that don’t have corresponding implementations, so pick a class, check it’s specs, write an implementation and join in on building a fantastic, extendable and awesome Ruby virtual machine! :)

Getting started with Rubinius

posted by crafterm, 05 Oct 2007

Rubinius is an alternate implementation of the Ruby virtual machine, loosely based on the architecture and implementation of Smalltalk-80.

The primary difference between Rubinius and MRI (aka Matz Ruby) is that it’s modeled around the design of a small, light and fast C kernel, with the surrounding language, libraries and classes implemented in the target language, Ruby. MRI on the other hand, is a larger body of C code.

Rubinius also compiles Ruby classes into byte code before execution and also includes an RSpec test suite that (when complete) documents the Ruby language, core library and Rubinius compiler.

“what can be written in Ruby, will be”

The focus on using Ruby where possible opens the implementation up to a much wider audience of contributors, and I certainly encourage you to take a look and implement a few core library methods or write some RSpec tests. The barrier of entry is quite low, some methods can even be implemented with a single line of code.

The Rubinius team have published several point releases in the past few months, however the latest and greatest version of Rubinius can be retrieved by checking out the project from source code control.

Recently, Rubinius switched source code control from using Subversion to Git. In this article we’ll step through the process of checking out Rubinius, building it, and examining the projects layout. In a future article we’ll look at implementing a simple method to step through the process of building a patch that can be submitted back to the project.

Checking out Rubinius

Since Rubinius is managed by Git, you’ll need to install it for your platform first. The Git home page is http://git.or.cz/, which has the Git source, and also hosts binary packages for several platforms. I personally used Fink to install Git (macports also has a package for it, as does many popular Linux distributions).

Git provides a fully distributed development experience. When you check out a project using Git, you are actually cloning an upstream repository which gives you local access to all history and changes within the project. This means you can work on Rubinius when offline, and your source code control system isn’t limited by network bandwidth or connectivity.

Distributed development using Git often works with developers ‘pulling’ changes from each other (such as the Linux Kernel), without there being a central repository where modifications are sent to, Rubinius on the other hand uses Git in a similar fashion to Subversion where a central repository hosts the latest changes, and all developers ‘pull’ changes from that. To check out the latest source from this central repository, run the following command:

$> git clone git://git.rubini.us/code rubinius

This will print some interesting output while checking out the source. Note that since you’re obtaining a full copy of the repository it will take slightly longer than Subversion which normally gives you the latest versions of all source files.

$> time git clone git://git.rubini.us/code rubinius
Initialized empty Git repository in /work/rubinius/.git/
remote: Generating pack..
remote: Done counting 24773 objects.
remote: Deltifying 24773 objects..
remote:  100% (24773/24773) done
Indexing 24773 objects..
remote: Total 24773 (delta 15683), reused 22174 (delta 13918)
 100% (24773/24773) done
Resolving 15683 deltas..
 100% (15683/15683) done
Checking 4286 files out..
 100% (4286/4286) done

real    7m8.927s
user    0m5.006s
sys     0m2.964s
$>

Building Rubinius

Before building Rubinius ensure that you have installed any required dependencies, these are listed in the INSTALL file included in the root Rubinius directory, currently this includes:

Once these are installed, building Rubinius is straightforward by running ‘configure’ and finally ‘make’:

$> cd rubinius
$> ./configure
Rubinius is configured.
$> make
cd shotgun; make rubinius
cd external_libs/libtommath; make
cc -I./ -Wall -W -Wshadow -Wsign-compare -fPIC -O3 -funroll-loops -fomit-frame-pointer   -c -o bncore.o bncore.c
cc -I./ -Wall -W -Wshadow -Wsign-compare -fPIC -O3 -funroll-loops -fomit-frame-pointer   -c -o bn_mp_init.o bn_mp_init.c
cc -I./ -Wall -W -Wshadow -Wsign-compare -fPIC -O3 -funroll-loops -fomit-frame-pointer   -c -o bn_mp_clear.o bn_mp_clear.c
cc -I./ -Wall -W -Wshadow -Wsign-compare -fPIC -O3 -funroll-loops -fomit-frame-pointer   -c -o bn_mp_exch.o bn_mp_exch.c
cc -I./ -Wall -W -Wshadow -Wsign-compare -fPIC -O3 -funroll-loops -fomit-frame-pointer   -c -o bn_mp_grow.o bn_mp_grow.c
cc -I./ -Wall -W -Wshadow -Wsign-compare -fPIC -O3 -funroll-loops -fomit-frame-pointer   -c -o bn_mp_shrink.o bn_mp_shrink.c
cc -I./ -Wall -W -Wshadow -Wsign-compare -fPIC -O3 -funroll-loops -fomit-frame-pointer   -c -o bn_mp_clamp.o bn_mp_clamp.c
cc -I./ -Wall -W -Wshadow -Wsign-compare -fPIC -O3 -funroll-loops -fomit-frame-pointer   -c -o bn_mp_zero.o bn_mp_zero.c
cc -I./ -Wall -W -Wshadow -Wsign-compare -fPIC -O3 -funroll-loops -fomit-frame-pointer   -c -o bn_mp_set.o bn_mp_set.c
&#8230;snip&#8230;
CC string.o
CC strlcat.o
CC strlcpy.o
CC subtend/PortableUContext.o
CC subtend/ffi.o
CC subtend/handle.o
CC subtend/library.o
CC subtend/nmc.o
CC subtend/nmethod.o
CC subtend/ruby.o
CC subtend/setup.o
CC symbol.o
CC tuple.o
CC var_table.o
CC subtend/PortableUContext_asm.o
LINK librubinius-0.8.0.dylib
gcc -Wall -g -ggdb3  -iquote . -iquote lib `pkg-config glib-2.0 &#8211;cflags` -Iexternal_libs/libbstring -Iexternal_libs/libcchash `pkg-config glib-2.0 &#8211;cflags`  -c -o main.o main.c
CC rubinius.bin
./shotgun/rubinius compile lib/ext/syck
Cleaning up objects&#8230;
Created rbxext.bundle
$>

From here you can run the Rubinius interpreter which is located in the shotgun directory:

$> shotgun/rubinius
sirb(eval):000>

which will give you an sirb (ie. shotgun irb) prompt. From here you can enter code just as you would in a normal irb session.

You can also run the specs, either individually or as a suite. Rubinius includes a mini-rspec implementation called mspec written in just over a hundred lines of code so that it can self host the full test suite and runner:

$> bin/mspec -f s spec/core/file/link_spec.rb

The parameter ‘-f s’ indicates that specdoc format should be used for spec result output. In this example we’re running the specs associated with the File.link method only.

$> rake spec

will run all known good specs.

Directory structure

Browsing the root level Rubinius directory:

$> ls
AUTHORS    Makefile   Rakefile   compiler   examples   kernel     shotgun    test
INSTALL    README     THANKS     configure  extensions lib        spec
LICENSE    ROADMAP    bin        doc        hashi      runtime    stdlib
$>

The most important directories can be summarised as follows:

  • bin - shell scripts to run mspec, continuous integration, and other tools
  • compiler - rubinius compiler implementation
  • kernel - platform, bootstrap and core language/library implementation
  • runtime - compiled rubinius archives (.rba files) of the compiler, bootstrap and core library
  • shotgun - rubinius C interpreter implementation
  • spec - rspec style test suite
  • stdlib - standard library code imported from Ruby 1.8.

In addition to this there several miscellaneous files including installation, build and license information.

Generally, most Rubinius development action takes place in the kernel, spec and shotgun directories. Inside the kernel directory you’ll find a subdirectory for the bootstrap, core and platform components of Rubinius. Bootstrap is initial code that Rubinius reads and uses to start running the compiler and interpreter. Core implements the core language of Ruby, and platform provides the binding to the underlying operating system.

Integrating changes

Changes can be made to Rubinius using your favourite text editor, compiling changes depends on where you actually make a change.

Modifications made to the low level C interpreter can be built using the ‘make’ command, changes made to Ruby files (eg. in the kernel directory) can be built using one of the following rake commands:

rake build:bootstrap    # Compiles the Rubinius bootstrap archive
rake build:compiler     # Compiles the Rubinius compiler archive
rake build:core         # Compiles the Rubinius core directory
rake build:platform     # Compiles the Rubinius platform archive

(to see all available rake tasks run ‘rake -T’)

These commands will recompile any changes made to the bootstrap, compiler, core and platform source files (located in the kernel bootstrap, compiler core and platform directories respectively) and update the compiled archives in the runtime directory.

Something to be aware of is that the Rubinius compiler uses the bootstrap and core archives itself, so if you accidentally introduce a defect and break a class such as File, Hash, or Array, etc, it’s quite likely the compiler will no longer work, leaving you in a state where you can’t recompile a fix to the breakage. To handle this catch-22 situation if you’re working on some critical methods, run the ‘dev:setup’ rake task. ‘dev:setup’ ensures that compilation occurs with pristine copies of the bootstrap, core, platform archives which will be unaffected in case of an error.

Summary

So far we’ve covered checking Rubinius out from source, building and running some simple tests with a brief discussion of the project’s layout. In a future article I’ll walk through implementing a small method to step through the process of creating a patch that can be submitted back to the Rubinius project. In the mean time, feel free to join the #rubinius IRC channel on irc.freenode.net, and read the forums/pages at http://rubini.us/forums.