Extending Ruby With C

Table of Contents

1 Getting Started

  • There's a lot to cover when it comes to extending Ruby. To demonstrate the point, we will start by creating a basic ruby gem written entirely in Ruby, and modify it as we go.
  • We will start with the basic structure of a gem. We first need to create a folder that will house everything.
$ mkdir xpanther

1.1 Initial Code

  • Let's put in our foundation:
lib/xpanther.rb
require 'rexml/document'

class XPanther
  def self.search_pure(filename, xpath)
    document = REXML::Document.new(File.read(filename))
    REXML::XPath.match(document.root, xpath)
  end
end

1.2 Initial Gemspec

  • Next we need to tell Rubygems how to build and package this gem:
xpanther.gemspec
Gem::Specification.new do |s|
  s.name        = 'xpanther'
  s.version     = '0.0.0'
  s.date        = '2012-02-21'
  s.summary     = "Made with real bits of panther"
  s.description = "60% of the time it works all the time"
  s.authors     = ["Aaron Bedra"]
  s.email       = 'aaron@aaronbedra.com'
  s.files       = ["lib/xpanther.rb"]
  s.homepage    = "http://example.com"
end

1.3 Building and testing the gem

  • To test the gem we need to build and install it:
$ gem build xpanther.gemspec
$ gem install xpanther-0.0.0.gem
  • We also need some test data. Let's grab a copy of the twitter public timeline:
mkdir examples
curl http://api.twitter.com/1/statuses/public_timeline.xml > examples/twitter.xml
  • Let's start an irb session using the new gem:
$ irb -rubygems -rxpanther
>> XPanther.search_pure("examples/twitter.xml", "/statuses/status/text/text()").count
=> 20
>> XPanther.search_pure("examples/twitter.xml", "/statuses/status/text/text()").first
=> "Hoje sai com a @is_nanny"

2 Testing

2.1 Adding the first test

  • We should add at least one test to make sure things are in place and behaving like we expect them to:
test/test_xpanther.rb
require 'test/unit'
require 'xpanther'

class XPantherTest < Test::Unit::TestCase
  def test_search_pure
    results = XPanther.search_pure("examples/twitter.xml", "/statuses/status/text/text()")
    assert_equal(20, results.count)
    assert_equal("Hoje sai com a @is_nanny", results.first.to_s)
  end
end

2.2 Adding a Rake task

  • It's always easier to control things via Rake
Rakefile
require 'rake/testtask'

Rake::TestTask.new do |t|
  t.libs << "test"
end

task :default => :test

2.3 Running our tests

  • With the default task wired up we can just run rake. Here's what you should see
Loaded suite /Users/abedra/.rvm/gems/ree-1.8.7-2011.12/gems/rake-0.9.2.2/lib/rake/rake_test_loader
Started
.
Finished in 0.090969 seconds.

1 tests, 2 assertions, 0 failures, 0 errors

3 How fast is it?

3.1 Timing a run

  • We can now use our gem to see how fast it takes to ask some questions about the sample xml document.
examples/pure.rb
require 'rubygems'
require 'xpanther'

results = XPanther.search_pure("examples/twitter.xml", "/statuses/status/text/text()")
puts results.count
puts results[1]
  • Now we can time run of this to see how we're doing on performance
$ time ruby examples/pure.rb
20
gelitik cewe paling binal dan buset ,,
ruby examples/pure.rb  0.15s user 0.01s system 98% cpu 0.167 total

3.2 Should we stop here?

4 Experiment: XPath search in C with libxml

  • libxml is a very widely used library in the XML parsing game. If you are in C and need to get the job done, libxml is your best friend

4.1 The test

examples/xml.c
#include <stdlib.h>
#include <stdio.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
#include <libxml/xpath.h>
#include <libxml/xpathInternals.h>

int search(const char* filename, const xmlChar* xpathExpr) {
  xmlDocPtr doc;
  xmlXPathContextPtr xpathCtx;
  xmlXPathObjectPtr xpathObj;
  xmlNodePtr cur;
  xmlNodeSetPtr nodes;
  int size;
  int i;

  doc = xmlParseFile(filename);
  xpathCtx = xmlXPathNewContext(doc);
  xpathObj = xmlXPathEvalExpression(xpathExpr, xpathCtx);

  nodes = xpathObj->nodesetval;
  size = (nodes) ? nodes->nodeNr : 0;

  if (size == 1) {
    fprintf(stderr, "%s\n", xmlNodeGetContent(nodes->nodeTab[0]));
  } else {
    for (i = 0; i < size; ++i) {
      cur = nodes->nodeTab[i];
      fprintf(stderr, "%s\n", xmlNodeGetContent(cur));
    }
  }

  xmlXPathFreeObject(xpathObj);
  xmlXPathFreeContext(xpathCtx);
  xmlFreeDoc(doc);

  return(0);
}

int main(int argc, char **argv) {
  xmlInitParser();
  search(argv[1], argv[2]);
  xmlCleanupParser();
  xmlMemoryDump();
  return 0;
}

4.2 Compiling

  • You can compile the example using the following command:
gcc xml.c -o xml `xml2-config --cflags` `xml2-config --libs`

4.3 How fast is the C version?

time ./xml twitter.xml "/statuses/status/text"

Hoje sai com a @is_nanny
gelitik cewe paling binal dan buset ,,
めっさひまやで!

....

/xml twitter.xml "/statuses/status/text"  0.00s user 0.00s system 40% cpu 0.0010 total
  • Without too much modification we can turn this into a Ruby C extension

5 Adding the extension infrastructure

5.1 Directory structure

  • When adding a C extension, the common folder structure is ext/gemname/*.c. We will create the ext/xpanther directory and create a file called extconf.rb in the xpanther folder.

5.2 extconf.rb

  • extconf.rb will generate a Makefile for the project. It is also what you will add to the gemspec to tell it how to build your extension.
ext/xpanther/extconf.rb
require 'mkmf'
create_makefile('xpanther/xpanther')

5.3 A simple example

  • Now we just need to add a short example to test our structure and wiring.
ext/xpanther/xpanther.c
#include <ruby.h>

static VALUE hello(VALUE self) {
  return rb_str_new2("Hello from C");
}

void Init_xpanther(void) {
  VALUE klass = rb_define_class("XPanther", rb_cObject);
  rb_define_singleton_method(klass, "hello", hello, 0);
}
  • We also need to have our gem load the extension
lib/xpanther.rb
require 'xpanther/xpanther'
require 'rexml/document'

class XPanther
  def self.search_pure(filename, xpath)
    document = REXML::Document.new(File.read(filename))
    REXML::XPath.match(document.root, xpath)
  end
end

5.4 Updating the gemspec

  • In order to have the extension built when our gem is installed, we have to tell the gemspec about it.
xpanther.gemspec
Gem::Specification.new do |s|
  s.name        = 'xpanther'
  s.version     = '0.0.0'
  s.date        = '2012-02-21'
  s.summary     = "Made with real bits of panther"
  s.description = "60% of the time it works all the time"
  s.authors     = ["Aaron Bedra"]
  s.email       = 'aaron@aaronbedra.com'
  s.files = Dir.glob('lib/**/*.rb') + Dir.glob('ext/**/*.c')
  s.extensions = ['ext/xpanther/extconf.rb']
  s.homepage    = "http://example.com"
end

5.5 Trying out the extension

  • Let's install our gem and give the new method a try. Since we have made significant changes we should bump the version number as well.
$ gem install xpanther-0.0.1.gem 
Building native extensions.  This could take a while...
Successfully installed xpanther-0.0.1
1 gem installed
Installing ri documentation for xpanther-0.0.1...
Installing RDoc documentation for xpanther-0.0.1...
  • Notice the new message about building the native extension. If you don't see that, your extension is not being installed.
  • Fire up and irb session and run the new method:
$ irb -rubygems -rxpanther
>> XPanther.hello
=> "Hello from C"

5.6 Adding a test

  • We are going to add a test for our new extension. You might be wondering why, but it will present an interesting challenge for us to solve when we try to run the tests.
test/test_xpanther.rb
def test_extension
  assert_equal("Hello from C", XPanther.hello)
end  

5.7 Updating the Rakefile to autocompile for tests

  • When we try to run rake we are now presented with an error.
$ rake
./lib/xpanther.rb:1:in 'require': no such file to load -- xpanther/xpanther (LoadError)
from ./lib/xpanther.rb:1

....
  • This error is caused because our extension is not compiled and available for our tests. Luckily, there's an easy solution to this.
  • Before we open our Rakefile, we should do a quick test on our system in irb
$ irb -rrbconfig
>> RbConfig::CONFIG['DLEXT']
=> "bundle"
  • This let's us know that the compiled extension will have the file extension of .bundle. If you are on Linux you would see .so instead of .bundle
  • Let's add some code into our Rakefile to automatically compile our extension when we run rake
Rakefile
require 'rake/testtask'
require 'rake/clean'
require 'rbconfig'
require 'fileutils'

EXT = RbConfig::CONFIG['DLEXT']

file "lib/xpanther/xpanther.#{EXT}" => Dir.glob('ext/xpanther/*.c') do
  Dir.chdir('ext/xpanther') do
    ruby "extconf.rb"
    sh "make"
  end
  FileUtils.mkdir_p('lib/xpanther')
  cp "ext/xpanther/xpanther.#{EXT}", "lib/xpanther/xpanther.#{EXT}"
end

task :test => "lib/xpanther/xpanther.#{EXT}"

CLEAN.include('ext/**/*{.o,.log,.#{EXT}}')
CLEAN.include('ext/**/Makefile')
CLOBBER.include('lib/**/*.#{EXT}')

Rake::TestTask.new do |t|
  t.libs << 'test'
end

desc "Run tests"
task :default => :test

6 Moving the example into a real Ruby extension

6.1 How should the API look?

  • There's quite a few different ways to create an API. Since we know that we are going to perform an XPath search when we instantiate our class, it would be nice to have it go ahead and preprocess the xml into memory for us. This obivously has limitations based on file size, but we are going to ignore that for the purposes of this example.

6.1.1 A note about GC and memory management

  • Note that in our C example libxml created and freed the memory. Ruby will not be able to handle the cleanup here and we will introduce a memory leak if we ignore this.

6.2 Object creation

  • When we create the object we should parse the XML document into memory and make it available for reference. Here's what our object creation will look like.
document = XPanther.new("/path/to/document.xml")

6.3 When the basic constructor just won't do

  • Since libxml needs to manage its own memory here, we will need to modify the constructor just a bit to account for this.
ext/xpanther/xpanther.c
VALUE constructor(VALUE self, VALUE filename) 
{
  xmlDocPtr doc;  
  VALUE argv[1];
  VALUE t_data;

  doc = xmlParseFile(StringValueCStr(filename));
  if (doc == NULL) {
    rb_raise(rb_eRuntimeError, "Error: unable to parse file \"%s\"\n", 
                               StringValueCStr(filename));
    return Qnil;
  }

  t_data = Data_Wrap_Struct(self, 0, xml_free, doc);
  argv[0] = filename;
  rb_obj_call_init(t_data, 1, argv);
  return t_data;
}
  • There's a few new ideas going on here. We are accepting a filename as an argument to our constructor. This is then converted from a Ruby string to a C string via the StringValueCStr function and passed into xmlParseFile. Error checking is important here. If the user passes in an invalid argument we want to notify them and return nil. We then have to take our variable and wrap them in an object representation for Ruby. We can do this via Data_Wrap_Struct. We have to provide it the object reference, a mark for garbage collection, a pointer to the function to call when it's time to free the memory, and a pointer to the data that we want to stuff inside. We will examine the xml_free function in a minute. Finally, we will manually initialize our object with rb_obj_call_init and feed it our object data, argument count, and argument data. This is the C way to manually create a constructor for a Ruby class.

6.4 Freeing the memory

  • Previously, we pointed to a function called xml_free that is supposed to instruct Ruby's garbage collection routines on how to deal with the memory allocated by libxml during object construction. Let's take a look.
ext/xpanther/xpanther.c
static void xml_free(void *doc) {
  xmlFreeDoc(doc);
}
  • All we are doing here is delegating the memory management to libxml. We just have to pass the function a pointer to the document in memory.

6.5 Wiring up our new constructor

  • In order for us to be able to accept an argument in our constructor, we also need to create an initialize method.
ext/xpanther/xpanther.c
static VALUE initialize(VALUE self, VALUE filename) 
{
  rb_iv_set(self, "@filename", filename);
  return self;
}
  • We have a new constructor that will serve our purpose well, but we still need to wire it up to our object inside the Init_xpanther function.
ext/xpanther/xpanther.c
void Init_xpanther(void)
{
  VALUE klass = rb_define_class("XPanther", rb_cObject);
  rb_define_singleton_method(klass, "new", constructor, 1);
  rb_define_method(klass, "initialize", initialize, 1);
}
  • Here we are defining what the new method is going to do. In this case, we are going to use our constructor and the cycle is complete.

6.6 Putting it all together

  • We have all the structure in place. We just have to drop our search routine in place and wire it up and our task will be complete. Let's define our search method.
ext/xpanther/xpanther.c
VALUE search(VALUE self, VALUE xpathExpr)
{
  VALUE results = rb_ary_new();
  xmlDocPtr doc;
  xmlXPathContextPtr xpathCtx;
  xmlXPathObjectPtr xpathObj;
  xmlNodeSetPtr nodes;
  xmlNodePtr cur;
  int size;
  int i;

  Data_Get_Struct(self, xmlDoc, doc);

  xpathCtx = xmlXPathNewContext(doc);
  if (xpathCtx == NULL) {
    rb_raise(rb_eRuntimeError, "Error: unable to create new XPath context\n");
    return Qnil;
  }

  xpathObj = xmlXPathEvalExpression(StringValueCStr(xpathExpr), xpathCtx);
  if (xpathObj == NULL) {
    rb_raise(rb_eArgError, "Error: unable to evaluate xpath expression \"%s\"\n", 
                           StringValueCStr(xpathExpr));
    xmlXPathFreeContext(xpathCtx);
    return Qnil;
  }

  nodes = xpathObj->nodesetval;
  size = (nodes) ? nodes->nodeNr : 0;

  if (size == 1) {
    results = rb_str_new2(xmlNodeGetContent(nodes->nodeTab[0]));
  } else {
    for (i = 0; i < size; ++i) {
      cur = nodes->nodeTab[i];
      rb_ary_push(results, rb_str_new2(xmlNodeGetContent(cur)));
    }
  }

  xmlXPathFreeObject(xpathObj);
  xmlXPathFreeContext(xpathCtx);

  return results;
}
  • Here we define a function that accepts an xpath expression just like we had in our pure Ruby example. We setup our variables just like we did in our experiment with the exception of VALUE results. This is the value that we will pass back to Ruby after we are done. We now see the reverse of our object packing, Data_Get_Struct. It's a type-safe wrapper around the DATA_PTR macro which essentially just returns the data we packed up in our constructor and places it inside the doc variable.
  • The rest of this function looks pretty similar to our experiment with few exceptions. These are all Ruby/C interop functions that make it possible for C to understand the Ruby data and then pass it back so that Ruby can understand it.
  • We already covered StringValueCStr, but we haven't seen rb_str_new2 yet. This function, along with the other rb_str_new functions, turns a char * into a VALUE for Ruby to consume. rb_str_new2 is the most commonly used of the conversion functions, because it automatically calculates the length of the string, making the function call more convienient.
  • rb_ary_new and rb_ary_push do exactly what you think they do.
  • Let's wire up the search function:
ext/xpanther/xpanther.c
void Init_xpanther(void)
{
  VALUE klass = rb_define_class("XPanther", rb_cObject);
  rb_define_singleton_method(klass, "new", constructor, 1);
  rb_define_method(klass, "initialize", initialize, 1);
  rb_define_method(klass, "search", search, 1);
}
  • We need to do a little bit of housekeeping to make things build properly. First, adjust the includes in xpanther.c to include the libxml pieces from our experiment.
ext/xpanther/xpanther.c
#include <ruby.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
#include <libxml/xpath.h>
#include <libxml/xpathInternals.h>
  • We also need to modify extconf.rb and tell it to link against libxml so that our extension can compile properly.
ext/xpanther/extconf.rb
require 'mkmf'
have_library("xml2")
find_header("libxml/tree.h", "/usr/include/libxml2")
find_header("libxml/parser.h", "/usr/include/libxml2")
find_header("libxml/xpath.h", "/usr/include/libxml2")
find_header("libxml/xpathInternals.h", "/usr/include/libxml2")
create_makefile('xpanther/xpanther')
  • Since we moved around a bit inside the xpanther.c file, here's a complete sample.
ext/xpanther/xpanther.c
#include <ruby.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
#include <libxml/xpath.h>
#include <libxml/xpathInternals.h>

static void xml_free(void *doc) {
  xmlFreeDoc(doc);
}

static VALUE initialize(VALUE self, VALUE filename)
{
  rb_iv_set(self, "@filename", filename);
  return self;
}

VALUE constructor(VALUE self, VALUE filename)
{
  xmlDocPtr doc;
  VALUE argv[1];
  VALUE t_data;

  doc = xmlParseFile(StringValueCStr(filename));
  if (doc == NULL) {
    rb_raise(rb_eRuntimeError, "Error: unable to parse file \"%s\"\n", 
                               StringValueCStr(filename));
    return Qnil;
  }

  t_data = Data_Wrap_Struct(self, 0, xml_free, doc);
  argv[0] = filename;
  rb_obj_call_init(t_data, 1, argv);
  return t_data;
}

VALUE search(VALUE self, VALUE xpathExpr)
{
  VALUE results = rb_ary_new();
  xmlDocPtr doc;
  xmlXPathContextPtr xpathCtx;
  xmlXPathObjectPtr xpathObj;
  xmlNodeSetPtr nodes;
  xmlNodePtr cur;
  int size;
  int i;

  Data_Get_Struct(self, xmlDoc, doc);

  xpathCtx = xmlXPathNewContext(doc);
  if (xpathCtx == NULL) {
    rb_raise(rb_eRuntimeError, "Error: unable to create new XPath context\n");
    return Qnil;
  }

  xpathObj = xmlXPathEvalExpression(StringValueCStr(xpathExpr), xpathCtx);
  if (xpathObj == NULL) {
    rb_raise(rb_eArgError, "Error: unable to evaluate xpath expression \"%s\"\n", 
                           StringValueCStr(xpathExpr));
    xmlXPathFreeContext(xpathCtx);
    return Qnil;
  }

  nodes = xpathObj->nodesetval;
  size = (nodes) ? nodes->nodeNr : 0;

  if (size == 1) {
    results = rb_str_new2(xmlNodeGetContent(nodes->nodeTab[0]));
  } else {
    for (i = 0; i < size; ++i) {
      cur = nodes->nodeTab[i];
      rb_ary_push(results, rb_str_new2(xmlNodeGetContent(cur)));
    }
  }

  xmlXPathFreeObject(xpathObj);
  xmlXPathFreeContext(xpathCtx);

  return results;
}

void Init_xpanther(void)
{
  VALUE klass = rb_define_class("XPanther", rb_cObject);
  rb_define_singleton_method(klass, "new", constructor, 1);
  rb_define_method(klass, "initialize", initialize, 1);
  rb_define_method(klass, "search", search, 1);
}
  • Now all we have to do is rebuild and reinstall our gem to give it a try. While we are at it we should also bump the version number.
$ gem build xpanther.gemspec 
  Successfully built RubyGem
  Name: xpanther
  Version: 0.0.1
  File: xpanther-0.0.1.gem
$ gem install xpanther-0.0.1.gem 
Building native extensions.  This could take a while...
Successfully installed xpanther-0.0.1
1 gem installed
Installing ri documentation for xpanther-0.0.1...
Installing RDoc documentation for xpanther-0.0.1...
$ irb -rubygems -rxpanther
>> document = XPanther.new("examples/twitter.xml")
=> #<XPanther:0x108eb7c98>
>> document.search("/statuses/status/text").count
=> 20
>> document.search("/statuses/status/text").first
=> "Hoje sai com a @is_nanny"

7 The results!

  • It's time to see how much we gained from our efforts. Let's put together a test
examples/extended.rb
require 'rubygems'
require 'xpanther'

document = XPanther.new("twitter.xml")
results = document.search("/statuses/status/text")
puts results.count
puts results.first
  • We can time our run as we did before and take a look at the difference.
$ time ruby extended.rb
20
Hoje sai com a @is_nanny
ruby extended.rb  0.02s user 0.01s system 95% cpu 0.029 total

7.1 A better example

require 'rubygems'
require 'xpanther'

document = XPather.new("iTunes Music Library.xml")
document.search("/plist/dict/dict/dict/key[text()='Artist']/following-sibling::string[1]").uniq
$ time ruby c.rb
ruby c.rb  0.03s user 0.01s system 95% cpu 0.037 total

$ time ruby pure.rb
ruby pure.rb  0.53s user 0.03s system 99% cpu 0.559 total

8 References

Date: 2012-03-15 15:45:02 PDT

Author: Aaron Bedra

Org version 7.8.03 with Emacs version 24

Validate XHTML 1.0