Extending Ruby With C
Table of Contents
1 Getting Started
- There's a lot to cover when it comes to extending Ruby. To demonstrate the point, we will start by creating a basic ruby gem written entirely in Ruby, and modify it as we go.
- We will start with the basic structure of a gem. We first need to create a folder that will house everything.
$ mkdir xpanther
1.1 Initial Code
- Let's put in our foundation:
require 'rexml/document' class XPanther def self.search_pure(filename, xpath) document = REXML::Document.new(File.read(filename)) REXML::XPath.match(document.root, xpath) end end
1.2 Initial Gemspec
- Next we need to tell Rubygems how to build and package this gem:
Gem::Specification.new do |s| s.name = 'xpanther' s.version = '0.0.0' s.date = '2012-02-21' s.summary = "Made with real bits of panther" s.description = "60% of the time it works all the time" s.authors = ["Aaron Bedra"] s.email = 'aaron@aaronbedra.com' s.files = ["lib/xpanther.rb"] s.homepage = "http://example.com" end
1.3 Building and testing the gem
- To test the gem we need to build and install it:
$ gem build xpanther.gemspec $ gem install xpanther-0.0.0.gem
- We also need some test data. Let's grab a copy of the twitter public timeline:
mkdir examples curl http://api.twitter.com/1/statuses/public_timeline.xml > examples/twitter.xml
- Let's start an irb session using the new gem:
$ irb -rubygems -rxpanther
>> XPanther.search_pure("examples/twitter.xml", "/statuses/status/text/text()").count => 20 >> XPanther.search_pure("examples/twitter.xml", "/statuses/status/text/text()").first => "Hoje sai com a @is_nanny"
2 Testing
2.1 Adding the first test
- We should add at least one test to make sure things are in place and behaving like we expect them to:
require 'test/unit' require 'xpanther' class XPantherTest < Test::Unit::TestCase def test_search_pure results = XPanther.search_pure("examples/twitter.xml", "/statuses/status/text/text()") assert_equal(20, results.count) assert_equal("Hoje sai com a @is_nanny", results.first.to_s) end end
2.2 Adding a Rake task
- It's always easier to control things via
Rake
require 'rake/testtask' Rake::TestTask.new do |t| t.libs << "test" end task :default => :test
2.3 Running our tests
- With the default task wired up we can just run
rake
. Here's what you should see
Loaded suite /Users/abedra/.rvm/gems/ree-1.8.7-2011.12/gems/rake-0.9.2.2/lib/rake/rake_test_loader
Started
.
Finished in 0.090969 seconds.
1 tests, 2 assertions, 0 failures, 0 errors
3 How fast is it?
3.1 Timing a run
- We can now use our gem to see how fast it takes to ask some questions about the sample xml document.
require 'rubygems' require 'xpanther' results = XPanther.search_pure("examples/twitter.xml", "/statuses/status/text/text()") puts results.count puts results[1]
- Now we can time run of this to see how we're doing on performance
$ time ruby examples/pure.rb 20 gelitik cewe paling binal dan buset ,, ruby examples/pure.rb 0.15s user 0.01s system 98% cpu 0.167 total
3.2 Should we stop here?
4 Experiment: XPath search in C with libxml
- libxml is a very widely used library in the XML parsing game. If you are in C and need to get the job done, libxml is your best friend
4.1 The test
examples/xml.c
#include <stdlib.h> #include <stdio.h> #include <libxml/tree.h> #include <libxml/parser.h> #include <libxml/xpath.h> #include <libxml/xpathInternals.h> int search(const char* filename, const xmlChar* xpathExpr) { xmlDocPtr doc; xmlXPathContextPtr xpathCtx; xmlXPathObjectPtr xpathObj; xmlNodePtr cur; xmlNodeSetPtr nodes; int size; int i; doc = xmlParseFile(filename); xpathCtx = xmlXPathNewContext(doc); xpathObj = xmlXPathEvalExpression(xpathExpr, xpathCtx); nodes = xpathObj->nodesetval; size = (nodes) ? nodes->nodeNr : 0; if (size == 1) { fprintf(stderr, "%s\n", xmlNodeGetContent(nodes->nodeTab[0])); } else { for (i = 0; i < size; ++i) { cur = nodes->nodeTab[i]; fprintf(stderr, "%s\n", xmlNodeGetContent(cur)); } } xmlXPathFreeObject(xpathObj); xmlXPathFreeContext(xpathCtx); xmlFreeDoc(doc); return(0); } int main(int argc, char **argv) { xmlInitParser(); search(argv[1], argv[2]); xmlCleanupParser(); xmlMemoryDump(); return 0; }
4.2 Compiling
- You can compile the example using the following command:
gcc xml.c -o xml `xml2-config --cflags` `xml2-config --libs`
4.3 How fast is the C version?
time ./xml twitter.xml "/statuses/status/text" Hoje sai com a @is_nanny gelitik cewe paling binal dan buset ,, めっさひまやで! .... /xml twitter.xml "/statuses/status/text" 0.00s user 0.00s system 40% cpu 0.0010 total
- Without too much modification we can turn this into a Ruby C extension
5 Adding the extension infrastructure
5.1 Directory structure
- When adding a C extension, the common folder structure is
ext/gemname/*.c
. We will create theext/xpanther
directory and create a file calledextconf.rb
in the xpanther folder.
5.2 extconf.rb
extconf.rb
will generate aMakefile
for the project. It is also what you will add to the gemspec to tell it how to build your extension.
require 'mkmf' create_makefile('xpanther/xpanther')
5.3 A simple example
- Now we just need to add a short example to test our structure and wiring.
#include <ruby.h> static VALUE hello(VALUE self) { return rb_str_new2("Hello from C"); } void Init_xpanther(void) { VALUE klass = rb_define_class("XPanther", rb_cObject); rb_define_singleton_method(klass, "hello", hello, 0); }
- We also need to have our gem load the extension
require 'xpanther/xpanther' require 'rexml/document' class XPanther def self.search_pure(filename, xpath) document = REXML::Document.new(File.read(filename)) REXML::XPath.match(document.root, xpath) end end
5.4 Updating the gemspec
- In order to have the extension built when our gem is installed, we have to tell the gemspec about it.
Gem::Specification.new do |s| s.name = 'xpanther' s.version = '0.0.0' s.date = '2012-02-21' s.summary = "Made with real bits of panther" s.description = "60% of the time it works all the time" s.authors = ["Aaron Bedra"] s.email = 'aaron@aaronbedra.com' s.files = Dir.glob('lib/**/*.rb') + Dir.glob('ext/**/*.c') s.extensions = ['ext/xpanther/extconf.rb'] s.homepage = "http://example.com" end
5.5 Trying out the extension
- Let's install our gem and give the new method a try. Since we have made significant changes we should bump the version number as well.
$ gem install xpanther-0.0.1.gem Building native extensions. This could take a while... Successfully installed xpanther-0.0.1 1 gem installed Installing ri documentation for xpanther-0.0.1... Installing RDoc documentation for xpanther-0.0.1...
- Notice the new message about building the native extension. If you don't see that, your extension is not being installed.
- Fire up and irb session and run the new method:
$ irb -rubygems -rxpanther
>> XPanther.hello
=> "Hello from C"
5.6 Adding a test
- We are going to add a test for our new extension. You might be wondering why, but it will present an interesting challenge for us to solve when we try to run the tests.
def test_extension assert_equal("Hello from C", XPanther.hello) end
5.7 Updating the Rakefile to autocompile for tests
- When we try to run
rake
we are now presented with an error.
$ rake
./lib/xpanther.rb:1:in 'require': no such file to load -- xpanther/xpanther (LoadError)
from ./lib/xpanther.rb:1
....
- This error is caused because our extension is not compiled and available for our tests. Luckily, there's an easy solution to this.
- Before we open our
Rakefile
, we should do a quick test on our system in irb
$ irb -rrbconfig >> RbConfig::CONFIG['DLEXT'] => "bundle"
- This let's us know that the compiled extension will have the file
extension of
.bundle
. If you are on Linux you would see.so
instead of .bundle - Let's add some code into our
Rakefile
to automatically compile our extension when we runrake
require 'rake/testtask' require 'rake/clean' require 'rbconfig' require 'fileutils' EXT = RbConfig::CONFIG['DLEXT'] file "lib/xpanther/xpanther.#{EXT}" => Dir.glob('ext/xpanther/*.c') do Dir.chdir('ext/xpanther') do ruby "extconf.rb" sh "make" end FileUtils.mkdir_p('lib/xpanther') cp "ext/xpanther/xpanther.#{EXT}", "lib/xpanther/xpanther.#{EXT}" end task :test => "lib/xpanther/xpanther.#{EXT}" CLEAN.include('ext/**/*{.o,.log,.#{EXT}}') CLEAN.include('ext/**/Makefile') CLOBBER.include('lib/**/*.#{EXT}') Rake::TestTask.new do |t| t.libs << 'test' end desc "Run tests" task :default => :test
6 Moving the example into a real Ruby extension
6.1 How should the API look?
- There's quite a few different ways to create an API. Since we know that we are going to perform an XPath search when we instantiate our class, it would be nice to have it go ahead and preprocess the xml into memory for us. This obivously has limitations based on file size, but we are going to ignore that for the purposes of this example.
6.1.1 A note about GC and memory management
- Note that in our C example libxml created and freed the memory. Ruby will not be able to handle the cleanup here and we will introduce a memory leak if we ignore this.
6.2 Object creation
- When we create the object we should parse the XML document into memory and make it available for reference. Here's what our object creation will look like.
document = XPanther.new("/path/to/document.xml")
6.3 When the basic constructor just won't do
- Since libxml needs to manage its own memory here, we will need to modify the constructor just a bit to account for this.
VALUE constructor(VALUE self, VALUE filename) { xmlDocPtr doc; VALUE argv[1]; VALUE t_data; doc = xmlParseFile(StringValueCStr(filename)); if (doc == NULL) { rb_raise(rb_eRuntimeError, "Error: unable to parse file \"%s\"\n", StringValueCStr(filename)); return Qnil; } t_data = Data_Wrap_Struct(self, 0, xml_free, doc); argv[0] = filename; rb_obj_call_init(t_data, 1, argv); return t_data; }
- There's a few new ideas going on here. We are accepting a
filename as an argument to our constructor. This is then
converted from a Ruby string to a C string via the
StringValueCStr
function and passed intoxmlParseFile
. Error checking is important here. If the user passes in an invalid argument we want to notify them and return nil. We then have to take our variable and wrap them in an object representation for Ruby. We can do this viaData_Wrap_Struct
. We have to provide it the object reference, a mark for garbage collection, a pointer to the function to call when it's time to free the memory, and a pointer to the data that we want to stuff inside. We will examine thexml_free
function in a minute. Finally, we will manually initialize our object withrb_obj_call_init
and feed it our object data, argument count, and argument data. This is the C way to manually create a constructor for a Ruby class.
6.4 Freeing the memory
- Previously, we pointed to a function called
xml_free
that is supposed to instruct Ruby's garbage collection routines on how to deal with the memory allocated by libxml during object construction. Let's take a look.
static void xml_free(void *doc) { xmlFreeDoc(doc); }
- All we are doing here is delegating the memory management to libxml. We just have to pass the function a pointer to the document in memory.
6.5 Wiring up our new constructor
- In order for us to be able to accept an argument in our constructor, we also need to create an initialize method.
static VALUE initialize(VALUE self, VALUE filename) { rb_iv_set(self, "@filename", filename); return self; }
- We have a new constructor that will serve our purpose well, but
we still need to wire it up to our object inside the
Init_xpanther
function.
void Init_xpanther(void) { VALUE klass = rb_define_class("XPanther", rb_cObject); rb_define_singleton_method(klass, "new", constructor, 1); rb_define_method(klass, "initialize", initialize, 1); }
- Here we are defining what the
new
method is going to do. In this case, we are going to use our constructor and the cycle is complete.
6.6 Putting it all together
- We have all the structure in place. We just have to drop our
search routine in place and wire it up and our task will be
complete. Let's define our
search
method.
VALUE search(VALUE self, VALUE xpathExpr) { VALUE results = rb_ary_new(); xmlDocPtr doc; xmlXPathContextPtr xpathCtx; xmlXPathObjectPtr xpathObj; xmlNodeSetPtr nodes; xmlNodePtr cur; int size; int i; Data_Get_Struct(self, xmlDoc, doc); xpathCtx = xmlXPathNewContext(doc); if (xpathCtx == NULL) { rb_raise(rb_eRuntimeError, "Error: unable to create new XPath context\n"); return Qnil; } xpathObj = xmlXPathEvalExpression(StringValueCStr(xpathExpr), xpathCtx); if (xpathObj == NULL) { rb_raise(rb_eArgError, "Error: unable to evaluate xpath expression \"%s\"\n", StringValueCStr(xpathExpr)); xmlXPathFreeContext(xpathCtx); return Qnil; } nodes = xpathObj->nodesetval; size = (nodes) ? nodes->nodeNr : 0; if (size == 1) { results = rb_str_new2(xmlNodeGetContent(nodes->nodeTab[0])); } else { for (i = 0; i < size; ++i) { cur = nodes->nodeTab[i]; rb_ary_push(results, rb_str_new2(xmlNodeGetContent(cur))); } } xmlXPathFreeObject(xpathObj); xmlXPathFreeContext(xpathCtx); return results; }
- Here we define a function that accepts an xpath expression just
like we had in our pure Ruby example. We setup our variables just
like we did in our experiment with the exception of
VALUE results
. This is the value that we will pass back to Ruby after we are done. We now see the reverse of our object packing,Data_Get_Struct
. It's a type-safe wrapper around theDATA_PTR
macro which essentially just returns the data we packed up in our constructor and places it inside thedoc
variable. - The rest of this function looks pretty similar to our experiment with few exceptions. These are all Ruby/C interop functions that make it possible for C to understand the Ruby data and then pass it back so that Ruby can understand it.
- We already covered
StringValueCStr
, but we haven't seenrb_str_new2
yet. This function, along with the otherrb_str_new
functions, turns achar *
into aVALUE
for Ruby to consume.rb_str_new2
is the most commonly used of the conversion functions, because it automatically calculates the length of the string, making the function call more convienient. rb_ary_new
andrb_ary_push
do exactly what you think they do.- Let's wire up the search function:
void Init_xpanther(void) { VALUE klass = rb_define_class("XPanther", rb_cObject); rb_define_singleton_method(klass, "new", constructor, 1); rb_define_method(klass, "initialize", initialize, 1); rb_define_method(klass, "search", search, 1); }
- We need to do a little bit of housekeeping to make things build
properly. First, adjust the includes in
xpanther.c
to include the libxml pieces from our experiment.
#include <ruby.h> #include <libxml/tree.h> #include <libxml/parser.h> #include <libxml/xpath.h> #include <libxml/xpathInternals.h>
- We also need to modify
extconf.rb
and tell it to link against libxml so that our extension can compile properly.
require 'mkmf' have_library("xml2") find_header("libxml/tree.h", "/usr/include/libxml2") find_header("libxml/parser.h", "/usr/include/libxml2") find_header("libxml/xpath.h", "/usr/include/libxml2") find_header("libxml/xpathInternals.h", "/usr/include/libxml2") create_makefile('xpanther/xpanther')
- Since we moved around a bit inside the
xpanther.c
file, here's a complete sample.
#include <ruby.h> #include <libxml/tree.h> #include <libxml/parser.h> #include <libxml/xpath.h> #include <libxml/xpathInternals.h> static void xml_free(void *doc) { xmlFreeDoc(doc); } static VALUE initialize(VALUE self, VALUE filename) { rb_iv_set(self, "@filename", filename); return self; } VALUE constructor(VALUE self, VALUE filename) { xmlDocPtr doc; VALUE argv[1]; VALUE t_data; doc = xmlParseFile(StringValueCStr(filename)); if (doc == NULL) { rb_raise(rb_eRuntimeError, "Error: unable to parse file \"%s\"\n", StringValueCStr(filename)); return Qnil; } t_data = Data_Wrap_Struct(self, 0, xml_free, doc); argv[0] = filename; rb_obj_call_init(t_data, 1, argv); return t_data; } VALUE search(VALUE self, VALUE xpathExpr) { VALUE results = rb_ary_new(); xmlDocPtr doc; xmlXPathContextPtr xpathCtx; xmlXPathObjectPtr xpathObj; xmlNodeSetPtr nodes; xmlNodePtr cur; int size; int i; Data_Get_Struct(self, xmlDoc, doc); xpathCtx = xmlXPathNewContext(doc); if (xpathCtx == NULL) { rb_raise(rb_eRuntimeError, "Error: unable to create new XPath context\n"); return Qnil; } xpathObj = xmlXPathEvalExpression(StringValueCStr(xpathExpr), xpathCtx); if (xpathObj == NULL) { rb_raise(rb_eArgError, "Error: unable to evaluate xpath expression \"%s\"\n", StringValueCStr(xpathExpr)); xmlXPathFreeContext(xpathCtx); return Qnil; } nodes = xpathObj->nodesetval; size = (nodes) ? nodes->nodeNr : 0; if (size == 1) { results = rb_str_new2(xmlNodeGetContent(nodes->nodeTab[0])); } else { for (i = 0; i < size; ++i) { cur = nodes->nodeTab[i]; rb_ary_push(results, rb_str_new2(xmlNodeGetContent(cur))); } } xmlXPathFreeObject(xpathObj); xmlXPathFreeContext(xpathCtx); return results; } void Init_xpanther(void) { VALUE klass = rb_define_class("XPanther", rb_cObject); rb_define_singleton_method(klass, "new", constructor, 1); rb_define_method(klass, "initialize", initialize, 1); rb_define_method(klass, "search", search, 1); }
- Now all we have to do is rebuild and reinstall our gem to give it a try. While we are at it we should also bump the version number.
$ gem build xpanther.gemspec Successfully built RubyGem Name: xpanther Version: 0.0.1 File: xpanther-0.0.1.gem $ gem install xpanther-0.0.1.gem Building native extensions. This could take a while... Successfully installed xpanther-0.0.1 1 gem installed Installing ri documentation for xpanther-0.0.1... Installing RDoc documentation for xpanther-0.0.1... $ irb -rubygems -rxpanther >> document = XPanther.new("examples/twitter.xml") => #<XPanther:0x108eb7c98> >> document.search("/statuses/status/text").count => 20 >> document.search("/statuses/status/text").first => "Hoje sai com a @is_nanny"
7 The results!
- It's time to see how much we gained from our efforts. Let's put together a test
require 'rubygems' require 'xpanther' document = XPanther.new("twitter.xml") results = document.search("/statuses/status/text") puts results.count puts results.first
- We can time our run as we did before and take a look at the difference.
$ time ruby extended.rb 20 Hoje sai com a @is_nanny ruby extended.rb 0.02s user 0.01s system 95% cpu 0.029 total
7.1 A better example
require 'rubygems' require 'xpanther' document = XPather.new("iTunes Music Library.xml") document.search("/plist/dict/dict/dict/key[text()='Artist']/following-sibling::string[1]").uniq
$ time ruby c.rb ruby c.rb 0.03s user 0.01s system 95% cpu 0.037 total $ time ruby pure.rb ruby pure.rb 0.53s user 0.03s system 99% cpu 0.559 total
8 References
- RubyGems Guides, C Extensions http://guides.rubygems.org/c-extensions/
- Programming Ruby, Extending Ruby http://ruby-doc.org/docs/ProgrammingRuby/html/extruby.html