Updates to CodeIgniter MongoDB library

This library can now be found at https://github.com/alexbilbie/codeigniter-mongodb-library and has been updated with numerous bug fixes since this post was written. If you find a new bug please add it to the issue tracker. Thanks!

I’ve spent some time this evening updating my CodeIgniter MongoDB library. You can get the latest release (4.0.1 at time of writing) at https://bitbucket.org/alexbilbie/codeigniter-mongo-library/

hg clone https://bitbucket.org/alexbilbie/codeigniter-mongo-library

Or if you’re one of the cool kids and using Sparks run

php tools/spark install -v4.0.1 mongodb

So what’s new?

You can now pass a mongo id in where and it will automatically be converted to the correct MongoId object type. You can also pass a field and value to the where function instead of an array. Thanks to Phil Sturgeon for this.

$this->mongo_db->where('_id', 'ced141265b96c037a3cab9dee0f3b4fa')->get('post')

I’ve also added a lot of new update functions which should make expressing updates much easier. Now you can write queries like:

$this->mongo_db
->where('_id', 'ced141265b96c037a3cab9dee0f3b4fa')
->set('title', 'My new blog post')
->inc('comment_count', 1)
->push('comments', array('id'=>1, 'name'=>'Alex', 'text'=>'Hello, world!'))
->update('post')

The new functions you can use are:

  • inc – increment the (integer) value of a field
  • dec – decrement the (integer) value of a field
  • set – sets the field to a new value
  • unset – unsets a field
  • push – pushes a new element into an array
  • pop – pops the last element from an array
  • pull – removes all occurrences of value from field
  • rename_field – renames a field (key remains intact)

There are a few missing functions that I couldn’t get to work this evening but I’ll add them shortly:

  • push_all – appends each value (where value is an array) to field
  • pull_all – removes all occurrences of value (where value is an array) in field
  • bit – does a bitwise update of a field
  • add_to_set – adds value to the array only if its not in the array already, if field is an existing array, otherwise sets field to the array value if field is not present

There are a number of other small enhancements, and I’ve updated the licence to the MIT License.

Coda syntax styles

A number of people in the past have asked me if they can have a copy of the syntax stylesheets I use in Coda. Well I’ve put them up on my Bitbucket so anyone can use them. Included are stylesheets for CSS, PHP, HTML, JavaScript and SQL.

https://bitbucket.org/alexbilbie/coda-styles/src

I’ve found that a dark stylesheet with pastel highlighting is the most comfortable to look at for long periods of time.

To import into Coda, open Preferences > Colors > Import.

#totalrecal: We need faster interwebz!

Now that we’ve got live data being produced from Blackboard and CEMIS we can start writing scheduled jobs to insert this data into the Total ReCal database however in the case of CEMIS we’re having a few problems.

Everyday a CSV file is created of all of the timetable events for each student. This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to parse each row in this CSV file and insert it into Nucleus over the API at about 0.9s per row. This isn’t good enough. As my tweet the other day shows, we need to significantly speed this up:

So our timetable import into #totalrecal (1.7m records) will currently take 19 days. Our target is 90 mins. HmmmTue Oct 12 17:24:31 via web

At the moment we’re simply streaming data out of the CSV file line by line (using PHP’s fgets function) and then sending it to Nucleus over cURL. Our two main problems are that the CSV file is generated one student at a time and so ideally needs to be re-ordered to group events by the unique event ID in order to improve performance by reducing the number of calls to Nucleus because we can just send the event with all associated students as one. Our second problem is parsing and building large arrays results in high memory usage and currently our server only has 2gb of memory to play with. We’ve capped PHP at 1gb memory at the moment however that is going to impact on Apache performance and other processes running on the server. Ideally we don’t want to just stick more memory into the machine because that isn’t really going to encourage us to fine tune our code so that isn’t an option at the moment.

Over the next few days we’re going to explore a number of options including altering the current script to instead send batched data using asynchronous cURL requests, and also then re-writing that script in a lower level language, however the second is going to take a bit of time as one of us learns a new language. Both should hopefully result in significantly improved performance and a decrease in execution time.

I’ll write another post soon that explains our final solution.