Update

This has been a big month for Total ReCal. We’ve now perfected our event importers for Blackboard assignments and academic timetables, and we’ve started working on the main web application (screenshots too). We’ve also launched a beta registration page for interested staff and students to sign up for early access. Finally, our Talis Keystone service that the University has recently purchased will be in place very soon meaning we can also start importing book return dates for staff and students.

After numerous code re-writes we’ve got a rock solid API for adding, updating and deleting events in our Nucleus data store. Our import code has also had many updates to support logging of changes to events which will be invaluable to students to keep them up to date. Once the main Total ReCal application has been developed we’re going to sit down and work out how we’re going to best make use of these logs.

When a lecturer calls in sick the central timetabling department isn’t informed (unless it will affect lecturers for a long period of time). Therefore based on our current nightly timetable imports we won’t find out about any changes. We’re going to develop a tool for faculty administration staff to make changes to events as they’re going to be more aware of what the situation is day to day. This means that we can then inform students of changes that day as soon as someone changes it.

In terms of the front end, I’ve forked our common web design, called it ‘common web design x’, made it fluid to adapt to browser size, made it completely semantic HTML5 based, and taken the concept of progressive enhancement to new levels. It will also make use of our new OAuth 2.0 based single sign on service that I’ve written and it will automatically adapt to mobile layouts.

Moving Forward

Over the past week we’ve worked tirelessly to perfect our timetable import code and we’ve now got a system that is working with real data. A select few students have now been given access to iCal feeds for both their timetables and their Blackboard assignments and the Library is hoping to have their Talis Keystone system in place very soon meaning we can start producing feeds of people’s book return dates.

Our next big job is to move away from bulk imports of data and instead start developing code that will go through and validate and verify events. So this could be looking for changes in the time of events or verifying that the right students are seeing the right events (in the event of a student changing course for example). With these changes logged we can then tackle one of the top requests that students have of the University and that is to be better informed of changes to their timetables.

The main timetables are produced by the Registry department however they aren’t informed if a lecturer is ill on a particular day, and in any case timetables aren’t updated currently until the following morning, so we’re planning on developing a tool for faculty offices to use so that they can make individual amendments to timetables when rooms need changing or lectures have been cancelled so that students can be informed sooner.

The logging of these changes will be important for Blackboard too. Certain schools and faculties like the idea of personalised assignment calendars however their own internal policies don’t allow staff to set deadlines inside Blackboard because deadlines may be changed by lecturers and senior staff aren’t informed. This is why the Computing School for example release a huge Excel spreadsheet of deadlines because it means only two people have access to change deadlines. We don’t want to be in a situation where we have to create individual departments their own tools to manage assignment deadlines, we’d prefer everyone used Blackboard and so with the ability to log changes to events what we could do is delay the update of the deadline in the student calendars for 24 or 48 hours giving senior staff a period in which to change it back to the original date or leave it (i.e. approve the change).

Our plan over the next few weeks is to perfect our API for querying events, give more students access to the their iCal feeds and also start developing the front end calendar application.

We need faster interwebz!

Now that we’ve got live data being produced from Blackboard and CEMIS we can start writing scheduled jobs to insert this data into the Total ReCal database however in the case of CEMIS we’re having a few problems.

Everyday a CSV file is created of all of the timetable events for each student. This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to parse each row in this CSV file and insert it into Nucleus over the API at about 0.9s per row. This isn’t good enough. As my tweet the other day shows, we need to significantly speed this up:

So our timetable import into #totalrecal (1.7m records) will currently take 19 days. Our target is 90 mins. HmmmTue Oct 12 17:24:31 via web

At the moment we’re simply streaming data out of the CSV file line by line (using PHP’s fgets function) and then sending it to Nucleus over cURL. Our two main problems are that the CSV file is generated one student at a time and so ideally needs to be re-ordered to group events by the unique event ID in order to improve performance by reducing the number of calls to Nucleus because we can just send the event with all associated students as one. Our second problem is parsing and building large arrays results in high memory usage and currently our server only has 2gb of memory to play with. We’ve capped PHP at 1gb memory at the moment however that is going to impact on Apache performance and other processes running on the server. Ideally we don’t want to just stick more memory into the machine because that isn’t really going to encourage us to fine tune our code so that isn’t an option at the moment.

Over the next few days we’re going to explore a number of options including altering the current script to instead send batched data using asynchronous cURL requests, and also then re-writing that script in a lower level language, however the second is going to take a bit of time as one of us learns a new language. Both should hopefully result in significantly improved performance and a decrease in execution time.

I’ll write another post soon that explains our final solution.