#totalrecal: We need faster interwebz!

Now that we’ve got live data being produced from Blackboard and CEMIS we can start writing scheduled jobs to insert this data into the Total ReCal database however in the case of CEMIS we’re having a few problems.

Everyday a CSV file is created of all of the timetable events for each student. This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to parse each row in this CSV file and insert it into Nucleus over the API at about 0.9s per row. This isn’t good enough. As my tweet the other day shows, we need to significantly speed this up:

So our timetable import into #totalrecal (1.7m records) will currently take 19 days. Our target is 90 mins. HmmmTue Oct 12 17:24:31 via web

At the moment we’re simply streaming data out of the CSV file line by line (using PHP’s fgets function) and then sending it to Nucleus over cURL. Our two main problems are that the CSV file is generated one student at a time and so ideally needs to be re-ordered to group events by the unique event ID in order to improve performance by reducing the number of calls to Nucleus because we can just send the event with all associated students as one. Our second problem is parsing and building large arrays results in high memory usage and currently our server only has 2gb of memory to play with. We’ve capped PHP at 1gb memory at the moment however that is going to impact on Apache performance and other processes running on the server. Ideally we don’t want to just stick more memory into the machine because that isn’t really going to encourage us to fine tune our code so that isn’t an option at the moment.

Over the next few days we’re going to explore a number of options including altering the current script to instead send batched data using asynchronous cURL requests, and also then re-writing that script in a lower level language, however the second is going to take a bit of time as one of us learns a new language. Both should hopefully result in significantly improved performance and a decrease in execution time.

I’ll write another post soon that explains our final solution.

We need faster interwebz!

Now that we’ve got live data being produced from Blackboard and CEMIS we can start writing scheduled jobs to insert this data into the Total ReCal database however in the case of CEMIS we’re having a few problems.

Everyday a CSV file is created of all of the timetable events for each student. This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to parse each row in this CSV file and insert it into Nucleus over the API at about 0.9s per row. This isn’t good enough. As my tweet the other day shows, we need to significantly speed this up:

So our timetable import into #totalrecal (1.7m records) will currently take 19 days. Our target is 90 mins. HmmmTue Oct 12 17:24:31 via web

At the moment we’re simply streaming data out of the CSV file line by line (using PHP’s fgets function) and then sending it to Nucleus over cURL. Our two main problems are that the CSV file is generated one student at a time and so ideally needs to be re-ordered to group events by the unique event ID in order to improve performance by reducing the number of calls to Nucleus because we can just send the event with all associated students as one. Our second problem is parsing and building large arrays results in high memory usage and currently our server only has 2gb of memory to play with. We’ve capped PHP at 1gb memory at the moment however that is going to impact on Apache performance and other processes running on the server. Ideally we don’t want to just stick more memory into the machine because that isn’t really going to encourage us to fine tune our code so that isn’t an option at the moment.

Over the next few days we’re going to explore a number of options including altering the current script to instead send batched data using asynchronous cURL requests, and also then re-writing that script in a lower level language, however the second is going to take a bit of time as one of us learns a new language. Both should hopefully result in significantly improved performance and a decrease in execution time.

I’ll write another post soon that explains our final solution.

Mobile CWD v0.0000001 alpha

So after much messing around we’ve finally got a semi-decent mobile stylesheet for CWD web sites. For the moment we’ve only worked on getting mobile Safari to look right however now we’ve got a platform on which we can hack in support for other popular mobile browsers such as Opera Mobile.

Here are a few examples:

Print From My PC mobile edition
Print From My PC mobile edition
Gateway mobile editon
Gateway mobile editon

Only a few sites are working correctly at the moment, and there are a few other bugs that we need to iron out such as a flash that appears when some sites are loaded and some other funky quirks however we’ll do our best to fix these and ensure that future sites all work as expected from now on.

If anything is breaking monumentally please let me know, abilbie@lincoln.ac.uk. Thanks.