Linking You to Safe Sites

Our URI shortening service, Linking You, has been slowly increasing in usage since we launched it last year. We’ve now shortened over 2,500 URIs which have been visited over 80,000 times. As Linking You becomes further rooted into the University, and is used by increasingly more and more people we’ve decided that we ought to spend some time ensuring that our service is not being used for malicious purposes.

I’ve just implemented a new feature which will help protect you against malicious links shortened by Linking You. Every new URI shortened as of this evening is now checked against Google’s Safe Browsing service and SURBL’s URI reputation database.

In the interest of keeping the Internets working, URIs that are considered dangerous by either of the above services will still be shortened however when you visit the short link you’ll be presented with a warning message instead of being forwarded on.

In the coming weeks we’re going to require all users of the API to use an API key in order to prevent misuse.

#totalrecal: We need faster interwebz!

Now that we’ve got live data being produced from Blackboard and CEMIS we can start writing scheduled jobs to insert this data into the Total ReCal database however in the case of CEMIS we’re having a few problems.

Everyday a CSV file is created of all of the timetable events for each student. This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to parse each row in this CSV file and insert it into Nucleus over the API at about 0.9s per row. This isn’t good enough. As my tweet the other day shows, we need to significantly speed this up:

So our timetable import into #totalrecal (1.7m records) will currently take 19 days. Our target is 90 mins. HmmmTue Oct 12 17:24:31 via web

At the moment we’re simply streaming data out of the CSV file line by line (using PHP’s fgets function) and then sending it to Nucleus over cURL. Our two main problems are that the CSV file is generated one student at a time and so ideally needs to be re-ordered to group events by the unique event ID in order to improve performance by reducing the number of calls to Nucleus because we can just send the event with all associated students as one. Our second problem is parsing and building large arrays results in high memory usage and currently our server only has 2gb of memory to play with. We’ve capped PHP at 1gb memory at the moment however that is going to impact on Apache performance and other processes running on the server. Ideally we don’t want to just stick more memory into the machine because that isn’t really going to encourage us to fine tune our code so that isn’t an option at the moment.

Over the next few days we’re going to explore a number of options including altering the current script to instead send batched data using asynchronous cURL requests, and also then re-writing that script in a lower level language, however the second is going to take a bit of time as one of us learns a new language. Both should hopefully result in significantly improved performance and a decrease in execution time.

I’ll write another post soon that explains our final solution.

Mobile CWD v0.0000001 alpha

So after much messing around we’ve finally got a semi-decent mobile stylesheet for CWD web sites. For the moment we’ve only worked on getting mobile Safari to look right however now we’ve got a platform on which we can hack in support for other popular mobile browsers such as Opera Mobile.

Here are a few examples:

Print From My PC mobile edition
Print From My PC mobile edition
Gateway mobile editon
Gateway mobile editon

Only a few sites are working correctly at the moment, and there are a few other bugs that we need to iron out such as a flash that appears when some sites are loaded and some other funky quirks however we’ll do our best to fix these and ensure that future sites all work as expected from now on.

If anything is breaking monumentally please let me know, Thanks.