My wishes for OS X 10.7

So Apple have announced a media event on the 20th October dubbed ‘Back to the Mac’. To me, the lion behind the cut out Apple logo implies that this event will preview OS X 10.7 which I’m very much looking forward to. I’ve been a Mac user now for about 18 months and generally I’m very happy, there are some really fantastic programs which I rely on day to day – Coda, Sequel Pro, MAMP, iCal and recently Sparrow, and with spaces and expose I feel so much more productive. However there are a few niggling issues that I hope 10.7 will solve.

Finder

Compared to Windows Explorer, the Finder really is crap. It’s slow, using the arrow keys to navigate folders doesn’t work as expected, you can’t refresh a folder as you can with Explorer using F5 and various other little things that make it counter productive. In fact I can’t think of a single thing that Finder does that Windows Explorer doesn’t do better. Even if 10.7 only results in a new Finder I’ll be happy, and if they don’t I’m first in line to buy a copy of Pathfinder.

Growl

I don’t know why Apple haven’t bought/leased/licensed/copied Growl already. It is by far one of the most useful enhancements to the operating system there is. I’d like to see either Growl itself or an Apple version built into the OS. In terms of enhancements I’d really like a way to push things such as URLs, pictures, and textual content to my iPhone like Android users can with various browser plugins and Growl could be the way to do this.

Sparkle

Sparkle is a software update framework for Cocoa developers and is used by almost all of the non-Apple applications I use. Like Growl I think this should be incorporated into 10.7 so that it becomes the defacto standard for doing software updates that are intuitively simple for users.

Installing software

I love the approach that most software written for OS X takes whereby installation is just a case of dragging that app into the Applications folder. Other software uses an installer similar to Windows software with a progress bar and some have custom installers like Adobe’s software. Again I’d like to see a defacto standard for installing software because I’m increasingly finding myself moving what I think are apps into the Applications folder only to find out that they are actually the installers. I appreciate that some software requires you to agree to a license or install some other utilities into other folders however why can’t this be done on first run?

Uninstalling software

OS X also has no method to completely uninstall software. Yes you can drag an app into the trash however that doesn’t remove the preference and cache files and other junk that some apps leave behind. I currently use TrashIt but there are numerous others such as AppZapper which do the same thing. Again would it really be that difficult to have an uninstaller built into the OS, or could the process that empties the trash be improved to automatically remove cache and preference files when it realises it’s deleting an app?

Hardware accelerated apps

Now I don’t proclaim to completely know what I’m on about here but I think there is a still a big barrier for software developers to make hardware accelerated apps. Why is it that Mozilla can so easily add in hardware acceleration to Firefox 4 for Windows Vista/7 and Linux users but they can’t for OS X. Does OS X not provide the right APIs yet or is there just a lack of documentation from Apple on how to easily do this.

I think that is everything, if I think of anymore I’ll update this post.

Roll on 20th October where all will be revealed!

#totalrecal: We need faster interwebz!

Now that we’ve got live data being produced from Blackboard and CEMIS we can start writing scheduled jobs to insert this data into the Total ReCal database however in the case of CEMIS we’re having a few problems.

Everyday a CSV file is created of all of the timetable events for each student. This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to parse each row in this CSV file and insert it into Nucleus over the API at about 0.9s per row. This isn’t good enough. As my tweet the other day shows, we need to significantly speed this up:

So our timetable import into #totalrecal (1.7m records) will currently take 19 days. Our target is 90 mins. HmmmTue Oct 12 17:24:31 via web

At the moment we’re simply streaming data out of the CSV file line by line (using PHP’s fgets function) and then sending it to Nucleus over cURL. Our two main problems are that the CSV file is generated one student at a time and so ideally needs to be re-ordered to group events by the unique event ID in order to improve performance by reducing the number of calls to Nucleus because we can just send the event with all associated students as one. Our second problem is parsing and building large arrays results in high memory usage and currently our server only has 2gb of memory to play with. We’ve capped PHP at 1gb memory at the moment however that is going to impact on Apache performance and other processes running on the server. Ideally we don’t want to just stick more memory into the machine because that isn’t really going to encourage us to fine tune our code so that isn’t an option at the moment.

Over the next few days we’re going to explore a number of options including altering the current script to instead send batched data using asynchronous cURL requests, and also then re-writing that script in a lower level language, however the second is going to take a bit of time as one of us learns a new language. Both should hopefully result in significantly improved performance and a decrease in execution time.

I’ll write another post soon that explains our final solution.

We need faster interwebz!

Now that we’ve got live data being produced from Blackboard and CEMIS we can start writing scheduled jobs to insert this data into the Total ReCal database however in the case of CEMIS we’re having a few problems.

Everyday a CSV file is created of all of the timetable events for each student. This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to parse each row in this CSV file and insert it into Nucleus over the API at about 0.9s per row. This isn’t good enough. As my tweet the other day shows, we need to significantly speed this up:

So our timetable import into #totalrecal (1.7m records) will currently take 19 days. Our target is 90 mins. HmmmTue Oct 12 17:24:31 via web

At the moment we’re simply streaming data out of the CSV file line by line (using PHP’s fgets function) and then sending it to Nucleus over cURL. Our two main problems are that the CSV file is generated one student at a time and so ideally needs to be re-ordered to group events by the unique event ID in order to improve performance by reducing the number of calls to Nucleus because we can just send the event with all associated students as one. Our second problem is parsing and building large arrays results in high memory usage and currently our server only has 2gb of memory to play with. We’ve capped PHP at 1gb memory at the moment however that is going to impact on Apache performance and other processes running on the server. Ideally we don’t want to just stick more memory into the machine because that isn’t really going to encourage us to fine tune our code so that isn’t an option at the moment.

Over the next few days we’re going to explore a number of options including altering the current script to instead send batched data using asynchronous cURL requests, and also then re-writing that script in a lower level language, however the second is going to take a bit of time as one of us learns a new language. Both should hopefully result in significantly improved performance and a decrease in execution time.

I’ll write another post soon that explains our final solution.