Authentication at Lincoln

Planning auth.lincoln.ac.uk
Planning auth.lincoln.ac.uk

One of the biggest parts of core.lincoln is the authentication API. The current applications used at Lincoln all use either our Windows domain logins (e.g. “abilbie” for my staff account or “06081032” for my student account) or they make use of our SafeCom printing ID (which is a member of staff’s employment id or for students it’s their eight digit student number). This presents a small problem because there currently isn’t any sort of service to easily convert one to the other and vice versa.

Our plan for the new Lincoln authentication service is to expose authentication logins over a standard known as OAuth which will hopefully start to bring some consistency to the way we sign in to apps at Lincoln. We’re going to implementing the OAuth 1.0a spec, “2 legged” OAuth and desktop pin OAuth (see http://apiwiki.twitter.com/Authentication). Additionally we’ll have a private SAML and ADFS authentication service for apps which we’ll talk about in the future.

OAuth authentication works with the concept of a consumer (an web/desktop application that wishes to access user data), a provider (a service which stores user data) and a user (who has an account with the provider). The conversation goes something like this:

Consumer: Hi there, I’d like to access your data on behalf of a user please. Here is my API Key and API Secret.

Service Provider validates the API Key and API Secret.

Service Provider: That’s cool. Here is an Request Token and an Requst Secret. Please send the User to this URL and append the Request Token to the query string.

Consumer redirects User to the Service Provider sign-in URL with the Request Token in the query string.

User signs into the Service Provider.

User approves the Consumer to access Protected Resources on their behalf.

Service Provider redirects User back to the Consumer Callback URL.

Consumer: Hello again, I’ve had a user sent back to my Callback URL. Here is the Request Token and Request Secret you gave me earlier, can I please now have access.

Service Provider validates the Request Token and Request Secret to check the User has authorised them.

Service Provider: Sure, here is an Access Token and an Access Secret which you can use to to access Protected Resources on behalf of the User.

Consumer sends the Access Token and Access Secret to the Service Provider to establish which user they have just authorised access.

Obviously it’s a bit more complicated than that but that’s roughly how it works.

The OAuth service needs to store information about applications (such as their API tokens and secrets), and also the request and access tokens and secrets. Additionally we’ve decided to write into it a permissions layer so that applications can only access certain information (e.g. they can only access basic information about users (such as their name and faculty) unless they’ve been granted additional access to extended information such as a user’s home address or phone number).

Traditionally we’d just create our app using a [insert your favourite relational database here] database however because of the high read + write requirements of an OAuth service – for each application there will be potentially 12,000 (current number of staff + students) sets of request tokens and secrets and access tokens and secrets – we’ve therefore decided to take at a different approach and are currently looking at using either Memcached or MongoDB to store these resources.

Memcached

Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

http://memcached.org/

MongoDB

MongoDB (from “humongous”) is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language.

MongoDB is designed for problems without heavy transactional requirements that aren’t easily solved by traditional RDBMSs, including problems which require the database to span many servers.

http://en.wikipedia.org/wiki/MongoDB

The advantage of Memcached to us is that it’s just so fast because everything is stored in memory. The disadvantage is that we need servers with a beefy amount of RAM and also if the machine should power off for any reason then it loses it’s entire index meaning that until everyone uses an application again it has to keep hitting the database to lookup data. Additionally Memcache indexes aren’t replicated across memcache instances.

MongoDB on the other hand is a document database. This means that unlike something like MySQL where you have to define a database schema made up of tables with set columns, with Mongo you are free to define things how you like creating and deleting things on the fly (basically no two rows have to have the same columns if you dont want them to). Recent builds of Mongo have shown seriously impressive benchmarks:

Storage engine request/second benchmark
Storage engine request/second benchmark - image source

Like Memcached, Mongo is seriously fast, peaking at just under 1600 request/second in the benchmark graph above (the server hardware was a 2.2 GHz quad core AMD Opteron with 2GB of RAM – benchmark source). Mongo can be configured to shard nicely across multiple servers so we don’t have to worry about “what ifs” so much in the event a node goes down and it’s also got a very stable PHP PECL module. Additionally because it is disk based we don’t have to worry about indexes being re-built and such.

We’re thinking at the moment we’re going to have two documents stores in Mongo, the first is the apps document which will hold information about apps such as name, admincontact details, API key + secret and also permissions. The second document will contain request and access tokens and also a copy of the application permissions which the tokens are linked to (to remove the need to perform joins between the two documents).

Nick and I did some planning the other night (see the first picture in this post) and we think we’ve got a solid internal and external API planned out, it’s really just a case of building it now and convincing the powers that be that this is something worth investing in. The plan is that all the new apps that we build will make use of the OAuth authentication process (starting with Posters), which means we can start to implement (to some extent) the beginnings of a single sign on service (I can just hear all the Lincoln staff and students crying out with joy at this!). Additionally once we’ve got this authentication service built and tested we can start to expose some of the APIs we’ve been building for consumption by outsiders (so if you’ve got any ideas for apps then please let us know and we’ll do our best to give you the APIs to build it).

So that’s what we’re up to, if you’ve got any ideas or experience with any of the above then please get in touch, this is all very new and any guidance will be welcomed.

Core dot Lincoln

So yesterday afternoon, Nick and I found ourselves talking about the data available for us to mash, mungle and meddle with here at Lincoln. Both of us have been working on our own pet projects utilising what data we can lay our paws (read: code) on, however there really isn’t much we can play with.

Therefore, armed with borrowed marker pens we headed to the “bean bag room” (MB 1009) to see if we could lay out a University data flow diagram, highlighting the core stores that we believe are needed to provide a functional base from which to build almost any possible service.

After much discussion we decided that the following seven core data stores are needed:

Core Data Stores - Location, People, Messages, Me, Groups, Events, and Auth
Core Data Stores - Location, People, Messages, Me, Groups, Events, and Auth

Location – This is a store listing campuses, buildings on a campus, rooms in a building (and other objects such as corridors, stairs, lifts, etc) and a list of links between different building objects (for use in routing). Buildings have geolocation meta  (i.e. latitude and longitude) and building objects have XY meta so that routes can be made between objects (more on this in another post).

People – This store is basically an interface to the University’s LDAP servers that provide basic personal information on staff and students (such as names, email addresses, phone numbers, etc) and extended meta such as which year group students are in or which departments staff work for.

Both Location and People are considered “static” stores (hence the different colour) because the data inside will rarely change.

Me – This store contains custom meta for individual users, e.g. which student societies they are members of, how they’d like to be contacted (e.g. SMS, email, Twitter, snail mail) and what preferences they have to communications (e.g. an interest in computing related news but not sport).

Groups – Groups are collections of users who are associated in the same way, e.g. all level 2 students, all students in the computing faculty or all students in the drama society. This data is drawn from both People and Me.

Events – This store holds information concerning the availability of locations. These events could be academic lessons or seminars, room bookings for staff meetings, or events that the student union are putting on however this information is not stored at this level, Events merely stores the fact that a location is (un)available at a specific date and time.

Messages – Messages is a global notifications system for pushing content out to individuals or groups according to their individual contact preferences (defined in Me). For example, a timetabling application can push a message out to students if a lecture is cancelled and they’ll be informed of it however they elected, be it email or Twitter.

Auth – Auth is a store containing data concerning session data for signed in users, application API keys and application authentication states (e.g. OAuth tokens).

Using these core data stores applications can then be built on top, all using the same authentication methods, and all making use of the available services meaning users get very fine control over their computing experience and applications are able to access very rich datasets.

We’re looking into exposing personal user data (from People and Me) using OAuth, which gives both users and the University control as to how personal data can be used. Obviously some internal services such as timetabling can’t be opted out of, however other applications such as those used by student societies can be granted or revoked access (much like Facebook Connect or Twitter’s OAuth implementation) as needed.

When working on the white board we tried to identify the flow of data between services and this is what we came up with (click for larger image):

Our white board data flow diagram

Here is the cleaned up version (click for larger image):

University of Lincoln's data flow between services

I’m not going to go over all of the points on this chart as some are internal projects which we’ll talk about later and others are just self explanatory, however I will cover a few:

Find a Free Room – a frequent complaint by both staff and students is that it can be difficult to find a free room at a moments notice. This could be for an impromptu meeting or a student society looking to find a room for a session. Staff can currently access a number of timetables via Portal including their own timetable, and room timetables however going through every room just to find a free one isn’t a quick task (and then, as Nick and I found yesterday when we were trying to find a room with a whiteboard, it doesn’t mean a room is free just because there isn’t a timetabled event). Therefore, a “find a free room” application would make use of the Events store to return a list of rooms that are currently available (which can be further limited to specific rooms/buildings and at certain dates/times). All staff and students would have to go through this application to make bookings in order for this to work.

Calendar – the calendar service would provide data drawn from the Events and Me service. For students this would show their academic timetable, as well as any events for any affiliations they’re involved with (e.g. student societies or research groups) or custom events they’ve created themselves (such as assignment deadlines). For staff this could show academic timetables (for teaching staff), meetings and personal events.

Hopefully implementing some (or even better, all) of these services will result in us able to truly implement the Strategic Plan Overview that describes the experience of the 2012 student, and hopefully Nick’s rewrite too.