Wednesday, 21 March 2012

QCon London – Thursday Edition

In my first week on a new job I was lucky enough to be sent to my first conference, hurrah. It was only for a single day out of the 3 that QCon runs for, but being an infinity percent improvement on everywhere else I’ve worked I can’t complain at that. I wrote up my experiences to share with the team, and they went a little something like this:

First up was a keynote session titled “Simple made easy” with Rich Hickey, the guy behind the newish up-and-coming language Clojure which is a LISPish implementation running on the JVM. This was a high level talk about how we let complexity into our software. I really liked the way that he defined differences between simple and easy, reaching back to the words’ etymology. Simple comes from “single fold” whereas easy derives from “near by”. He went on to talk about how we may often make choices because they are easy for us, even though they have inherent complexity whereas a simple alternative may seem hard in comparison because we don’t understand the techniques/tools etc. A good example that sprung to my mind is that of IoC. Getting to grips with using something like Windsor can seem quite hard, but in doing so we remove the responsibility of managing the creation and lifetime of dependencies in a given method or class. This means that  it just needs to focus on doing its work and leaves the IoC container to look after the dependencies, so whilst it is simple it starts as a hard concept until we’ve done it enough. He also made the point that state is inherently complex due to the fact that it combines values and time. Some complexity can’t be eliminated, stateless systems won’t do a whole lot, but it needs to be minimised and managed appropriately. If you look through the slides for it you’ll see reference to using functions instead of methods. This means functions in the functional programming style where they are stateless and free of side effects, so just take parameters and return a value based upon them and nothing else. I’d recommend that everyone furtling with code should have a flick through the slides for this one, there are a few places where the lack of commentary is a nuisance, but most of the key points should be clear enough. http://qconlondon.com/dl/qcon-london-2012/slides/RichHickey_SimpleMadeEasy.pdf

After this session the conference broke off into the different tracks, and I bounced between the architecture and finance tracks as they had some nice, geeky-sounding sessions. “Progressive Architectures at The Royal Bank of Scotland” was a pretty decent talk about a few things that RBS have been doing and could probably have fitted into either track mentioned, but happened to be in architecture. Latency is very important in the finance world where a second delay doing trades can mean a change of price that loses you money. In managed languages (they were using Java, but .Net has the same issues) the garbage collector kicks in from time to time causing the system to pause whilst it tidies up. If a pause of a second or three might cause your business to fail this means that using such languages is a problem, so work in vital areas is often all done in C/C++ to eliminate the unpredictability but at the expense of increased development times due to working at a low level. The approaches to gain some control over GC runs involved either having a huge generation 1 space but creating minimal garbage or have a tiny generation 1 which can be GC’d in a known short time (around 1ms) so that 3rd party or base class components can create garbage (cos they will) but the main working set doesn’t change. Techniques for avoiding the creation of garbage include non-allocating queues and object pools. The talk then switched to their big data requirements in risk modelling. The general gist here is that lots of data is created in simulations of events. Some events influence other events so the amount of data reflects all those permutations so advances in modelling have meant changes from 2MB to 10TB of data being created and using Hadoop as a distributed filesystem has been a good answer. It then moved on to the subject of messaging and that different services may keep historical info pertaining to messages, but each have their own interpretation, so a centralised store can bring back some consistency here. This led them to have a message bus with a sharded in-memory nosql data store in front of it. The last segment was about data virtualisation. Different systems have their own data requirements and need to interface with other systems. This can typically be handled by feeding and replicating data between different systems with the headaches that brings along, or providing a huge central database that has to fulfil every system’s requirements. Data virtualisation creates the appearance of a single database from a series of different databases. At a couple of points during the talk, there was mention of investigation into using GPGPUs to enhance performance. I don’t know how much processor intensive work goes on here, but as there are .Net libraries for such things it does seem like something that may be worth looking into if there are areas where performance is critical and is CPU bound. The last point made was that investment in collaboration and sharing of knowledge to bring everyone’s skills up always seems to pay off. http://qconlondon.com/dl/qcon-london-2012/slides/BenjaminStopford_and_FarzadPezeshkpour_and_MarkAtwell_ProgressiveArchitecturesAtTheRoyalBankOfScotland.pdf

Next up was “Architecting for Failure at the Guardian.co.uk”, another good talk based loosely around the premise that whatever can fail, will fail, at 4am. It started with a high level overview of their site’s architecture a few years ago, being a typical web-server/app-server/database type setup. Recently this has switched to using what they refer to as microapps. The main site knows how to serve up content, but all the ancillary bits hanging off the side, like twitter feeds, related stories etc. are different systems. They are connected to over HTTP as this protocol is well known so not only is it easy for their devs to dig into for debugging problems, but responses can be easily cached by existing infrastructure. Pulling these into their own separate chunks allowed faster development and increased innovation, partially due to the fact that devs could work in their language/frameworks of choice. This has a maintenance cost however, as not everyone will be familiar with different people’s choices, so they decided to pull it all towards running on the JVM stack to give a balance of choice and consistency. This microapp separation allowed coding, testing, and deploying to happen in isolation, reducing risk of problems, and keeping apps very small and focussed. It does increase the architectural complexity somewhat, but as this was splitting up a monolithic system it was an improvement overall. He then went on to talk about some of the performance issues that they can encounter. Being a large, public-facing website means that speed is key, be it latency from microapps or huge spikes in traffic due to important breaking news, or stories about cute furry animals going viral. To handle this they have an emergency mode which stops caches from expiring, and can “press” important pages into static content that can be served far quicker. Monitoring was discussed next with key points being what went wrong, when, what changed & what can I turn off. Having automatic switches based on monitor certain things means that emergency mode can be triggered when required. Analysing logs to determine causes of problems is also vital, and a number of Unix commandline tools like grep were recommended, and choice of appropriate data to log and how to format it for parsability also need considering. Various reasons for failure were discussed, from the code level or lacking system resources to infrastructure failure and network problems. The time between failures and the time taken to recover affect the perception of how well your system copes. http://qconlondon.com/dl/qcon-london-2012/slides/MichaelBrunton-Spall_ArchitectingForFailureAtTheGuardianCoUk.pdf

Next up were a brace of disappointing sessions. On the finance track I went to “Extreme FIX Messaging for Low-Latency” which promised to take the earlier concept of low-latency, garbage-collector-beating geekery further forwards but ended up feeling more like a pitch of a great product that they’d created, with no idea of the business knowledge in the audience. It was all about the FIX message protocol, what this is was never explained, so I could only glean bits here and there, so there was an assumption that everyone knew all about it. But then there was a simple step-by-step example of trading currency to show how money can be made if the latency is tiny. If I know FIX then that would be child’s play. Sadly there was very little talk about techniques to get this great latency. Flitting back to architecture was the “Big Data Architectures at Facebook” session. This was a slow paced talk which told us that facebook have a lot of data, use hadoop for storage, and have some tools to help getting data in and out. I’m not sure that’s news to anyone. Slides are provided, but I don’t really recommend bothering with them: http://qconlondon.com/dl/qcon-london-2012/slides/KevinHoustoun_and_RupertSmith_ExtremeFIXMessagingForLowLatency.pdf http://qconlondon.com/dl/qcon-london-2012/slides/AshishThusoo_BigDataArchitecturesAtFacebook.pdf

My last session was another in the finance track, “Lock-free Algorithms for Ultimate Performance” the blurb for which included the following snippet “Make no mistake this is not a subject for beginners but if you are brave, and enjoy understanding how memory and processors really work…” which sets the tone for the sort of level this was pitched at. I’ll not try to highlight too many of the stand out points from this, mostly because it is, as it claims, hard. For the super performance that these guys were after they are applying “mechanical sympathy” to their code. This means writing things in such a way as to make optimal use of processor architectures. In some cases it’s the simple stuff that many of us may know, such as division is expensive but there are tricks with bit shifting or masking that can do the same thing cheaply for the right sort of number. On the other end of the scale they discussed how processors access memory in chunks based on cache size, so ensuring separate bits of data are far apart means that they’re in separate cache lines and won’t need to be bounced between different processor cores resulting in cache misses and wasted processor cycles. The slides for this one include links to their respective blogs and sample code for more info: http://qconlondon.com/dl/qcon-london-2012/slides/MartinThompson_and_MichaelBarker_LockFreeAlgorithmsForUltimatePerformance.pdf

A few of the things grabbing my attention were all about performance, so I feel that as a responsible senior dev type I should make the point that premature optimisation is evil. If you can measure a performance issue that is a bottleneck and a problem for your requirements then it needs addressing, but good, clean, understandable, maintainable code should be the first aim as performance optimisations tend to lead to far less clarity in your code.

Tuesday, 13 March 2012

Souce control? What a git!

So, I’ve just started a new job and for the first time I’m using git in a multiuser environment. That means it’s time to start making mistakes :)

Standard practice is to create branches for anything beyond trivial single commit tweaks, and as I’ve been started off with writing tests around some untested legacy type code I’m feeling like lots of small commits is a good idea. After a few of these I realised that I failed to branch, so everything that I’m doing is going into the “trunk” type branch. Ideally I’d like to be able to create a new branch based on the state of the code when I started, move all of my commits onto it and remove them from the trunk, as if I’d created a new branch right at the start.

A quick googling led me to a handy stackoverflow post with the following response:

git stash
git checkout -b edge SHA1_before_your_commits
git rebase master # to replay all your commits on top of this new branch
git checkout master
git reset --hard SHA1_before_your_commits
git checkout edge
git stash apply

The stash commands are just there for work that hadn’t been committed, so I didn’t need them. As a windows developer I’m used to my GUI tools so I’m using git extensions, but found it relatively simple to translate the command line directions. Lo and behold, everything seems to be working as expected. I hope…