Ruminations of a Programmer: oscon08

Showing posts with label oscon08. Show all posts

Monday, August 04, 2008

Erlang as middleware

Delicious is also using Erlang. Well, that's yet another addition to the Erlangy list of Facebook, SimpleDB, CouchDB, Twitter and many more. All these applications/services rely on the intrinsic scalability of Erlang as a platform. RabbitMQ provides an implementation of AMQP based on Erlang, ejabberd, an XMPP implementation is also Erlang based. EngineYard is also betting on Erlang for Vertebrae, its platform for Cloud Computing. It's like Erlang is carving out it's own niche as the dominant choice of service based backends.

I can make my application scale using distributed hashmap technologies of memcached or in-process JVM clustering techniques of Terracotta or a host of other techniques that treat distribution and scalability as a concern separate from the core application design. But with Erlang/OTP, I start with shared nothing concurrency oriented process design, which can naturally be distributed across the cores of your deployment server. What is a module in the codebase can be made to map to a process in the runtime, instances of which can be distributed transparently to the nodes of your cluster.

Why Erlang ?

Erlang is naturally concurrent, with ultralightweight processes based on green threads that can be spawned in millions on a cluster of commodity hardware. As a functional language, Erlang applications are designed as shared nothing architectures that interact with each other through asynchronous message passing primitives - as if the whole code can be mathematically analyzed. This is unlike an imperative language runtime that offers shared state concurrency through threads and mutexes. Erlang runtime offers dynamic hotswapping of code, you can change code on-the-fly, converting your application to a non stop system. Finally Erlang processes can be organized into supervisor hierarchies that manage the lifetimes of their child processes and automatically restart in case of exceptions and failures. And almost all of these come out of the box through the goodness of platforms like OTP.

But do we find enough Erlang programmers ? Well, the syntax .. umm ..

Isn't your OS mainstream ?

Then why don't you find chores of developers programming with the kernel APIs that your OS publishes ? The OS offers the service which developers use everyday when they open up a host of windows, manage their filesystems, send out an IM or open up the browser to get the latest quotes on their tickers. And all these, being completely oblivious of how the kernel handles scheduling of native threads to serve up your long running requests.

Erlang is becoming mainstream in the same context.

I do not need to know a bit of Erlang to design my trading service that can process millions of messages from my AMQP endpoints in real time. In fact while prototyping for the next version of a trading back-office system, I cooked up all my services using Scala actors, that happily could use RabbitMQ's Erlang based queue and exchange implementation through well-published Java APIs of the client.

I can still architect scalable websites that need not poll Flickr 3 million times a day to fetch 6000 updates, without an iota of Erlang awareness. The technology is called XMPP application server, which scaffolds all the Erlang machinery underneath while exposing easy-to-use client interfaces in your favorite programming language ..

Erlang is becoming mainstream as a middleware service provider. And, one could feel the buzz in OSCON 2008.

Monday, July 28, 2008

The Programmable Web is getting more complex

OSCON is a conference which has various amorphous tracks and most of those relate to the "trends" of tomorrow. Some of the people whom I met were complaining about the lack of Java's presence in the conference. Well, as Tim Bray says, cool kids are not using Java these days. Personally I was a bit disappointed with the lack of presence of the Scala community in OSCON. In fact, except in one of Tim Bray's keynotes, Scala, I guess, never figured in the radar of this year's conference. There were tutorials and BOFs on actor models, where also Scala got only a passing reference. But I digress ..

Web Architecture is getting more complicated!

There is no doubt about the fact that we are looking at more and more options to structure services (and data) delivered over the Web. While we are all huge fans of RESTful APIs delivering the goods at optimal payloads for many of today's Web 2.0 sites, things may start changing with the scale of data that we are witnessing today as a result of frivolous cross pollination of data streams. Inundating message streams, flooding of tweets, flickr uploads and continuous exchange of other social objects over the Web have started challenging the inherent polling model for some of the usecases that REST based architecture employs.

In the session titled Beyond REST? Building Data Services with XMPP PubSub, Rabble and Kellan made a great presentation of how FriendFeed polled Flickr 2.9 million times on one day to check on updates for 45 thousand users, of whom only 6.7 thousand were logged in at any one time. With Flickr publishing feeds as the only way to advertise presence, Friendfeed had no other alternative but to "poll". Kellan called this "Polling sucks!". The alternative is asynchronous message passing with shared nothing architectures and non-blocking event loop based processing. Then Kellan and Rabble goes on to present how the pub-sub model over XMPP/Jabber can be hijacked to implement Web scale message passing and thereby getting rid of the polling nightmare. When we are talking XMPP, we are talking XML payloads, and this makes perfect sense when you are operating with undefined (or loosely defined) endpoints, high latency and low bandwidth systems. Again a good mix for all the social mashups out there ..

Take a hard look at your usecase ..

While polling sucks for the FriendFeed-Flickr usecase that Rabble and Kellan discussed, polling has its legitimate and rational usecases too. RSS and Atom feeds are ubiquitous today, while very few clients support XMPP. After all, when I have a huge number of clients interested in my feed, it makes perfect sense to go for the polling model for all my clients. Google Reader has been doing this day in and day out.

In another related presentation Open Source XMPP for Cloud Services, Matt Tucker of Jive Software, also discussed the virtues of XMPP based message passing paradigm for handling complex cloud services. While comparing alternatives like SOAP, he talked about performance optimizations that XMPP based services offer through long lived persistent connections as opposed to overhead of establishing encryption and repeated authentication in polling based systems. Well, I am not sure, what impact will there be on the system through 1 million persistent connections - I would love to get some ground realities from people who are actually using it. Matt demonstrated how easily an XMPP based system can be implemented using open source OpenFire XMPP server and a host of client APIs available for a multitude of languages.

But that's really one part of the story, that, at best, adds another dimension to our thought process when designing services for the Web. You need to define what scalability and performance optimizations mean for you. What Google can achieve through Bigtable and MapReduce makes no sense for Twitter. DareObasanjo has a great piece on this subject of ubiquitous scalability.

Whether it's 100% RESTful or not is purely driven by requirements ..

Architecture changes when you have defined end points, low latency and high bandwidth networks to play with. You can afford to enforce constraints over your end points and have binary payloads like Protocol Buffer or Thrift that need a specific subset of programming languages and runtimes. Google and Facebook have been using them with great success. When you need to process lots of data and perform data intensive computations, it makes sense to use the binary wire format instead of XML.

So options start getting multiplied as to whether you need to use polling based RESTful Web services, or message oriented XMPP services or the more classical distributed computing model of using RPC through binary serialization. OSCON 2008 had quite a few sessions that discussed many of these cloud computing infrastructure components - the key message was, of course, one size doesn't fit all. And the Web architecture is getting complicated by the day.

Wednesday, July 23, 2008

OSCON 2008 (Day 1 and 2) - Great Tutorial Sessions

It has been a typical bootstrapping for the first 2 days at OSCON 2008. The tutorial sessions have started as per schedule on the usual low-key note, the irc channels yet to set up the buzz tone and all the pubs yet to get heated up with the latest of the conference musings. For the early comers, this is the time to start getting soaked into the spirit of the conference and build up the setting for the next 3 days of fun, frolic and frenzy.

I had registered for 4 tutorial sessions.

OSCON day 1 for me began with the early registration, the usual conference style breakfast and then a jump into the "Python in 3 hours" tutorial. This session was perfect in setting up the tone for Python non-programmers. The initial part of the tutorial was quite slow paced and focused much more on the basics of programming than what the time demanded. As a result, the tutorial faced lots of time pressure towards the end - hence we missed out on the more Pythonesque stuff that I had expected to see as part of this tutorial. Otherwise, it was managed well, Steve Holden is a nice speaker and had all things under control for tutoring on the basics of a language. It's not always that easy to control the entropy of an audience so varied in profile and skill level while teaching on the basics of a programming language.

The second pick for me was "Memcached and MySQL: Everything You Need to Know", which ended up being a great one for me. Both the speakers had been working on memcached codebase for quite some time, I guess. Hence lots of implementation issues were discussed and sorted out with expected seamlessness. In fact after the tutorial I heard some people complaining about the too-much-implementation-level-detail orientation of the session. I liked it though and it gave pointers to many of the memcached internals viz. memory allocation (slab allocators), consistent hashing (the secret sauce of memcached) and lots of tuning and stats tips.

OSCON day 2 began with the tutorial on "Ubiquitous Multithreading for a Multicore World", where the Intel guys took us through the pains of managing shared state concurrency. They introduced Threading Building Blocks (TBB) that offer parallelism within the C++ programming model. TBB offers a higher level task based parallelism that abstracts platform dependencies and threading implementations away from the programmer. TBB is offered as a library with GPLv2 licensing and comes bundled with lots of C++ compilers today. Being a higher level of abstraction, TBB will make concurrent programming easier in C++ than what it is today. But given the clumsiness and verbosity of C++ syntax, and the emergence of newer languages and concurrency paradigms, I have strong doubts on its acceptability to mainstream development community. One other interesting stuff that the Intel guys discussed was the lambda expression implementation in Intel C++ compiler, also part of the official C++ specification coming soon to an implementation near you. A lambda expression in C++ will have to be implemented as an object, which can contain references to local and global variables of the enclosing context. C++, not being a garbage collected language, needs to implement a copy-semantics for all such context elements. It is not yet clear to me how non-copyable objects will be handled in such cases .. need to look up more for implementation details. Overall the tutorial session was informative and it was good to see C++ getting a facelift through some positive movement forward.

The second tutorial that I attended was "An Introduction to Actors for Performance, Scalability and Resilience", conducted by Steven Parkes. The session began with the discussion of the actor model as implemented in Erlang. Steven was thorough in his exercise of designing real life problems modeled with actor based concurrency, implemented first using Erlang and then in Ruby using Dramatis actor library. A straight mapping of actors to native threads does not scale - Dramatis implementation also does not usually dedicate a thread to an actor. The receive is implicit, much like Erlang/OTP's gen_server, though it was not clear from the discussion regarding the exact payload of the actor model implementation in dynamic OO languages like Ruby and Python. Steven also mentioned that there is nothing implicitly functional about actors and that he chose to implement the imperative way with a nice OO scaffolding around the actor interface. Scala also provides a similar interface of objects with event based implementation - it will be a very interesting exercise to benchmark how much all the actor implementations scale under heavy concurrent throttling. The Erlang implementation still stands out tall and large as far as the seamless distribution and fault tolerance of the actor model is concerned. None of the other implementations can be made to distribute as easily as the Erlang model.

Friday, July 18, 2008

My Picks for OSCON 2008

Yay! I will be in OSCON 2008. Along with the growth of the open source ecosystem, OSCON has also grown up to the diversities of multitude of technologies. And this year, they are celebrating the tenth anniversary of promoting open source.

I will be in Portland from 21st till 25th of July as a part of this celebration. I will be attending 4 tutorials and a number of regular sessions. Each of them will be great inductions to some of the newer technologies that I have been planning for quite some time to get into. However, I will be focusing on 2 primary areas as my main takeaways from this conference ..

Scaling out with large datasets

Indoctrinated in the basics of relational databases, normalization and joins, I find that almost all of the Web 2.0 scalability stories are taking alternate routes. Have a look at Google BigTable, Amazon SimpleDB and Facebook Cassandra - all of them have got around the scalability problems of normalized relational databases through denormalization and usage based data models, replication among multiple nodes and delayed eventual consistency. Processing large tables with SQL based queries are being replaced by map-reduce jobs that operate on tuple models and enforce data integrity through code rather than database constraints.

OSCON 2008 has a number of sessions that deal with this topic of scalability in the cloud ..

Cloud Computing with bigdata

CouchDB from 10,000 ft

Processing Large Data with Hadoop and EC2

HDFS Under the Hood

Cloud Computing with Persistent Data: Pushing the Envelope of Amazon Web Services

Scale into the Cloud with Open Source

Asynchronous Messaging as the enabler of enterprise architectures

Today's common wisdom is that asynchronous messaging enables loose coupling in distributed architectures. And this has led to the recent popularity of the actor model of programming which relies on message-send and receive as the basic constructs of the programming model. Erlang has been encouraging this model for years, though only recently has started getting the mindshare and marketshare of the mainstream. Scala also offers great support for actor model of programming that makes your application scale so easily.

Along with the growing popularity of message queueing frameworks, we are looking at recent implementations of the Extensible Messaging and Presence Protocol (XMPP) and the Advanced Message Queueing Protocol (AMQP) that offer multiple messaging solutions to connect across distributed systems.

OSCON 2008 has multiple sessions and BOFs that discuss this paradigm at length ..

Beyond REST? Building Data Services with XMPP PubSub

Beautiful Concurrency with Erlang

XMPP/Open Source Components for Cloud Services

And this is just the beginning of the fun. There will be a number of events, parties, fun and frolic centered around open source innovation. I am really excited to be a party to it.

See you all in Portland !