tag:blogger.com,1999:blog-22587889.post6027103926883320551..comments2024-02-11T13:21:47.930+05:30Comments on Ruminations of a Programmer: A Sketch as the Query Model of an EventSourced SystemAnonymoushttp://www.blogger.com/profile/01613713587074301135noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-22587889.post-85523096885650054372014-01-24T04:22:19.905+05:302014-01-24T04:22:19.905+05:30This already exist and is being used in production...This already exist and is being used in production. Take a look at Storehaus(https://github.com/twitter/storehaus) and specially MergeableStore. There are implementation available for most common data store and when you call insert a value, it uses a user provided monoid to merge the new value with the existing value. So you use CM Sketch, HyperLogLog or any monoid your heart desireAnonymoushttps://www.blogger.com/profile/08505006785205027715noreply@blogger.comtag:blogger.com,1999:blog-22587889.post-33697326702294264222014-01-23T22:21:12.801+05:302014-01-23T22:21:12.801+05:30+1 on your thoughts. Lambda architecture looks qui...+1 on your thoughts. Lambda architecture looks quite powerful and we are also in the process of implementing one. <br /><br />And clever data structures like Count Min Sketch, AMS Sketch, Bloom Filters (the counting version of BF is also a Sketch) provide the swiss army knife type of capabilities for certain use cases. <br /><br />In fact I see today charts and graphs are generated from an RDBMS where an in memory succinct data structure like a sketch would be much more powerful. Also sketches provide some benefits over sampling - hence I am really exploring the possibilities of using the power of sketches as part of lambda architecture.<br /><br />Another advantage with a sketch is the linearity property and the fact that they are associative. Think monoids and you have a great use case for distributed systems. Collect summaries from individual nodes and just mappend.Anonymoushttps://www.blogger.com/profile/01613713587074301135noreply@blogger.comtag:blogger.com,1999:blog-22587889.post-78750353638889525342014-01-23T02:55:18.754+05:302014-01-23T02:55:18.754+05:30Nice post. Consider an analytics system where you&...Nice post. Consider an analytics system where you're compuing real time report of fairly large data coming in as streams. You might need to compute heavy hitters for threshold detection using a CMS, discard bad ip addresses by using some bloom filters, and any other system. In the mean time you might want some correct, i.e non approximate results. The common way one doing this these days is to have a lambda architecture with hadoop for batched and exact views, and a real time layer with storm or any other stream processor. Or just use both shar and spark streaming, or summingbird for instance.<br />This architecture called lambda arcitecture, resembles a lot to an event sourcing architecture for me.<br />We dployed a similar system for real time risk system in an algo trading platform, and sketches where useful as the number of data sources was constantly growing, with data coming in at an un-even rate. <br />Sketches and approximate data structures in general, in the era of big data, are the equivalent of the swiss army knife for real time analytics.Sam BESSALAHhttps://www.blogger.com/profile/13266636197721230723noreply@blogger.com