DataStax Day London 2016 – taking Blockchain and Graph forward

As part of our partnership with DataStax we were very kindly invited to exhibit at the DataStax Day conference in London this week. A lot of the sessions centred around one of the major new features of DataStax Enterprise 5.0 — its graphing capability. This allows us to store data at Cassandra scale, but also to model and query the data as a graph of nodes (vertices) and connections (edges). With the focus of the day so squarely on Graph, I decided to reprise my earlier Ethereum-graph prototype by marrying two of our partners’ platforms — DataStax Enterprise and Chain.com.

A while ago I wrote a small trading application to simulate six agents trading an asset stored at chain.com. The agents trade with each other then submit their transactions down to the blockchain using the following code:

1.png

The above API is very flexible. The transaction request allows us to specify any number of inputs and any number of outputs. These are effectively changes to (fungible) asset balances within chain.com’s UTXO model. This enables us to model any kind of exchange of assets without being prescriptive about how transactions should be structured.

I then use some of Chain’s other APIs to listen to the transactions as they are committed to the blockchain to simulate events (which I am assured is a roadmap feature for Chain). As I receive the commit events from Chain, I then execute the following Gremlin code:

2.png

Contrast this with the Cypher code I wrote a while ago to persist to Neo4j:

3.png

For me, Neo4j’s approach feels a little more declarative and SQL-like than DSE’s Gremlin syntax, which will be more familiar to someone working in Java or Groovy. The Gremlin DSL gives you the feeling you are manipulating a graph of nodes, which is important for retrieval (especially when retrieving vertexes of differing types).

Originally, I had modelled the trades as edges between account vertices. However, I soon encountered a fundamental design principle of DSE Graph — that it’s not possible to query based on edge properties, which means I would need to pull the trade up into a vertex (it also proves that I should take my own advice and design data models based on the required access patterns!). This has a number of interesting (and helpful) side effects:


  • Manipulating graph data is somewhat decoupled from the underlying persistence model. This means that should we need to add more complex trades using the same domain model, (e.g. as a multi-party asset swap/ butterfly, or a block trade), we can add them into and retrieve them from the graph without any invasive changes to the schema.
  • It enables walking the tree of buyers and sellers involved in trades (which could be used to identify or investigate suspicious trading activity such as price collusion)

Consider how such a relationship might be modelled in a traditional relational database approach: a trade would need a buyer and a seller field. These would likely be foreign keys into some account/trader/counterparty table. In order to view a trade plus its seller and buyer information, one would need to join trades to the accounts and then back again. Worse, as an account could be either a buyer or a seller, if we wanted to view the trades that an account had taken any part in, we would need to join accounts to trades and ‘union all’ the same SQL again while changing the join field from buyer to seller.

Compare this with the graph model of retrieving all accounts involved in a trade:

4.png

The new notebooks functionality in DSE 5 further improves the developer experience. It includes at least one “hidden” feature that Jonathan Lacefield (DSE Graph product manager), showed me — Code Assist. Within a piece of Gremlin in a notebook section one can type Ctrl+Space after a dot to list the allowed methods and their parameters on the current type, much as one would in Eclipse or IntelliJ. Features like this are like gold dust to a developer, and it is conferences that let you pick up these kinds of useful hints.

The conference itself shows just how healthy the DataStax ecosystem is, with a large and diverse range of attendees both from within and outside of financial services. Our demo sparked so many interesting conversations in the exhibition hall that we will definitely be bringing along technical exhibits to many of our forthcoming conference appearances.

I see huge potential in graph databases — for many problems it allows one to model the data much closer to the required domain — blockchain being just one use case amongst many. Other graph use cases include situations such as entitlements — modelling the access rights of users across an enterprise. They are also great for cluster analysis, such as in fraud detection, or looking for investment opportunities.

Notebooks brings graph data to life in a way rarely seen in the relational, or even the big data sphere. The relationships are explicitly materialised in the data so the tools can leverage that directly for meaningful visualisations. The exciting thing is this is just the beginning - the DSE graph team mentioned all sorts of interesting directions they might take the product in future!

Please get in touch on twitter or DM me if you’d like to see more of the above code in context.

James Bowkett

A financial services software engineer specialist who always strives to incorporate the industry’s best practices and tools. With over a decade and a half of development experience using Java in an enterprise environment, he is now a principal consultant at Excelian Luxoft within the Technical Consulting practice. He is passionate about the wider software development community often attending community events and conferences (such as JAX London and QCon) and he is an Associate at the Graduate Developer Community helping to prepare undergraduates and graduates for their role in the IT industry.

Check all posts by James Bowkett

Related content