Posts tagged with: tinkerpop3

Serving multiple Titan graphs over Gremlin Server (TinkerPop3)

A detailed walkthrough on how to properly configure Gremlin Server to expose multiple graphs using Titan v1.0.0 graph database.

Define each Titan graph storage and indexing backends

Within the context of Titan graph database, there are two important things to have in mind when configuring graph backends and external indexing to work with multiple graphs.

Assuming a single storage backend cluster, you’ll be required to define distinct Cassandra keyspaces or HBase table names for each graph.  Assuming a single indexing backend such as an Elasticsearch cluster, make sure you configure each graphs with distinct index names.

Let’s define two graphs, “movies” and “music” stored in the same Cassandra cluster within distinct keyspaces and indexed in the same Elasticsearch cluster in distinct indexes.

Define the first graph, ‘movies.properties’

gremlin.graph=com.thinkaurelius.titan.core.TitanFactory

storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.cassandra.keyspace=movies

index.movies.backend=elasticsearch
index.movies.hostname=127.0.0.1

And the second graph ‘music.properties’

gremlin.graph=com.thinkaurelius.titan.core.TitanFactory

storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.cassandra.keyspace=music

index.music.backend=elasticsearch
index.music.hostname=127.0.0.1

You’re not required to give keyspaces and index names the exact same name as your graphs, though it may be easier for keeping track of things.
Depending on your needs, you’re obviously free to store each graph on distinct storage and/or index clusters. Supplying distinct keyspaces/table names and index names may then become optional.

Please refer to the Chapter 12 — Configuration reference in the Titan documentation for further information on how to configure storage and indexing backends.

Configure Gremlin Server to initialize the graphs at launch

The next step consists in editing the Gremlin server configuration file, located in conf/gremlin-server/gremlin-server.yaml, to point to each of the graph .properties files. This configuration file defines a graphs property as a list of graphs with their corresponding .properties file. An example configuration for two graphs could be:

graphs: {
movies: conf/gremlin-server/movies.properties,
music: conf/gremlin-server/music.properties
}

This will expose two graphs respectively referenced by the movies and music variables within the Gremlin script execution context. Then again, the variable names are not required to match the names of the graphs as defined in the .properties file, but we’ll do so for simplicity.

Reference each graph Traversal object in the Gremlin Server .groovy bootstrap script

After exposing your graphs as movies and music variables, you’re almost done. You must now update the Gremlin server bootstrap script located in scripts/empty-sample.groovy in order to define references to each graph’s Traversal object (the path to this script is also defined in the gremlin-server.yaml file and can be edited). Because we no longer expose a graphvariable but movies and music graph variables, the empty-sample.groovyfile should now look like this:

mo = movies.traversal()
mu = music.traversal()

So the ’empty-sample.groovy’ file should look something like this

// an init script that returns a Map allows explicit setting of global bindings.
def globals = [:]

// defines a sample LifeCycleHook that prints some output to the Gremlin Server console.
// note that the name of the key in the “global” map is unimportant.
globals << [hook : [ onStartUp: { ctx ->
ctx.logger.info(“Executed once at startup of Gremlin Server.”)
},
onShutDown: { ctx ->
ctx.logger.info(“Executed once at shutdown of Gremlin Server.”)
}
] as LifeCycleHook]

// define the default TraversalSource to bind queries to – this one will be named “g”.
globals << [g : graph.traversal(), mo : movies.traversal(), mu : music.traversal()]

 

Since TinkerPop3, graph traversals are no longer issued via a Graph instance. The default empty-sample.groovy script mimics the old TinkerPop 2.x behavior where a graph traversal would typically start with g. Because we now have two graphs, we must bind each graph’s Traversal object to distinct variables. Let’s call these mo and mu. The above initialization script will allow you to execute graph traversals such as mo.V() for the movies graph or mu.V() for the music graph, as defined in the gremlin-server.yaml file.

Putting this into practice: interacting with multiple graphs within the same Gremlin query

A nice side-effect of this approach is that you can now query multiple graphs within the same Gremlin query. You could then easily setup simple scripts for migrating moderately sized graphs from one database implementor to another.


This example is not limited to Titan graph database and can be tweaked to serve multiple graphs from a combination of any other graph databases implementing the TinkerPop framework such as JanusGraph, ArangoDB, OrientDB or Neo4j.

ps: This is a buch of information I saved over time into my tiddlyWiki.