Posts tagged with: howto

Serving multiple Titan graphs over Gremlin Server (TinkerPop3)

A detailed walkthrough on how to properly configure Gremlin Server to expose multiple graphs using Titan v1.0.0 graph database.

Define each Titan graph storage and indexing backends

Within the context of Titan graph database, there are two important things to have in mind when configuring graph backends and external indexing to work with multiple graphs.

Assuming a single storage backend cluster, you’ll be required to define distinct Cassandra keyspaces or HBase table names for each graph.  Assuming a single indexing backend such as an Elasticsearch cluster, make sure you configure each graphs with distinct index names.

Let’s define two graphs, “movies” and “music” stored in the same Cassandra cluster within distinct keyspaces and indexed in the same Elasticsearch cluster in distinct indexes.

Define the first graph, ‘movies.properties’

gremlin.graph=com.thinkaurelius.titan.core.TitanFactory

storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.cassandra.keyspace=movies

index.movies.backend=elasticsearch
index.movies.hostname=127.0.0.1

And the second graph ‘music.properties’

gremlin.graph=com.thinkaurelius.titan.core.TitanFactory

storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.cassandra.keyspace=music

index.music.backend=elasticsearch
index.music.hostname=127.0.0.1

You’re not required to give keyspaces and index names the exact same name as your graphs, though it may be easier for keeping track of things.
Depending on your needs, you’re obviously free to store each graph on distinct storage and/or index clusters. Supplying distinct keyspaces/table names and index names may then become optional.

Please refer to the Chapter 12 — Configuration reference in the Titan documentation for further information on how to configure storage and indexing backends.

Configure Gremlin Server to initialize the graphs at launch

The next step consists in editing the Gremlin server configuration file, located in conf/gremlin-server/gremlin-server.yaml, to point to each of the graph .properties files. This configuration file defines a graphs property as a list of graphs with their corresponding .properties file. An example configuration for two graphs could be:

graphs: {
movies: conf/gremlin-server/movies.properties,
music: conf/gremlin-server/music.properties
}

This will expose two graphs respectively referenced by the movies and music variables within the Gremlin script execution context. Then again, the variable names are not required to match the names of the graphs as defined in the .properties file, but we’ll do so for simplicity.

Reference each graph Traversal object in the Gremlin Server .groovy bootstrap script

After exposing your graphs as movies and music variables, you’re almost done. You must now update the Gremlin server bootstrap script located in scripts/empty-sample.groovy in order to define references to each graph’s Traversal object (the path to this script is also defined in the gremlin-server.yaml file and can be edited). Because we no longer expose a graphvariable but movies and music graph variables, the empty-sample.groovyfile should now look like this:

mo = movies.traversal()
mu = music.traversal()

So the ’empty-sample.groovy’ file should look something like this

// an init script that returns a Map allows explicit setting of global bindings.
def globals = [:]

// defines a sample LifeCycleHook that prints some output to the Gremlin Server console.
// note that the name of the key in the “global” map is unimportant.
globals << [hook : [ onStartUp: { ctx ->
ctx.logger.info(“Executed once at startup of Gremlin Server.”)
},
onShutDown: { ctx ->
ctx.logger.info(“Executed once at shutdown of Gremlin Server.”)
}
] as LifeCycleHook]

// define the default TraversalSource to bind queries to – this one will be named “g”.
globals << [g : graph.traversal(), mo : movies.traversal(), mu : music.traversal()]

 

Since TinkerPop3, graph traversals are no longer issued via a Graph instance. The default empty-sample.groovy script mimics the old TinkerPop 2.x behavior where a graph traversal would typically start with g. Because we now have two graphs, we must bind each graph’s Traversal object to distinct variables. Let’s call these mo and mu. The above initialization script will allow you to execute graph traversals such as mo.V() for the movies graph or mu.V() for the music graph, as defined in the gremlin-server.yaml file.

Putting this into practice: interacting with multiple graphs within the same Gremlin query

A nice side-effect of this approach is that you can now query multiple graphs within the same Gremlin query. You could then easily setup simple scripts for migrating moderately sized graphs from one database implementor to another.


This example is not limited to Titan graph database and can be tweaked to serve multiple graphs from a combination of any other graph databases implementing the TinkerPop framework such as JanusGraph, ArangoDB, OrientDB or Neo4j.

ps: This is a buch of information I saved over time into my tiddlyWiki.


Install scoop package manager for windows

Scoop is a command line package manager for windows that will let you easily download the tools we need. Open a powershell terminal and run:

set-executionpolicy remotesigned -s cu

iex (new-object net.webclient).downloadstring('https://get.scoop.sh')

You can install tools like cURL, nodejs etc.. using scoop.

scoop install nodejs
scoop install curl

Pretty Print JSON from terminal

Have Python installed.

cat file.json | python -m json.tool > pretty_file.json

Python comes built in with a JSON encoding/decoding library, and you can use it to your advantage to get nice formatted output. Alternatively, if you are receiving JSON from an API or HTTP request, you can pipe your results from a curl call directly into this tool as well.


Creating a sitemap for your WordPress blog

Sitemaps expedite this process by providing search engine robots with a detailed map of your site. Instead of having to crawl your site manually by finding internal links to all your content, the crawler can instantly know where every public page in your website is located.

While by no means a magic bullet for SEO, sitemaps will improve the indexing of your site. And that means it’s more likely that all your posts and pages get included in search results.

They also allow you to provide search engines with optional information like when a page was last updated, how often a page changes, and how important a page is. This information can further help search engines optimize how they crawl your site.

XML vs HTML Sitemaps

XML sitemaps are the most common implementation. They’re exactly what I discussed above – a map dedicated almost entirely to search engines. Their data isn’t really useful to humans, so the only reason to create one is to boost your indexing.

HTML sitemaps, on the other hand, can be used by both humans and search engines. It’s an actual page on your website where humans and search engines can get a high-level overview of all the content your site offers. Search engines can still crawl this page, but it also gives some curious visitors a better user experience.

So which type should you use?

The answer is BOTH. It’s not an either/or question. They don’t conflict and both types offer benefits. If you don’t believe me, check out former Google SEO guru Matt Cutts talking about why you should include both XML and HTML sitemaps:

XML Sitemap – plugins

Given the popularity of SEO plugins, there’s a good chance you already have all the necessary functionality to create an XML sitemap. If you use JetpackYoast SEO, All in One SEO, Google Sitemaps, or SEOPressor, you just need to find the relevant plugin setting to set up your XML sitemap.

1. Jetpack

Jetpack allows you to generate such files thanks to the Sitemaps module.

If you already have the plugin activated & running, ensure you are using the latest version of Jetpack.

Now, head to Jetpack > Settings > Engagement & look for module call Sitemaps & activate it.

Unlike other sitemap plugins for WordPress, here you don’t need to configure anything. This is a limitation for big WordPress sites but a new or medium size blog; you don’t need to configure anything apart from submitting your sitemap to search engines.

Once you activate the module, Jetpack will generate two different sitemaps for you: a sitemap listing all your public posts and pages, and a News sitemap built specifically for Google News:

  • Normal sitemap: yoursitename.com/sitemap.xml
  • News Sitemap: yoursitename.com/news-sitemap.xml

Sitemap generated by Jetpack WordPress

2. Yoast SEO

Yoast SEO you just need to navigate to SEO → XML Sitemaps to enable and configure it:

yoast-xml-sitemap

One nice feature with Yoast SEO’s sitemap tool is the ability to include media attachments in your XML sitemap:

yoast-xml-sitemap-image

When enabled, this can boost your traffic from Image Search by increasing the indexing of your media uploads.

404 Errors:

How do I know if I need to add the redirect rules?

The Yoast SEO XML sitemap URL uses a pretty permalink of example.com/sitemap_index.xml but, behind the scenes, this URL also has a non-pretty permalink of example.com/?sitemap=1. If you can load and see the sitemap using the non-pretty permalink, your server is not setup to redirect and, thus, you’ll need to add redirect rules.

  • Using NGINX:
    In some cases, you may need to add server level redirects if you receive an NGINX server error or a wrong page when loading the XML sitemaps.Basic Code
    # Rewrites for Yoast SEO XML Sitemap
    rewrite ^/sitemap_index.xml$ /index.php?sitemap=1 last;
    rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 last;

    Expanded Code
    #Yoast SEO Sitemaps
    location ~ ([^/]*)sitemap(.*).x(m|s)l$ {
    ## this redirects sitemap.xml to /sitemap_index.xml
    rewrite ^/sitemap.xml$ /sitemap_index.xml permanent;
    ## this makes the XML sitemaps work
    rewrite ^/([a-z]+)?-?sitemap.xsl$ /index.php?xsl=$1 last;
    rewrite ^/sitemap_index.xml$ /index.php?sitemap=1 last;
    rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 last;
    ## The following lines are optional for the premium extensions
    ## News SEO
    rewrite ^/news-sitemap.xml$ /index.php?sitemap=wpseo_news last;
    ## Local SEO
    rewrite ^/locations.kml$ /index.php?sitemap=wpseo_local_kml last;
    rewrite ^/geo-sitemap.xml$ /index.php?sitemap=wpseo_local last;
    ## Video SEO
    rewrite ^/video-sitemap.xsl$ /index.php?xsl=video last;
    }
  •  Using Apache:

    You should go to your .htaccess file ( How to Guide available here.) and add the following code before the main WordPress rewrite rules:
    # Yoast SEO - XML Sitemap Rewrite Fix
    RewriteEngine On
    RewriteBase /
    RewriteRule ^sitemap_index.xml$ /index.php?sitemap=1 [L]
    RewriteRule ^locations.kml$ /index.php?sitemap=wpseo_local_kml [L]
    RewriteRule ^geo_sitemap.xml$ /index.php?sitemap=geo [L]
    RewriteRule ^([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 [L]
    RewriteRule ^([a-z]+)?-?sitemap.xsl$ /index.php?xsl=$1 [L]
    # END Yoast SEO - XML Sitemap Rewrite Fix

3. All in One SEO

With All in One SEO plugin installed, just navigate to All in One SEO → XML Sitemap:

all-in-one-seo-sitemap

4. Google XML sitemap

After you install the plugin, you can configure it by going to Settings → XML-Sitemap:

xml-sitemap-generator-plugin

Here’s what you’ll definitely want to configure on that page:

  • Post priority: Set how you want to calculate posts’ crawling priorities. You can have the plugin automatically calculate priority by the number of comments, or manually set the priority later on.
  • Sitemap content: Choose what types of content get included in your sitemap. For example, if you want to exclude category archives, you just need to uncheck that box.
  • Change frequencies: Set how often each type of content gets changed. This gives search engines an idea of how to prioritize their crawling. For example, you’ll definitely want to set the page which displays your recent posts to be crawled daily.
  • Priorities: Lets you set manual crawling priorities for different content. You definitely want your homepage and posts page (if different) to be high priority.

404 Errors:

  • Using NGINX:
    In some cases, you may need to add server level redirects if you receive an NGINX server error or a wrong page when loading the XML sitemaps.Basic Code
    # Rewrites for Yoast SEO XML Sitemap

 

 

Tell Search Engines About Your XML Sitemap

Now that you’ve created your sitemaps, there’s one more thing you need to do:

Tell search engines exactly where they can find it.

By showing search engines where you keep your sitemap, you ensure they find it, which means they’ll know whenever you publish changes to it and your site.

To submit your sitemap to Google, you’ll need to sign up at Google Search Console and follow their instructions for submitting a sitemap.

The process is quite similar for submitting your sitemap to Bing. You’ll need to sign up at Bing Webmaster Tools and then submit your sitemap by following their directions.


Reading an mbox file with Thunderbird

thunderbird

I’ll spare you the Thuderbird installation and first-time setup instructions

1. Download, launch Thunderbird

http://www.mozilla.org/en-US/thunderbird/

2. Mail Account Setup

https://support.mozilla.org/en-US/products/thunderbird/emails-thunderbird

3. Find your Thunderbird “Local Folders” Directory

Click on “Local Folders” and then “View Settings for this account” to see where it is looking for local mail folders. Something like this:

C:\Users\harsha\AppData\Roaming\Thunderbird\Profiles\\Mail\Local Folders

(You may have to allow “Show hidden files, folders, and drives”) 

on your Mac

/Users/harsha/Library/Thunderbird/Profiles//Mail/Local Folders

4. Drop the .mbox file into Local Folders

Quit Thunderbird. Navigate to your “Local Folders” directory. Drag and drop your mbox file in there. You may change the name to something appropriate (something like mbox-harsha-20161009-gmail).

5. Browse mbox

Restart thunderbird, and you should see the mailbox in your Local Folders list.

Feel free to leave comments.