How to package a python application to make it pip-installable

Python scripts are usable for a lot of stuff, i.e. fetching tha latest posts from /r/python or counting the total number of lines in your project folder. However, remembering where a script is placed and having to type

python dest/to/script/myscript.py

every time can be a lot of pain. Of course, you can make an alias so you only have to type

myscript

to run it, but it would be much nicer to make the script available from any computer, any place, and in addition possibly helping others with the same problem.

It only takes three simple steps to make you python app pip-installable.

  1. Write your script or application
  2. Add a setup script
  3. Upload to PyPI

After this, you and anyone else can install it by typing:

pip install my_awesome_script

Write your script

The script can really simple or big and advanced. We’ll cover the most basic example

#!/usr/bin/env python
print "Hello World"

You can name the script anything, but this name will be the name you’ll be typing every time, so make sure it’s not to difficult. We’ll name our script 

helloworld

.

Add a setup script

The 

setup.py

 file is is the centre of building, distributing and installing modules using the Distutils.

from setuptools import setup

setup(
name=‘my-awesome-helloworld-script’, # This is the name of your PyPI-package.
version=‘0.1’, # Update the version number for new releases
scripts=[‘helloworld’] # The name of your scipt, and also the command you’ll be using for calling it
)

Optional: We can now package the script using python setup.py sdist. This will create a dist folder containing all your distributions. After unpacking the distribution file, you can simply install it using sudo python setup.py install.

Upload to PyPI

First, you need to register the package on PyPi. This is simply done by typing python setup.py register. If you haven’t registered a package from this computer before, you’ll be prompted with this message:

$ python setup.py register
running register
We need to know who you are, so please choose either:
1. use your existing login,
2. register as a new user,
3. have the server generate a new password for you (and email it to you), or
4. quit
Your selection [default 1]:
...

Once this is done, register will ask you if you want to save your login information in the .pypirc file. By default, this will store the login name and the password. The next step is to upload your package. Just type 

python setup.py sdist upload

, and the package is now available on PyPI! You can save a few keystrokes by doing it all in one command: 

python setup.py register sdist upload

.

Install using pip

Finally, you can now install you package with 

pip install my-awesome-helloworld-script

. You probably want to 

sudo

 that line. You can now run the script with:

$ helloworld
Hello world!

 


Add Insecure Registry to Docker

To add an insecure docker registry, add the file /etc/docker/daemon.json with the following content

Then restart docker.


A Fast, Flexible JSON Library for Java

Don’t we already have objectMapper.readValue? Why another JSON library? I found myself stuck in a couple of situations, and jsoniter (json-iterator) really worked for me:

  • Working with PHP: You want to int, so they might give you 100 or “100”. You want an object, so they might give you [] as an empty object.
  • Parsing large JSON streams: Parsing a large stream of JSON, then extracting only what you need from the jungle.
  • Cannot bind: The JSON is organized in a key/value form, and it cannot bind directly to my object model.

Also, jsoniter is much much faster than existing libraries (jackson, gson, you name it). Third party benchmarking is welcome. Here are the shameless self-benchmarking results for data binding 1kb of JSON:

Image title

The advantages of jsoniter come from these innovations:

  • Any type: Capture raw bytes as the Any type. Parsing is done lazily. Any can be used like a PHP array or JavaScript Object and be weakly typed.
  • Iterator abstraction: Take the JSON input stream as an iterator like object. You can walk through the graph in a streaming way, just like iterating collections. It’s similar to the gson API, but greatly simplified.
  • Trie-tree: The biggest drawback (and maybe its biggest benefit) of JSON is the string typed field name. It is time-consuming to bind object fields by comparing strings. Jsoniter uses tri-tree to boost the performance.
  • Code generation: All decoder/encoder logic can be code generated. You have plenty of options available, such as reflection/dynamic codegen/static codgen.
  • Only pay for the feature you want: Taking InputStream as an input is slower than byte[]. Traditional parsers use a virtual method or feature flags to generalize, which is a performance killer. Jsoniter uses dynamic class shadowing to switch implementations.
  • Required field validation: When you parse an object of int field. You can not tell the field is zero because no input from JSON or the field is indeeded specified as zero. Jsoniter implemented required field tracking using bit mask, now you can know.

A lot of work has been done to make sure jsoniter is the fastest out there. Benchmarking aside, what most people truly want is to get their job done fast. Here is an example to show you how flexible the API is:

[1024, {"product_id": 100, "start": "beijing"}]
["1025", {"product_id": 101, "start": "shanghai"}]
// many many more lines

Each line is an object. The first element is the order ID, and the second element is the order details. Notice:

  • There are many lines, and reading them all in once will lead to memory issues.
  • Some order IDs are ints and some are strings. This is very common when working with PHP.
  • The order details have many fields and need object binding.

In 6 lines, we have solved all the problems.

JsonIterator iter = JsonIterator.parse(input); // input stream
OrderDetails orderDetails = new OrderDetails(); // reused
while(iter.whatIsNext() != ValueType.INVALID) {
    Any order = iter.readAny(); // lazy
    int orderId = order.toInt(0); // weakly typed
    String start = order.get(1).bindTo(orderDetails).start; // data binding
}
  • JsonIterator.parse takes InputStream as the input and parses everything in a stream.
  • ReadAny returns an instance of Any. The parsing is lazily done when actually getting the field, which makes it simple and performant.
  • BindTo(orderDetails): Data binding can reuse existing objects

This example is just a demo of the flexibility. It might seems overly complex, it will be handy when you need it though. For everyday use, just remember two lines:

JsonIterator.deserialize("[1,2,3]"); // JSON => object
JsonStream.serialize(new int[]{1,2,3}) // object => JSON

I hope you are interested. This library is new, so bug reports or pull requests should be submitted to https://github.com/json-iterator/java. The Golang version will be available soon.

 

PS: This is not my original writing. Blog post completely copied from dzone.com/articles/dealing-with-json-in-a-new-way for my future reference. All credit goes to the original author.


[Mac] Reset MySQL Root Password

Reset your MySQL ‘ROOT’ password —

1.  Stop the mysqld server.  Typically this can be done by from ‘System Prefrences’ > MySQL > ‘Stop MySQL Server’

2.  Start the server in safe mode with privilege bypass

From a terminal:

sudo /usr/local/mysql/bin/mysqld_safe –skip-grant-tables

3.  In a new terminal window:

sudo /usr/local/mysql/bin/mysql -u root

UPDATE mysql.user SET authentication_string = PASSWORD(‘MyNewPass’), password_expired = ‘N’ WHERE User = ‘root’ AND Host = ‘localhost’;

FLUSH PRIVILEGES;

\q

4.  Stop the mysqld server again and restart it in normal mode.


Serving multiple Titan graphs over Gremlin Server (TinkerPop3)

A detailed walkthrough on how to properly configure Gremlin Server to expose multiple graphs using Titan v1.0.0 graph database.

Define each Titan graph storage and indexing backends

Within the context of Titan graph database, there are two important things to have in mind when configuring graph backends and external indexing to work with multiple graphs.

Assuming a single storage backend cluster, you’ll be required to define distinct Cassandra keyspaces or HBase table names for each graph.  Assuming a single indexing backend such as an Elasticsearch cluster, make sure you configure each graphs with distinct index names.

Let’s define two graphs, “movies” and “music” stored in the same Cassandra cluster within distinct keyspaces and indexed in the same Elasticsearch cluster in distinct indexes.

Define the first graph, ‘movies.properties’

gremlin.graph=com.thinkaurelius.titan.core.TitanFactory

storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.cassandra.keyspace=movies

index.movies.backend=elasticsearch
index.movies.hostname=127.0.0.1

And the second graph ‘music.properties’

gremlin.graph=com.thinkaurelius.titan.core.TitanFactory

storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.cassandra.keyspace=music

index.music.backend=elasticsearch
index.music.hostname=127.0.0.1

You’re not required to give keyspaces and index names the exact same name as your graphs, though it may be easier for keeping track of things.
Depending on your needs, you’re obviously free to store each graph on distinct storage and/or index clusters. Supplying distinct keyspaces/table names and index names may then become optional.

Please refer to the Chapter 12 — Configuration reference in the Titan documentation for further information on how to configure storage and indexing backends.

Configure Gremlin Server to initialize the graphs at launch

The next step consists in editing the Gremlin server configuration file, located in conf/gremlin-server/gremlin-server.yaml, to point to each of the graph .properties files. This configuration file defines a graphs property as a list of graphs with their corresponding .properties file. An example configuration for two graphs could be:

graphs: {
movies: conf/gremlin-server/movies.properties,
music: conf/gremlin-server/music.properties
}

This will expose two graphs respectively referenced by the movies and music variables within the Gremlin script execution context. Then again, the variable names are not required to match the names of the graphs as defined in the .properties file, but we’ll do so for simplicity.

Reference each graph Traversal object in the Gremlin Server .groovy bootstrap script

After exposing your graphs as movies and music variables, you’re almost done. You must now update the Gremlin server bootstrap script located in scripts/empty-sample.groovy in order to define references to each graph’s Traversal object (the path to this script is also defined in the gremlin-server.yaml file and can be edited). Because we no longer expose a graphvariable but movies and music graph variables, the empty-sample.groovyfile should now look like this:

mo = movies.traversal()
mu = music.traversal()

So the ’empty-sample.groovy’ file should look something like this

// an init script that returns a Map allows explicit setting of global bindings.
def globals = [:]

// defines a sample LifeCycleHook that prints some output to the Gremlin Server console.
// note that the name of the key in the “global” map is unimportant.
globals << [hook : [ onStartUp: { ctx ->
ctx.logger.info(“Executed once at startup of Gremlin Server.”)
},
onShutDown: { ctx ->
ctx.logger.info(“Executed once at shutdown of Gremlin Server.”)
}
] as LifeCycleHook]

// define the default TraversalSource to bind queries to – this one will be named “g”.
globals << [g : graph.traversal(), mo : movies.traversal(), mu : music.traversal()]

 

Since TinkerPop3, graph traversals are no longer issued via a Graph instance. The default empty-sample.groovy script mimics the old TinkerPop 2.x behavior where a graph traversal would typically start with g. Because we now have two graphs, we must bind each graph’s Traversal object to distinct variables. Let’s call these mo and mu. The above initialization script will allow you to execute graph traversals such as mo.V() for the movies graph or mu.V() for the music graph, as defined in the gremlin-server.yaml file.

Putting this into practice: interacting with multiple graphs within the same Gremlin query

A nice side-effect of this approach is that you can now query multiple graphs within the same Gremlin query. You could then easily setup simple scripts for migrating moderately sized graphs from one database implementor to another.


This example is not limited to Titan graph database and can be tweaked to serve multiple graphs from a combination of any other graph databases implementing the TinkerPop framework such as JanusGraph, ArangoDB, OrientDB or Neo4j.

ps: This is a buch of information I saved over time into my tiddlyWiki.


Install scoop package manager for windows

Scoop is a command line package manager for windows that will let you easily download the tools we need. Open a powershell terminal and run:

set-executionpolicy remotesigned -s cu

iex (new-object net.webclient).downloadstring('https://get.scoop.sh')

You can install tools like cURL, nodejs etc.. using scoop.

scoop install nodejs
scoop install curl

Pretty Print JSON from terminal

Have Python installed.

cat file.json | python -m json.tool > pretty_file.json

Python comes built in with a JSON encoding/decoding library, and you can use it to your advantage to get nice formatted output. Alternatively, if you are receiving JSON from an API or HTTP request, you can pipe your results from a curl call directly into this tool as well.


Creating a sitemap for your WordPress blog

Sitemaps expedite this process by providing search engine robots with a detailed map of your site. Instead of having to crawl your site manually by finding internal links to all your content, the crawler can instantly know where every public page in your website is located.

While by no means a magic bullet for SEO, sitemaps will improve the indexing of your site. And that means it’s more likely that all your posts and pages get included in search results.

They also allow you to provide search engines with optional information like when a page was last updated, how often a page changes, and how important a page is. This information can further help search engines optimize how they crawl your site.

XML vs HTML Sitemaps

XML sitemaps are the most common implementation. They’re exactly what I discussed above – a map dedicated almost entirely to search engines. Their data isn’t really useful to humans, so the only reason to create one is to boost your indexing.

HTML sitemaps, on the other hand, can be used by both humans and search engines. It’s an actual page on your website where humans and search engines can get a high-level overview of all the content your site offers. Search engines can still crawl this page, but it also gives some curious visitors a better user experience.

So which type should you use?

The answer is BOTH. It’s not an either/or question. They don’t conflict and both types offer benefits. If you don’t believe me, check out former Google SEO guru Matt Cutts talking about why you should include both XML and HTML sitemaps:

XML Sitemap – plugins

Given the popularity of SEO plugins, there’s a good chance you already have all the necessary functionality to create an XML sitemap. If you use JetpackYoast SEO, All in One SEO, Google Sitemaps, or SEOPressor, you just need to find the relevant plugin setting to set up your XML sitemap.

1. Jetpack

Jetpack allows you to generate such files thanks to the Sitemaps module.

If you already have the plugin activated & running, ensure you are using the latest version of Jetpack.

Now, head to Jetpack > Settings > Engagement & look for module call Sitemaps & activate it.

Unlike other sitemap plugins for WordPress, here you don’t need to configure anything. This is a limitation for big WordPress sites but a new or medium size blog; you don’t need to configure anything apart from submitting your sitemap to search engines.

Once you activate the module, Jetpack will generate two different sitemaps for you: a sitemap listing all your public posts and pages, and a News sitemap built specifically for Google News:

  • Normal sitemap: yoursitename.com/sitemap.xml
  • News Sitemap: yoursitename.com/news-sitemap.xml

Sitemap generated by Jetpack WordPress

2. Yoast SEO

Yoast SEO you just need to navigate to SEO → XML Sitemaps to enable and configure it:

yoast-xml-sitemap

One nice feature with Yoast SEO’s sitemap tool is the ability to include media attachments in your XML sitemap:

yoast-xml-sitemap-image

When enabled, this can boost your traffic from Image Search by increasing the indexing of your media uploads.

404 Errors:

How do I know if I need to add the redirect rules?

The Yoast SEO XML sitemap URL uses a pretty permalink of example.com/sitemap_index.xml but, behind the scenes, this URL also has a non-pretty permalink of example.com/?sitemap=1. If you can load and see the sitemap using the non-pretty permalink, your server is not setup to redirect and, thus, you’ll need to add redirect rules.

  • Using NGINX:
    In some cases, you may need to add server level redirects if you receive an NGINX server error or a wrong page when loading the XML sitemaps.Basic Code
    # Rewrites for Yoast SEO XML Sitemap
    rewrite ^/sitemap_index.xml$ /index.php?sitemap=1 last;
    rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 last;

    Expanded Code
    #Yoast SEO Sitemaps
    location ~ ([^/]*)sitemap(.*).x(m|s)l$ {
    ## this redirects sitemap.xml to /sitemap_index.xml
    rewrite ^/sitemap.xml$ /sitemap_index.xml permanent;
    ## this makes the XML sitemaps work
    rewrite ^/([a-z]+)?-?sitemap.xsl$ /index.php?xsl=$1 last;
    rewrite ^/sitemap_index.xml$ /index.php?sitemap=1 last;
    rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 last;
    ## The following lines are optional for the premium extensions
    ## News SEO
    rewrite ^/news-sitemap.xml$ /index.php?sitemap=wpseo_news last;
    ## Local SEO
    rewrite ^/locations.kml$ /index.php?sitemap=wpseo_local_kml last;
    rewrite ^/geo-sitemap.xml$ /index.php?sitemap=wpseo_local last;
    ## Video SEO
    rewrite ^/video-sitemap.xsl$ /index.php?xsl=video last;
    }
  •  Using Apache:

    You should go to your .htaccess file ( How to Guide available here.) and add the following code before the main WordPress rewrite rules:
    # Yoast SEO - XML Sitemap Rewrite Fix
    RewriteEngine On
    RewriteBase /
    RewriteRule ^sitemap_index.xml$ /index.php?sitemap=1 [L]
    RewriteRule ^locations.kml$ /index.php?sitemap=wpseo_local_kml [L]
    RewriteRule ^geo_sitemap.xml$ /index.php?sitemap=geo [L]
    RewriteRule ^([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 [L]
    RewriteRule ^([a-z]+)?-?sitemap.xsl$ /index.php?xsl=$1 [L]
    # END Yoast SEO - XML Sitemap Rewrite Fix

3. All in One SEO

With All in One SEO plugin installed, just navigate to All in One SEO → XML Sitemap:

all-in-one-seo-sitemap

4. Google XML sitemap

After you install the plugin, you can configure it by going to Settings → XML-Sitemap:

xml-sitemap-generator-plugin

Here’s what you’ll definitely want to configure on that page:

  • Post priority: Set how you want to calculate posts’ crawling priorities. You can have the plugin automatically calculate priority by the number of comments, or manually set the priority later on.
  • Sitemap content: Choose what types of content get included in your sitemap. For example, if you want to exclude category archives, you just need to uncheck that box.
  • Change frequencies: Set how often each type of content gets changed. This gives search engines an idea of how to prioritize their crawling. For example, you’ll definitely want to set the page which displays your recent posts to be crawled daily.
  • Priorities: Lets you set manual crawling priorities for different content. You definitely want your homepage and posts page (if different) to be high priority.

404 Errors:

  • Using NGINX:
    In some cases, you may need to add server level redirects if you receive an NGINX server error or a wrong page when loading the XML sitemaps.Basic Code
    # Rewrites for Yoast SEO XML Sitemap

 

 

Tell Search Engines About Your XML Sitemap

Now that you’ve created your sitemaps, there’s one more thing you need to do:

Tell search engines exactly where they can find it.

By showing search engines where you keep your sitemap, you ensure they find it, which means they’ll know whenever you publish changes to it and your site.

To submit your sitemap to Google, you’ll need to sign up at Google Search Console and follow their instructions for submitting a sitemap.

The process is quite similar for submitting your sitemap to Bing. You’ll need to sign up at Bing Webmaster Tools and then submit your sitemap by following their directions.


Reading an mbox file with Thunderbird

thunderbird

I’ll spare you the Thuderbird installation and first-time setup instructions

1. Download, launch Thunderbird

http://www.mozilla.org/en-US/thunderbird/

2. Mail Account Setup

https://support.mozilla.org/en-US/products/thunderbird/emails-thunderbird

3. Find your Thunderbird “Local Folders” Directory

Click on “Local Folders” and then “View Settings for this account” to see where it is looking for local mail folders. Something like this:

C:\Users\harsha\AppData\Roaming\Thunderbird\Profiles\\Mail\Local Folders

(You may have to allow “Show hidden files, folders, and drives”) 

on your Mac

/Users/harsha/Library/Thunderbird/Profiles//Mail/Local Folders

4. Drop the .mbox file into Local Folders

Quit Thunderbird. Navigate to your “Local Folders” directory. Drag and drop your mbox file in there. You may change the name to something appropriate (something like mbox-harsha-20161009-gmail).

5. Browse mbox

Restart thunderbird, and you should see the mailbox in your Local Folders list.

Feel free to leave comments.