Friday, 19 September 2014

PyMongo almost 2x faster with PyPy

  1. Install PyPy and MongoDB with pacman under Arch Linux based distros:

    $ sudo pacman -Sy
    $ sudo pacman -S pypy mongodb
  2. Create a new virtualenv (see howto) and install PyMongo:

    $ mkvirtualenv testPyPy --python=/usr/bin/pypy
    $ pip install pymongo
  3. Start MongoDB:

    $ mkdir /home/mictadlo/databases/
    $ mongod --dbpath /home/mictadlo/databases/
  4. I found here a PyMongo benchmark code:

    import sys
    import os
    import pymongo
    import time
    import random
    
    from datetime import datetime
    
    min_date = datetime(2012, 1, 1)
    max_date = datetime(2013, 1, 1)
    delta = (max_date - min_date).total_seconds()
    
    job_id = '1'
    
    if len(sys.argv) < 2:
        sys.exit("You must supply the item_number argument")
    elif len(sys.argv) > 2:
        job_id = sys.argv[2]   
    
    documents_number = int(sys.argv[1])
    batch_number = 5 * 1000;
    
    job_name = 'Job#' + job_id
    start = datetime.now();
    
    # obtain a mongo connection
    connection = pymongo.Connection("mongodb://localhost", safe=True)
    
    # obtain a handle to the random database
    db = connection.random
    collection = db.randomData
    
    batch_documents = [i for i in range(batch_number)];
    
    for index in range(documents_number):
        try:           
            date = datetime.fromtimestamp(time.mktime(min_date.timetuple()) + int(round(random.random() * delta)))
            value = random.random()
            document = {
                'created_on' : date,   
                'value' : value,   
            }
            batch_documents[index % batch_number] = document
            if (index + 1) % batch_number == 0:
                collection.insert(batch_documents)     
            index += 1;
            if index % 100000 == 0:
                print job_name, ' inserted ', index, ' documents.'     
        except:
            print 'Unexpected error:', sys.exc_info()[0], ', for index ', index
            raise
    print job_name, ' inserted ', documents_number, ' in ', (datetime.now() - start).total_seconds(), 's'
  5. PyPy is almost 2x faster than CPython as the results below show

    • CPython:

      $ python --version
      Python 2.7.8
      
      $ time python test_pymongo.py 12000000
      Job#1  inserted  12000000  in  709.949361 s
      
      real    11m50.123s
      user    6m55.263s
      sys     0m48.803s
    • PyPy:

      $ python --version
      Python 2.7.6 (3cf384e86ef7, Jun 27 2014, 00:09:47)
      [PyPy 2.4.0-alpha0 with GCC 4.9.0 20140604 (prerelease)]
      
      $ time python test_pymongo.py 12000000
      Job#1  inserted  12000000  in  464.130798 s
      
      real    7m44.711s
      user    3m2.693s
      sys     0m41.667s

Virtualenvwrapper makes it easier to use Python's virtualenv

  1. Install the following packages. I still need to use Python 2. If you want to use Python 3 just remove the 2 below:
    $ sudo pacman -Sy
    $ sudo pacman -S python-virtualenvwrapper python2-virtualenv python2-pip
  2. Introduce virtualenv in your .bashrc:
    $ nano ~/.bashrc
    export WORKON_HOME=$HOME/.virtualenvs
    source /usr/bin/virtualenvwrapper.sh
  3. Create a project e.g. myProject in virtualenv:
    $ mkvirtualenv myProject --python=/usr/bin/python2
    $ pip install sphinx
  4. Other virtualenv commands:
  • $ workon myProject        # activate the virtualenv called myProject
  • $ deactivate              # deactivate the current virtualenv
  • $ rmvirtualenv myProject  # delete the current virtualenv

How to install Pandoc on Arch Linux based distros

  1. Add haskell-core was above extra in /etc/pacman.conf:
    $ sudo nano /etc/pacman.conf
    [haskell-core]
    Server = http://xsounds.org/~haskell/core/$arch
  2. Remove alex, ghc, happy and other haskell packages from the extra repository from your system
  3. Add key and install Pandoc and Texlive (needed to generate PDF):
    $ sudo pacman-key -r 4209170B
    $ sudo pacman-key --lsign-key 4209170B
    $ sudo pacman -Scc
    $ sudo pacman -Syy
    $ sudo pacman -S haskell-pandoc haskell-pandoc-citeproc haskell-pandoc-types
    $ sudo pacman -S texlive-core 
  4. I am using Sphinx which is a tool that makes it easy to create intelligent and beautiful documentation. Sphinx and I use reStructuredText (rst) instead of markdown (md). Here is a comparison between markdown and reStructuredText. Try pandoc site offers to convert between different text syntax formats.
    These is a little reStructuredText syntax:
    Title
    =====
    
    Heading 1
    ---------
    
    Heading 2
    `````````
    
    Heading 3
    '''''''''
    
    Heading 4
    .........
    
    Heading 5
    ~~~~~~~~~
    
    Heading 6
    *********
    
    Heading 7
    +++++++++
    
    Heading 8
    ^^^^^^^^^
    
    *Italic*
    
    **bold**
    ReStructuredText’s cheat sheet
  5. Converting reStructuredText to PDF and DOCX:
    $ pandoc -V geometry:a4paper -f rst --toc --smart -o test.pdf test.rst
    $ pandoc -V geometry:a4paper -f rst --toc --smart -o test.docx test.rst
  6. Converting reStructuredText to other formats.