Friday, 19 September 2014

PyMongo almost 2x faster with PyPy

  1. Install PyPy and MongoDB with pacman under Arch Linux based distros:

    $ sudo pacman -Sy
    $ sudo pacman -S pypy mongodb
  2. Create a new virtualenv (see howto) and install PyMongo:

    $ mkvirtualenv testPyPy --python=/usr/bin/pypy
    $ pip install pymongo
  3. Start MongoDB:

    $ mkdir /home/mictadlo/databases/
    $ mongod --dbpath /home/mictadlo/databases/
  4. I found here a PyMongo benchmark code:

    import sys
    import os
    import pymongo
    import time
    import random
    
    from datetime import datetime
    
    min_date = datetime(2012, 1, 1)
    max_date = datetime(2013, 1, 1)
    delta = (max_date - min_date).total_seconds()
    
    job_id = '1'
    
    if len(sys.argv) < 2:
        sys.exit("You must supply the item_number argument")
    elif len(sys.argv) > 2:
        job_id = sys.argv[2]   
    
    documents_number = int(sys.argv[1])
    batch_number = 5 * 1000;
    
    job_name = 'Job#' + job_id
    start = datetime.now();
    
    # obtain a mongo connection
    connection = pymongo.Connection("mongodb://localhost", safe=True)
    
    # obtain a handle to the random database
    db = connection.random
    collection = db.randomData
    
    batch_documents = [i for i in range(batch_number)];
    
    for index in range(documents_number):
        try:           
            date = datetime.fromtimestamp(time.mktime(min_date.timetuple()) + int(round(random.random() * delta)))
            value = random.random()
            document = {
                'created_on' : date,   
                'value' : value,   
            }
            batch_documents[index % batch_number] = document
            if (index + 1) % batch_number == 0:
                collection.insert(batch_documents)     
            index += 1;
            if index % 100000 == 0:
                print job_name, ' inserted ', index, ' documents.'     
        except:
            print 'Unexpected error:', sys.exc_info()[0], ', for index ', index
            raise
    print job_name, ' inserted ', documents_number, ' in ', (datetime.now() - start).total_seconds(), 's'
  5. PyPy is almost 2x faster than CPython as the results below show

    • CPython:

      $ python --version
      Python 2.7.8
      
      $ time python test_pymongo.py 12000000
      Job#1  inserted  12000000  in  709.949361 s
      
      real    11m50.123s
      user    6m55.263s
      sys     0m48.803s
    • PyPy:

      $ python --version
      Python 2.7.6 (3cf384e86ef7, Jun 27 2014, 00:09:47)
      [PyPy 2.4.0-alpha0 with GCC 4.9.0 20140604 (prerelease)]
      
      $ time python test_pymongo.py 12000000
      Job#1  inserted  12000000  in  464.130798 s
      
      real    7m44.711s
      user    3m2.693s
      sys     0m41.667s

No comments:

Post a Comment