Huffman Coding in Python

Program will take the name of the file (with path is also fine), and will output the compressed file in the directory of the program. This program also shows how to write file byte-wise required for compression.

################################
# Name: Dhruv Pancholi
# Filename: huffman.py
# Input Format: filename
################################

from heapq import *

class Node(object):
	def __init__(self, freq, left = None, right = None, char = None):
		super(Node, self).__init__()
		self.freq = freq
		self.left = left
		self.right = right
		self.char = char
		self.code = [0, 0]
	def __lt__(self, other):
		return self.freq < other.freq
	def __gt__(self, other):
		return self.freq > other.freq
	def __eq__(self, other):
		if other == None:
			return False
		return self.freq == other.freq

def calcFrequency(filename):
	filep = open(filename, 'r')
	freq = {}
	for line in filep:
		for c in line:
			if c not in freq:
				freq[c] = 1
			else:
				freq[c] += 1
	return freq

def huffman(C):
	n = len(C)
	for i in xrange(n-1):
		z = Node(0)
		z.left = heappop(C)
		z.right = heappop(C)
		z.freq = z.left.freq + z.right.freq
		heappush(C, z)
	return heappop(C)

def generateHuffman(root, code, height, code_dict):
	if root==None:
		return
	root.code[0] = code>>1
	root.code[1] = height
	if root.char != None:
		code_dict[root.char] = root.code
	generateHuffman(root.left, code<<1, height+1, code_dict)
	generateHuffman(root.right, (code+1)<<1, height+1, code_dict)

def generateHuffman1(root, code, height, code_dict):
	if root==None:
		return
	root.code = code
	if root.char != None:
		code_dict[root.char] = root.code
	generateHuffman1(root.left, code+'0', height+1, code_dict)
	generateHuffman1(root.right, code+'1', height+1, code_dict)

def calcHuffman(filename):
	code_dict = {}
	freq = calcFrequency(filename)	
	heap = []
	for c in freq:
		heappush(heap, Node(freq[c], char = c))
	root = huffman(heap)
	generateHuffman1(root, '', 0, code_dict)
	return code_dict

def main():
	filename = raw_input('Enter the name of the file to be compressed: ')
	huffman = calcHuffman(filename)
	filep = open(filename, 'r')
	s=''
	for line in filep:
		for c in line:
			s += huffman[c]
	for i in xrange(8-len(s)%8):
		s+='0'
	packed_data = ''.join(chr(int(s[i:i+8], 2)) for i in xrange(0, len(s), 8))
	fileo = open('output.txt', 'w')
	fileo.write(packed_data)
	filep.close()
	fileo.close()
	import os
	print 'Name of the compressed file: output.txt'
	print "Compression ratio: %.6f"%(float(os.stat('output.txt').st_size)/float(os.stat(filename).st_size))

if __name__ == '__main__':
	main()

Heap Sort list of objects in Python

Here, we are consider a list of list objects which are supposed to be sorted based on one of its elements. Consider the list as shown below:

l=[[150,"Batman","DC"],[300,"Superman","DC"],[220,"Ironman","Marvel"],
[280,"Thor","Marvel"],[250,"Hulk","Marvel"]]

The above list consists of lists which in turn contains the rating of superhero, name of the super hero and it rights holder. Let the position of rating be the key of the nested list. So the aim is the sort the list based on the given set of keys(here power rating). The position of power rating is 0.

The generic heap sort algorithm is designed to sort the list of integer or float values, so a bit of tweaking is required to deal with this problem.

#Creating a function for sorting the list
l=[[150,"Batman","DC"],[300,"Superman","DC"],[220,"Ironman","Marvel"],
[280,"Thor","Marvel"],[250,"Hulk","Marvel"],]
key=0

About heaps

Heap data structure is an list object which can be viewed as a binary tree, which contains one added attribute heap-size. Here in this can we won’t consider heap-size and will just take heap-size equal to the size of the list. As it is a binary tree it contains a parent, a left child and a right child. The corresponding functions to find out the index is as follows:

def parent(i):
	return i/2-1

def left(i):
	return 2*i+1

def right(i):
	return 2*i+2

Implementation

There are two types of heaps max-heap and min-heap, in this case we will consider only max-heap. The property of the max-heap which needs to be satisfied is weight(A,parent(i))>weight(A,i). We define the weight function as follows:

def weight(A,i):
	return A[i][key]

min-heap is just opposite of the above condition. Some of the important procedures used in the context of Heap sort are as follows:

  • max_heapify
  • build_max_heap
  • heapsort

Max Heapify procedure

def max_heapify(A,i):
	l=left(i)
	r=right(i)
	if l<len(A) and weight(A,l)>weight(A,i):
		largest=l
	else:
		largest=i
	if r<len(A) and weight(A,r)>weight(A,largest):
		largest=r
	if largest!=i:
		temp=A[i]
		A[i]=A[largest]
		A[largest]=temp
		max_heapify(A,largest)

max_heapify procedure is called on the root of the sub-tree which needs to be converted in to max-heap.

Build Max Heap Procedure

def build_max_heap(A):
	#	Going from level above the leaf to the root
	for x in range(0,len(A)/2+1):
		i=len(A)/2-x
		max_heapify(A,i)

build_max_heap takes the list as the argument and makes necessary changes to the list to convert it into max-heap.

Heap Sort

def heapsort(A):
	build_max_heap(A)
	result=[]
	i=len(A)-1
	while len(A)!=1:
		temp=A[i]
		A[i]=A[key]
		A[key]=temp
		result.append(A[i])
		A.remove(A[i])
		max_heapify(A,key)
		i=i-1
	result.append(A[key])
	return result

This is the main function to which the list is passed and it calls all other functions in order to sort the given list, based on the key(index) required.

Complete Code

l=[[150,"Batman","DC"],[300,"Superman","DC"],
[220,"Ironman","Marvel"],[280,"Thor","Marvel"],[250,"Hulk","Marvel"]]

key=0

def parent(i):
	return i/2-1

def left(i):
	return 2*i+1

def right(i):
	return 2*i+2

def weight(A,i):
	return A[i][key]

def max_heapify(A,i):
	l=left(i)
	r=right(i)
	if l<len(A) and weight(A,l)>weight(A,i):
		largest=l
	else:
		largest=i
	if r<len(A) and weight(A,r)>weight(A,largest):
		largest=r
	if largest!=i:
		temp=A[i]
		A[i]=A[largest]
		A[largest]=temp
		max_heapify(A,largest)

def build_max_heap(A):
	#	Going from level above the leaf to the root
	for x in range(0,len(A)/2+1):
		i=len(A)/2-x
		max_heapify(A,i)

def heapsort(A):
	build_max_heap(A)
	result=[]
	i=len(A)-1
	while len(A)!=1:
		temp=A[i]
		A[i]=A[key]
		A[key]=temp
		result.append(A[i])
		A.remove(A[i])
		max_heapify(A,key)
		i=i-1
	result.append(A[key])
	return result
l=heapsort(l)

Setting up DJango Server in 5 minutes

Well the amount of time taken depends on the Internet speed. This tutorial assumes that you are working on Linux(Debian/Ubuntu), if not I recommend you Install one of those distros on dual-boot or on VMware/Oracle VM VirtualBox.

Install python pip, which is a tool for installing and managing python packages. Pop out the terminal and start inputing the following code.

sudo apt-get install python-pip

Installing DJango

sudo pip install django

To check for the correct installation

python -c "import django;print django.get_version()"

At this point Django supports three database engines

    • PostgreSQL
    • SQLite 3
    • MySQL

We will focus on installing SQLite and MySQL

Setting up SQLite

If you’re using a Python version over 2.5, you already have SQLite. If you’re working with Python 2.4 or older, you’ll need SQLite 3— not version 2. I assume that you are using Python version over 2.5.

sudo apt-get install python-pysqlite2

To check for the correct installation

python -c "import sqlite3;print sqlite3.version"

Setting up MySQLdb

This is a bit tricky, as many errors will pop out during this installation. The following steps worked out fine for me

Installing MySQL server
sudo apt-get install mysql-server
sudo apt-get install libmysqlclient-dev
Installing MySQLdb
easy_install -U distribute
sudo apt-get install python-mysqldb

To check for the  correct installation

python -c "import MySQLdb;print MySQLdb.version_info"

Starting the server

Pop out another terminal and change to the directory in which you want to start your project, in my case the name of the project is mysite and the directory in which I am going to have my projects is workspace

cd ~
mkdir workspace

After creating the directory, call out django-admin.py to start the django project, which will create the basic project structure

cd workspace
django-admin.py startproject mysite
cd mysite
python manage.py runserver 0.0.0.0:8000

If there are not any errors, then the output should look something like this

0 errors found
December 11, 2013 - 17:51:52
Django version 1.6, using settings 'mysite.settings'
Starting development server at http://0.0.0.0:8000/
Quit the server with CONTROL-C.

Here 0.0.0.0 is for in case you want to access the webapp from a different machine, whereas 8000 is the port on which the webapp is accessible. localhost will also work with the above IP, port can be changed according to your convenience. If you want to access this webapp from other machine go to web browser and type

your_ip:8000

In case you want to make it accessible only to your machine, then change it to 127.0.0.1:8000 and type in the browser

127.0.0.1:8000 or localhost:8000