Benchmarking Python String Replacemnt

Jan 15, 2014

Python offers several ways of string replacement. But which one is the most efficient ? I set up a little experiment to find out. A file containing some 30,000 characters were generated using this lipsum generator tool. Then a few Python scripts were written to try out different string replacement strategies. The goal is to find every occurance of “o” and replace them with “0?.

The most straight forward way is to use the replace function.

import time

t0 = time.time()
print t0

fin = open("lipsum5000.txt", "r")
fout = open("lipsum5000.out", "w")
for line in fin :
    line = line.replace("o", "0") # replacing "o" with "0"
    fout.write(line)

t1 = time.time()
print t1

print "time difference : {0}".format(t1-t0)

A more sophisticated way would be to use regular expressions. The re module has a function called sub for matching patterns using regular expressions and substituting them with something else.

import time
import re # for regular expressions

t0 = time.time()
print t0

fin = open("lipsum5000.txt", "r")
fout = open("lipsum5000.out", "w")
for line in fin :
    line = re.sub("o", "0", line) # replacing "o" with "0"
    fout.write(line)

t1 = time.time()
print t1

print "time difference : {0}".format(t1-t0)

Python supports functional programming, that can be tried out as well.

import time

t0 = time.time()
print t0

fin = open("lipsum5000.txt", "r")
fout = open("lipsum5000.out", "w")
for line in fin :

    l = map( lambda y: y if y!='o' else '0' , list(line) )  # the lambda function replaces "o" with "0"
    fout.write("".join(l))

t1 = time.time()
print t1

print "time difference : {0}".format(t1-t0)

Note that list lines are split into lists here because the map function takes a list as its second argument. But the splitting of a string into a list could introduce significant overhead. This can be confirmed by comparing with another approach to splitting the string into a list. By list comprehension, for example.

import time

t0 = time.time()
print t0

fin = open("lipsum5000.txt", "r")
fout = open("lipsum5000.out", "w")
for line in fin :

    l = map( lambda y: y!='o' or 0 , [x for x in line] ) # list comprehension to split a string into a list
    fout.write(line)

t1 = time.time()
print t1

print "time difference : {0}".format(t1-t0)

List comprehension is actually one of the cool features of Python. The replacement act itself can be performed by list comprehension.

import time

t0 = time.time()
print t0

fin = open("lipsum5000.txt", "r")
fout = open("lipsum5000.out", "w")
for line in fin :

    l = [x if x!='o' else '0' for x in line]
    fout.write("".join(l))

t1 = time.time()
print t1

print "time difference : {0}".format(t1-t0)

Each of these scripts were ran 10 thousand times and the running time was recorded. Then the average running time with standard deviation was plotted using Matplotlib. The straight forward replace function appears to be the fastest one, as expected. But what I find interesting is that functional programming and list comprehension are comparable in terms of average running time. This can be the starting point of another benchmarking experiment in the future.

chart