Let's see how well does the IO section of some of the most popular
scripting languages. The exercise consists of reading sequentially
several large input file and split it into smaller files.
The languages under consideration are:
Ruby 1.8.6 p287 (2008-08-11) [i386-mswin32]
Ruby 1.8.7 p334 (2011-02-18) [i386-mingw32]
Ruby 1.9.2 p180 (2011-02-18) [i386-mingw32]
jruby 1.5.1 (ruby 1.8.7 patch 249) (Java HotSpot(TM) Client VM 1.6.0_14) [x86-java]
jruby 1.5.1 (ruby 1.8.7 patch 249) (Java HotSpot(TM) Client VM 1.6.0_24) [x86-java]
jruby 1.6.1 (ruby-1.8.7-p330) (Java HotSpot(TM) Client VM 1.6.0_24) [Windows XP-x86-java]
IronRuby 1.1.0.0 on .NET 4.0.30319.225
Python 2.6.2
Python 2.7.1
Python 3.2.0
Php 5.3.6 vc9 unsafe thread
Lua 5.1.4 40
We start by creating the three input files needed for the test:
ruby new.rb input1.txt 185000 1799 => 330Mb
ruby new.rb input2.txt 500000 1799 => 880Mb
ruby new.rb input3.txt 1500000 1799 => 2,6Gb
These measurements were made on a PC Cpu Intel E7300 Core2 Duo
2,66Ghz Ram 3,25Gb with Windows XP Professional 32bit, Hard Disk
ST3250310AS Barracuda 7200.10 SATA 3.0Gb/s 250Gb.
Soon it will also perform on a Server Windows 2008 R2 64bit on VMWare
Xeon X7460 Dual Core at 2,66Ghz and 2Gb di ram with SCSI disks.
Before and after creating the three input files I defragmented the
disk. If the times are erratic means that the disk should be
defragmented or there is something that slows down the system such as
the antivirus which must be disabled.
For every file I run six benchs and considering the poor performance
of the IO system, I dropped the three worst. Of course, before each test
I removed the output files.
The graphs are explicit.
Only one comment about ruby 1.9.2 which has obvious problems of IO and
these results are not in line with the overall performance of this
language that, as I have checked from previous tests, are very good.
These are the scripts that I wrote:
# Written by Marco Mastrodonato on 19/04/2011
# Script to split a file into n output files
# Example:
# ruby split.rb par1 par2
# par1 => name [default => input1.txt]
# par2 => record number that determines the number of output files [default => 1650]
strinput = ARGV[0] || 'input1.txt'
nrec_to_split = ARGV[1] ? ARGV[1].to_i : 1650
unless File.exists? strinput
puts "File #{strinput} doesn't exists!"
exit 1
end
stroutput = "out_%03d.txt"
t1= Time.now
puts "Ruby #{RUBY_VERSION} #{strinput} started at #{t1}, wait please..."
File.open(strinput, "r") do |f|
nsplit = 0
nrec = 0
fileoutput = nil
while line = f.gets
if nrec % nrec_to_split == 0
nsplit += 1
fileoutput.close if fileoutput
fileoutput = File.open(stroutput % nsplit, 'w')
end
fileoutput.write line
nrec += 1
end
fileoutput.close if fileoutput
end
puts "Ended at #{Time.now}"
puts "Elapsed time #{Time.now - t1}"
exit 0
# Written by Marco Mastrodonato on 19/04/2011
# Script to split a file into n output files
# Example:
# python split.py par1 par2
# par1 => name [default => input1.txt]
# par2 => record number that determines the number of output files [default => 1650]
from time import time, gmtime, strftime
import sys
try:
strinput = sys.argv[1]
except:
strinput = 'input1.txt'
stroutput = "out_%03d.txt"
try:
nrec_to_split = int(sys.argv[2])
except:
nrec_to_split = 1650
t1 = time()
print(sys.version)
print(strftime("Started at %a, %d %b %Y %H:%M:%S +0000, wait please...", gmtime()))
nrec = 0
nsplit = 0
fileinput = open(strinput, "r")
for line in fileinput:
if nrec % nrec_to_split == 0:
try:
fileoutput.close()
except NameError:
fileoutput = None
nsplit += 1
fileoutput = open(stroutput %nsplit , "w")
fileoutput.write(line)
nrec += 1
fileoutput.close()
fileinput.close()
print(strftime("Ended at %a, %d %b %Y %H:%M:%S +0000", gmtime()))
print("Elapsed time %f" %(time() - t1))
name [default => input1.txt]
// par2 => record number that determines the number of output files [default => 1650]
$strinput = isset($argv[1]) ? $argv[1] : 'input1.txt';
$nrec_to_split = isset($argv[2]) ? $argv[2] : 1650;
$stroutput = 'out_%03d.txt';
$t1 = microtime_float();
echo "Php ".phpversion()." started at ".date('D, d M Y H:i:s T').", wait please...\n";
$nsplit = 0;
$nrec = 0;
$fileinput=fopen($strinput,"r");
while(!feof($fileinput)) {
if ($nrec % $nrec_to_split == 0) {
++$nsplit;
if (isset($fileoutput)) fclose($fileoutput);
$fileoutput = fopen(sprintf($stroutput, $nsplit), 'w');
}
$buffer = fgets($fileinput);
fwrite($fileoutput, $buffer);
++$nrec;
}
fclose ($fileinput);
echo "Ended at ".date('D, d M Y H:i:s T')."\n";
echo "Elapsed time ".(microtime_float() - $t1)."\n";
function microtime_float() {
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
?>
--[[
Written by Marco Mastrodonato on 19/04/2011
Script to split a file into n output files
Example:
lua split.lua par1 par2
par1 => name [default => input1.txt]
par2 => record number that determines the number of output files [default => 1650]
--]]
strinput = arg and arg[1] or "input1.txt"
stroutput = "out_%03d.txt"
nrec_to_split = arg and arg[2] and tonumber(arg[2]) or 1650
local t1 = os.clock()
print(_VERSION .. " started at " .. os.date("%a, %d %b %Y %H:%M:%S +0000"))
nsplit = 0
nrec = 0
for line in io.lines(strinput) do
if nrec % nrec_to_split == 0 then
if fileOut ~= nil then io.close(fileOut) end
nsplit = nsplit + 1
fileOut = io.open(string.format(stroutput, nsplit) , 'w')
end
fileOut:write (line .. '\n')
nrec = nrec + 1
end
io.close(fileOut)
print("Ended at " .. os.date("%a, %d %b %Y %H:%M:%S +0000"))
print(string.format("Elapsed time: %.2f\n", os.clock() - t1))
To create the files I've used this simple ruby script:
# Example:
# ruby new.rb [NOME] [LINES] [RECORD SIZE]
stroutput = ARGV[0] || 'input1.txt'
num = ARGV[1] ? ARGV[1].to_i : 185000
size = ARGV[2] ? ARGV[2].to_i : 1799
if File.exists? stroutput
puts "File #{stroutput} already exists!"
exit 1
end
t1= Time.now
puts "Ruby #{RUBY_VERSION} #{stroutput} started at #{t1}, wait please..."
line = "*" * size
File.open(stroutput, "w") do |f|
num.times do
f.puts line
end
end
puts "Ended at #{Time.now}"
puts "Elapsed time #{Time.now - t1}"
exit 0