Rate this page del.icio.us  Digg slashdot StumbleUpon

This isn’t your grandpappy’s dd command

by

co-authored with Grig Gheorghiu

Background

The dd command is one of those ancient UNIX tools that is extremely powerful, yet at the same time, the syntax can make it feel slightly archaic. A lot of seasoned sysadmins and developers still remember the first time they saw the dd command used by a bearded wizard. He might have used it to test the disk I/O, capture a disk image, or restore it.

In some ways, dd can seem like Old Spice–only the guys over 60 use it. But the younger generation should know that dd still has some tricks up its sleeve. In this article, we’re going to put a new twist on this old favorite and show how grandpappy really does know best sometimes. The new twist is to mix dd with Python and the Google Chart API to make a UNIX 2.0 mashup tool. (“UNIX 2.0″ is a play on words for what happens when you change the original behavior of a tool like dd to make it do something a bit different.)

Setup

For this article, we assume you’re running Fedora Core 8. We’re actually just renting some time from Amazon in all of these examples. To do that we allocated a 1 GB Elastic Block Storage volume from Amazon and attached it as the device /dev/sdd to an Amazon Machine Instance (AMI) running Fedora Core 8. Learn more about using Amazon Cloud Computing with Red Hat.

Using dd for disk benchmarking with Google Charts API and Python

We benchmarked the throughput of the disk by running the dd command with various block sizes from 128 KB to 1 MB. (Note: If you want to run the script on your own machine, make sure that the volume you use doesn’t contain any valuable data, because the data will be erased by the dd command. Remember, data loss makes grandpappy mad!)

For the benchmark, we wrote a Python script that uses the commands module to run and capture the output of the dd command. The script also uses the csv module to generate a comma-separated values file so that we can graph the results later. For this example, we chose to graph the results using the Google Chart API.

#!/usr/bin/env python
import commands
import re
import csv

Next we define the main function, which takes a device name and a block size as parameters, and returns the throughput measured with dd and the unit of measure (e.g. MB/s). We use the regular expression module (re) to isolate the throughput value and unit of measure from the output of the dd command.

Editor’s note: In the code below, unit = "" has been added since the article was posted.

def get_disk_throughput(device, blocksize):

    blocksize = str(blocksize) + 'k'

    cmd = "dd if=/dev/zero of=%s bs=%s" % (device, blocksize)

    output = commands.getoutput(cmd)

    throughput = 0

    unit = ""

    for line in output.split('\n'):

        s = re.search(' copied,.*, (\S+) (\S+)$', line)

        if s:

            throughput = s.group(1)

            unit = s.group(2)

            break

    return (throughput, unit)

Here is the portion of the script that is executed when it’s run from the command line. We open a csv file and associate it with a csv writer. We then use the writerow method of the writer to append the header and each data row. We iterate over the list of block sizes and call the get_disk_throughput function for each block size.

We also compose the Google Chart URL by filling in the exact data values, represented by the throughput numbers that we obtain from the get_disk_throughput function. Then we print the URL to stdout. If you check the URL, you’ll see the chart generated with our data.

For details on the Google Chart API and what each parameter to the URL represents, see the Developer’s Guide.

Editor’s note: This is an updated version of the code that originally appeared with this article. It has command-line argument processing, and it composes the Google Chart URL in a better, more self-explanatory fashion. See the end of the article for the original.

f = open('disk_throughput.csv', 'w')
        writer = csv.writer(f)
        writer.writerow( ('Block size (KB)', 'Throughput') )
        blocksizes = [128, 256, 512, 1024]
        gchart_url = "http://chart.apis.google.com/chart?"
        gchart_type = "cht=bvs"
        gchart_title = "&chtt=Disk%20throughput"
        gchart_size = "&chs=400x250"
        gchart_axis_labels = "&chxt=x,y"
        gchart_data = "&chd=t:"
        gchart_labels = "&chl="
        max_t = 0.0
        for blocksize in blocksizes:
            (t, u) = get_disk_throughput(device, blocksize)
            if float(t) > max_t:
                max_t = float(t)
            writer.writerow( (blocksize, t) )
            print 'Block Size: %sk Throughput: %s %s' % (blocksize, t, u)
            gchart_data += t + ","
            gchart_labels += str(blocksize) + "k" + "|"
        gchart_data = gchart_data.rstrip(',')
        gchart_labels = gchart_labels.rstrip('|')
        gchart_axis_range = "&chxr=1,0," + str(max_t+10.0)
        gchart_scaling = "&chds=0," + str(max_t+10.0)
        gchart_url += gchart_type + gchart_title + gchart_size + gchart_data + gchart_labels
        gchart_url += gchart_axis_labels + gchart_axis_range + gchart_scaling
        print "Google Chart URL (just paste in a browser):", gchart_url
    finally:
        f.close()

Here is the output of the script in one of our runs:

Block Size: 128 Throughput: 62.8 MB/s

Block Size: 256 Throughput: 61.8 MB/s

Block Size: 512 Throughput: 57.1 MB/s

Block Size: 1024 Throughput: 56.5 MB/s

Now here is the actual image that gets created:

Full script

#!/usr/bin/env python

import sys
import commands
import re
import csv
from optparse import OptionParser

def get_disk_throughput(device, blocksize):
    blocksize = str(blocksize) + 'k'
    cmd = "dd if=/dev/zero of=%s bs=%s" % (device, blocksize)
    output = commands.getoutput(cmd)
    throughput = 0
    unit = ""
    for line in output.split('\n'):
        s = re.search(' copied,.*, (\S+) (\S+)$', line)
        if s:
            throughput = s.group(1)
            unit = s.group(2)
            break
    return (throughput, unit)

if __name__ == "__main__":

    usage = "usage: %prog options"
    parser = OptionParser(usage=usage)
    parser.add_option("-d", "--device", dest="device",
            help="Disk device to operate on (NOTE: any data on that device will be lost)")
    (options, args) = parser.parse_args()
    device = options.device
    if not device:
        parser.print_help()
        sys.exit(1)

    try:
        f = open('disk_throughput.csv', 'w')
        writer = csv.writer(f)
        writer.writerow( ('Block size (KB)', 'Throughput') )
        blocksizes = [128, 256, 512, 1024]
        gchart_url = "http://chart.apis.google.com/chart?"
        gchart_type = "cht=bvs"
        gchart_title = "&chtt=Disk%20throughput"
        gchart_size = "&chs=400x250"
        gchart_axis_labels = "&chxt=x,y"
        gchart_data = "&chd=t:"
        gchart_labels = "&chl="
        max_t = 0.0
        for blocksize in blocksizes:
            (t, u) = get_disk_throughput(device, blocksize)
            if float(t) > max_t:
                max_t = float(t)
            writer.writerow( (blocksize, t) )
            print 'Block Size: %sk Throughput: %s %s' % (blocksize, t, u)
            gchart_data += t + ","
            gchart_labels += str(blocksize) + "k" + "|"
        gchart_data = gchart_data.rstrip(',')
        gchart_labels = gchart_labels.rstrip('|')
        gchart_axis_range = "&chxr=1,0," + str(max_t+10.0)
        gchart_scaling = "&chds=0," + str(max_t+10.0)
        gchart_url += gchart_type + gchart_title + gchart_size + gchart_data + gchart_labels
        gchart_url += gchart_axis_labels + gchart_axis_range + gchart_scaling
        print "Google Chart URL (just paste in a browser):", gchart_url
    finally:
        f.close()

Summary

In this article we shattered the myth that you must be 60, have a massive grey beard, and have worked at Bell Labs to use the dd command. Even for a newer generation, dd can be used in some inventive ways. We combined Python, the Google Chart API, and Red Hat on Amazon’s cloud computing infrastructure to create a novel way to measure and chart disk I/O and performance. Go celebrate by buying yourself a bottle of Old Spice.

References

Python: http://www.python.org/
dd example scripts: http://tldp.org/LDP/abs/html/extmisc.html
Google Chart API: http://code.google.com/apis/chart/

Original code

if __name__ == "__main__":

    try:

        f = open('disk_throughput.csv', 'w')

        writer = csv.writer(f)

        writer.writerow( ('Block size (KB)', 'Throughput') )

        device = '/dev/sdd'

        blocksizes = [128, 256, 512, 1024]

        google_chart_url = "http://chart.apis.google.com/chart?cht=bvs&chd=t:"

        google_chart_data = ""

        google_chart_labels = ""

        max_t = 0.0

        for blocksize in blocksizes:

            (t, u) = get_disk_throughput(device, blocksize)

            if float(t) > max_t:

                max_t = float(t)

            writer.writerow( (blocksize, t) )

            print 'Block Size: %s Throughput: %s %s' % (blocksize, t, u)

            google_chart_data += t + ","

            google_chart_labels += str(blocksize) + "k" + "|"

        google_chart_data = google_chart_data.rstrip(',')

        google_chart_labels = google_chart_labels.rstrip('|')

        google_chart_url += google_chart_data +"&chl=" + google_chart_labels

        google_chart_url += "&chtt=Disk%20throughput" +"&chs=400x250&chxt=x,y"

        google_chart_url += "&chxr=1,0,%s&chds=0,%s" % (str(max_t+10.0), str(max_t+10.0))

        print google_chart_url

    finally:

        f.close()

Authors

Noah Gift is the co-author of Python For Unix and Linux by O’Reilly, and Google App Engine in Action by Manning. He is an author, speaker, consultant, and community leader, writing for publications such as IBM Developerworks, Red Hat Magazine, O’Reilly, Manning, and MacTech. He has a master’s degree in CIS from Cal State Los Angeles, a B.S. in nutritional science from Cal Poly San Luis Obispo, and is an Apple and LPI certified sys admin. He’s worked at companies that include Caltech, Disney Feature Animation, Sony Imageworks, Turner Studios, and most recently, WetaDigital.

Grig Gheorghiu is the director of technology for RIS Technology, a web hosting company based in Los Angeles. Grig has 15 years industry experience, during which time he has worked as a programmer, research lab manager, system/network/security architect, IT consultant, and lead test engineer.

Grig is an active member of the Python and agile testing communities. He maintains a blog dedicated to agile testing, Python programming, and automated testing tools and techniques. Grig is the founder of the Southern California Python Interest Group, aka “the SoCal Piggies”. He lives in Los Angeles with his wife and two children.

9 responses to “This isn’t your grandpappy’s dd command”

  1. Jeff Schroeder says:

    Can you put the script online as a whole so we don’t have to copy / paste snippets?

    The version in this article _does not work_

    Traceback (most recent call last):
    File “foo.py”, line 34, in
    (t, u) = get_disk_throughput(device, blocksize)
    File “foo.py”, line 19, in get_disk_throughput
    return (throughput, unit)
    UnboundLocalError: local variable ‘unit’ referenced before assignment

    Copy here:
    http://pastebin.com/m7526b1ea

  2. Mike Russle says:

    There is also a Python module that does Google Charts for you without having to construct the URL yourself. It’s called pygooglechart.

  3. Ruth Suehle says:

    Jeff–we added the complete piece.

  4. Ivan Baldin says:

    Really interesting way of doing that but the script is missing few escape chars (‘\’) on lines 11 and 12.

  5. Finnbarr P. Murphy says:

    This is the correction for the regular expression:
    s = re.search(‘ copied,.*, (\S+) (\S+)$’, line)

  6. Finnbarr P. Murphy says:

    Here is a modified script which directly calls Google Charts and displays the resulting chart in a Gnome window using pygtk+

    #!/usr/bin/env python
    #
    # FPMurphy 11/21/08 – Based on Redhat Magazine article
    #

    import sys
    import os
    import commands
    import re
    from optparse import OptionParser
    import urllib
    import urllib2
    import pygtk
    pygtk.require(’2.0′)
    import gtk

    class DisplayGraph:

    def delete_event(self, widget, event, data=None):
    return False

    def destroy(self, widget, data=None):
    gtk.main_quit()

    def __init__(self):
    self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)
    self.window.connect(“delete_event”, self.delete_event)
    self.window.connect(“destroy”, self.destroy)
    self.window.set_border_width(10)
    self.window.set_position(gtk.WIN_POS_CENTER)
    self.window.set_title(“Disk Throughput”)
    pixbuf = gtk.gdk.pixbuf_new_from_file(“/tmp/dd.png”)
    os.remove(“/tmp/dd.png”)

    self.image = gtk.Image()
    self.image.set_from_pixbuf(pixbuf)
    self.image.show()
    self.window.add(self.image)
    self.window.show()

    def main(self):
    gtk.main()

    class GoogleChart:

    def __init__(self):
    self.gchart_url = “http://chart.apis.google.com/chart?”
    self.gchart_type = “cht=bvs”
    self.gchart_size = “&chs=400×250″
    self.gchart_axis_labels = “&chxt=x,y,x,y”
    self.gchart_axis_position = “&chxp=2,50|3,50″
    self.gchart_data = “&chd=t:”
    self.gchart_labels = “&chxl=0:|”
    self.gchart_title = “&chtt=”
    self.gchart_bar_settings = “&chbh=30,15″

    def title(self,title):
    self.gchart_title = self.gchart_title + title

    def write(self, data, labels, max_t):
    self.gchart_data = self.gchart_data + data.rstrip(‘,’)
    self.gchart_labels = self.gchart_labels + labels + “2:|Block%20Size|3:|Mb/s”
    self.gchart_axis_range = “&chxr=1,0,” + str(max_t+10.0)
    self.gchart_scaling = “&chds=0,” + str(max_t+10.0)
    self.gchart_url += self.gchart_type + self.gchart_title + self.gchart_size
    self.gchart_url += self.gchart_bar_settings + self.gchart_data + self.gchart_labels
    self.gchart_url += self.gchart_axis_labels + self.gchart_axis_position
    self.gchart_url += self.gchart_axis_range + self.gchart_scaling

    opener = urllib2.urlopen(self.gchart_url)
    if opener.headers['content-type'] != ‘image/png’:
    raise BadContentTypeException(‘Server responded with a ‘ \
    ‘content-type of %s’ % opener.headers['content-type'])
    open(“/tmp/dd.png”, ‘wb’).write(opener.read())

    def get_disk_throughput(device, blocksize):
    blocksize = str(blocksize) + ‘k’
    cmd = “dd if=/dev/zero of=%s bs=%s” % (device, blocksize)
    output = commands.getoutput(cmd)
    throughput = 0
    unit = “”
    for line in output.split(‘n’):
    s = re.search(‘ copied,.*, (\S+) (\S+)$’, line)
    if s:
    throughput = s.group(1)
    unit = s.group(2)
    break
    return (throughput, unit)

    if __name__ == “__main__”:
    usage = “usage: %prog options”
    parser = OptionParser(usage=usage)
    parser.add_option(“-d”, “–device”, dest=”device”, \
    help=”Disk device to operate on (NOTE: any data on that device will be lost)”)
    (options, args) = parser.parse_args()
    device = options.device
    if not device:
    parser.print_help()
    sys.exit(1)

    max_t = 0.0
    blocksizes = [128, 256, 512, 1024, 2048, 4096, 8192]
    data=””
    labels=””
    for blocksize in blocksizes:
    (t, u) = get_disk_throughput(device, blocksize)
    if float(t) > max_t:
    max_t = float(t)
    data += str(t) + “,”
    labels += str(blocksize) + “k” + “|”

    chart = GoogleChart()
    chart.title(device)
    chart.write(data, labels, max_t)

    graph = DisplayGraph()
    graph.main()

  7. Dan Yocum says:

    “I’m 37!”
    “What?”
    “I said, I’m 37. I’m not old!”

    You snarky little git who thinks that dd is only used by guys over 60 years old. Humph.

  8. Joe Smith says:

    I’ve been using dd to measure disk write speeds on 500 GB disk drives. I use this perl script to compare throughput of single disks, RAID1, RAID5, RAID1+0, etc.

    http://www.inwap.com/mybin/miscunix/?z1gb

  9. GaloAlgof says:

    emm. thank you :)