Writing Ruby Scripts That Respect Pipelines

Published on December 12, 2011 by Jesse Storimer

Pipes are the most powerful concept on the command line.

With pipes you can string together small, simple commands into bigger, more useful pipelines. This is the secret sauce that goes along with the Unix philosophy of "Do one thing, and do it well". Take this as an example:

# Look for any lines mentioning 'user' in the current git diff
# and display them one page at a time using less(1).
$ git diff | grep user | less

A simple utility

Here's a Ruby script I'm calling hilong.

It's a small, simple utility that takes a file as an argument and prints each line prepended with its length. Lines longer than the maximum length (80 chars by default) are highlighted in red. Here's the simplest version of our script:

#!/usr/bin/env ruby

# A file is passed in as an argument
input = File.open(ARGV[0])

# escaped bash color codes
red = "\e[31m"
reset_color = "\e[0m"

maximum_line_length = 80

# For each line of the input...
input.each_line do |line|
  # Construct a string that begins with the length of this line
  # and ends with the content. The trailing newline is #chop'ped 
  # off of the content so we can control where the newline occurs.
  # The strings are joined with a tab character so that indentation
  # is preserved.
  output_line = [line.length, line.chop].join("\t")

  if line.length > maximum_line_length
    # Turn the output to red starting at the first character.
    output_line.insert(0, red)
    # Reset the text color back to what it was at the end of the
    # line.
    output_line.insert(-1, reset_color)
  end

  $stdout.puts output_line
end

And here's how it works:

 $ hilong Gemfile
17    source :rubygems
1
13    gem 'jekyll'
82    gem 'liquid', '2.2.2'         # The comment on this line is long-winded, not sure why... 
16    gem 'RedCloth'  

This works pretty well. It takes a file as input and puts the modified version onto its $stdout. Let's see how it fares when we combine it with other utilities.

Introducing Pipes

Let's start by trying to pipe the output of this utility to another utility:

# Only show 'gem' lines.
$ hilong Gemfile | grep gem

This works nicely. Since we put our output on $stdout grep(1) can read it and do the proper filtering. It even preserves our color codes!

Let's try another one:

# View the output one page at a time.
$ hilong Gemfile | more

Eeee. Now we get some ugly escape codes in our output. It seems that more(1) doesn't know what to do with the bash escaped colors that we included so it just includes it as part of the output. The same things happens if you redirect the output to a file.

When you are piping output to another program you should always send plain, unformatted text. Unix utilities expect to deal with plain text.

Is a tty?

So we can't include our color codes if our output is being piped to another program, but we want to include the color codes if our output is being displayed in a terminal. How do we tell?

Ruby's IO#isatty method (aliased as IO#tty?) will tell you whether or not the IO in question is attached to a terminal. Calling it on $stdout, for instance, when it's being piped will return false.

We'll rewrite our script to make use of this, I've highlighted the relevant part below:

  # If the line is long and our $stdout is not being piped then we'll
  # colorize this line.
  if $stdout.tty? && line.size > maximum_line_length
    # Turn the output to red starting at the first character.
    output_line.insert(0, red)
    # Reset the text color back to what it was at the end of the
    # line.
    output_line.insert(-1, reset_color)
  end 

  $stdout.puts output_line

Now if we try piping our output to more(1) again, or to a file, we get nice plain text.

Incoming!

Most Unix utilties also respond to pipes coming from the other direction, as input. Let's see how our utility responds when we pipe in some input:

$ cat Gemfile Gemfile.lock | hilong 
/Users/jessestorimer/projects/hilong/hilong:4:in `initialize': can't convert nil into String (TypeError)
from /Users/jessestorimer/projects/hilong/hilong:4:in `open'
    from /Users/jessestorimer/projects/hilong/hilong:4:in `<main>'

:/

Right now our utility is only written to handle input given as a filename passed in via ARGV. How can we make it accept raw data from a pipe?

Ruby has a wonderful facility for this called ARGF. ARGF provides a consistent interface for raw data coming in via a pipe, and filenames passed via ARGV. Let's rewrite our script to take advantage of it:

# Read input from files or pipes.
-input = File.open(ARGV[0])
+input = ARGF.read

Wonderful! If something is passed in on ARGV then ARGF will assume that it's filenames and call IO#read on them sequentially. If ARGV is empty then it reads from $stdin to get data passed in via pipe.

Unix utilities will ignore standard input if filenames are given.

Now let's look at all the ways we can use our new utility.

$ hilong Gemfile
$ hilong Gemfile | more
$ hilong Gemfile > output.txt
$ hilong Gemfile Gemfile.lock
$ cat Gemfile* | hilong
$ cat Gemfile | hilong - Gemfile.lock
$ hilong < Gemfile

One more case...

With only a few small changes we were able to get our utility to respect pipelines like any other Unix utility would. But there's one more case I want to demonstrate. What if we're getting input from a pipe coming from a command such as tail -f where the input never stops coming?

$ tail -f log/test.log | hilong

If you give this a try and append to the log file you'll notice that our utility seems to be supressing the output. We're not seeing anything being printed.

This is due to the fact that we're using ARGF#read. #read will block until it receives EOF, but the tail utility will never send EOF because it always has more data. So the first time our utility is invoked with some data it simply blocks and never returns. So we need to change the way we're reading from ARGF.

We'll read from ARGF one line at a time using #each_line. #each_line will, duh, read each line in succession. So anytime a newline is encountered the String is passed into the block.

Here are the required changes:

# Keep reading lines of input as long as they're coming.
ARGF.each_line do |line|
  # Construct a string that begins with the length of this line
  # and ends with the content. The trailing newline is #chop'ped 
  # off of the content so we can control where the newline occurs.
  # The string are joined with a tab character so that indentation
  # is preserved.
  output_line = [line.size, line.chop].join("\t")

And that'll do it! Now our utility can handle the slew of input methods I showed above, plus handle continuous data from a pipe.

UPDATE: A commenter brought up one more case that we're not handling, demonstrated by this usage of hilong.

$ cat /dev/urandom | base64 -b 80 | hilong | head

What's special about this pipeline is that hilongs output is being piped into head(1). The head(1) command will read the first ten lines of input, then close the pipe.

If you run this pipeline in your shell, you'll see that hilong raises Broken pipe - <STDOUT> (Errno::EPIPE). This is because hilong wasn't expecting STDOUT to close before it was finished writing, so when it attempted to write another line of data, it got the broken pipe error.

The solution here is to wrap the code that writes to STDOUT in a begin block that rescues this exception. Here's the updated code for hilong.

  begin
    $stdout.puts output_line
  rescue Errno::EPIPE
    exit(74)
  end

The exit(74) tells the program to exit with a non-successful exit code of 74. sysexits(3) specifies that this exit code represent an IO error which seems suited to this situation.

Now hilong won't choke when its output is fed to head(1).

The full, finished source for the hilong utility is at https://gist.github.com/jstorimer/1465437.

If you can think of a better way to accomplish any of this or there's another use case that I missed let me know in the comments.


Read the followup post: On Colorized Output where the colorized output becomes configurable.


Like what you read?

Join 2,000+ Ruby programmers improving their skills with exclusive content about digging deeper with sockets, processes, threads, and more - delivered to your inbox weekly.

I'll never send spam and you can unsubscribe any time.


comments powered by Disqus