Writing Ruby Scripts That Respect Pipelines
Published on December 12, 2011 by Jesse Storimer
Pipes are the most powerful concept on the command line.
With pipes you can string together small, simple commands into bigger, more useful pipelines. This is the secret sauce that goes along with the Unix philosophy of "Do one thing, and do it well". Take this as an example:
# Look for any lines mentioning 'user' in the current git diff # and display them one page at a time using less(1). $ git diff | grep user | less
A simple utility
Here's a Ruby script I'm calling
It's a small, simple utility that takes a file as an argument and prints each line prepended with its length. Lines longer than the maximum length (80 chars by default) are highlighted in red. Here's the simplest version of our script:
#!/usr/bin/env ruby # A file is passed in as an argument input = File.open(ARGV) # escaped bash color codes red = "\e[31m" reset_color = "\e[0m" maximum_line_length = 80 # For each line of the input... input.each_line do |line| # Construct a string that begins with the length of this line # and ends with the content. The trailing newline is #chop'ped # off of the content so we can control where the newline occurs. # The strings are joined with a tab character so that indentation # is preserved. output_line = [line.length, line.chop].join("\t") if line.length > maximum_line_length # Turn the output to red starting at the first character. output_line.insert(0, red) # Reset the text color back to what it was at the end of the # line. output_line.insert(-1, reset_color) end $stdout.puts output_line end
And here's how it works:
$ hilong Gemfile 17 source :rubygems 1 13 gem 'jekyll' 82 gem 'liquid', '2.2.2' # The comment on this line is long-winded, not sure why... 16 gem 'RedCloth'
This works pretty well. It takes a file as input and puts the modified version onto its
$stdout. Let's see how it fares when we combine it with other utilities.
Let's start by trying to pipe the output of this utility to another utility:
# Only show 'gem' lines. $ hilong Gemfile | grep gem
This works nicely. Since we put our output on
$stdout grep(1) can read it and do the proper filtering. It even preserves our color codes!
Let's try another one:
# View the output one page at a time. $ hilong Gemfile | more
Eeee. Now we get some ugly escape codes in our output. It seems that more(1) doesn't know what to do with the bash escaped colors that we included so it just includes it as part of the output. The same things happens if you redirect the output to a file.
When you are piping output to another program you should always send plain, unformatted text. Unix utilities expect to deal with plain text.
Is a tty?
So we can't include our color codes if our output is being piped to another program, but we want to include the color codes if our output is being displayed in a terminal. How do we tell?
IO#isatty method (aliased as
IO#tty?) will tell you whether or not the
IO in question is attached to a terminal. Calling it on
$stdout, for instance, when it's being piped will return false.
We'll rewrite our script to make use of this, I've highlighted the relevant part below:
# If the line is long and our $stdout is not being piped then we'll # colorize this line. if $stdout.tty? && line.size > maximum_line_length # Turn the output to red starting at the first character. output_line.insert(0, red) # Reset the text color back to what it was at the end of the # line. output_line.insert(-1, reset_color) end $stdout.puts output_line
Now if we try piping our output to more(1) again, or to a file, we get nice plain text.
Most Unix utilties also respond to pipes coming from the other direction, as input. Let's see how our utility responds when we pipe in some input:
$ cat Gemfile Gemfile.lock | hilong /Users/jessestorimer/projects/hilong/hilong:4:in `initialize': can't convert nil into String (TypeError) from /Users/jessestorimer/projects/hilong/hilong:4:in `open' from /Users/jessestorimer/projects/hilong/hilong:4:in `<main>'
Right now our utility is only written to handle input given as a filename passed in via
ARGV. How can we make it accept raw data from a pipe?
Ruby has a wonderful facility for this called
ARGF provides a consistent interface for raw data coming in via a pipe, and filenames passed via
ARGV. Let's rewrite our script to take advantage of it:
# Read input from files or pipes. -input = File.open(ARGV) +input = ARGF.read
Wonderful! If something is passed in on
ARGF will assume that it's filenames and call
IO#read on them sequentially. If
ARGV is empty then it reads from
$stdin to get data passed in via pipe.
Unix utilities will ignore standard input if filenames are given.
Now let's look at all the ways we can use our new utility.
$ hilong Gemfile $ hilong Gemfile | more $ hilong Gemfile > output.txt $ hilong Gemfile Gemfile.lock $ cat Gemfile* | hilong $ cat Gemfile | hilong - Gemfile.lock $ hilong < Gemfile
One more case...
With only a few small changes we were able to get our utility to respect pipelines like any other Unix utility would. But there's one more case I want to demonstrate. What if we're getting input from a pipe coming from a command such as
tail -f where the input never stops coming?
$ tail -f log/test.log | hilong
If you give this a try and append to the log file you'll notice that our utility seems to be supressing the output. We're not seeing anything being printed.
This is due to the fact that we're using
#read will block until it receives EOF, but the tail utility will never send EOF because it always has more data. So the first time our utility is invoked with some data it simply blocks and never returns. So we need to change the way we're reading from
We'll read from
ARGF one line at a time using
#each_line will, duh, read each line in succession. So anytime a newline is encountered the String is passed into the block.
Here are the required changes:
# Keep reading lines of input as long as they're coming. ARGF.each_line do |line| # Construct a string that begins with the length of this line # and ends with the content. The trailing newline is #chop'ped # off of the content so we can control where the newline occurs. # The string are joined with a tab character so that indentation # is preserved. output_line = [line.size, line.chop].join("\t")
And that'll do it! Now our utility can handle the slew of input methods I showed above, plus handle continuous data from a pipe.
UPDATE: A commenter brought up one more case that we're not handling, demonstrated by this usage of
$ cat /dev/urandom | base64 -b 80 | hilong | head
What's special about this pipeline is that
hilongs output is being piped into head(1). The head(1) command will read the first ten lines of input, then close the pipe.
If you run this pipeline in your shell, you'll see that
Broken pipe - <STDOUT> (Errno::EPIPE). This is because
hilong wasn't expecting
STDOUT to close before it was finished writing, so when it attempted to write another line of data, it got the broken pipe error.
The solution here is to wrap the code that writes to
STDOUT in a
begin block that rescues this exception. Here's the updated code for
begin $stdout.puts output_line rescue Errno::EPIPE exit(74) end
exit(74) tells the program to exit with a non-successful exit code of 74. sysexits(3) specifies that this exit code represent an IO error which seems suited to this situation.
hilong won't choke when its output is fed to head(1).
The full, finished source for the hilong utility is at https://gist.github.com/jstorimer/1465437.
If you can think of a better way to accomplish any of this or there's another use case that I missed let me know in the comments.
Read the followup post: On Colorized Output where the colorized output becomes configurable.