The Ins and Outs of Clojure: Part I
(Written about Clojure 1.2.)
It is a truth universally acknowledged, that a programmer using Clojure will want to perform IO. Let me help you out (put).
I’ll go over some of the basics of IO, focusing on what you can use Clojure to do directly. I’ll move on after the basic introduction, to some of the more interesting and generally useful classes that Java offers, giving a little context for each.
Reading files in is generally one of the first things I want to do when playing with a new language, so I’ll start there. Before I get started though, I should mentioned that in Clojure, strings are always encoded using UTF-16. Generally this saves time and worry, but it’s something to keep in mind should you run into problems on the encoding front.
Clojure comes with a handy little function called
slurp that takes in a string representing a filename (or, really, pretty much anything; a File, a stream, a byte array, URL, etc) and returns a string containing the contents of your file. It’s pretty handy if you just need to get some information from a file that’s relatively small, and you’ll be parsing it yourself.
(slurp "/Users/ihodes/Desktop/afile.txt") => "A little bit\nof information here."
A nice thing about
slurp is that you can easily build up file paths with
str. For example, say you want to output to a file based on information you find at runtime:
(slurp (str "/Users/" username "/Desktop/" filename))
slurp is pretty basic, and once your files get large enough, totally impractical. Nonetheless, it’s a handy function to know about.
As a useful and comical aside, the function
spit is the counterpart to
slurp, except that instead of reading input,
spit does output. More on this in a future article, though.
One of my favorite IO functions has got to be
line-seq takes a reader object (which must implement
BufferedReader) and returns a lazy sequence of the lines of the text the reader supplies. This is handy when you’re dealing with files (if this offends you, let’s be Unixy here for now and say that everything is a file) that are too big to merely
slurp, but that are
\newline delimited (or CR/LF delimited, if you’re of the Windows persuasion).
(use '[clojure.java.io '(reader)]) (take 2 (line-seq (reader "bobby.txt"))) => ("Bobby was a good boy," "and didn't complain too much")
Notice how we
take 2 from the sequence we get from using
line-seq. We can
take as much or as little as we need; we won’t be reading much (Clojure will read a bit more than you tell it to in order to get more IO performance, but let’s not worry about that) more than we specify. We can do anything we want with the resulting seq; that’s the beauty of
line-seq and the ubiquitous sequence abstraction.
Back in the day, Clojurists had to sink a little lower than the
clojure.java.io namespace to use
line-seq; two Java classes were needed. One of these Java classes is the most wondrous and amazing thing just below the surface of the more elegant and beautiful Clojure code;
BufferedReader. Here’s how we used to do it;
(import '(java.io FileReader BufferedReader)) (take 2 (line-seq (BufferedReader. (FileReader. "bobby.txt"))) => ("Bobby was a good boy," "and didn't complain too much")
This might give you a better sense of what’s going on when you use
reader, though in reality
reader is far more complicated than just that: you can trust it to handle a variety of “readable things” and return to you a
BufferedReader if possible.
FileReader will return a
Reader on a file, and
BufferedReader takes and buffers a
Reader, as you might have extrapolated from the name.
Readers are basically just objects upon which a few methods (like
close) may be enacted and expected to return reasonable results.
reads up until a line-delimiter and returns the read chunk as an element in the sequence it is generating.
While on the subject of files, I should probably mentioned the
file function, from
file takes in an arbitrary number of string arguments, and pieces them together into a file hierarchy, returning a
File instance. This can come in handy.
Rivers? inputStreams? Brooks?
Streams are an especially useful class of readers. Oftentimes you’re reading in text; that’s what
Readers do. But often you need to read in a stream of bytes; that’s where you need to use
(use '[clojure.java.io '(reader)]) (def g (input-stream "t.txt")) (.read g) => 105 (.read g) => 115 (char (.read g)) => \space
As you can see, instead of getting characters from this file (like we get when we use a reader), we’re getting integer byte values. This can be useful when reading, for example, a media file.
In general, strings are always UTF-16, which are 16-bit pieces of data, whereas byte-streams are 8-bit pieces of data. It bears repeating that the stream operators should be used when you’re not dealing with strings: they are not trivially interchangeable, as they might be in other languages where strings are syntactic sugar for byte arrays.
Finally, let me introduce to you a spectacularly useful Java class.
RandomAccessFile is a class which allows you to quickly jump around in a large file, and read bytes from it.
(import '(java.io RandomAccessFile)) (def f (RandomAccessFile. "stuff.txt" "r"))
Note the second argument of the constructor, “r”; this indicates that we’re opening the file just for reading. Now that we have
f, we can use it to navigate and read the file:
(.read f) => 105 (.length f) => 2015 ;; this is the number of bytes this file is in length (.skipBytes f 20) (.getFilePointer h) => 21 ;; the position we're at in the file (.read f) => 89
As you can see, you can jump around (quickly!) through a file, and read from the parts you want, and skip the parts you do not want. The key methods/functions here (among many others that can also be useful; be sure to check the documentation) are
Every file that is opened should be closed, and what we’ve been doing is a little unsafe. In order to close an open reader/file, we should use the
close method on it; in the above example, when you’re done with
f, simply execute
(.close f) to tell the file system that you’re done with the file. Alternatively, and more idiomatically, you can open your files with the handy
(with-open [f (RandomAccessFile. "stuff.txt" "r")] (.read f))
When you’re done with
f, Clojure will
close it, and you won’t have to worry one iota about it.
line-seq not be enough for your reading needs (and chances are that, should you code enough in Clojure, they won’t always been), you might want to explore
clojure.java.io some more, as well as some of the Java classes (namely, those stemming from
BufferedReader, as well as
BufferedInputStream) mentioned above. See my previous article on using Java if you’re unfamiliar with using Java.
Next up is an introduction to the “outs” of Clojure and Java. Stay tuned!
I owe a big thank you to Phil Hagelberg for reading over this essay and offering advice. If you don’t already, you should be using his Leiningen for both dependency management and a stress-free development environment.