Words and Code

One writer’s journey from words to code.

Fleeting Filing With Ruby Tempfile

#technicaltuesdays, ruby


There are times in a Ruby program or Rails application that one comes to a single realization: Oh no, I need to deal with an external file! For larger applications, this might manifest as a request to your Amazon S3 bucket for a file, which you then need to modify in some way, or perhaps just simply read and have access to. But sometimes, even a simple Ruby script or plain old Ruby class may need to read or write to an external file.

I honestly didn’t know a lot about Ruby’s File class (wait, Ruby has a file class?! Yes, yes it does.) until recently, when I had to handle a situation that would allow me to download files from a file storage service (such as S3), and then process the file locally on my machine. The process was a bit complicated, and I still think that I have more to learn about how it actually works. But, one thing that I did actually start to wrap my head around is Ruby Tempfiles. Yup, that’s right: not only does Ruby have a File class — it also has a Tempfile class.

It turns out that these two classes intersect quite a bit, and it can be a little confusing to know how they differ. The only way to really understand Ruby’s Tempfile class is to play around with it and create some tempfiles. So let’s get filin’!

To File Or To Tempfile


The reason that the Tempfile class and File class seem so similar — and can therefore be so confusing — is because the Tempfile class actually inherits from its parent delegate class, which just so happens to be…File! So, tempfiles are actually just a type of file. So, what can you do with files in Ruby? And what makes tempfiles different from a regular old file?

Well, the documentation for File objects is hugely informative, and if we dove into it, we’d learn that Ruby files actually inherit from a class called IO File. But let’s not get too distracted here: what can we do with files, again? Well, we can read a file’s data, we can write more data to it, and we can even change its permissions (i.e. who can access and write to the file). Pretty straightforward, right?

Now, onto tempfiles. As their name would suggest, they’re files that are temporary. But even though they are named pretty well, it still might not be clear in what way these files are temporary. Well, we can create tempfiles in the same way that we create regular Ruby files, but what makes them unique is that tempfiles only exist as long as there is a reference to them. In other words, tempfiles get deleted automatically by the Ruby garbage collector. If no variable is pointing (read: assigned) to a tempfile, the garbage collector will “finalize” the tempfile object, and the file would be deleted from our system.

So…why is this significant? Well, what would happen if we tried to access a tempfile that Ruby has deleted? Bad things, that’s what! Well, we’d actually just get an error, because we’d be trying to access a file at a path that doesn’t exist anymore. But still, things could get pretty bleak if we didn’t know what was going on — or worse, if we didn’t fundamentally understand how tempfiles worked!

A good rule of thumb for deciding whether or not to use a tempfile is this: if we need access to the file outside of the context of our Ruby script, we probably shouldn’t be using a tempfile. However, if we want to temporarily create, read, or write to a file, and then have Ruby delete it for us (for free!) when we’re done using it, then a tempfile is our new best friend!

The Order of Filing Operations


When it comes to creating files, there’s a certain order of operations with method invocation. Even though the documentation for the Tempfile class has a list of helpful methods for us to use, there’s a lot of less obvious functionality at our disposal, as long as we know where to look. Because the Ruby Tempfile essentially inherits from File, a tempfile behaves just like a file object. This means that we can call any File instance method on a Tempfile object. This is particularly important to note since some of the most common methods that are called on a tempfile are actually defined within the File class. So, if we couldn’t find where a particular method on a tempfile was being defined, it probably means that we need to go look inside of the File class.

Okay, enough talk about where to find these methods; let’s figure out which methods are actually pertinent to tempfiles. Both files and tempfiles share a sequence of events: generally, we create the file, then we read or write to it, and then we close it. But with tempfiles, there’s a little twist at the end. Let’s take a look at the order of filing operations:

1. new

This method is pretty self-explanatory: it’s what we’ll use to create a new tempfile. This comes from the File class, and takes a single argument: the name of our file. Let’s create a tempfile called cats:

1
2
3
♥ irb
irb(main):001:0> tempfile = Tempfile.new('cats')
=> #<Tempfile:/var/folders/v7/8rk39kc11ln54w3tl7twrhwc0000gn/T/cats20151005-24769-ac6qgw>

This creates a unique filename in our operating slystem’s temp directory, and it contains our filename cats in its basename. If we wanted to find out exactly where in our temp directory this file lives, we could just ask it for its path using — you guessed it — the path method:

1
2
irb(main):002:0> tempfile.path
=> "/var/folders/v7/8rk39kc11ln54w3tl7twrhwc0000gn/T/cats20151005-24769-ac6qgw"

We could also specify the extension of the file that we’re creating (i.e., pdf, gif, etc.). However, it’s not as simple as just appending it to our filename; if we do that, this is what happens:

1
2
irb(main):003:0> Tempfile.new('cats.pdf').path
=> "/var/folders/v7/8rk39kc11ln54w3tl7twrhwc0000gn/T/cats.pdf20151006-24769-xvpyh0"

Not great! We don’t want our extension to be a part of our filename, we want it to be at the end, obviously! Luckily, the new method allows us to pass the filename and extension as an array:

1
2
irb(main):004:0> Tempfile.new([ 'cats', '.pdf' ]).path
=> "/var/folders/v7/8rk39kc11ln54w3tl7twrhwc0000gn/T/cats20151006-24769-1kzx615.pdf"

Much better! Now, let’s open this file up.

2. binmode

The next step is to put our file into binary mode by using the binmode method. As the documentation explains, this method is what changes how we write data to the tempfile’s binary.

1
2
irb(main):005:0> tempfile.binmode
=> #<File:/var/folders/v7/8rk39kc11ln54w3tl7twrhwc0000gn/T/cats20151005-24769-ac6qgw>

This disables us from encoding and creating new lines, and it changes the way that we write content; setting a file to binary mode forces Ruby to treat the content as ASCII-8BIT. There’s also a handy binmode? method that we can use to check whether our file is in binary mode or not.

3. write

Finally, once we’re in binary mode, we actually write to our file! And of course, the tool for the job is the write method. This takes a parameter of whatever it is that you want to write to the file.

1
irb(main):006:0> file.write("meow meow meow")

Interestingly, this method is defined in the IO class, which subclasses into StringIO.

4. rewind

Now, if we wanted to read to our file, we could just read it, right?

1
2
irb(main):007:0> file.read
=> ""

Wait, what happened to our "meow meow meow" string? Well, if we think about it, when we were writing to our file, we ended at the end of wherever we stopped writing. And that means that there’s nothing to read, because we’re at the end of our file. This calls for the rewind method, which will take us back to the beginning of our tempfile.

1
2
irb(main):008:0> file.rewind
=> 0

We’re now at the beginning of our file!

5. read

After rewinding back to the beginning of our file, we can now actually read it using the read method:

1
2
irb(main):009:0> file.read
=> "meow meow meow"

So far, we’ve been working with this file as though it’s a normal Ruby File class. But let’s not forget…this is actually a Tempfile. And dealing with how to close and clean up a tempfile is where things can get tricky. Now all that’s left is for us to elegantly handle these fleeting, disappearing files.

Disappearing Files

The last steps in the order of filing operations is explicitly closing our tempfile. This is probably the most complicated part to understand, especially if we’re not familiar with the concept of garbage collection. However, I think that the Ruby docs do a pretty great job of explaining the how and why of explicitly closing tempfiles, a good practice that the core team strongly encourages:

“When a Tempfile object is garbage collected, or when the Ruby interpreter exits, its associated temporary file is automatically deleted. This means that it’s unnecessary to explicitly delete a Tempfile after use, though it’s good practice to do so: not explicitly deleting unused Tempfiles can potentially leave behind large amounts of tempfiles on the filesystem until they’re garbage collected. The existance of these temp files can make it harder to determine a new Tempfile filename.”

The guides suggest that the best way to go about deleting a tempfile after we’re done using it is by calling two specific methods from within an ensure block (remember the ensure keyword? No? Don’t worry, you can read about it over here).

But first, let’s round out our order of filing operations with the two most important tempfile methods.

6. close

This method basically tells the Ruby interpreter, you can’t read the file now!. It closes the file and doesn’t allow it to be read or written to.

1
2
irb(main):010:0> file.close
=> nil

Some blogs have mentioned that this method isn’t always necessary, but it can’t hurt to use it.

7. unlink

This method is what actualy deletes the file from the filesystem.

1
2
irb(main):034:0> file.unlink
=> #<Tempfile:>

We could also use the delete method, which is just an alias for unlink.

Playing with these methods in irb has been fun, but what would this look like in a Rails application? Well, we’d probably want a single method to handle the creation, writing, and deletion of our tempfile (think separation of concerns!). And this is where the use of our ensure block would come in.

This might look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
def file_attachment
  tempfile = Tempfile.new(SecureRandom.uuid)
  tempfile.binmode

  begin
    tempfile.write("some text we could write dynamically")

    tempfile.rewind
  ensure
    tempfile.close
    tempfile.unlink
  end
end

Here, we’re using all of the methods in our order of tempfile operations! And, because ensure runs even if there were any errors raised, we’re basically always going to handle the closing and deleting of our tempfile. In this case, we probably don’t want to call all our tempfiles cats, so instead we can be a bit more fancy and use the SecureRandom module to create a base64 tempfile name each time. We even go crazy and take this yet another step further, and have our file_attachment method take a block, which we could yield to inside of our begin block, before we rewind to the beginning of our tempfile.

As you can see, the possibilies are pretty endless! Tempfiles are our oysters! Well, until Ruby’s garbage collector deletes them, that is.


tl;dr?

  • The fundamental difference between Ruby’s File class and Tempfile class is that tempfiles are cleaned up (deleted by the garbage collector) once there is nothing pointing to them. Tempfile objects inherit from the File class, which means that we can use any file method on a Tempfile instance.
  • The unlink method is super important to use, since that’s what actually deletes our tempfile from the filesystem.
  • There are some great tutorials out there on dealing with creating files and directories from within a Ruby script. Check out my two favorites here and here.