My Experience with exists?, any?, and present?

A few weeks ago, I encountered a problem, which I think, really really weird!

How weird was that?

So, I have some tables with one to many relationship, let’s say table articles has many images and images has many image_sizes (the tables are different with the actual case). The system has a job that executes codes fetching Article load it eagerly with Image and ImageSize. We thought it was safe, since there were no complicated process at all. Just load at print it!

After a while, booooom! We had 500 in production…….

We looked at our log and see a super strange error

undefined images_size of nil:NilClass

Let’s take a look at the code. Here’s the example of the code that cause the error:

def has_images?
  images.exists? && images.first.image_sizes.... 
  #and another chain of function

What’s wrong with that? Based on what the error said it meant that images.first was nil. If images.exists? returned false then, the error wouldn’t ever occurred in the first place. Therefore, in that condition, images.exists? returned true. What the hell!?

So, we performed a thorough analysis to see why this error occurred. Based on our analysis, it was caused by the commit time difference between data saving to articles and images tables. First, a user just saved a data to articles, the job started, then system commit data save to images and image_size tables.

Why does that operation cause an error to the job?

Let me explain the step:

  1. User save an article and database perform a data commit and create an id : 1987
  2. Somehow, concurrently, a job just started to fetch all articles and load eagerly with images and image_sizes, in condition where article 1987 hasn’t committed data to images and image_sizes.
  3. In a part of the process, the job execute has_images?
  4. has_images? executes images.exists? which run a query to database to have a data check. But, this time, user process has finished committing data to images and image_size. This data check resulting the query to return true.
  5. Since it’s true, it will try the next part of the code: images.first.images_size. Guess what? Since it was eager loaded, it won’t make another new query, just load it from memory, which is nil. Yeah, NIL!
  6. Then comes the error! Yeay!

So, based on this experience, we tried some data checking provided by Ruby and Rails. This is the result:

  1. exists?: always perform a query to check it to database
  2. any?: it will looked into the memory when it’s eager loaded, but it will always perform a  query to check directly to database if it isn’t.
  3. present?: it doesn’t care whether it’s eager loaded or not, but if the collection is already in the memory, it will fetch from the memory. And if it isn’t in the memory, it will perform a query to database and store it into the memory. The next time you call it, it will retrieve it from the memory.

I’m not talking about performance, so it’s all based on your application goal.

Please choose it wisely. Hope it will help you, guys. :)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s