A few weeks ago, I encountered a problem, which I think, really really weird!
How weird was that?
So, I have some tables with one to many relationship, let’s say table
articles has many
images has many
image_sizes (the tables are different with the actual case). The system has a job that executes codes fetching
Article load it eagerly with
ImageSize. We thought it was safe, since there were no complicated process at all. Just load at print it!
After a while, booooom! We had 500 in production…….
We looked at our log and see a super strange error
undefined images_size of nil:NilClass
Let’s take a look at the code. Here’s the example of the code that cause the error:
def has_images? images.exists? && images.first.image_sizes.... #and another chain of function end
What’s wrong with that? Based on what the error said it meant that
false then, the error wouldn’t ever occurred in the first place. Therefore, in that condition,
true. What the hell!?
So, we performed a thorough analysis to see why this error occurred. Based on our analysis, it was caused by the commit time difference between data saving to
images tables. First, a user just saved a data to articles, the job started, then system commit data save to
Why does that operation cause an error to the job?
Let me explain the step:
- User save an
articleand database perform a data commit and create an id : 1987
- Somehow, concurrently, a job just started to fetch all
articlesand load eagerly with
image_sizes, in condition where article 1987 hasn’t committed data to
- In a part of the process, the job execute
images.exists?which run a query to database to have a data check. But, this time, user process has finished committing data to
image_size. This data check resulting the query to return
- Since it’s
true, it will try the next part of the code:
images.first.images_size. Guess what? Since it was eager loaded, it won’t make another new query, just load it from memory, which is
nil. Yeah, NIL!
- Then comes the error! Yeay!
So, based on this experience, we tried some data checking provided by Ruby and Rails. This is the result:
exists?: always perform a query to check it to database
any?: it will looked into the memory when it’s eager loaded, but it will always perform a query to check directly to database if it isn’t.
present?: it doesn’t care whether it’s eager loaded or not, but if the collection is already in the memory, it will fetch from the memory. And if it isn’t in the memory, it will perform a query to database and store it into the memory. The next time you call it, it will retrieve it from the memory.
I’m not talking about performance, so it’s all based on your application goal.
Please choose it wisely. Hope it will help you, guys. :)