Is preloading/caching data before the actual method call an (anti)pattern?

Cyno@programming.dev · edit-2 1 year ago

Is preloading/caching data before the actual method call an (anti)pattern?

booooop [any] · 1 year ago

If testing this properly is your problem you should invest time in integration testing, running them on an in-memory database is an option as well. I think retrieving all the data and “caching” it like you call it has some negative consequences, for example what if the validation for some action fails and you didn’t need to load whatever you preloaded? Waste of a call to the db

pohart@programming.dev · edit-2 1 year ago

You're right that this could introduce regressions, but it sounds like it's making more testable.

My biggest concern would be introducing db contention with locks being held for too long, and introducing race conditions because the cached data isn't locking the records when they're cached.

Edit: your->you're

Cyno@programming.dev · 1 year ago

Validation is usually the first step so I only start preloading after it's done of course, but you are right - you can easily end up loading more data than it necessary.

However, it can also result in fewer overall queries - if I load all relevant entities at the beginning then later I won't have to do 2+ separate calls to get relevant data perhaps. For example, if I'm processing weather for 3 users, I know to preload all 3 users and weather data for the 3 locations where they live in. The old implementation could end up loading 3 users, then go into a loop and eventually into a method that processes their weather data and do 3 separate weather db hits for each of the users (this is a simplified example but something that I've definitely seen happen in more subtle ways).

I guess I'm just trying to find a way to keep it a pure method with only "actual logic" in it, without depending on a database. Forcing developers to think ahead about what data they actually need in advance also seems like a good thing maybe.

pohart@programming.dev · 1 year ago

Forcing developers to think ahead about what data they actually need in advance also seems like a good thing maybe.

It does.

BehindTheBarrier@programming.dev · edit-2 1 year ago

I'm not sure how you do it at the moment or already know this since you mention repository pattern. But here's how I know.

A pattern I encountered at my workplace is a split between the repository and the data access (Dao) layer.

The repository implements an interface which other parts of your program uses to get data. The repository askes the data access layer to make database calls.

For testing other parts of the programs, we mock the repository interface, and implement simple returns of data instead of relying on the database at all. Then we have full control of what goes in and out of your legacy code, assuming you are able to use this.

For testing the dao, I don't actually have much experience since that's not a good option for us at the moment, but as others mentioned you can use in memory databases or manually mock the connection object provided to the dao to test that your save methods store the correct data. The latter being somewhat clunky in my experience but the best option when you are trying to save something with 20 values and making sure they end up in the right order or have the right values when converting enum values to strings for example.

I don't know much about cache, but if you want to keep it then it's possible to do it in the repository class.