Is preloading/caching data before the actual method call an (anti)pattern?

Cyno@programming.dev · edit-2 11 months ago

Is preloading/caching data before the actual method call an (anti)pattern?

moroni@lemmy.ca · 11 months ago

How does the caching work? If the method is called again with the same parameters, does it load from the cache or fetch the data from the database into the cache again?

Cyno@programming.dev · 11 months ago

Fetches from the database again, it’s just a temporary bundle of data rather than a persistent cache. We have caching for commonly-read/rarely-updated entities but its not feasible for everything ofc.

booooop [any]@hexbear.net · 11 months ago

If testing this properly is your problem you should invest time in integration testing, running them on an in-memory database is an option as well. I think retrieving all the data and “caching” it like you call it has some negative consequences, for example what if the validation for some action fails and you didn’t need to load whatever you preloaded? Waste of a call to the db

pohart@programming.dev · edit-2 11 months ago

You’re right that this could introduce regressions, but it sounds like it’s making more testable.

My biggest concern would be introducing db contention with locks being held for too long, and introducing race conditions because the cached data isn’t locking the records when they’re cached.

Edit: your->you’re

Cyno@programming.dev · 11 months ago

Validation is usually the first step so I only start preloading after it’s done of course, but you are right - you can easily end up loading more data than it necessary.

However, it can also result in fewer overall queries - if I load all relevant entities at the beginning then later I won’t have to do 2+ separate calls to get relevant data perhaps. For example, if I’m processing weather for 3 users, I know to preload all 3 users and weather data for the 3 locations where they live in. The old implementation could end up loading 3 users, then go into a loop and eventually into a method that processes their weather data and do 3 separate weather db hits for each of the users (this is a simplified example but something that I’ve definitely seen happen in more subtle ways).

I guess I’m just trying to find a way to keep it a pure method with only “actual logic” in it, without depending on a database. Forcing developers to think ahead about what data they actually need in advance also seems like a good thing maybe.

pohart@programming.dev · 11 months ago

Forcing developers to think ahead about what data they actually need in advance also seems like a good thing maybe.

It does.

BehindTheBarrier@programming.dev · edit-2 11 months ago

I’m not sure how you do it at the moment or already know this since you mention repository pattern. But here’s how I know.

A pattern I encountered at my workplace is a split between the repository and the data access (Dao) layer.

The repository implements an interface which other parts of your program uses to get data. The repository askes the data access layer to make database calls.

For testing other parts of the programs, we mock the repository interface, and implement simple returns of data instead of relying on the database at all. Then we have full control of what goes in and out of your legacy code, assuming you are able to use this.

For testing the dao, I don’t actually have much experience since that’s not a good option for us at the moment, but as others mentioned you can use in memory databases or manually mock the connection object provided to the dao to test that your save methods store the correct data. The latter being somewhat clunky in my experience but the best option when you are trying to save something with 20 values and making sure they end up in the right order or have the right values when converting enum values to strings for example.

I don’t know much about cache, but if you want to keep it then it’s possible to do it in the repository class.

moroni@lemmy.ca · edit-2 11 months ago

If you ignore the caching, the approach you’re describing loosely aligns with the concept of Domain-Driven Design (DDD). In DDD, the model is loaded before any business logic is executed, and then any changes made to the model are persisted back to the database.

lysdexic@programming.dev · 11 months ago

In DDD, the model is loaded before any business logic is executed

That’s really not a DDD requirement. Having a domain model does not require you to preload data to run business logic. For example, you can easily have business logic that only takes as input a value object and triggers a usecase, and you do not need to preload anything to instantiate a value.

moroni@lemmy.ca · 11 months ago

Agree.

I’m just saying OP is loading stuff into a dictionary that perhaps function as a Domain Model. Then they pass this Domain Model to a Use Case, where it gets modified and saved to a database.

OP was asking for an architecture name or design pattern, and while it’s not a perfect match, it’s kinda like a Domain Model, although an anemic one.

None of this is a DDD requirement.