Short explanation of the title: imagine you have a legacy mudball codebase in which most service methods are usually querying the database (through EF), modifying some data and then saving it in at the end of the method.

This code is hard to debug, impossible to write unit tests for and generally performs badly because developers often make unoptimized or redundant db hits in these methods.

What I've started doing is to often make all the data loads before the method call, put it in a generic cache class (it's mostly dictionaries internally), and then use that as a parameter or a member variable for the method - everything in the method then gets or saves the data to that cache, its not allowed to do db hits on its own anymore.

I can now also unit test this code as long as I manually fill the cache with test data beforehand. I just need to make sure that i actually preload everything in advance (which is not always possible) so I have it ready when I need it in the method.

Is this good practice? Is there a name for it, whether it's a pattern or an anti-pattern? I'm tempted to say that this is just a janky repository pattern but it seems different since it's more about how you time and cache data loads for that method individually, rather than overall implementation of data access across the app.

In either case, I'd like to learn either how to improve it, or how to replace it.

  • booooop [any]
    ·
    1 year ago

    If testing this properly is your problem you should invest time in integration testing, running them on an in-memory database is an option as well. I think retrieving all the data and “caching” it like you call it has some negative consequences, for example what if the validation for some action fails and you didn’t need to load whatever you preloaded? Waste of a call to the db

    • pohart@programming.dev
      ·
      edit-2
      1 year ago

      You're right that this could introduce regressions, but it sounds like it's making more testable.

      My biggest concern would be introducing db contention with locks being held for too long, and introducing race conditions because the cached data isn't locking the records when they're cached.

      Edit: your->you're

    • Cyno@programming.dev
      hexagon
      ·
      1 year ago

      Validation is usually the first step so I only start preloading after it's done of course, but you are right - you can easily end up loading more data than it necessary.

      However, it can also result in fewer overall queries - if I load all relevant entities at the beginning then later I won't have to do 2+ separate calls to get relevant data perhaps. For example, if I'm processing weather for 3 users, I know to preload all 3 users and weather data for the 3 locations where they live in. The old implementation could end up loading 3 users, then go into a loop and eventually into a method that processes their weather data and do 3 separate weather db hits for each of the users (this is a simplified example but something that I've definitely seen happen in more subtle ways).

      I guess I'm just trying to find a way to keep it a pure method with only "actual logic" in it, without depending on a database. Forcing developers to think ahead about what data they actually need in advance also seems like a good thing maybe.

      • pohart@programming.dev
        ·
        1 year ago

        Forcing developers to think ahead about what data they actually need in advance also seems like a good thing maybe.

        It does.

  • BehindTheBarrier@programming.dev
    ·
    edit-2
    1 year ago

    I'm not sure how you do it at the moment or already know this since you mention repository pattern. But here's how I know.

    A pattern I encountered at my workplace is a split between the repository and the data access (Dao) layer.

    The repository implements an interface which other parts of your program uses to get data. The repository askes the data access layer to make database calls.

    For testing other parts of the programs, we mock the repository interface, and implement simple returns of data instead of relying on the database at all. Then we have full control of what goes in and out of your legacy code, assuming you are able to use this.

    For testing the dao, I don't actually have much experience since that's not a good option for us at the moment, but as others mentioned you can use in memory databases or manually mock the connection object provided to the dao to test that your save methods store the correct data. The latter being somewhat clunky in my experience but the best option when you are trying to save something with 20 values and making sure they end up in the right order or have the right values when converting enum values to strings for example.

    I don't know much about cache, but if you want to keep it then it's possible to do it in the repository class.