Lemmy requires active users to manually search for communities and discover content. Instances can choose to defederate from other instances, but I want instances that block as few other users as possible so I can decide for myself what content I see.

I want to add a column to this script to analyze Lemmy instances and identify communities that have high user activity but low blocking of users.

Initially I was thinking of adding a column that calculates the ratio of:

(active users) / (total blocked users)

However, this runs into a divide by zero error if there are no blocked users.

I've thought of a few ways to handle the ZeroDivisionError case, but there could be a better metric entirely that avoids this issue or gives a good measure of high activity + low blocking.

Does anyone have ideas for a better metric or ratio to use here?

Some context on what the data looks like:

  • "active users" = number of active users in the past month
  • "total blocked users" = sum of active users from all instances blocking or being blocked by this instance

Let me know if you have any suggestions! I'm open to different formulas or metrics beyond a simple ratio.

Appreciate any help!

  • MagicShel@programming.dev
    ·
    edit-2
    8 months ago

    It's hard to say what algorithm would serve you better. Seems like this does what you are seeing it to do. It's not how I'd do it, but I don't prioritize unblocked users. To fix this, I'd assign a multiplier for zero blocked users. It might be one so that no blocked users is the same as one mathematically. But maybe free speech is so important to you that you give it a multiplier of 2 or wherever.

    active_users / [MAX(1,blocked_users)]

    This would be a multiple of 1. Change the 1 to a 0.5 for a multiple of 2.

    I didn't look at the script but it probably has a Max function which just returns the higher of two numbers, effectively putting a lower bound on the possible values.