Are Lemmy sorting types just broken?

.Donuts@lemmy.world · 3 months ago

Are Lemmy sorting types just broken?

asudox@lemmy.asudox.dev · edit-2 3 months ago

No, it is not weird. Scaled, according to the docs, is like Hot, but less active communities’ posts get a boost.

Edit: here’s a bit more detail

The exact function to calculate the scaled rank is this one:

CREATE OR REPLACE FUNCTION scaled_rank (score numeric, published timestamp with time zone, users_active_month numeric)
    RETURNS float
    AS $$
BEGIN
    -- Add 2 to avoid divide by zero errors
    -- Default for score = 1, active users = 1, and now, is (0.1728 / log(2 + 1)) = 0.3621
    -- There may need to be a scale factor multiplied to users_active_month, to make
    -- the log curve less pronounced. This can be tuned in the future.
    RETURN (hot_rank (score, published) / log(2 + users_active_month));
END;

The hot rank is calculated like this:

CREATE OR REPLACE FUNCTION hot_rank (score numeric, published timestamp without time zone)
    RETURNS integer
    AS $$
BEGIN
    -- hours_diff:=EXTRACT(EPOCH FROM (timezone('utc',now()) - published))/3600
    RETURN floor(10000 * log(greatest (1, score + 3)) / power(((EXTRACT(EPOCH FROM (timezone('utc', now()) - published)) / 3600) + 2), 1.8))::integer;
END;

dfyx@lemmy.helios42.de · 3 months ago

Exactly. In a small community that usually doesn’t see much activity, if a post gets even one upvote, scaled might consider it relevant.

.Donuts@lemmy.world · 3 months ago

But these are 3 posts with 0 upvotes that recently have been posted, how is that “Hot”, let alone at the top of “Hot”?

MHLoppy@fedia.io · 3 months ago

“Hot” is a mix of recency and votes. The posts in your example score low on votes but very high on recency (<1 hour ago) and extremely high on the size scaling because that community ( !hp_fanfiction@literature.cafe ) is tiny with only two subscribers.

You may consider Scaled to be a more appropriate sorting option for when you’re viewing the communities that you’ve subscribed to, rather than the firehose of /all

.Donuts@lemmy.world · 3 months ago

It’s starting to make sense, thanks. I’ll try to keep scaled to my subscriptions, although I like “All” for the variety.

Bezier@suppo.fi · edit-2 3 months ago

Ranking:
Hot = Upvotes / Age
Scaled = Hot / Community size

Hpfanfiction must be a fresh community with no one joined yet and the creator posting a lot immediately. Alternatively, it just federated to LW.

On hot, I guess you managed to open it with the exact same second?

.Donuts@lemmy.world · edit-2 3 months ago

But 0 upvotes divided by any age is still 0. So Hot = 0, and Scaled would then be 0 divided by community size, and therefore also still 0.

On hot, I guess you managed to open it with the exact same second?

It was in response to “Scaled is like Hot”, so I wasn’t looking at the Hot page at that moment, but I tried to convey how it doesn’t make sense that a post with 0 upvotes get to the top of Scaled

asudox@lemmy.asudox.dev · 3 months ago

If you want a bit more detail, look at my edit. The functions to calculate the hot and scaled for content is now there.

MHLoppy@fedia.io · 3 months ago

You’re making assumptions about how they work based on your intuition - luckily we don’t need to do much guesswork about how the sorts are actually implemented because we can just look at the code to check:

CREATE FUNCTION r.scaled_rank (score numeric, published timestamp with time zone, interactions_month numeric)
    RETURNS double precision
    LANGUAGE sql
    IMMUTABLE PARALLEL SAFE
    -- Add 2 to avoid divide by zero errors
    -- Default for score = 1, active users = 1, and now, is (0.1728 / log(2 + 1)) = 0.3621
    -- There may need to be a scale factor multiplied to interactions_month, to make
    -- the log curve less pronounced. This can be tuned in the future.
    RETURN (
        r.hot_rank (score, published) / log(2 + interactions_month)
);

And since it relies on the hot_rank function:

CREATE FUNCTION r.hot_rank (score numeric, published timestamp with time zone)
    RETURNS double precision
    LANGUAGE sql
    IMMUTABLE PARALLEL SAFE RETURN
    -- after a week, it will default to 0.
    CASE WHEN (
now() - published) > '0 days'
        AND (
now() - published) < '7 days' THEN
        -- Use greatest(2,score), so that the hot_rank will be positive and not ignored.
        log (
            greatest (2, score + 2)) / power (((EXTRACT(EPOCH FROM (now() - published)) / 3600) + 2), 1.8)
    ELSE
        -- if the post is from the future, set hot score to 0. otherwise you can game the post to
        -- always be on top even with only 1 vote by setting it to the future
        0.0
    END;

So if there’s no further changes made elsewhere in the code (which may not be true!), it appears that hot has no negative weighting for votes <2 because it uses the max value out of 2 and score + 2 in its calculation. If correct, those posts you’re pointing out are essentially being ranked as if their voting score was 2, which I hope helps to explain things.

edit: while looking for the function someone else beat me to it and it looks like possibly the hot_rank function I posted may or may not be the current version but hopefully you get the idea regardless!

.Donuts@lemmy.world · 3 months ago

Thanks! That clears up a lot. Appreciate the paraphrasing too.

You’re making assumptions about how they work based on your intuition

Small difference: I made the assumption that the simplified version was exactly how it works, as in, taking the comment at face value.

MHLoppy@fedia.io · 3 months ago

Fair enough - glad you’ve found it helpful (Y)

Bezier@suppo.fi · 3 months ago

It’s a simplified version they had explained somewhere in the documentation. Details like that may be left out.