[META] MBFC bot

JonsJava@lemmy.world · 1 year ago

[META] MBFC bot

AbouBenAdhem@lemmy.world · 1 year ago

I was thinking of something like the graph of subreddits from this paper—although I think that’s based on subscriber overlap, and I don’t know if there’s a similar metric that would cover all news sites.

steventhedev@lemmy.world · 1 year ago

I don’t see an easy way to accomplish this without either pulling in the full text of every article over some period and running something like paragraph/doc/site vectors and then clustering by site vector.

That’s putting a lot of faith into unsupervised learning, and it’s probably just as likely to pick up on stylistic conventions like byline and date formats as it is to cluster by some common thematic pattern like political leaning.

AbouBenAdhem@lemmy.world · 1 year ago

Maybe you could use a source site’s posts and upvotes in different fediverse communities as a proxy (assuming you could find representative communities with a similar range of biases).

steventhedev@lemmy.world · 1 year ago

That’s…actually not a bad idea. Take the user-domain name pairs and weigh the edges between domains by the number of unique users who posted from both domains.

For producing clusters from the resulting graph should be easy, but aside from just saying “these are similar websites” does it really say much?

You could do something similar with comment/upvote/downvote based linkages - maybe they’ll have some deeper semantic meaning