Follow

POLL, please comment if you have an opinon:

Codeberg.org is being spammed by users using one-time/disposable email services and TOR connections. These spam projects with thousands of bogus issue comments, cause pain for project owners, and spam their notification email inbox. Also, Codeberg's SMTP reputation is harmed.

We consider disabling access via TOR and one-time email providers to maintain smooth operation for all users.

What do you think? Is there a better approach?
Please have your say.

@codeberg If they use the same disposable mail address multiple times, maybe restricting such addresses to be used no more than once (or twice) per day (in the hope "creating new ones" is an effort they won't make too often)? Or do that with the combination of the address and TOR? IDK, but disabling both completely might harm other "legit uses". And no, I won't mention Captchas (unless you count "hidden input fields" which, when filled, tell you it's a bot).

@IzzyOnDroid @codeberg

A quick hack that we once employed, is to put a delay in delivering mail (to certain domains).

We mostly did that to spread the load,though. But it helped a lot when bots had to wait 1 to 20 hours before receiving the confirmation emails. Needs to be clearly communicated, though.

@berkes @IzzyOnDroid Emails are already throttled, could refine this a bit, tho. General problem is that new accounts are created easily, with throwaway-IDs (anonymous or pseudonymous).

Question is mostly if there is a practicable way to stop abuse without completely disallowing anonymous accounts.

@codeberg @berkes Maybe if "throw-aways" have to mail-confirm each post – with the mail containing a "phrased captcha", requiring to set the subject line in a specific way – or something like that?

@IzzyOnDroid @codeberg @berkes that's easier automated than done manually. So everyone suffers for no gain.

@codeberg maybe unpopular, but have you considered a captcha in the form of a JavaScript miner? Basically requiring anyone to halt for 60 seconds when registering an 5 when posting etc.

I do understand the practical problems (blockers, malware detection) and understand that people might have ideological issues with 'mining'.

But requiring a proof of work is defendable to your audience, I'd say.

@berkes You say a captcha per issue? Hmm ... not sure what to think, also there are very legitimate use cases to create issues via API, for example from CI.

@codeberg @berkes You could do captcha per issue and restrict issue creation via API to own projects per default. At least until some kind of reputation criteria is met, like contributors to a project are allowed API access.

@codeberg technically, an API could require a proof-of-work. I'm not aware of existing libraries or implementations.

The client (including the web-version) would then need to 'mine' some hashes before submitting a request(I.e. hashcash). Acting as captcha for API and web. Costing a normal user tiny amounts of electricity and delay, but bots large amounts of resources.

And if those hashes then bring in some micropayments, its a win-win.

@codeberg to clarify: each request needs such a PoW in a header or as part of the payload.

But this sounds like a big project on its own. Maybe others have built this already? Could be in the form of a HTTP proxy even.

@berkes It would be really interesting to build hashcash using a more modern PoW like Cuckoo Cycle and actually implement it in Gitea as DoS prevention.
For APIs it's a bit tricky to require your users to implement it themselves. If Gitea has client libs, it could be done.
@codeberg how about plain rate limiting? Like one request per 5 seconds.

@stevenroose @berkes Rate-limiting makes sense here, but need to check carefully that interactive use cases like GitNex client apps are not impacted.

@stevenroose @berkes one request per 5sec possibly a bit tight for interactive use, at the same time one spam issue comment every 5 seconds already quite a lot

@codeberg @stevenroose rate-limiting based on a combination of session (ID in headers?), IP and 'user' often works.

X/session/minute
Y/user/minute
Z/IP/minute

Where Z > X, Z > Y. And Y > X.

For rails, I always use github.com/jeremy/rack-ratelim. There might be something in go, that can be integrated in gitea. Or agnostic proxy that is as flexible and tunable.

Though a proxy has no knowledge of things like 'user' or 'customer'.

@codeberg also note that typically, rate limiting for GET request can be an order of magnitute more lenient than for PUT/patch/POST/DELETE.

E.g. where you say: per IP we allow 100.000 read (GET) requests per hour, but only 100 writes (post etc ) per hour.

@codeberg @berkes Actual clients could implement the hashcash. Even though if a client like GitNex needs an API call every second to operate normally (or occasional bursts of 10 calls at once) it's also gonna be negatively impacted.

@stevenroose I'd say you could implement it as a generic HTTP proxy. Making it a language- and application agnostic API protection. Probably even poosible as SaaS.

Clients would need to implement though. But it could even be configured progressive, to remain backwards compatible. As in '5 requests per minute allowed without PoW-header, unlimited with such a header'.

Now, if only someone with more time liked this concept as much as I do...

@berkes I know the feeling. The biggest issue with something like hashcash is working out a spec. Because it's a very standard-sensitive thing. You don't want every website to go take an entirely different PoW etc so that you have to go program a miner every time you want to use a new service's API.

@berkes "Costing a normal user tiny amounts of electricity and delay, but bots large amounts of resources." A swarm of bot is based on piracy, they don't care about electricity because they don't pay it. @codeberg I suggest asking project owners to moderate issues before these issues are made public. And flag projects that authorize too much issues for manual inspection.

@codeberg
" Keep Open-Source open for everyone"

Please keep in mind that people use Tor for very legitimate reasons and need to rely an it for their own safety and security.
Blocking Tor should be a last resort, if at all considered.

@ck What productive approach do you suggest to stop spamming?

@codeberg @ck try to the detect the bougs accounts by the amount of issues they commented in which amount of time.
Or make the confirmation link a captcha itself ("the last digit is the result of 3+4")

@dadosch @ck Per issue? This would disable issue creation via API completely, right?

@dadosch @ck That's currently not the problem. Accounts are created manually, being a while dormant, then start creating thousands of issues.

@codeberg Even Github allows Tor.Do you really want to be worse than Micro$oft?Sorry,but banning the right of anonymity is absolutely not a option,not even as a last chance.No,you cause too much harm to legitimate users.Also note that there are thousands of Webproxys which could also be used so I don't expect a Tor block to have much effect.

@nipos Then again new accounts cannot create issues at github. (Immediately marked as spam and deleted).

@nipos What approach would you suggest to defend legitimate use cases against spam and DOS, that is practicable?

@codeberg You could use mogelmail.de or a similar service to detect fake email addresses and forbid to use them or at least lock those accounts temporarily until your team reviews them.I don't think there's any legitimate use for throwaway email addresses in a Git repository.If you want to stay anonymous,there's always the option to create a second real email address.

@codeberg 10 minute mail? definetely block.
Tor connections? this is a bit more advanced problem, since some of them are real users who need more privacy or to fight with some network blocks on codeberg (I used to install Tor Browser each time on my informatics lessons in junior high school just to browse wykop.pl)

@codeberg I think adding a captcha would be the better way. Unfortunately Gitea currently only allows ReCaptcha - maybe an alternative would be to only block the signup page from TOR with a message?

@momar We have
a captcha at registration (go-gitea captcha, not recaptcha). Problem is that accounts are created manually, then dormant, then spamming thousands of comments.

@codeberg Hm, then I guess rate-limiting the API with a notification being sent to someone who can then look into the issue would be a better way.

@momar sounds like a lot of manual effort (easy to spam the maintainer)

@codeberg I mean, how many accounts/issues are we talking? If it's e.g. 20 issues/minute, I'd imagine false alarms to be pretty rare, and the whole account could be closed pretty quickly.

@momar That's what we do right now. Problem is, creating new accounts is easy

@codeberg please don't block TOR access. Attackers would be able to use legitimate IPs anyway.

A captcha per issue/comment could be a good temporal counter-measure.

@codeberg I don't know, maybe a web of trust or a reputation system would be more suitable in the long term. Then allow trusted users to avoid captcha and use API freely.

@rdg Sounds like a bigger project, contributions absolutely welcome!

But what to do near-term?

@codeberg use captcha and disable API by default. You may want to enable API to some users via a human evaluated petition, similarly as some mastodon instances for newcomers.

Of course is a temporal patch and won't scale, but may stop the SPAM problem.

@rdg This would lock-out app users like GitNex, and long-term make federation hard to impossible .. ?

@codeberg unfortunately yes, for newcomers. Do you have statistics of what is the portion of GitNex users that would be locked? Also federation may need to implement a trust mechanism anyway

@codeberg I'd say do it until a less radical solution is found. Many websites already block access from the TOR network. Others apply "I am not a robot" filters.

@codeberg What about using an Ostrom-based system? Basically a reputation-based rate limiter - i.e. if you’re new here, you need to wait a bit before you get full rights, to ensure you’re trustworthy; projects with more invaders get more fences

@codeberg notabug.org appears to have had similar problems, as they disabled access over their Tor onion service. git over ssh is still possible with it, but I'd prefer if the website were to work as well

@utf8equalsX There are surely perfectly legitimate use cases, especially for users living in disadvantages countries. Still we need to find a mode of operation that ensures that single troublemakers cannot disrupt functionality required by other contributors.

@codeberg Do you operate a hidden service and is this spam traffic coming through it, or through Tor exit nodes via clearnet?

@utf8equalsX The traffic is coming through Tor exit nodes, we do not operate a hidden service

@codeberg Maybe some text analysis could help? For example if more than 10 issues are created with at least 90% matching contents, automatically mark it as spam and make users request approval in some way.

Maybe you could also setup spamassassin on your outgoing server and have it learn what the spam messages look like to limit email spam, but that won't get rid of the issues that are created inside gitea.

@hugot This particular spammer was posting the content of random-not-so-random birdsite posts.

@codeberg Right. This needs some thought. You want to stay accessible to software users who just want to create an account and make an issue right away so a lot of manual setup for users or blocking tor would suck.

I think you need a solution that lets new users participate, but lets existing users maintain the peace of codeberg.org.

@codeberg How about this:

- New users are allowed to create a maximum of X issues/comments per 24 hours through the web UI by default. No API use allowed.

- Optional: They are allowed to use the API to create issues on their own repos, but creating them on other repo's is not allowed.

- To be allowed to use the API and create more than X issues per 24h, a user needs to be approved by at least 2 already approved users.

@codeberg You'd need to make some sort of UI for users to grant/request approval but with something like this in place you can let your users take care stuff as a community.

@hugot Seems this would involve significant changes within gitea. Long-term a built-in trust-scoring system will surely be great, the gitea core developers will surely like this as well?

Anybody volunteering?

@codeberg Well I'd like to help out but my time is limited and my knowledge of gitea is nonexistent.

If you can find more people who are willing I'd be happy to take part and see what I can take on.

Sign in to participate in the conversation
Mastodon for Tech Folks

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!