Bots scraping forum content for AI?

Bryn

Resident
Joined
Sep 21, 2024
Messages
533
Reaction score
160
FP$
723
I've been told recently that there are bots that may appear to be just "viewing" the board index, but would turn out to be ones that would scrape content for AI training purposes. When I was told this, even when I experienced a lot of traffic on my forum (but now has come to a stop due to an urgent security measure put in place), I was very concerned as I didn't want any of them ever stealing content from my forum but, there's no way to prove that they're doing so...

Have any of you known about this? And, are you concerned as much as I am?
 
What content would you not like to be scraped?

I personally don't mind, as it would give out links to the source in case it would scrape and use your content.
 
I've noticed a good number of AI bots browsing RPG haven lately. I typically have at least 10 browsing my forum every day. I haven't noticed this on Thee Zone or other forums. I don't mind it, like Cedric said if they scrape and use the content on your forums you can at least track it back.
 
What content would you not like to be scraped?

I personally don't mind, as it would give out links to the source in case it would scrape and use your content.
Exactly. If there are links back from search engines/AI it'll increase the traffic we receive from SEO on our forums.
 
What content would you not like to be scraped?
Anything really...

And by the way, although the security measure used (which is the CloudFlare Turnstile by the way) does keep them out, apparently they're also kept out according to robots.txt where the likes of GPTBot and ClaudeBot are on the blacklist, as they're known to exhibit very aggressive behaviour.

So are some of you saying that AI bots scraping forum content can be a good thing? Right now, I'm very doubtful about that and don't think I'd get as many legitimate visitors either, despite having changed the meta tags to help on the SEO side of things. Maybe I'm just overly paranoid...
 
Many swear it isn't, but I wouldn't mind it being shown to someone on chatgpt with a link to the source.
 
Anything really...

And by the way, although the security measure used (which is the CloudFlare Turnstile by the way) does keep them out, apparently they're also kept out according to robots.txt where the likes of GPTBot and ClaudeBot are on the blacklist, as they're known to exhibit very aggressive behaviour.

So are some of you saying that AI bots scraping forum content can be a good thing? Right now, I'm very doubtful about that and don't think I'd get as many legitimate visitors either, despite having changed the meta tags to help on the SEO side of things. Maybe I'm just overly paranoid...
It’s a good thing. Your content will be sourced and linked to on Chatgpt & the other ai chatbot systems.

I wouldn’t block them out unless you prefer not to receive any traffic from ai chatbots.
 
Your content will be sourced and linked to on Chatgpt & the other ai chatbot systems.
Well, GPTBot is one of the few AI bots that is blocked because it exhibits very aggressive behaviour as I've been told, same with ClaudeBot.

I wouldn’t block them out unless you prefer not to receive any traffic from ai chatbots.
That's where I'm getting at... I don't want any of them flooding the Online Users List or use up so much resources., but I think not all are like that though. Stil don't want to take that chance though...
 
Well, GPTBot is one of the few AI bots that is blocked because it exhibits very aggressive behaviour as I've been told, same with ClaudeBot.


That's where I'm getting at... I don't want any of them flooding the Online Users List or use up so much resources., but I think not all are like that though. Stil don't want to take that chance though...
Why not? You’re not paying for bandwidth on Jcink. So if you’d have 10 guests or 10.000 guests, who cares?
 
So if you’d have 10 guests or 10.000 guests, who cares?
Well I care, because, while I don't mind my forum being quiet or busy in terms of activity, I personally don't want it too busy and I certainly don't want most of my guests being bots either. It's no wonder why I have the CF Turnstile security measure enabled so that only legit visitors can get in.
 
Well, GPTBot is one of the few AI bots that is blocked because it exhibits very aggressive behaviour as I've been told, same with ClaudeBot.


That's where I'm getting at... I don't want any of them flooding the Online Users List or use up so much resources., but I think not all are like that though. Stil don't want to take that chance though...
The thing here is if you’re relying on just google for traffic, then blocking out chatgpt isn’t a wise decision here. Claude is also similar to Chatgpt. A lot of users are trending to using AI systems for searching for information and not so much on Google/bing these days.

However, it is your choice at the end of day.


I’d keep them block if you want to save bandwidth, lose traffic in the long run and simply keep all your eggs in one basket .

🙂
 
The thing here is if you’re relying on just google for traffic, then blocking out chatgpt isn’t a wise decision here. Claude is also similar to Chatgpt. A lot of users are trending to using AI systems for searching for information and not so much on Google/bing these days.
Well actually, it wasn't me that blocked their bots from crawling my forum... in fact they're blocked across all Jcink forums, as all forums including mine share a "universal" robots.txt. And they're blocked because as stated in the document, they are "insanely, overwhelmingly aggressive AI scrapers" according to John (who is the main man behind Jcink).

EDIT: I've just spoken with John, and he will not permit them again, as they had drained so much of Jcink's resources, though says that he will again if they change their behaviour. For now, my forum is staying AI bot-free for the foreseeable future, and the same for other forums on the entire Jcink network.
 
Anything really...

And by the way, although the security measure used (which is the CloudFlare Turnstile by the way) does keep them out, apparently they're also kept out according to robots.txt where the likes of GPTBot and ClaudeBot are on the blacklist, as they're known to exhibit very aggressive behaviour.
If you are using CloudFlare (which if you are using Turnstile you most likely have an account) you can also use their AI Bot blocking feature they have available.

Screen Shot 2025-05-22 at 4.04.12 PM.webp
 
Well actually, it wasn't me that blocked their bots from crawling my forum... in fact they're blocked across all Jcink forums, as all forums including mine share a "universal" robots.txt. And they're blocked because as stated in the document, they are "insanely, overwhelmingly aggressive AI scrapers" according to John (who is the main man behind Jcink).

EDIT: I've just spoken with John, and he will not permit them again, as they had drained so much of Jcink's resources, though says that he will again if they change their behaviour. For now, my forum is staying AI bot-free for the foreseeable future, and the same for other forums on the entire Jcink network.
Until they find another way to bypass those restrictions 😜
 
If you are using CloudFlare (which if you are using Turnstile you most likely have an account) you can also use their AI Bot blocking feature they have available.
Actually, it was John that enabled that setting for me, but I can tell him about this feature though...
 
I can definitely see these bots being a problem for free forum hosts like Jcink, taking all those resources by trying to scrape multiple forums would definitely affect the servers.
 
Well, GPTBot is one of the few AI bots that is blocked because it exhibits very aggressive behaviour as I've been told, same with ClaudeBot.


That's where I'm getting at... I don't want any of them flooding the Online Users List or use up so much resources., but I think not all are like that though. Stil don't want to take that chance though...

Well I care, because, while I don't mind my forum being quiet or busy in terms of activity, I personally don't want it too busy and I certainly don't want most of my guests being bots either. It's no wonder why I have the CF Turnstile security measure enabled so that only legit visitors can get in.

Well actually, it wasn't me that blocked their bots from crawling my forum... in fact they're blocked across all Jcink forums, as all forums including mine share a "universal" robots.txt. And they're blocked because as stated in the document, they are "insanely, overwhelmingly aggressive AI scrapers" according to John (who is the main man behind Jcink).

EDIT: I've just spoken with John, and he will not permit them again, as they had drained so much of Jcink's resources, though says that he will again if they change their behaviour. For now, my forum is staying AI bot-free for the foreseeable future, and the same for other forums on the entire Jcink network.
Until they find another way to bypass those restrictions 😜

Simply use a premium DNS.
 
Actually, it was John that enabled that setting for me, but I can tell him about this feature though...
Don't know who John is... but if using a free/shared script offering you have less control over it.
Just one reason that I prefer self-hosted scripts... and one reason I'm no longer really paying much attention to Invision since they are questionable on the length of time of their self-hosting offering. There are plenty of other scripts (both paid and free) out there for people to use.
You want the old-fashioned forum formats... go with any of the paid scripts and several of the free ones like SMF, phpBB, myBB. You want something a little more modern... go with NodeBB or Discourse.
I run sites using NodeBB, Discourse and Xenforo.
 
For sure.

But with that being said, it's not always a bad thing.

I've seen AI-generated search results that have used my blog posts as sources.

AI isn't going anywhere, and it will continue to disrupt how we search and rank our websites. Either you learn and master it and adapt or you get left behind.

It's that simple.
 
Back
Top Bottom