Re: webecv4 questions
By: Al to Hemo on Wed Apr 01 2020 02:28 pm
I've looked for a while but my google-foo is failing me.
I am wanting to have the BBS web pages present, but not allow anyone
to browse the message areas unless logged in. Perhaps allow one or two
areas like a local/main, if possible. I want to shutdown the network
areas from being web crawling/indexing targets.
You can stop the web crawlers with your robots.txt.
I'm not sure but I think the default robots.txt that comes with Synchronet will do this. My own robots.txt looks like this..
User-agent: *
Disallow: /bbbs
I've got this:
User-agent: *
Disallow: /
Its not stopping things taht are not identifying as a crawler. I think. I think a legitimate crawler starts by looking for the robots.txt file, I see some of those too.
Here snips of what I see in the log:
Apr 1 12:31:32 bbs synchronet: web 0045 HTTP connection accepted from: 52.82.96.27 port 49946
Apr 1 12:31:32 bbs synchronet: web 0045 Hostname: ec2-52-82-96-27.cn-northwest-1.compute.amazonaws.com.cn [52.82.96.27]
Apr 1 12:31:32 bbs synchronet: web 0045 Request: GET /api/files.ssjs?call=download-file&dir=sndmodv1mod_hl&file=INFLNCIA.MOD HTTP/1.1
Apr 1 12:31:32 bbs synchronet: web 0045 Unable to send to peer
Apr 1 12:31:32 bbs synchronet: web 0045 Sending file: /sbbs/tmp/SBBS_SSJS.31685.45.html (0 bytes)
Apr 1 12:31:33 bbs synchronet: web 0045 Session thread terminated (0 clients, 3 threads remain, 219 served)
Apr 1 12:32:16 bbs synchronet: web 0045 HTTPS connection accepted from: 111.225.148.163 port 55238
Apr 1 12:32:17 bbs synchronet: web 0045 Hostname: bytespider-111-225-148-163.crawl.bytedance.com [111.225.148.163]
Apr 1 12:32:17 bbs synchronet: web 0045 Request: GET /robots.txt HTTP/1.1
Apr 1 12:32:17 bbs synchronet: web 0045 Sending file: /sbbs/webv4/root/robots.txt (2076 bytes)
Apr 1 12:32:17 bbs synchronet: web 0045 Sent file: /sbbs/webv4/root/robots.txt (2076 bytes)
Apr 1 12:32:18 bbs synchronet: web 0045 Session thread terminated (0 clients, 3 threads remain, 220 served)
Apr 1 12:32:58 bbs synchronet: web 0045 HTTP connection accepted from: 111.225.148.177 port 46388
Apr 1 12:32:58 bbs synchronet: web 0045 Hostname: bytespider-111-225-148-177.crawl.bytedance.com [111.225.148.177]
Apr 1 12:32:58 bbs synchronet: web 0045 Request: GET /robots.txt HTTP/1.1
Apr 1 12:32:58 bbs synchronet: web 0045 Sending file: /sbbs/webv4/root/robots.txt (2076 bytes)
Apr 1 12:32:58 bbs synchronet: web 0045 Sent file: /sbbs/webv4/root/robots.txt (2076 bytes)
Apr 1 12:32:59 bbs synchronet: web 0045 Session thread terminated (0 clients, 3 threads remain, 221 served)
Apr 1 12:33:42 bbs synchronet: web 0045 HTTPS connection accepted from: 52.83.249.124 port 52734
Apr 1 12:33:42 bbs synchronet: web 0045 Hostname: ec2-52-83-249-124.cn-northwest-1.compute.amazonaws.com.cn [52.83.249.124]
Apr 1 12:33:43 bbs synchronet: web 0045 Request: GET /api/files.ssjs?call=download-file&dir=st20s92msdosc&file=CNEWS003.ARC HTTP/1.1
Apr 1 12:33:43 bbs synchronet: web 0045 Sending file: /sbbs/tmp/SBBS_SSJS.31685.45.html (0 bytes)
Apr 1 12:33:44 bbs synchronet: web 0045 Session thread terminated (0 clients, 3 threads remain, 222 served)
every minute or so, something comes in and goes directly to a specific file and tries to download it. Most of these seem to come from cn-northwest-1.compute.amazonaws.com.cn
--
H
... It is impossible to please the whole world and your mother-in-law.
---
þ Synchronet þ - Running madly into the wind and screaming - bbs.ujoint.org