00:03:10etnguyen03 quits [Client Quit]
00:30:52etnguyen03 (etnguyen03) joins
00:47:33xkey quits [Quit: WeeChat 4.6.3]
00:47:45xkey (xkey) joins
00:57:57sec^nd quits [Remote host closed the connection]
00:58:15sec^nd (second) joins
01:10:09fluke quits [Ping timeout: 260 seconds]
01:10:21fluke joins
01:25:46kansei- (kansei) joins
01:27:08kansei quits [Ping timeout: 276 seconds]
01:43:45Island joins
02:39:28PwnHoaX (PwnHoaX) joins
02:43:32<h2ibot>PaulWise edited Template:Instant messengers (+7, ICQ shut down): https://wiki.archiveteam.org/?diff=56371&oldid=55577
02:52:35etnguyen03 quits [Client Quit]
02:54:31etnguyen03 (etnguyen03) joins
03:14:01etnguyen03 quits [Remote host closed the connection]
03:17:54pabs quits [Ping timeout: 260 seconds]
03:52:45<h2ibot>Hans5958 edited Main Page/In The Media (+23): https://wiki.archiveteam.org/?diff=56372&oldid=50687
03:54:45<h2ibot>Hans5958 edited In The Media (+138): https://wiki.archiveteam.org/?diff=56373&oldid=55626
04:26:32riteo quits [Ping timeout: 276 seconds]
04:55:32flotwig_ joins
04:55:54flotwig quits [Ping timeout: 260 seconds]
04:55:54flotwig_ is now known as flotwig
05:03:03sec^nd quits [Remote host closed the connection]
05:03:13sec^nd (second) joins
05:38:35Guest58 joins
05:38:52Webuser649429 joins
05:41:05pabs (pabs) joins
05:42:17Webuser649429 quits [Client Quit]
05:51:54@hook54321 quits [Ping timeout: 615 seconds]
05:51:54mrfooooo quits [Read error: Connection reset by peer]
05:52:07mrfooooo joins
05:54:13hook54321 (hook54321) joins
05:54:13@ChanServ sets mode: +o hook54321
05:57:57qxtal quits [Read error: Connection reset by peer]
05:58:07qxtal (qxtal) joins
06:12:45nicolas17 quits [Quit: Konversation terminated!]
06:13:11nicolas17 joins
06:14:35UwU_93bydbco451y joins
06:38:05UwU_93bydbco451y1 joins
06:39:03UwU_93bydbco451y quits [Read error: Connection reset by peer]
06:39:03UwU_93bydbco451y1 quits [Read error: Connection reset by peer]
06:40:47UwU_93bydbco451y joins
06:42:12xarph quits [Read error: Connection reset by peer]
06:42:20xarph joins
07:06:00xarph quits [Read error: Connection reset by peer]
07:06:21xarph joins
07:12:56Island quits [Read error: Connection reset by peer]
07:13:34jonty quits [Ping timeout: 615 seconds]
07:13:44jonty (jonty) joins
07:42:15Onyx quits [Remote host closed the connection]
07:50:19todb quits [Ping timeout: 615 seconds]
07:51:24todb joins
08:02:28<@JAA>https://anubis.techaro.lol/docs/user/known-instances/
08:12:05archiveDrill joins
08:23:13Webuser194063 joins
08:34:31riteo (riteo) joins
08:37:04Dada joins
08:52:39Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…]
08:52:39Webuser194063 quits [Client Quit]
08:58:06<BlankEclair>anubis has a no-js version??
09:01:27<UwU_93bydbco451y>fr?!!
09:01:29lemuria_ quits [Ping timeout: 276 seconds]
09:02:08<UwU_93bydbco451y>oh is it the one that checks if your browser supports CSS animation or something?
09:02:42lemuria (lemuria) joins
09:02:49<UwU_93bydbco451y>I think I heard they discussed it a while back
09:03:28<BlankEclair>just noticed that functions via <meta http-equiv=refresh ... />
09:04:11<nyakase>there is an anubis clone called go-away that offers non-js challenges https://git.gammaspectra.live/git/go-away/wiki/Challenges#non-javascript
09:04:19<nyakase>im not sure if anubis itself has them?
09:04:38<nyakase>some sites there might have moved to it, its not automatically maintained. i know sourcehut did
09:05:11<BlankEclair>> curl --no-progress-meter https://anubis.techaro.lol/docs/admin/algorithm-selection/ --user-agent 'Mozilla' | grep metarefresh
09:05:12<BlankEclair></script><script id="anubis_challenge" type="application/json">{"rules":{"algorithm":"metarefresh","difficulty":1,"report_as":4},"challenge":"..."}
09:05:28<nyakase>oh yeah it does as an optional thing https://github.com/TecharoHQ/anubis/pull/623
09:05:29<nyakase>cool
09:10:14lemuria quits [Ping timeout: 260 seconds]
09:10:59<UwU_93bydbco451y>we live in a (AI scrapper's) society after all
09:12:02lemuria (lemuria) joins
09:20:50Guest58 joins
09:39:42Dango3602 (Dango360) joins
09:42:54Dango360 quits [Ping timeout: 260 seconds]
09:42:54Dango3602 is now known as Dango360
09:44:08Guest58 quits [Client Quit]
09:50:29beardicus quits [Ping timeout: 260 seconds]
09:51:15beardicus (beardicus) joins
09:59:58Cornelius quits [Quit: bye.]
10:00:05Guest58 joins
10:02:12Cornelius (Cornelius) joins
10:10:07camrod636 (camrod) joins
10:34:41MrMcNuggets (MrMcNuggets) joins
11:00:01Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
11:02:51Bleo182600722719623455222 joins
11:08:15seadog007__ quits [Ping timeout: 612 seconds]
11:09:44seadog007__ joins
11:10:12<kpcyrd>[X] confirm you're not a robot [X] accept our cookies [X] find a partial hash collision to prove you're not an AI
11:10:23<kpcyrd>web2 is going just great
11:18:45<alexlehm>the hoops get bigger
11:18:57<alexlehm>or more hoops maybe
11:19:26nine quits [Quit: See ya!]
11:19:38nine joins
11:19:38nine quits [Changing host]
11:19:38nine (nine) joins
11:22:28<UwU_93bydbco451y>hopefully web4 would be better (web3 was DoA)
11:25:55<UwU_93bydbco451y>we'd just knocking a physical network of wires and listening to the vibrations
11:26:28<UwU_93bydbco451y>bah, grammer :(
11:27:58Webuser053660 joins
11:30:30Webuser053660 quits [Client Quit]
11:37:54Makusu joins
11:58:49tzt quits [Ping timeout: 260 seconds]
12:01:39Twirre8 joins
12:02:12<@arkiver>i don't entirely get the web1/2/3 stuff
12:02:13<@arkiver>the Internet is the Internet
12:06:04<nyakase>web1 to 2 made some sense (direct web coding vs mass content creation, there was definitely a split in web usage then). 3 was purely a buzzword to shoehorn blockchain tech
12:20:56tzt (tzt) joins
12:22:35Twirre8 quits [Client Quit]
12:27:24Nact joins
12:27:53UwU_93bydbco451y quits [Quit: Dats about it, see ya.]
12:56:47Wohlstand (Wohlstand) joins
12:58:04nine quits [Client Quit]
12:58:48nine joins
12:58:48nine quits [Changing host]
12:58:48nine (nine) joins
13:24:51Makusu quits [Client Quit]
13:25:01Makusu (Makusu) joins
13:36:12PredatorIWD25 quits [Read error: Connection reset by peer]
13:52:02<hexagonwin>i've been running grab-site on choimobile.vn for a bit less than 20 hours now and it generally seems to run well, but it seems to download some unnecessary things. For example even if it has obtained /threads/rom-myster-v4-sky-860-slk-len-ke.17492/page-190 it tries to get /threads/rom-myster-v4-sky-860-slk-len-ke.17492/page-190#post-543213 and a bunch of other url for each post on that
13:52:02<hexagonwin>same page for some reason. I'm using the forums and xenforo igset (github.com/ArchiveTeam/grab-site/pull/178), is there more igset i can add to let it ignore/skip those urls with #post- ?
14:04:09<hexagonwin>it seems like even for the same page on the same thread, each time it crawls a different #post- url the page slightly differs due to google/cloudflare related strings getting randomized, like http://www.z80.kr/tmp/a.html and http://www.z80.kr/tmp/b.html (copied example to my server)
14:04:48<hexagonwin>sometimes it catches a dupe, sometimes it doesn't..
14:05:09lemuria quits [Read error: Connection reset by peer]
14:06:11lemuria (lemuria) joins
14:06:48PredatorIWD25 joins
14:07:02<hexagonwin>the best solution would be to internally strip the #post- suffix so that no valid page is ignored, but it doesn't seem to be possible :/
14:17:31<masterx244|m><hexagonwin> "the best solution would be to..." <- thats what the ignore-rules are for. write a regex matching #post-DIGITS anchored to end of string
14:17:37<h2ibot>Exorcism edited PukiWiki (+25, /* Wikifarms */): https://wiki.archiveteam.org/?diff=56374&oldid=52428
14:18:20<masterx244|m>(since those post-links are only reachable if you reached the page already, they are the link you get when clicking on a post link)
14:18:22Nact quits [Remote host closed the connection]
14:19:29<hexagonwin>masterx244|m the igset only ignores the link entirely and can't rewrite it to drop that suffix right? i was a bit concerned since i thought the crawler could've got the link from another page that references it, not from the thread's page (where it's an obvious dupe)
14:19:56Nact joins
14:20:31<hexagonwin>while reading the xenforo igset I noticed this line, do you know what the %page% part is for?
14:20:36<hexagonwin>" /index\.php\?threads/.*/page-%page% "
14:21:18<masterx244|m>no idea on that weird %page% part
14:22:10<masterx244|m>also: there should be at least one way to a thread without direct post links by going over the topic list
14:22:46<hexagonwin>alright, i'll just let it ignore every #post- suffixed url
14:49:26awauwa (awauwa) joins
15:04:03MrMcNuggets quits [Quit: WeeChat 4.3.2]
15:08:28nulldata-alt quits [Quit: Ping timeout (120 seconds)]
15:09:07nulldata-alt (nulldata) joins
15:33:13grill (grill) joins
15:44:05<murb>a/win 66
15:44:11<murb>ffs
16:01:26Zeether joins
16:08:27nine quits [Quit: See ya!]
16:08:39nine joins
16:08:39nine quits [Changing host]
16:08:39nine (nine) joins
16:14:25<Zeether>hi, I'm not sure if proboards stuff is in any danger but is it fine to submit a forum for archival?
16:46:31<@imer>Zeether: very much so, we archive a lot of things proactively :)
16:56:38flotwig quits [Ping timeout: 276 seconds]
16:56:57flotwig joins
17:04:48HP_Archivist quits [Read error: Connection reset by peer]
17:06:23awauwa quits [Client Quit]
17:29:47grill quits [Ping timeout: 276 seconds]
17:29:50<Zeether>okay, it's this one Kim Possible fan forum that has a few posts from showrunners etc
17:29:57<Zeether>https://ronstoppable.proboards.com/
17:31:28etnguyen03 (etnguyen03) joins
17:33:04grill (grill) joins
17:34:12Webuser675383 joins
17:35:30Webuser675383 quits [Client Quit]
17:37:56<@imer>Zeether: queued, thanks! you should be able to watch progress on http://archivebot.com/ if you like
17:44:14<Zeether>alright, thanks
17:53:49<nicolas17>nyakase: Berners-Lee had already coined web3 for semantic web stuff, the blockchain bros took over it
17:55:30nyakase learned something new
18:04:09etnguyen03 quits [Client Quit]
18:22:55Zeether quits [Client Quit]
18:59:30BornOn420 quits [Remote host closed the connection]
19:00:09BornOn420 (BornOn420) joins
19:04:04nine quits [Ping timeout: 260 seconds]
19:09:54grill quits [Ping timeout: 260 seconds]
19:14:36Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…]
19:26:52Mateon1 quits [Remote host closed the connection]
19:27:03Mateon1 joins
19:49:02Makusu quits [Quit: My Mac has gone to sleep. ZZZzzz…]
20:02:24riteo quits [Ping timeout: 260 seconds]
20:03:00salads joins
20:03:00<salads>hi
20:03:26salads quits [Client Quit]
20:05:44twirre joins
20:16:53twirre quits [Client Quit]
20:23:59pseudorizer quits [Ping timeout: 260 seconds]
20:24:10pseudorizer (pseudorizer) joins
20:50:13ThreeHM quits [Quit: WeeChat 4.6.3]
20:54:05ThreeHM (ThreeHeadedMonkey) joins
21:43:00<h2ibot>Pokechu22 edited Mailman/2 (+37, /* Archived */…): https://wiki.archiveteam.org/?diff=56375&oldid=56295
21:44:01<h2ibot>Pokechu22 edited Deathwatch (+171, /* 2025 */ umakeiba.com): https://wiki.archiveteam.org/?diff=56376&oldid=56272
21:50:29etnguyen03 (etnguyen03) joins
22:00:21etnguyen03 quits [Client Quit]
22:01:05etnguyen03 (etnguyen03) joins
22:12:20cuphead2527480 (Cuphead2527480) joins
22:16:19rvtr joins
22:17:41<rvtr>Hey all! I'm trying to figure out a good way to archive some documents for municipalities in Ontario, CA. I can make a list of URLs but there will probably many hundreds of thousands. Is there some way and some place I could submit them all for archiving?
22:20:54<rvtr>Should say this isn't an urgent thing. Governments just use eScribe which doesn't give direct links for crawlers to save. The documents will probably stay up for a good while, but I'd like to have some eventual backup.
22:22:26<Vokun>Hey rvtr. Feel free to upload the list to https://transfer.archivete.am and paste it here. I'm sure someone will be by shortly that can help archive it
22:23:21<rvtr>Thanks! Will still have to track down all the domains, but I'll submit my lists there when I'm done!
22:37:37Dada quits [Remote host closed the connection]
22:49:35Webuser201884 joins
22:50:28<Webuser201884>hello is there anyone that could help me setup a project to run in a docker container on win? when i try to start this project i get an error saying the downloader argument is required, but its there
22:53:06<nyakase>Webuser201884: are you able to share the command with us, preferably using a pastebin like https://paste.debian.net ?
22:55:57<Webuser201884>I'll be honest this is new to me. I can try, but to be sure and not go too back and forth. Im using docker desktop, and its a container, im currently in the Inspect tab on the project im trying to make. would that be the right thing to give you?
22:57:49<Webuser201884>I can get vms to run, thats easy for me but the whole docker things and commands are new. and im using the docker GUI where i can but i feel that might be the wrong way to be doing this
22:57:53<nyakase>ahh i assumed you were starting it through the terminal. i'm not very familiar with the interface of docker desktop
22:58:19<Webuser201884>I can start it through the terminal i can give you an example of one of the commands i was tyring to use
23:00:53<nyakase>sure! just put it in a pastebin so you dont flood the chat please, i think https://pastebin.com has a more user friendly interface
23:01:00<Webuser201884>https://paste.debian.net/plain/1384612
23:01:04<nyakase>ah thanks lets see
23:01:06<Webuser201884>I think i did that right?
23:01:17<nyakase>yeah
23:02:34<nyakase>to set the downloader (your nickname), you want to put it after the concurrency number
23:02:45<nyakase>like so: "--concurrent 10 nyakase"
23:02:49<Webuser201884>thats the container, and when i try to run it in the docker gui. this is what i get https://paste.debian.net/plain/1384613
23:02:57<nyakase>im not seeing that in your paste, so that might be why
23:02:57<Webuser201884>oh okay
23:03:20<nyakase>at the same time, i want to warn you about urls-grab specifically
23:03:34<Webuser201884> Here to listen
23:03:37<nyakase>that project can download a lot of urls at once, so you might not want to run it on your personal ip address
23:03:57<Webuser201884>Hm, harms with doing so? Could i use a dedicated ip vpn
23:04:14<nyakase>you might face more aggressive bans than you would get with other projects
23:04:24<nyakase>given the wide variety of sites it would grab
23:04:35<Webuser201884>ahh okay. makes enough sense
23:05:18<@JAA>Your ISP might also get annoyed if they receive enough notices of alleged abuse due to website operators not understanding how the web works.
23:05:27<nyakase>thanks JAA
23:05:41<Webuser201884>ahh yes lovely okay
23:05:48<nyakase>i think urls-grab batches about 50 urls per item, waiting for one to complete before doing the next
23:05:56<nyakase>and concurrency 10 would let you run 10 items at the same time
23:05:57<Webuser201884>I mean im in canada, and with telus
23:06:00<nyakase>you might see the problem :-)
23:06:08<Webuser201884>yea i do now lol
23:06:11<Webuser201884>good explanation
23:06:15<Webuser201884>i appreciate it
23:06:59<nyakase>shit is the tracker down again
23:07:01<Webuser201884>think ill just go with glitch than
23:07:31<nyakase>Webuser201884: glitch specifically just got paused due to problems with the site hoster :(
23:07:45<nyakase>i heard that telegram is always welcoming new users though, so you could do that in the meantime
23:07:59<Webuser201884>ohh okay
23:08:16<Webuser201884>do you have the image address handly by chance? or it just telegram-grab at the end
23:08:41<nyakase>but uhh.. the tracker, which distributes items to grabbers, appears to have crashed just now :-(
23:08:43<nyakase>quite bad timing
23:08:48<nyakase>you can still start the container and itll work when the tracker comes back
23:09:02<@JAA>YouTube, too, though that might impact your ability to watch videos by other means.
23:09:27<nyakase>Webuser201884: just replace atdr.meo.ws/archiveteam/urls-grab with atdr.meo.ws/archiveteam/telegram-grab
23:09:28<Webuser201884>oh okay and yeah i was thinking the yt one would be a little.. rather not lol
23:09:55<Webuser201884>thank you nyakase
23:11:07<Webuser201884>says unable to find but am i right to guess thats because the tracker is down
23:12:34<nyakase>give me a moment
23:13:23<nyakase>oops you're right, that does appear to have broken as well
23:13:39<nyakase>for observers: Get "https://atdr.meo.ws/v2/": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
23:14:49<Webuser201884>yeah thats what i got
23:15:09<nyakase>gonna have to wait a bit then :/
23:15:19<Webuser201884>alrighty :)
23:15:26<@JAA>The alarm bells have been rung.
23:15:29<Webuser201884>Thanks for the help its greatly appreciated
23:15:54<Webuser201884>new to this but want to put my 3g internet and my overclocked 5900x to use lol
23:17:41<Webuser201884>Am i correct in thinking that running a project in a standalone docker container is more efficient than running a warrior in a vm?
23:18:12KoleiTheBat joins
23:19:07<@JAA>Yes
23:19:14<Webuser201884>figured
23:19:20<@JAA>The VM is basically a wrapper around Docker anyway.
23:19:51<Webuser201884>ahh okay, so running it w docker desktop removes a whole layer
23:20:25<Webuser201884>My previous experience with docker goes as far as running PiHole, thats about it
23:20:25<@JAA>Also different purposes: warrior (VM or container) is for 'set it up once and forget about it'-type deployments while individual project containers need manual action if you want to support a new and urgent project or similar.
23:20:55<Webuser201884>okayy, thats makes sense and i was sensing that with the vm's
23:24:00etnguyen03 quits [Client Quit]
23:24:30etnguyen03 (etnguyen03) joins
23:33:24xarph quits [Read error: Connection reset by peer]
23:33:43xarph joins
23:34:06etnguyen03 quits [Client Quit]
23:34:41etnguyen03 (etnguyen03) joins
23:37:44nepeat (nepeat) joins
23:44:18etnguyen03 quits [Client Quit]
23:44:34<Webuser201884>Is the reddit project a good one to work on? if so what image should i use lol
23:45:04Pedrosso quits [Quit: Leaving]
23:45:04ScenarioPlanet quits [Quit: meow meowy meow]
23:45:04TheTechRobo quits [Quit: Leave message goes here]
23:45:13etnguyen03 (etnguyen03) joins
23:45:32<@JAA>The image names all derive from the repo names, which you can find on the projects' wiki pages.
23:45:43<@JAA>It's usually project-grab, but there are some exceptions.
23:45:55<Webuser201884>ahh okay
23:46:38<that_lurker>also the reddit project is paused
23:47:04<@JAA>I was about to say, the infobox also says something about the project status, though it might not always be up to date especially for short-term disruptions.
23:47:19Webuser717640 joins
23:47:40<Webuser201884>ahh okay
23:48:38kansei- quits [Quit: ZNC 1.10.0 - https://znc.in]
23:49:05kansei (kansei) joins
23:50:33etnguyen03 quits [Remote host closed the connection]
23:50:38Pedrosso joins
23:50:42ScenarioPlanet (ScenarioPlanet) joins
23:50:59TheTechRobo (TheTechRobo) joins
23:51:21<h2ibot>JustAnotherArchivist edited Reddit (+80, Clarify archival status): https://wiki.archiveteam.org/?diff=56377&oldid=55372
23:51:49etnguyen03 (etnguyen03) joins
23:51:59nepeat quits [Ping timeout: 276 seconds]
23:52:21<h2ibot>Savmor98 edited List of websites excluded from the Wayback Machine (+22): https://wiki.archiveteam.org/?diff=56378&oldid=56215
23:52:21nepeat (nepeat) joins
23:56:57kansei quits [Client Quit]
23:58:27<KoleiTheBat>so what is going on with the other projects? why did they all stop?
23:58:36<KoleiTheBat>and is that normal>