00:01:53<@arkiver>no idea of the size here
00:02:01<@arkiver>shall i send them an email?
00:02:10<@arkiver>will send them an email
00:04:06<@arkiver>do they actually have an email adres
00:06:24<@OrIdow6>arkiver: Are those two that J A A found not working?
00:07:07<@arkiver>ah
00:07:12@arkiver cant read
00:08:05<@arkiver>JAA: adding you in cc
00:09:56KRG_ (KRG) joins
00:14:27<@JAA>Ack
00:15:58<@arkiver>JAA: sent
00:23:09Iki quits [Ping timeout: 258 seconds]
00:25:04Wayward quits [Ping timeout: 258 seconds]
00:25:12jazza quits [Read error: Connection reset by peer]
00:25:36jazza joins
00:31:10Wayward (wayward) joins
00:36:57Doranwen quits [Ping timeout: 258 seconds]
00:40:24sneezey quits [Ping timeout: 258 seconds]
00:43:52sneezey joins
00:59:09phiresky joins
01:02:51dm4v quits [Read error: Connection reset by peer]
01:03:57dm4v joins
01:04:00dm4v quits [Changing host]
01:04:00dm4v (dm4v) joins
01:05:22Doranwen (Doranwen) joins
01:09:23<@JAA>Looks like the Bethesda forums aren't read-only yet as of 14 minutes ago: https://bethesda.net/community/post/3264661
01:58:41<Larsenv>https://transfer.archivete.am/4Y91D/wikifoundry.txt
01:58:47<Larsenv>a sublist3r run, nothing special
02:00:14<@JAA>Larsenv: #archiveteam is for announcements and important messages only.
02:00:35<@JAA>Yes, discussion was here after I mentioned it there, as it's supposed to.
02:00:57<Larsenv>JAA: sorry, didn't notice that, and I also didn't notice the timestamp
02:00:58<Larsenv>I'm sorry
02:03:47<@OrIdow6>Not a good sign for size, 409/562 in Lars env's file are not in CDX
02:05:07Iki joins
02:06:24<@OrIdow6>CDX: https://transfer.archivete.am/joS8M/wikifoundry_subdomains_from_cdx_pagination.txt
02:06:35<Larsenv>Lars env
02:06:50<Larsenv>(if you're trying to mitigate pinging me, you don't have to)
02:10:06<@OrIdow6> Alright
02:10:34<@OrIdow6>Here is Larsenv's list in the same format, also with www. processed out: https://transfer.archivete.am/4NYaU/larsenvs_list_stripped.txt
02:12:11<@OrIdow6>Also, I made an error using comm, that number is only 142/552
02:13:38archzz quits [Ping timeout: 250 seconds]
02:13:51AlsoHP_Archivist joins
02:14:39AlsoHP_Archivist quits [Client Quit]
02:17:00<@JAA>The ArchiveBot collection reached 20 billion URLs (responses) a couple days ago! :-)
02:17:00HP_Archivist quits [Ping timeout: 258 seconds]
02:19:41KRG_ quits [Client Quit]
02:19:43archzz joins
02:19:44<Ryz>Wooooooooooo
02:20:50<s-crypt>thuban it is not time sensitive, and semi-low priority, as this is mostly preemptive. thank you so much!
02:40:56xkey quits [Quit: WeeChat 2.9]
02:41:13xkey (eyo) joins
02:58:44<thuban>s-crypt: i'm getting 403s for the urls in that text file you posted
02:58:53<s-crypt>hmmm
02:59:31<s-crypt>oh NO. they were time based
03:00:51<s-crypt>they are in a "public" dropbox with the password 'wilson'. do you know of another way to download them?
03:01:08<thuban>link?
03:01:10<s-crypt>if not, I can just redo the list and dm the links as I get them
03:01:33<s-crypt>https://www.dropbox.com/sh/8ffmi8kzuzla500/AAA3P47QvYIHyHyEDzgrGgMsa?dl=0
03:04:09Wayward quits [Ping timeout: 258 seconds]
03:04:20lennier1 quits [Client Quit]
03:09:52C4K3 joins
03:12:53<thuban>hm... dbx-cli wants me to make an account. awful. i could download them individually through the web interface, i guess, but that'd be a pita; if you could go ahead and send me that first link i'd appreciate it
03:17:06<s-crypt>thuban dming now
03:19:48C4K3 quits [Client Quit]
03:20:56<pcr>arkiver: For wikifoundry, their website says that wiki admins can generate a zip of their site's HTML files. Should you ask for access to that?
03:22:47qw3rty__ joins
03:26:23qw3rty_ quits [Ping timeout: 258 seconds]
03:27:11<@arkiver>awesome on archivebot! congrats :)
03:30:33lennier1 (lennier1) joins
04:00:48etnguyen03 quits [Client Quit]
04:25:00HP_Archivist (HP_Archivist) joins
04:27:54BlueMaxima_ joins
04:31:33BlueMaxima quits [Ping timeout: 258 seconds]
04:32:37<mgrandi>I think dbx-cli that only works for stuff you added to your own dropbox
05:26:16BlueMaxima__ joins
05:30:12BlueMaxima_ quits [Ping timeout: 258 seconds]
05:47:04HP_Archivist quits [Ping timeout: 258 seconds]
06:02:01nerdguy1138 quits [Ping timeout: 258 seconds]
06:16:35nerdguy1138 (nerdguy1138) joins
07:23:40flashfire42 quits [Remote host closed the connection]
07:23:40kiska quits [Remote host closed the connection]
07:23:40s-crypt quits [Remote host closed the connection]
07:37:57Doran (Doranwen) joins
07:38:14Doranwen quits [Ping timeout: 258 seconds]
07:47:08BlueMaxima__ quits [Client Quit]
07:59:47HP_Archivist (HP_Archivist) joins
08:33:26HP_Archivist quits [Ping timeout: 258 seconds]
08:37:48ave quits [Client Quit]
08:37:48lun4 quits [Quit: o/ https://thelounge.lasagna.dev]
08:38:11ave (ave) joins
08:38:17lun4 (lun4) joins
09:44:59HP_Archivist (HP_Archivist) joins
09:53:10kiskaWeebChat quits [Ping timeout: 258 seconds]
09:58:09HP_Archivist quits [Ping timeout: 258 seconds]
10:37:15Hosseinifard joins
10:37:47Hosseinifard quits [Remote host closed the connection]
10:38:38Hosseinifard joins
10:39:46Hosseinifard quits [Remote host closed the connection]
10:39:56Iki quits [Ping timeout: 258 seconds]
11:02:49Iki joins
11:05:09s-crypt (s-crypt) joins
11:05:12flashfire42 (flashfire42) joins
11:05:54kiska (kiska) joins
11:26:19Arcorann (Arcorann) joins
12:02:34Stiletto quits [Remote host closed the connection]
12:08:31Stiletto joins
12:17:17Stiletto quits [Remote host closed the connection]
12:28:52etnguyen03 (etnguyen03) joins
12:33:49Ryz quits [Remote host closed the connection]
12:35:07<@jrwr>I swear this is like a speed run and how fast you can kill an IRC Network
12:37:23<Arcorann>I think you want #archiveteam-ot
12:41:44Stiletto joins
12:49:05ddd joins
12:58:03gazorpazorp quits [Quit: WeeChat 2.3]
12:59:29ddd quits [Remote host closed the connection]
12:59:33ddd joins
13:01:10user joins
13:01:15user is now known as gazorpazorp
13:19:03nertzy (nertzy) joins
13:33:46LeGoupil joins
13:40:06Arcorann quits [Ping timeout: 258 seconds]
13:51:29JensRex quits [Client Quit]
14:12:40Jens (JensRex) joins
14:13:59Jonboy345 quits [Read error: Connection reset by peer]
14:27:48ddd quits [Remote host closed the connection]
14:39:05Jonboy345 joins
14:55:43Jens quits [Killed (NickServ (GHOST command used by jens_!~jens@hackint/user/JENS))]
14:55:58JensRex (JensRex) joins
15:04:03kiskaWeebChat (kiska) joins
15:04:59Stiletto quits [Remote host closed the connection]
15:09:25fuzzy8021 quits [Ping timeout: 258 seconds]
15:22:54fuzzy8021 (fuzzy8021) joins
16:28:17lennier1 quits [Client Quit]
16:29:39HP_Archivist (HP_Archivist) joins
16:36:49HP_Archivist quits [Ping timeout: 258 seconds]
16:37:20Daloader__ joins
17:34:57HP_Archivist (HP_Archivist) joins
17:42:23pcr leaves [Disconnected: Replaced by new connection]
17:42:26pcr joins
17:43:23Doran is now known as Doranwen
17:52:25Stiletto joins
17:57:19ddd joins
18:06:24Limeysoda joins
18:06:41Limeysoda quits [Remote host closed the connection]
18:06:51Lord_Nightmare quits [Read error: Connection reset by peer]
18:07:17Stiletto quits [Ping timeout: 258 seconds]
18:07:24Stilett0 joins
18:11:01Lord_Nightmare (Lord_Nightmare) joins
18:18:17Wayward (wayward) joins
18:28:33Ryz (Ryz) joins
18:32:58Daloader__ quits [Ping timeout: 250 seconds]
18:33:47Stiletto joins
18:35:39Stilett0 quits [Ping timeout: 258 seconds]
18:38:08Lord_Nightmare quits [Read error: Connection reset by peer]
18:40:56Lord_Nightmare (Lord_Nightmare) joins
18:42:57<@JAA>EggplantN, HCross, rewby: Does one of you have a box with a bunch of IPs where I could run a qwarc grab of LBPCentral? It's not huge, probably only roughly 300k requests, but they have stupid IP blocks and rate limits of ~0.5 req/s. It's going down on the weekend. Would take at least a week from one IP, so even just a few would be enough. As mentioned previously, qwarc needs iptables magic or similar
18:43:03<@JAA>to distribute the outgoing connections; it can't bind to specific interfaces/IPs.
18:50:41<@EggplantN>How many IPs you need. I can get 2 /24s ready and a /25 but it’ll be 3 hrs
18:50:52<@EggplantN>harry Might have something sooner if needed
18:52:36<@JAA>The more, the quicker I can grab it, but something like /29 or /28 would easily be enough to grab it in time.
18:55:02<@EggplantN>Aight can qwarc work with more than 1 subnet or not
18:55:36<@JAA>As long as the IP handling happens externally, qwarc doesn't care.
18:58:32<@HCross>EggplantN: not atm
18:58:36<@HCross>you'll need to configure NAT
18:58:45<nerdguy1138>JAA: google cloud has a free trial of $300usd, i ran 32 vms for a month.
18:58:49<@HCross>my hardware isn't setup
18:58:55<@HCross>nerdguy1138: won't do what JAA needs
18:59:19<nerdguy1138>better than nothing right?
18:59:29<@EggplantN>HCross got the iptables rule?
18:59:42<@HCross>EggplantN: not off the of my head, you'll need to know the dest IPs as well
18:59:43<@HCross>to NAT too
18:59:46<@EggplantN>nerdguy1138 we have kit, just not configured/setup :P
19:00:08<@EggplantN>Or does JAA have the magic rule
19:00:40<@JAA>I wouldn't be surprised if Google Cloud was banned anyway. They drop everything from OVH's IP blocks, for example.
19:00:45<@JAA>EggplantN: I don't.
19:00:57<@HCross>rewby: may have it
19:00:59<@EggplantN>Aight well then someone call the Dutch May
19:01:02<@EggplantN>*man
19:01:08<nerdguy1138>JAA: vultr, linode, digitalocean?
19:01:24<@HCross>EggplantN: we have highly available dutch people
19:01:28<@EggplantN>yeah nerdguy1138 we run serious kit. Racks of magic. Just very little management 😂
19:01:35<nerdguy1138>well ok then
19:01:38<@EggplantN>plus our cloud provider of choice is Hetzner cloud ;)
19:01:52<@EggplantN>They’re really nice and cheap
19:02:18<nerdguy1138>EggplantN whats the project?
19:03:20<@JAA>Huh, Hetzner IPs aren't blocked, interesting.
19:04:16<@JAA>I could manually split it up across a couple of machines I suppose, but trying to avoid that since it's a pain to manage.
19:05:32<rewby>JAA: Will a /27 do? If so, drop me your SSH key and I'll get you sorted in a few minutes
19:06:28<rewby>That machine has NAT magic set up, as long as you can tell qwarc to bind to a specific ip
19:07:39<@EggplantN>If you can wait I can give a /24 if preferred
19:07:51<@JAA>rewby: I can't. :-/
19:07:59<@JAA>Well, not without patching qwarc in nasty ways.
19:08:03<rewby>Uh. Can you package it in a docker container?
19:08:21Stiletto quits [Remote host closed the connection]
19:08:29<rewby>Docker containers work fine with this setup
19:09:03<@JAA>Uh, I suppose so. Or just launch it in an interactive shell in a `sleep infinity` container I guess? I've done that monstrosity before.
19:09:39Vukky quits [Quit: Leaving]
19:09:45<rewby>Either way works for me
19:09:54<rewby>I can just drop you into a shell with access to docker
19:10:02<@JAA>Ok, sounds good. :-)
19:11:45<nerdguy1138>JAA: probably a stupid question but have you heard of clusterssh?
19:13:19<@JAA>I'm aware it exists, yeah. It's more about actually splitting the thing up into independent sets than running those.
19:13:32<@JAA>Much easier to just have a single process on one machine.
19:13:56<@JAA>Mostly due to how qwarc works.
19:15:05lennier1 (lennier1) joins
19:17:39<nerdguy1138>i agree, but then my pet project is probably much easier to parallelize.
19:21:23<@JAA>qwarc is fully parallel, it's just annoying to manage. I can basically do a `threadIds = range(upperLimit)` with one process whereas I'd have to split it into N ranges manually to distribute across N machines.
19:24:12<rewby>nerdguy1138: I've tried clusterssh once. Wasn't a huge fan. I mostly do automation via saltstack which is a tad more powerful so maybe I'm spoiled
19:26:59Stiletto joins
19:27:40DogsRNice (Webuser299) joins
19:41:58HP_Archivist quits [Ping timeout: 258 seconds]
19:53:21HP_Archivist (HP_Archivist) joins
20:01:56Lilpea joins
20:02:00AlsoHP_Archivist joins
20:03:03HP_Archivist quits [Ping timeout: 258 seconds]
20:15:06<@EggplantN>we good JAA / rewby
20:15:09<@EggplantN>or need my box
20:16:18<rewby>Ask JAA. I've kinda left him on his own with his container
20:17:02<@JAA>Yep, all good, just applying the finishing touches before launching it. :-)
20:17:27<@EggplantN>okie :)
20:18:01<rewby>Hm actually. How much disk space are you expecting ? I think that vm has like 100G
20:18:09<rewby>Might be a problem
20:18:23AlsoHP_Archivist quits [Ping timeout: 258 seconds]
20:18:49AlsoHP_Archivist joins
20:19:01<rewby>I can attach more if need be
20:22:54<@JAA>I'll try to get an estimate now.
20:24:57<@JAA>But since I'm only fetching the thread page HTML, it should be pretty small.
20:25:57<rewby>I can imagine, I just done want us to run out of disk space. I can easily arrange for up to 1T of space to be added
20:27:45<@JAA>Cool, they're sending invalid HTTP responses. God, I hate the internet.
20:31:00<@JAA>rewby: We're looking at 2-3 GB probably. :-)
20:31:31<rewby>Ah, I won't bother adding more space then. Should be plenty available
20:31:35<@JAA>And that's not even with zstd.
20:31:42<@JAA>Just plain old gzip.
20:39:24AlsoHP_Archivist quits [Client Quit]
20:39:25<@JAA>Also, if it should blow up unexpectedly, it will stop automatically (on 50 GiB free space with my standard options), and I can move the existing archives away to make space. So yeah, should be fine.
20:39:39space leaves
20:45:02<rewby>Awesome
21:01:37Jake4 (Jake) joins
21:02:02Jake quits [Ping timeout: 250 seconds]
21:02:02Jake4 is now known as Jake
21:03:34damianwebster joins
21:04:48<damianwebster>hello i recently found out your project and love it that you do all these kinds of stuff. but im still wondering how i can access a specific site from an archive. if someone could help me with that i would appreciate it! thanks
21:10:08<@EggplantN>Sure what site
21:10:11<@EggplantN>You can enter it here
21:10:16<@EggplantN>https://web.archive.org/web/
21:10:50<damianwebster>Already tried it but it redirect it to a different site. Its this site that im looking for: http://formspring.me/keekihime
21:11:25LeGoupil quits [Client Quit]
21:16:26<Jake>damianwebster: here's the last working one. https://web.archive.org/web/20111214194257/http://formspring.me/keekihime
21:17:03<damianwebster>Thank you!
21:19:23<damianwebster>I assume i cant see more of the site right? Like the rest of the responses.
21:19:27<@EggplantN>To be clear, thanks for the praise but we didnt archive it :D
21:19:59<damianwebster>Well it was a project of archive team. atleast thats what the site said.
21:20:08<@EggplantN>Yep but that exact one wasnt us :D
21:20:15<@EggplantN>it was Alexa Crawls
21:20:24<@EggplantN>speaking of arkiver any reason they've stopped uploading
21:21:06Sylirana quits [Ping timeout: 244 seconds]
21:21:24Sylirana (Sylirana) joins
21:21:29<damianwebster>Hm alright. Too bad the rest of the responses are inaccessible. But i guess that comes with archiving stuff.
21:24:49<Jake>All of the captures from 2011 and 2010 seem to work, so you may be able to see more there? https://web.archive.org/web/2011*/http://formspring.me/keekihime
21:27:28<damianwebster>Hm doesnt seem like it. I meant when i click on the "load more responses" that it simply stucks. Thought it would load the other responses too.
21:32:03celestial quits [Quit: ZNC 1.8.0 - https://znc.in]
21:32:55<Jake>Sorry, I was trying to say there was different content on different capture. (Also for some reason, this user wasn't included on the ArchiveTeam captures, I believe.)
21:33:56celestial joins
21:36:35Atom quits [Ping timeout: 258 seconds]
21:45:30ddd quits [Remote host closed the connection]
22:08:02ddd joins
22:12:10Atom joins
22:36:23Iki quits [Ping timeout: 258 seconds]
22:56:13Iki joins
22:57:10monoxane7 (monoxane) joins
22:59:02monoxane quits [Ping timeout: 250 seconds]
22:59:02monoxane7 is now known as monoxane
23:27:40namespace joins
23:30:52namspc joins
23:33:39<namspc>So I have some relevant BS I'm working on: https://tzstamp.io/
23:34:07<namspc>It's a bit like OpenTimestamps except I want it to be actually documented and faster to verify proofs.
23:34:17<namspc>(Among other things)
23:49:13BlueMaxima joins
23:51:31ddd quits [Ping timeout: 258 seconds]