| 00:01:53 | <@arkiver> | no idea of the size here |
| 00:02:01 | <@arkiver> | shall i send them an email? |
| 00:02:10 | <@arkiver> | will send them an email |
| 00:04:06 | <@arkiver> | do they actually have an email adres |
| 00:06:24 | <@OrIdow6> | arkiver: Are those two that J A A found not working? |
| 00:07:07 | <@arkiver> | ah |
| 00:07:12 | | @arkiver cant read |
| 00:08:05 | <@arkiver> | JAA: adding you in cc |
| 00:09:56 | | KRG_ (KRG) joins |
| 00:14:27 | <@JAA> | Ack |
| 00:15:58 | <@arkiver> | JAA: sent |
| 00:23:09 | | Iki quits [Ping timeout: 258 seconds] |
| 00:25:04 | | Wayward quits [Ping timeout: 258 seconds] |
| 00:25:12 | | jazza quits [Read error: Connection reset by peer] |
| 00:25:36 | | jazza joins |
| 00:31:10 | | Wayward (wayward) joins |
| 00:36:57 | | Doranwen quits [Ping timeout: 258 seconds] |
| 00:40:24 | | sneezey quits [Ping timeout: 258 seconds] |
| 00:43:52 | | sneezey joins |
| 00:59:09 | | phiresky joins |
| 01:02:51 | | dm4v quits [Read error: Connection reset by peer] |
| 01:03:57 | | dm4v joins |
| 01:04:00 | | dm4v is now authenticated as dm4v |
| 01:04:00 | | dm4v quits [Changing host] |
| 01:04:00 | | dm4v (dm4v) joins |
| 01:05:22 | | Doranwen (Doranwen) joins |
| 01:09:23 | <@JAA> | Looks like the Bethesda forums aren't read-only yet as of 14 minutes ago: https://bethesda.net/community/post/3264661 |
| 01:58:41 | <Larsenv> | https://transfer.archivete.am/4Y91D/wikifoundry.txt |
| 01:58:47 | <Larsenv> | a sublist3r run, nothing special |
| 02:00:14 | <@JAA> | Larsenv: #archiveteam is for announcements and important messages only. |
| 02:00:35 | <@JAA> | Yes, discussion was here after I mentioned it there, as it's supposed to. |
| 02:00:57 | <Larsenv> | JAA: sorry, didn't notice that, and I also didn't notice the timestamp |
| 02:00:58 | <Larsenv> | I'm sorry |
| 02:03:47 | <@OrIdow6> | Not a good sign for size, 409/562 in Lars env's file are not in CDX |
| 02:05:07 | | Iki joins |
| 02:06:24 | <@OrIdow6> | CDX: https://transfer.archivete.am/joS8M/wikifoundry_subdomains_from_cdx_pagination.txt |
| 02:06:35 | <Larsenv> | Lars env |
| 02:06:50 | <Larsenv> | (if you're trying to mitigate pinging me, you don't have to) |
| 02:10:06 | <@OrIdow6> | Alright |
| 02:10:34 | <@OrIdow6> | Here is Larsenv's list in the same format, also with www. processed out: https://transfer.archivete.am/4NYaU/larsenvs_list_stripped.txt |
| 02:12:11 | <@OrIdow6> | Also, I made an error using comm, that number is only 142/552 |
| 02:13:38 | | archzz quits [Ping timeout: 250 seconds] |
| 02:13:51 | | AlsoHP_Archivist joins |
| 02:14:39 | | AlsoHP_Archivist quits [Client Quit] |
| 02:17:00 | <@JAA> | The ArchiveBot collection reached 20 billion URLs (responses) a couple days ago! :-) |
| 02:17:00 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 02:19:41 | | KRG_ quits [Client Quit] |
| 02:19:43 | | archzz joins |
| 02:19:44 | <Ryz> | Wooooooooooo |
| 02:20:50 | <s-crypt> | thuban it is not time sensitive, and semi-low priority, as this is mostly preemptive. thank you so much! |
| 02:40:56 | | xkey quits [Quit: WeeChat 2.9] |
| 02:41:13 | | xkey (eyo) joins |
| 02:58:44 | <thuban> | s-crypt: i'm getting 403s for the urls in that text file you posted |
| 02:58:53 | <s-crypt> | hmmm |
| 02:59:31 | <s-crypt> | oh NO. they were time based |
| 03:00:51 | <s-crypt> | they are in a "public" dropbox with the password 'wilson'. do you know of another way to download them? |
| 03:01:08 | <thuban> | link? |
| 03:01:10 | <s-crypt> | if not, I can just redo the list and dm the links as I get them |
| 03:01:33 | <s-crypt> | https://www.dropbox.com/sh/8ffmi8kzuzla500/AAA3P47QvYIHyHyEDzgrGgMsa?dl=0 |
| 03:04:09 | | Wayward quits [Ping timeout: 258 seconds] |
| 03:04:20 | | lennier1 quits [Client Quit] |
| 03:09:52 | | C4K3 joins |
| 03:09:52 | | C4K3 is now authenticated as C4K3 |
| 03:12:53 | <thuban> | hm... dbx-cli wants me to make an account. awful. i could download them individually through the web interface, i guess, but that'd be a pita; if you could go ahead and send me that first link i'd appreciate it |
| 03:17:06 | <s-crypt> | thuban dming now |
| 03:19:48 | | C4K3 quits [Client Quit] |
| 03:20:56 | <pcr> | arkiver: For wikifoundry, their website says that wiki admins can generate a zip of their site's HTML files. Should you ask for access to that? |
| 03:22:47 | | qw3rty__ joins |
| 03:26:23 | | qw3rty_ quits [Ping timeout: 258 seconds] |
| 03:27:11 | <@arkiver> | awesome on archivebot! congrats :) |
| 03:30:33 | | lennier1 (lennier1) joins |
| 04:00:48 | | etnguyen03 quits [Client Quit] |
| 04:25:00 | | HP_Archivist (HP_Archivist) joins |
| 04:27:54 | | BlueMaxima_ joins |
| 04:31:33 | | BlueMaxima quits [Ping timeout: 258 seconds] |
| 04:32:37 | <mgrandi> | I think dbx-cli that only works for stuff you added to your own dropbox |
| 05:26:16 | | BlueMaxima__ joins |
| 05:30:12 | | BlueMaxima_ quits [Ping timeout: 258 seconds] |
| 05:47:04 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 06:02:01 | | nerdguy1138 quits [Ping timeout: 258 seconds] |
| 06:16:35 | | nerdguy1138 (nerdguy1138) joins |
| 07:23:40 | | flashfire42 quits [Remote host closed the connection] |
| 07:23:40 | | kiska quits [Remote host closed the connection] |
| 07:23:40 | | s-crypt quits [Remote host closed the connection] |
| 07:37:57 | | Doran (Doranwen) joins |
| 07:38:14 | | Doranwen quits [Ping timeout: 258 seconds] |
| 07:47:08 | | BlueMaxima__ quits [Client Quit] |
| 07:59:47 | | HP_Archivist (HP_Archivist) joins |
| 08:33:26 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 08:37:48 | | ave quits [Client Quit] |
| 08:37:48 | | lun4 quits [Quit: o/ https://thelounge.lasagna.dev] |
| 08:38:11 | | ave (ave) joins |
| 08:38:17 | | lun4 (lun4) joins |
| 09:44:59 | | HP_Archivist (HP_Archivist) joins |
| 09:53:10 | | kiskaWeebChat quits [Ping timeout: 258 seconds] |
| 09:58:09 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 10:37:15 | | Hosseinifard joins |
| 10:37:47 | | Hosseinifard quits [Remote host closed the connection] |
| 10:38:38 | | Hosseinifard joins |
| 10:39:46 | | Hosseinifard quits [Remote host closed the connection] |
| 10:39:56 | | Iki quits [Ping timeout: 258 seconds] |
| 11:02:49 | | Iki joins |
| 11:05:09 | | s-crypt (s-crypt) joins |
| 11:05:12 | | flashfire42 (flashfire42) joins |
| 11:05:54 | | kiska (kiska) joins |
| 11:26:19 | | Arcorann (Arcorann) joins |
| 12:02:34 | | Stiletto quits [Remote host closed the connection] |
| 12:08:31 | | Stiletto joins |
| 12:17:17 | | Stiletto quits [Remote host closed the connection] |
| 12:28:52 | | etnguyen03 (etnguyen03) joins |
| 12:33:49 | | Ryz quits [Remote host closed the connection] |
| 12:35:07 | <@jrwr> | I swear this is like a speed run and how fast you can kill an IRC Network |
| 12:37:23 | <Arcorann> | I think you want #archiveteam-ot |
| 12:41:44 | | Stiletto joins |
| 12:49:05 | | ddd joins |
| 12:58:03 | | gazorpazorp quits [Quit: WeeChat 2.3] |
| 12:59:29 | | ddd quits [Remote host closed the connection] |
| 12:59:33 | | ddd joins |
| 13:01:10 | | user joins |
| 13:01:15 | | user is now known as gazorpazorp |
| 13:19:03 | | nertzy (nertzy) joins |
| 13:33:46 | | LeGoupil joins |
| 13:40:06 | | Arcorann quits [Ping timeout: 258 seconds] |
| 13:51:29 | | JensRex quits [Client Quit] |
| 14:12:40 | | Jens (JensRex) joins |
| 14:13:59 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 14:27:48 | | ddd quits [Remote host closed the connection] |
| 14:39:05 | | Jonboy345 joins |
| 14:55:43 | | Jens quits [Killed (NickServ (GHOST command used by jens_!~jens@hackint/user/JENS))] |
| 14:55:58 | | JensRex (JensRex) joins |
| 15:04:03 | | kiskaWeebChat (kiska) joins |
| 15:04:59 | | Stiletto quits [Remote host closed the connection] |
| 15:09:25 | | fuzzy8021 quits [Ping timeout: 258 seconds] |
| 15:22:54 | | fuzzy8021 (fuzzy8021) joins |
| 16:28:17 | | lennier1 quits [Client Quit] |
| 16:29:39 | | HP_Archivist (HP_Archivist) joins |
| 16:36:49 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 16:37:20 | | Daloader__ joins |
| 17:34:57 | | HP_Archivist (HP_Archivist) joins |
| 17:42:23 | | pcr leaves [Disconnected: Replaced by new connection] |
| 17:42:26 | | pcr joins |
| 17:43:23 | | Doran is now known as Doranwen |
| 17:52:25 | | Stiletto joins |
| 17:57:19 | | ddd joins |
| 18:06:24 | | Limeysoda joins |
| 18:06:41 | | Limeysoda quits [Remote host closed the connection] |
| 18:06:51 | | Lord_Nightmare quits [Read error: Connection reset by peer] |
| 18:07:17 | | Stiletto quits [Ping timeout: 258 seconds] |
| 18:07:24 | | Stilett0 joins |
| 18:11:01 | | Lord_Nightmare (Lord_Nightmare) joins |
| 18:18:17 | | Wayward (wayward) joins |
| 18:28:33 | | Ryz (Ryz) joins |
| 18:32:58 | | Daloader__ quits [Ping timeout: 250 seconds] |
| 18:33:47 | | Stiletto joins |
| 18:35:39 | | Stilett0 quits [Ping timeout: 258 seconds] |
| 18:38:08 | | Lord_Nightmare quits [Read error: Connection reset by peer] |
| 18:40:56 | | Lord_Nightmare (Lord_Nightmare) joins |
| 18:42:57 | <@JAA> | EggplantN, HCross, rewby: Does one of you have a box with a bunch of IPs where I could run a qwarc grab of LBPCentral? It's not huge, probably only roughly 300k requests, but they have stupid IP blocks and rate limits of ~0.5 req/s. It's going down on the weekend. Would take at least a week from one IP, so even just a few would be enough. As mentioned previously, qwarc needs iptables magic or similar |
| 18:43:03 | <@JAA> | to distribute the outgoing connections; it can't bind to specific interfaces/IPs. |
| 18:50:41 | <@EggplantN> | How many IPs you need. I can get 2 /24s ready and a /25 but it’ll be 3 hrs |
| 18:50:52 | <@EggplantN> | harry Might have something sooner if needed |
| 18:52:36 | <@JAA> | The more, the quicker I can grab it, but something like /29 or /28 would easily be enough to grab it in time. |
| 18:55:02 | <@EggplantN> | Aight can qwarc work with more than 1 subnet or not |
| 18:55:36 | <@JAA> | As long as the IP handling happens externally, qwarc doesn't care. |
| 18:58:32 | <@HCross> | EggplantN: not atm |
| 18:58:36 | <@HCross> | you'll need to configure NAT |
| 18:58:45 | <nerdguy1138> | JAA: google cloud has a free trial of $300usd, i ran 32 vms for a month. |
| 18:58:49 | <@HCross> | my hardware isn't setup |
| 18:58:55 | <@HCross> | nerdguy1138: won't do what JAA needs |
| 18:59:19 | <nerdguy1138> | better than nothing right? |
| 18:59:29 | <@EggplantN> | HCross got the iptables rule? |
| 18:59:42 | <@HCross> | EggplantN: not off the of my head, you'll need to know the dest IPs as well |
| 18:59:43 | <@HCross> | to NAT too |
| 18:59:46 | <@EggplantN> | nerdguy1138 we have kit, just not configured/setup :P |
| 19:00:08 | <@EggplantN> | Or does JAA have the magic rule |
| 19:00:40 | <@JAA> | I wouldn't be surprised if Google Cloud was banned anyway. They drop everything from OVH's IP blocks, for example. |
| 19:00:45 | <@JAA> | EggplantN: I don't. |
| 19:00:57 | <@HCross> | rewby: may have it |
| 19:00:59 | <@EggplantN> | Aight well then someone call the Dutch May |
| 19:01:02 | <@EggplantN> | *man |
| 19:01:08 | <nerdguy1138> | JAA: vultr, linode, digitalocean? |
| 19:01:24 | <@HCross> | EggplantN: we have highly available dutch people |
| 19:01:28 | <@EggplantN> | yeah nerdguy1138 we run serious kit. Racks of magic. Just very little management 😂 |
| 19:01:35 | <nerdguy1138> | well ok then |
| 19:01:38 | <@EggplantN> | plus our cloud provider of choice is Hetzner cloud ;) |
| 19:01:52 | <@EggplantN> | They’re really nice and cheap |
| 19:02:18 | <nerdguy1138> | EggplantN whats the project? |
| 19:03:20 | <@JAA> | Huh, Hetzner IPs aren't blocked, interesting. |
| 19:04:16 | <@JAA> | I could manually split it up across a couple of machines I suppose, but trying to avoid that since it's a pain to manage. |
| 19:05:32 | <rewby> | JAA: Will a /27 do? If so, drop me your SSH key and I'll get you sorted in a few minutes |
| 19:06:28 | <rewby> | That machine has NAT magic set up, as long as you can tell qwarc to bind to a specific ip |
| 19:07:39 | <@EggplantN> | If you can wait I can give a /24 if preferred |
| 19:07:51 | <@JAA> | rewby: I can't. :-/ |
| 19:07:59 | <@JAA> | Well, not without patching qwarc in nasty ways. |
| 19:08:03 | <rewby> | Uh. Can you package it in a docker container? |
| 19:08:21 | | Stiletto quits [Remote host closed the connection] |
| 19:08:29 | <rewby> | Docker containers work fine with this setup |
| 19:09:03 | <@JAA> | Uh, I suppose so. Or just launch it in an interactive shell in a `sleep infinity` container I guess? I've done that monstrosity before. |
| 19:09:39 | | Vukky quits [Quit: Leaving] |
| 19:09:45 | <rewby> | Either way works for me |
| 19:09:54 | <rewby> | I can just drop you into a shell with access to docker |
| 19:10:02 | <@JAA> | Ok, sounds good. :-) |
| 19:11:45 | <nerdguy1138> | JAA: probably a stupid question but have you heard of clusterssh? |
| 19:13:19 | <@JAA> | I'm aware it exists, yeah. It's more about actually splitting the thing up into independent sets than running those. |
| 19:13:32 | <@JAA> | Much easier to just have a single process on one machine. |
| 19:13:56 | <@JAA> | Mostly due to how qwarc works. |
| 19:15:05 | | lennier1 (lennier1) joins |
| 19:17:39 | <nerdguy1138> | i agree, but then my pet project is probably much easier to parallelize. |
| 19:21:23 | <@JAA> | qwarc is fully parallel, it's just annoying to manage. I can basically do a `threadIds = range(upperLimit)` with one process whereas I'd have to split it into N ranges manually to distribute across N machines. |
| 19:24:12 | <rewby> | nerdguy1138: I've tried clusterssh once. Wasn't a huge fan. I mostly do automation via saltstack which is a tad more powerful so maybe I'm spoiled |
| 19:26:59 | | Stiletto joins |
| 19:27:40 | | DogsRNice (Webuser299) joins |
| 19:41:58 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 19:53:21 | | HP_Archivist (HP_Archivist) joins |
| 20:01:56 | | Lilpea joins |
| 20:02:00 | | AlsoHP_Archivist joins |
| 20:03:03 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 20:15:06 | <@EggplantN> | we good JAA / rewby |
| 20:15:09 | <@EggplantN> | or need my box |
| 20:16:18 | <rewby> | Ask JAA. I've kinda left him on his own with his container |
| 20:17:02 | <@JAA> | Yep, all good, just applying the finishing touches before launching it. :-) |
| 20:17:27 | <@EggplantN> | okie :) |
| 20:18:01 | <rewby> | Hm actually. How much disk space are you expecting ? I think that vm has like 100G |
| 20:18:09 | <rewby> | Might be a problem |
| 20:18:23 | | AlsoHP_Archivist quits [Ping timeout: 258 seconds] |
| 20:18:49 | | AlsoHP_Archivist joins |
| 20:19:01 | <rewby> | I can attach more if need be |
| 20:22:54 | <@JAA> | I'll try to get an estimate now. |
| 20:24:57 | <@JAA> | But since I'm only fetching the thread page HTML, it should be pretty small. |
| 20:25:57 | <rewby> | I can imagine, I just done want us to run out of disk space. I can easily arrange for up to 1T of space to be added |
| 20:27:45 | <@JAA> | Cool, they're sending invalid HTTP responses. God, I hate the internet. |
| 20:31:00 | <@JAA> | rewby: We're looking at 2-3 GB probably. :-) |
| 20:31:31 | <rewby> | Ah, I won't bother adding more space then. Should be plenty available |
| 20:31:35 | <@JAA> | And that's not even with zstd. |
| 20:31:42 | <@JAA> | Just plain old gzip. |
| 20:39:24 | | AlsoHP_Archivist quits [Client Quit] |
| 20:39:25 | <@JAA> | Also, if it should blow up unexpectedly, it will stop automatically (on 50 GiB free space with my standard options), and I can move the existing archives away to make space. So yeah, should be fine. |
| 20:39:39 | | space leaves |
| 20:45:02 | <rewby> | Awesome |
| 21:01:37 | | Jake4 (Jake) joins |
| 21:02:02 | | Jake quits [Ping timeout: 250 seconds] |
| 21:02:02 | | Jake4 is now known as Jake |
| 21:03:34 | | damianwebster joins |
| 21:04:48 | <damianwebster> | hello i recently found out your project and love it that you do all these kinds of stuff. but im still wondering how i can access a specific site from an archive. if someone could help me with that i would appreciate it! thanks |
| 21:10:08 | <@EggplantN> | Sure what site |
| 21:10:11 | <@EggplantN> | You can enter it here |
| 21:10:16 | <@EggplantN> | https://web.archive.org/web/ |
| 21:10:50 | <damianwebster> | Already tried it but it redirect it to a different site. Its this site that im looking for: http://formspring.me/keekihime |
| 21:11:25 | | LeGoupil quits [Client Quit] |
| 21:16:26 | <Jake> | damianwebster: here's the last working one. https://web.archive.org/web/20111214194257/http://formspring.me/keekihime |
| 21:17:03 | <damianwebster> | Thank you! |
| 21:19:23 | <damianwebster> | I assume i cant see more of the site right? Like the rest of the responses. |
| 21:19:27 | <@EggplantN> | To be clear, thanks for the praise but we didnt archive it :D |
| 21:19:59 | <damianwebster> | Well it was a project of archive team. atleast thats what the site said. |
| 21:20:08 | <@EggplantN> | Yep but that exact one wasnt us :D |
| 21:20:15 | <@EggplantN> | it was Alexa Crawls |
| 21:20:24 | <@EggplantN> | speaking of arkiver any reason they've stopped uploading |
| 21:21:06 | | Sylirana quits [Ping timeout: 244 seconds] |
| 21:21:24 | | Sylirana (Sylirana) joins |
| 21:21:29 | <damianwebster> | Hm alright. Too bad the rest of the responses are inaccessible. But i guess that comes with archiving stuff. |
| 21:24:49 | <Jake> | All of the captures from 2011 and 2010 seem to work, so you may be able to see more there? https://web.archive.org/web/2011*/http://formspring.me/keekihime |
| 21:27:28 | <damianwebster> | Hm doesnt seem like it. I meant when i click on the "load more responses" that it simply stucks. Thought it would load the other responses too. |
| 21:32:03 | | celestial quits [Quit: ZNC 1.8.0 - https://znc.in] |
| 21:32:55 | <Jake> | Sorry, I was trying to say there was different content on different capture. (Also for some reason, this user wasn't included on the ArchiveTeam captures, I believe.) |
| 21:33:56 | | celestial joins |
| 21:36:35 | | Atom quits [Ping timeout: 258 seconds] |
| 21:45:30 | | ddd quits [Remote host closed the connection] |
| 22:08:02 | | ddd joins |
| 22:12:10 | | Atom joins |
| 22:36:23 | | Iki quits [Ping timeout: 258 seconds] |
| 22:56:13 | | Iki joins |
| 22:57:10 | | monoxane7 (monoxane) joins |
| 22:59:02 | | monoxane quits [Ping timeout: 250 seconds] |
| 22:59:02 | | monoxane7 is now known as monoxane |
| 23:27:40 | | namespace joins |
| 23:30:52 | | namspc joins |
| 23:33:39 | <namspc> | So I have some relevant BS I'm working on: https://tzstamp.io/ |
| 23:34:07 | <namspc> | It's a bit like OpenTimestamps except I want it to be actually documented and faster to verify proofs. |
| 23:34:17 | <namspc> | (Among other things) |
| 23:49:13 | | BlueMaxima joins |
| 23:51:31 | | ddd quits [Ping timeout: 258 seconds] |