00:00:47DopefishJustin quits [Remote host closed the connection]
00:08:16DopefishJustin joins
00:11:43Wohlstand quits [Client Quit]
00:12:48etnguyen03 (etnguyen03) joins
00:37:20etnguyen03 quits [Client Quit]
00:59:17beastbg8__ quits [Read error: Connection reset by peer]
01:06:39<klea>https://wiki.archiveteam.org/index.php/Hacker_News links to https://github.com/HackerNews/API, maybe implement this api?
01:06:54<klea>i don't how the limits to query the api are however
01:07:06astrinaut leaves [][]
01:09:13etnguyen03 (etnguyen03) joins
01:09:37<klea>the repo says: "There is currently no rate limit."
01:10:07<klea>but given maxitem returns 46029159 i'm not sure about that
01:10:47DogsRNice_ joins
01:14:21DogsRNice quits [Ping timeout: 272 seconds]
01:17:19<that_lurker>The urls project is currently fetching https://news.ycombinator.com/newest and https://news.ycombinator.com/newcomments on random intervals
01:22:36sg72 quits [Remote host closed the connection]
01:23:45sg72 joins
01:27:09etnguyen03 quits [Client Quit]
01:30:45etnguyen03 (etnguyen03) joins
01:38:54BennyOtt quits [Ping timeout: 256 seconds]
01:45:46<klea>that_lurker: afaik fetching the direct api data from https://hacker-news.firebaseio.com/v0/item/$ID.json seems like a better idea?
01:45:52<klea>but yeah that's nice :3
01:46:37<klea>s/:3// # shouldn't add on public discussion, not profesional enough
01:57:18Webuser854399 joins
01:58:45Webuser854399 quits [Client Quit]
02:03:19<klea>--
02:03:50<klea>https://wiki.archiveteam.org/index.php/Dev/Tracker <- i noticed just like the official Archive Team tracker is crossed off from that page, i wonder why the official AT tracker can't be published fully
02:05:03<klea>"The first line allows spawning maximum of 2 processes. The second line restarts Passenger after 10,000 requests to free memory caused by memory leaks. "
02:06:39<BlankEclair>https://wiki.archiveteam.org/index.php/Tracker#History: "Sometime in the late 2010s the open-source tracker was gradually replaced with the proprietary one"
02:08:07<BlankEclair>oh nvm, disregard me, i thought you were asking why it's not the official AT tracker ^^;
02:12:12<nicolas17>BlankEclair: maybe you can comment on the professionalism of using :3 in this channel (see above)
02:12:28<BlankEclair>i was tempted to interject, but i opted not to
02:12:33<BlankEclair>but since you prompted...
02:12:37<nicolas17>I think we need more :3 here
02:12:39<nicolas17>not less
02:12:39<BlankEclair>why nyot use :3?
02:12:52<BlankEclair>i once reported a security vulnyability entirely in UwUspeak
02:15:34jinn6 quits [Quit: WeeChat 4.7.1]
02:15:50jinn6 joins
02:54:30<nulldata>BlankEclair - https://www.youtube.com/watch?v=QXUSvSUsx80
03:09:09<NatTheCat>lol nicolas17 checks out.... how scummy
03:09:36<NatTheCat>and yes, very true. more :3 is never a bad thing
03:39:29etnguyen03 quits [Client Quit]
03:44:34etnguyen03 (etnguyen03) joins
04:05:44etnguyen03 quits [Remote host closed the connection]
04:13:08lennier2_ joins
04:16:07lennier2 quits [Ping timeout: 272 seconds]
04:45:19gosc joins
04:45:51<gosc>I wonder if there's a quicker way to get a large amount of webpages saved at the same time without having to ask here?
04:46:40<gosc>there used to be google sheets for wayback machine but they've since made it so that it would only run after like 2 days or something
04:47:54Island quits [Read error: Connection reset by peer]
05:01:13beastbg8 (beastbg8) joins
05:02:30sec^nd quits [Remote host closed the connection]
05:02:59sec^nd (second) joins
05:14:14arch quits [Ping timeout: 256 seconds]
05:17:10arch (arch) joins
05:59:03<pabs>gosc: the SPN email API still works
05:59:25<pabs>also asking for AB !ao < here works
06:19:22DogsRNice_ quits [Read error: Connection reset by peer]
06:20:14driib97 quits [Quit: Ping timeout (120 seconds)]
06:38:37unknownsrc quits [Ping timeout: 272 seconds]
07:00:53unknownsrc (unknownsrc) joins
07:24:39Webuser760579 joins
07:25:42Webuser760579 quits [Client Quit]
07:26:45mcint quits [Ping timeout: 272 seconds]
07:27:03mcint (mcint) joins
07:50:26BennyOtt (BennyOtt) joins
09:04:17valdikss quits [Ping timeout: 272 seconds]
09:05:32valdikss joins
09:09:13valdikss quits [Client Quit]
09:10:04valdikss joins
09:32:44Wohlstand (Wohlstand) joins
09:36:15choochaa quits [Remote host closed the connection]
09:36:37choochaa (choochaa) joins
09:38:36HackMii quits [Remote host closed the connection]
09:38:54HackMii (hacktheplanet) joins
10:02:40skyrocket quits [Ping timeout: 256 seconds]
10:03:24skyrocket joins
10:06:52Afanasiy joins
10:07:12nathang2184 quits [Ping timeout: 256 seconds]
10:08:20Afanasiy quits [Client Quit]
10:08:40Webuser327504 joins
10:08:59Webuser327504 quits [Client Quit]
10:23:17nathang2184 joins
10:34:46cyanbox quits [Read error: Connection reset by peer]
11:13:07<cruller>arkiver: I asked KCN Kyoto about kinet-tv.ne.jp. They said the sites will be deleted.
11:13:18<cruller>Fortunately, the Google search results for site:http://www.kinet-tv.ne.jp return 1,320 results, indicating there are very few pages. Therefore, I'll create a page list (referencing https://wiki.archiveteam.org/index.php/Site_exploration).
11:16:05evergreen5 quits [Quit: Bye]
11:16:41evergreen5 joins
11:24:33justaguy is now known as mystique_altrosky
11:47:50Commander001 joins
12:00:03Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:45Bleo182600722719623455222 joins
12:21:48Wohlstand quits [Client Quit]
12:22:04Wohlstand (Wohlstand) joins
12:39:09ymgve_ joins
12:43:25ymgve quits [Ping timeout: 272 seconds]
13:14:59Commander001 quits [Remote host closed the connection]
13:19:16Commander001 joins
14:54:31ThreeHM quits [Ping timeout: 272 seconds]
14:55:58ThreeHM (ThreeHeadedMonkey) joins
15:02:19gosc_1 joins
15:05:55gosc quits [Ping timeout: 272 seconds]
16:13:41aninternettroll quits [Ping timeout: 272 seconds]
16:17:05aninternettroll (aninternettroll) joins
16:17:38sg72 quits [Remote host closed the connection]
16:18:46sg72 joins
17:00:00<klea>--
17:00:02<klea>I didn't know how big #archivebot's request count was but i made this and it helped me see how many reqs AB makes: websocat ws://archivebot.com:4568/ | jq -r '.job_data = {u: (.started_by//null), c: (.started_in//null), n: (.note//null), url: (.url//null), id: .ident} | "Queried \(.url) for \(.job_data.id) req by \(.job_data.u) in \(.job_data.c) for url(s) \(.job_url) with
17:00:04<klea>note: \(.job_data.note)"'
17:13:51ThetaDev quits [Ping timeout: 272 seconds]
17:14:02ThetaDev joins
17:20:56Webuser852275 joins
17:22:42Webuser852275 quits [Client Quit]
18:26:17Cornelius quits [Quit: Cornelius]
18:27:08Cornelius (Cornelius) joins
18:52:11<Thibaultmol>Q: are there backups of 3D print models from websites like printables? (Besides the thingiverse collection on archive.org itself, not sure how complete that even if)
18:52:16<Thibaultmol>is*
18:55:11<justauser|m>archiveteam_thingiverse should be complete as of 2015.
18:56:46<justauser|m>archiveteam_googlepoly, remix3d.com_20191220000000, some WARCs in archiveteam_chromebot,
18:57:20<justauser|m>archiveteam_claraio, archiveteam_tinkercad_*...
18:59:14<pokechu22>I believe katia has been looking into that. I tried to do an archivebot job on their behalf but it ended up not working well because of rate-limits on the main site leading to fake 404s on valid URLs (even at a 4 second delay), but that was as a normal (mostly) recursive job of the frontend pages as opposed to just the models themselves
19:00:18<katia>Printables requires some hundreds of thousands of API requests for getting direct links
19:02:59Cornelius quits [Client Quit]
19:03:54Cornelius (Cornelius) joins
19:05:55<katia>But yes I’ve done printables in the past
19:06:12<katia>Via archivebot
19:06:33<katia>I got all models and PDFs for everything at the time
19:06:58<katia>Should do another run at some point
19:14:41jspiros quits []
19:28:20andrewnyr quits [Quit: Ping timeout (120 seconds)]
19:28:46andrewnyr joins
19:38:08gosc_1 quits [Quit: Leaving]
19:40:39Cuphead2527480 (Cuphead2527480) joins
19:45:33SootBector quits [Remote host closed the connection]
19:46:40SootBector (SootBector) joins
20:00:21cyanbox joins
20:22:19kdy quits [Remote host closed the connection]
20:30:47kdy (kdy) joins
20:32:45DogsRNice joins
20:38:08that_lurker quits [Remote host closed the connection]
20:40:03jspiros (jspiros) joins
20:43:35that_lurker (that_lurker) joins
21:08:45MrMcNuggets quits [Quit: WeeChat 4.3.2]
21:11:36MrMcNuggets (MrMcNuggets) joins
21:25:26HP_Archivist quits [Quit: Leaving]
21:29:38cmlow joins
21:30:35TastyWiener95 quits [Quit: So long, farewell, auf wiedersehen, good night]
22:00:24Cuphead2527480 quits [Client Quit]
22:04:20HP_Archivist (HP_Archivist) joins
22:10:27<Guest>klea: the api doesnt have a ratelimit. if you have enough concurrent downloads you can download the entire thing in a few hours (i believe between 30-50GB uncompressed).
22:18:03<Guest>was there anything happening to HN?
22:35:04etnguyen03 (etnguyen03) joins
22:37:44Island joins
23:04:13<hexagonwin>ah crap, even my browsertrix crashed. if nobody's interested guess i should try developing something..
23:06:18<@JAA>Browsertrix writes bad WARCs anyway doesn't it?
23:08:02<hexagonwin>idk, but it's still much better than doing nothing
23:09:10<hexagonwin>my prev message here 23h ago jic you missed it https://termbin.com/o7mq
23:09:37<@JAA>Yeah, I saw. Haven't had time to look into it myself.
23:12:20<hexagonwin>archivebot still at 327GB sadly (vs my now dead crawler 424GB)
23:20:05Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…]
23:22:04nomadgeek (nomadgeek) joins
23:28:49nine quits [Quit: See ya!]
23:29:02nine joins
23:29:02nine quits [Changing host]
23:29:02nine (nine) joins
23:38:48superkuh_ joins
23:39:40<hexagonwin>not a good script, but this seems to work well https://termbin.com/gu15
23:40:24<hexagonwin>is there any way to have wget get multiple URLs in one run with different headers?
23:42:05superkuh quits [Ping timeout: 272 seconds]
23:51:36Guest58 joins
23:52:16HugsNotDrugs` quits [Ping timeout: 256 seconds]
23:52:39HugsNotDrugs joins