| 00:00:47 | | DopefishJustin quits [Remote host closed the connection] |
| 00:08:16 | | DopefishJustin joins |
| 00:08:16 | | DopefishJustin is now authenticated as DopefishJustin |
| 00:11:43 | | Wohlstand quits [Client Quit] |
| 00:12:48 | | etnguyen03 (etnguyen03) joins |
| 00:37:20 | | etnguyen03 quits [Client Quit] |
| 00:59:17 | | beastbg8__ quits [Read error: Connection reset by peer] |
| 01:06:39 | <klea> | https://wiki.archiveteam.org/index.php/Hacker_News links to https://github.com/HackerNews/API, maybe implement this api? |
| 01:06:54 | <klea> | i don't how the limits to query the api are however |
| 01:07:06 | | astrinaut leaves [][] |
| 01:09:13 | | etnguyen03 (etnguyen03) joins |
| 01:09:37 | <klea> | the repo says: "There is currently no rate limit." |
| 01:10:07 | <klea> | but given maxitem returns 46029159 i'm not sure about that |
| 01:10:47 | | DogsRNice_ joins |
| 01:14:21 | | DogsRNice quits [Ping timeout: 272 seconds] |
| 01:17:19 | <that_lurker> | The urls project is currently fetching https://news.ycombinator.com/newest and https://news.ycombinator.com/newcomments on random intervals |
| 01:22:36 | | sg72 quits [Remote host closed the connection] |
| 01:23:45 | | sg72 joins |
| 01:27:09 | | etnguyen03 quits [Client Quit] |
| 01:30:45 | | etnguyen03 (etnguyen03) joins |
| 01:38:54 | | BennyOtt quits [Ping timeout: 256 seconds] |
| 01:45:46 | <klea> | that_lurker: afaik fetching the direct api data from https://hacker-news.firebaseio.com/v0/item/$ID.json seems like a better idea? |
| 01:45:52 | <klea> | but yeah that's nice :3 |
| 01:46:37 | <klea> | s/:3// # shouldn't add on public discussion, not profesional enough |
| 01:57:18 | | Webuser854399 joins |
| 01:58:45 | | Webuser854399 quits [Client Quit] |
| 02:03:19 | <klea> | -- |
| 02:03:50 | <klea> | https://wiki.archiveteam.org/index.php/Dev/Tracker <- i noticed just like the official Archive Team tracker is crossed off from that page, i wonder why the official AT tracker can't be published fully |
| 02:05:03 | <klea> | "The first line allows spawning maximum of 2 processes. The second line restarts Passenger after 10,000 requests to free memory caused by memory leaks. " |
| 02:06:39 | <BlankEclair> | https://wiki.archiveteam.org/index.php/Tracker#History: "Sometime in the late 2010s the open-source tracker was gradually replaced with the proprietary one" |
| 02:08:07 | <BlankEclair> | oh nvm, disregard me, i thought you were asking why it's not the official AT tracker ^^; |
| 02:12:12 | <nicolas17> | BlankEclair: maybe you can comment on the professionalism of using :3 in this channel (see above) |
| 02:12:28 | <BlankEclair> | i was tempted to interject, but i opted not to |
| 02:12:33 | <BlankEclair> | but since you prompted... |
| 02:12:37 | <nicolas17> | I think we need more :3 here |
| 02:12:39 | <nicolas17> | not less |
| 02:12:39 | <BlankEclair> | why nyot use :3? |
| 02:12:52 | <BlankEclair> | i once reported a security vulnyability entirely in UwUspeak |
| 02:15:34 | | jinn6 quits [Quit: WeeChat 4.7.1] |
| 02:15:50 | | jinn6 joins |
| 02:54:30 | <nulldata> | BlankEclair - https://www.youtube.com/watch?v=QXUSvSUsx80 |
| 03:09:09 | <NatTheCat> | lol nicolas17 checks out.... how scummy |
| 03:09:36 | <NatTheCat> | and yes, very true. more :3 is never a bad thing |
| 03:39:29 | | etnguyen03 quits [Client Quit] |
| 03:44:34 | | etnguyen03 (etnguyen03) joins |
| 04:05:44 | | etnguyen03 quits [Remote host closed the connection] |
| 04:13:08 | | lennier2_ joins |
| 04:16:07 | | lennier2 quits [Ping timeout: 272 seconds] |
| 04:45:19 | | gosc joins |
| 04:45:51 | <gosc> | I wonder if there's a quicker way to get a large amount of webpages saved at the same time without having to ask here? |
| 04:46:40 | <gosc> | there used to be google sheets for wayback machine but they've since made it so that it would only run after like 2 days or something |
| 04:47:54 | | Island quits [Read error: Connection reset by peer] |
| 05:01:13 | | beastbg8 (beastbg8) joins |
| 05:02:30 | | sec^nd quits [Remote host closed the connection] |
| 05:02:59 | | sec^nd (second) joins |
| 05:14:14 | | arch quits [Ping timeout: 256 seconds] |
| 05:17:10 | | arch (arch) joins |
| 05:59:03 | <pabs> | gosc: the SPN email API still works |
| 05:59:25 | <pabs> | also asking for AB !ao < here works |
| 06:19:22 | | DogsRNice_ quits [Read error: Connection reset by peer] |
| 06:20:14 | | driib97 quits [Quit: Ping timeout (120 seconds)] |
| 06:38:37 | | unknownsrc quits [Ping timeout: 272 seconds] |
| 07:00:53 | | unknownsrc (unknownsrc) joins |
| 07:24:39 | | Webuser760579 joins |
| 07:25:42 | | Webuser760579 quits [Client Quit] |
| 07:26:45 | | mcint quits [Ping timeout: 272 seconds] |
| 07:27:03 | | mcint (mcint) joins |
| 07:50:26 | | BennyOtt (BennyOtt) joins |
| 09:04:17 | | valdikss quits [Ping timeout: 272 seconds] |
| 09:05:32 | | valdikss joins |
| 09:09:13 | | valdikss quits [Client Quit] |
| 09:10:04 | | valdikss joins |
| 09:32:44 | | Wohlstand (Wohlstand) joins |
| 09:36:15 | | choochaa quits [Remote host closed the connection] |
| 09:36:37 | | choochaa (choochaa) joins |
| 09:38:36 | | HackMii quits [Remote host closed the connection] |
| 09:38:54 | | HackMii (hacktheplanet) joins |
| 10:02:40 | | skyrocket quits [Ping timeout: 256 seconds] |
| 10:03:24 | | skyrocket joins |
| 10:06:52 | | Afanasiy joins |
| 10:07:12 | | nathang2184 quits [Ping timeout: 256 seconds] |
| 10:08:20 | | Afanasiy quits [Client Quit] |
| 10:08:40 | | Webuser327504 joins |
| 10:08:59 | | Webuser327504 quits [Client Quit] |
| 10:23:17 | | nathang2184 joins |
| 10:34:46 | | cyanbox quits [Read error: Connection reset by peer] |
| 11:13:07 | <cruller> | arkiver: I asked KCN Kyoto about kinet-tv.ne.jp. They said the sites will be deleted. |
| 11:13:18 | <cruller> | Fortunately, the Google search results for site:http://www.kinet-tv.ne.jp return 1,320 results, indicating there are very few pages. Therefore, I'll create a page list (referencing https://wiki.archiveteam.org/index.php/Site_exploration). |
| 11:16:05 | | evergreen5 quits [Quit: Bye] |
| 11:16:41 | | evergreen5 joins |
| 11:24:33 | | justaguy is now known as mystique_altrosky |
| 11:47:50 | | Commander001 joins |
| 12:00:03 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:45 | | Bleo182600722719623455222 joins |
| 12:21:48 | | Wohlstand quits [Client Quit] |
| 12:22:04 | | Wohlstand (Wohlstand) joins |
| 12:39:09 | | ymgve_ joins |
| 12:43:25 | | ymgve quits [Ping timeout: 272 seconds] |
| 13:04:04 | | colla is now authenticated as colla |
| 13:14:59 | | Commander001 quits [Remote host closed the connection] |
| 13:19:16 | | Commander001 joins |
| 13:22:54 | | mystique_altrosky is now authenticated as mystique_altrosky |
| 14:54:31 | | ThreeHM quits [Ping timeout: 272 seconds] |
| 14:55:58 | | ThreeHM (ThreeHeadedMonkey) joins |
| 15:02:19 | | gosc_1 joins |
| 15:05:55 | | gosc quits [Ping timeout: 272 seconds] |
| 16:13:41 | | aninternettroll quits [Ping timeout: 272 seconds] |
| 16:17:05 | | aninternettroll (aninternettroll) joins |
| 16:17:38 | | sg72 quits [Remote host closed the connection] |
| 16:18:46 | | sg72 joins |
| 17:00:00 | <klea> | -- |
| 17:00:02 | <klea> | I didn't know how big #archivebot's request count was but i made this and it helped me see how many reqs AB makes: websocat ws://archivebot.com:4568/ | jq -r '.job_data = {u: (.started_by//null), c: (.started_in//null), n: (.note//null), url: (.url//null), id: .ident} | "Queried \(.url) for \(.job_data.id) req by \(.job_data.u) in \(.job_data.c) for url(s) \(.job_url) with |
| 17:00:04 | <klea> | note: \(.job_data.note)"' |
| 17:13:51 | | ThetaDev quits [Ping timeout: 272 seconds] |
| 17:14:02 | | ThetaDev joins |
| 17:20:56 | | Webuser852275 joins |
| 17:22:42 | | Webuser852275 quits [Client Quit] |
| 18:26:17 | | Cornelius quits [Quit: Cornelius] |
| 18:27:08 | | Cornelius (Cornelius) joins |
| 18:52:11 | <Thibaultmol> | Q: are there backups of 3D print models from websites like printables? (Besides the thingiverse collection on archive.org itself, not sure how complete that even if) |
| 18:52:16 | <Thibaultmol> | is* |
| 18:55:11 | <justauser|m> | archiveteam_thingiverse should be complete as of 2015. |
| 18:56:46 | <justauser|m> | archiveteam_googlepoly, remix3d.com_20191220000000, some WARCs in archiveteam_chromebot, |
| 18:57:20 | <justauser|m> | archiveteam_claraio, archiveteam_tinkercad_*... |
| 18:59:14 | <pokechu22> | I believe katia has been looking into that. I tried to do an archivebot job on their behalf but it ended up not working well because of rate-limits on the main site leading to fake 404s on valid URLs (even at a 4 second delay), but that was as a normal (mostly) recursive job of the frontend pages as opposed to just the models themselves |
| 19:00:18 | <katia> | Printables requires some hundreds of thousands of API requests for getting direct links |
| 19:02:59 | | Cornelius quits [Client Quit] |
| 19:03:54 | | Cornelius (Cornelius) joins |
| 19:05:55 | <katia> | But yes I’ve done printables in the past |
| 19:06:12 | <katia> | Via archivebot |
| 19:06:33 | <katia> | I got all models and PDFs for everything at the time |
| 19:06:58 | <katia> | Should do another run at some point |
| 19:14:41 | | jspiros quits [] |
| 19:28:20 | | andrewnyr quits [Quit: Ping timeout (120 seconds)] |
| 19:28:46 | | andrewnyr joins |
| 19:38:08 | | gosc_1 quits [Quit: Leaving] |
| 19:40:39 | | Cuphead2527480 (Cuphead2527480) joins |
| 19:45:33 | | SootBector quits [Remote host closed the connection] |
| 19:46:40 | | SootBector (SootBector) joins |
| 20:00:21 | | cyanbox joins |
| 20:22:19 | | kdy quits [Remote host closed the connection] |
| 20:30:47 | | kdy (kdy) joins |
| 20:32:45 | | DogsRNice joins |
| 20:38:08 | | that_lurker quits [Remote host closed the connection] |
| 20:40:03 | | jspiros (jspiros) joins |
| 20:43:35 | | that_lurker (that_lurker) joins |
| 21:08:45 | | MrMcNuggets quits [Quit: WeeChat 4.3.2] |
| 21:11:36 | | MrMcNuggets (MrMcNuggets) joins |
| 21:25:26 | | HP_Archivist quits [Quit: Leaving] |
| 21:29:38 | | cmlow joins |
| 21:30:35 | | TastyWiener95 quits [Quit: So long, farewell, auf wiedersehen, good night] |
| 22:00:24 | | Cuphead2527480 quits [Client Quit] |
| 22:04:20 | | HP_Archivist (HP_Archivist) joins |
| 22:10:27 | <Guest> | klea: the api doesnt have a ratelimit. if you have enough concurrent downloads you can download the entire thing in a few hours (i believe between 30-50GB uncompressed). |
| 22:18:03 | <Guest> | was there anything happening to HN? |
| 22:35:04 | | etnguyen03 (etnguyen03) joins |
| 22:37:44 | | Island joins |
| 23:04:13 | <hexagonwin> | ah crap, even my browsertrix crashed. if nobody's interested guess i should try developing something.. |
| 23:06:18 | <@JAA> | Browsertrix writes bad WARCs anyway doesn't it? |
| 23:08:02 | <hexagonwin> | idk, but it's still much better than doing nothing |
| 23:09:10 | <hexagonwin> | my prev message here 23h ago jic you missed it https://termbin.com/o7mq |
| 23:09:37 | <@JAA> | Yeah, I saw. Haven't had time to look into it myself. |
| 23:12:20 | <hexagonwin> | archivebot still at 327GB sadly (vs my now dead crawler 424GB) |
| 23:20:05 | | Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…] |
| 23:22:04 | | nomadgeek (nomadgeek) joins |
| 23:28:49 | | nine quits [Quit: See ya!] |
| 23:29:02 | | nine joins |
| 23:29:02 | | nine is now authenticated as nine |
| 23:29:02 | | nine quits [Changing host] |
| 23:29:02 | | nine (nine) joins |
| 23:38:48 | | superkuh_ joins |
| 23:39:40 | <hexagonwin> | not a good script, but this seems to work well https://termbin.com/gu15 |
| 23:40:24 | <hexagonwin> | is there any way to have wget get multiple URLs in one run with different headers? |
| 23:42:05 | | superkuh quits [Ping timeout: 272 seconds] |
| 23:51:36 | | Guest58 joins |
| 23:52:16 | | HugsNotDrugs` quits [Ping timeout: 256 seconds] |
| 23:52:39 | | HugsNotDrugs joins |