00:19:19 | | DogsRNice_ joins |
00:22:43 | | DogsRNice quits [Ping timeout: 252 seconds] |
00:42:01 | | linuxgemini quits [Quit: getting (hopefully fresh) air o/] |
00:42:15 | | loug8318142 quits [Quit: The Lounge - https://thelounge.chat] |
00:45:18 | | linuxgemini (linuxgemini) joins |
01:01:16 | | Webuser984809 joins |
01:01:48 | | Webuser984809 quits [Client Quit] |
01:04:25 | | PredatorIWD29 joins |
01:06:21 | | PredatorIWD2 quits [Ping timeout: 252 seconds] |
01:06:21 | | PredatorIWD29 is now known as PredatorIWD2 |
01:58:30 | | stormcynk joins |
01:59:55 | <stormcynk> | Hi, does anyone happen to know where I could find a live torrent of archiveteam-twitter-stream-2015-05 from here? https://archive.org/download/archiveteam-twitter-stream-2015-05. I've tried several I found and they're dead. An alternate mirror is also great! |
02:10:02 | <nicolas17> | stormcynk: did you actually find different torrents or were they all the same one? :P |
02:16:09 | | NeonGlitch quits [Client Quit] |
02:34:42 | <pabs> | https://news.ycombinator.com/item?id=42551900 Tell HN: John Friel my father, internet pioneer and creator of QModem, has died |
03:00:43 | | Webuser481608 joins |
03:01:36 | <Webuser481608> | Hello. |
03:02:53 | <Webuser481608> | I think I read https://wiki.archiveteam.org/index.php/Archive.today in the past. I read https://archive.ph/faq today. It would be cool if whoever runs archive.today had free and open source code so anyone could capture webpages as seen on archive.today. So you could download WARCs and have corresponding archive.today-like pages for replay. |
03:03:32 | <Webuser481608> | *so anyone could capture webpages like as seen on archive.today |
03:03:54 | <nicolas17> | yeah it would be nice, go let them know :P |
03:05:37 | <Webuser481608> | It's about archival fixity - a term I heard of in an ipwb issue thread. ipwb = Interplanetary Wayback and that's open source in GitHub. |
03:10:24 | <Webuser481608> | So archive.today pages usually look like how the page should look like without stuff missing. If you run an Apache HTTP Server HTML file "1370929" doesn't show up as rendered. You have to rename it to "1370929.html" for it to show up as rendered, but lots of mojibaking. If you run an IPFS gateway, HTML file "1370929" will show up as rendered but |
03:10:24 | <Webuser481608> | then you get a bit of mojibaking and a thing is missing. archive.today = it all looks perfect (but no WARC). |
03:11:12 | <Webuser481608> | *Apache HTTP Server, HTML file |
03:11:54 | | stormcynk quits [Client Quit] |
03:12:35 | <Flashfire42> | We have no affiliation with archive.today and no affiliation with archive.org tho we do host our stuff there |
03:12:41 | <Webuser481608> | I suspect that archive.today uses Selenium. Just faking the user agent alone may not work to bypass CF-block and similar anti-grabbing software. Plus, I've seen things which indicate that archive.is uses Chrome/Selenium. |
03:13:11 | <Webuser481608> | @Flashfire42 I know. Just wanted to talk about or post about this. |
03:14:45 | <Webuser481608> | So loading up a browser tab or window for every capture is a bit intensive and heavier than wget or grab-site, so you may have to use a medium or high-powered computer to do that. |
03:15:50 | <nicolas17> | does Save Page Now use a browser too? |
03:16:01 | <Webuser481608> | Yeah, I've seen stuff to indicate that. |
03:16:26 | <Webuser481608> | Such as some error message like "Save Page Now browser crashed while trying to load this URL." |
03:16:39 | <Webuser481608> | (in web.archive.org) |
03:17:35 | <@JAA> | Yes, SPN uses brozzler, I believe. |
03:18:22 | <@JAA> | I suppose 2-3 orders of magnitude qualifies as 'a bit heavier'. |
03:20:19 | <Webuser481608> | I was thinking of this recently due to https://derpibooru.org/forums/meta/topics/policy-update-regarding-ai-content which says "[ai generated images WILL BE DELETED after 2025-01-06]". There's 33,958 image/video uploads tagged as "ai content" in that website: https://derpibooru.org/search?q=ai+content |
03:20:42 | <nicolas17> | (and nothing of value will be lost >.>) |
03:20:45 | <Webuser481608> | So I was seeing if a couple captures of that site would replay OK. It was not so great. |
03:21:52 | <Webuser481608> | I didn't know about that JAA https://github.com/internetarchive/brozzler |
03:24:59 | <Webuser481608> | nicolas17 there's a couple of boorus related to that website. There's Twibooru and Ponerpics which apparently have all Derpibooru images/videos mirrored automatically starting in ~2019. However, those aren't WARCs, and worse, the binary file data is messed with so images and videos get "losslessly" "optimized" with tinypng, ffmpeg, and whatever. I |
03:24:59 | <Webuser481608> | wanted to put in some effort before this mass deletion. |
03:25:26 | <Webuser481608> | Huh, dunno why my post was split into two posts again. |
03:26:13 | <TheTechRobo> | (IRC has a protocol line limit of 512 bytes) |
03:26:30 | <Webuser481608> | Ok |
03:26:54 | <TheTechRobo> | As someone who's been messing around with brozzler for awhile now, I can certainly tell you it's slow. |
03:27:27 | <Webuser481608> | I never used it. Maybe I will in the future. |
03:28:05 | <TheTechRobo> | Definitely optimisable to at least some degree, though. For example, right now there's a hardcoded `time.sleep(5)` call after it visits each hashtag to let things load, which can absolutely be done in a more elegant way. |
03:28:45 | <TheTechRobo> | s/hashtag/anchor/ (Brozzler calls them hashtags) |
03:29:06 | <@JAA> | (There are some sudden feelings of anger building inside me.) |
03:29:25 | <Webuser481608> | Reminds me of a bug fix I sent to the GNU Wget mailing list: https://lists.gnu.org/archive/html/bug-wget/2024-12/msg00006.html |
03:30:38 | <TheTechRobo> | JAA: Even the original developer knew it was wrong. :-) https://github.com/internetarchive/brozzler/blob/eb922f515/brozzler/browser.py#L652 |
03:31:11 | <@JAA> | TheTechRobo: My anger was directed at 'hashtags' though. |
03:31:26 | <TheTechRobo> | Ah, I see. lol |
03:31:44 | <TheTechRobo> | Are they called anchors? 'fragment' is coming to mind as well. |
03:32:16 | <@JAA> | Depends on where you look. Fragments in URLs, anchors in HTML. |
03:32:46 | <TheTechRobo> | Eh, hashtags is close enough :P |
03:33:07 | | @JAA slaps TheTechRobo around a bit with a large trout |
03:34:06 | <TheTechRobo> | There's fewer load-bearing sleeps than I expected, though. Props to the IA team. |
03:35:17 | <Webuser481608> | Talk about fragments and anchors make me think of Shadow DOM / Shadow root. Those are annoying! |
03:36:32 | <TheTechRobo> | I had no idea what those are, so I looked on MDN and now I *really* have no idea what those are. |
03:39:15 | <nicolas17> | JAA: I heard of music teachers getting really annoyed with young students calling notes "D hashtag" |
03:39:37 | <@JAA> | Eww, yeah |
03:40:15 | <Webuser481608> | In the past, IA's search, collection, and user pages didn't use shadow DOMs, but now they all do as a result of an update to that website. Such as https://archive.org/details/@someuser has ten uploads for example. If you download that link with wget it will have zero details about any of those uploads. It's like you gotta write a custom Selenium |
03:40:15 | <Webuser481608> | program to download it every time. Or use some fancy/exotic Browser-based software to download it. |
03:40:49 | <Webuser481608> | *custom Selenium script |
03:40:50 | <TheTechRobo> | You'd generally use IA's cli tool for that, though. |
03:44:47 | <Webuser481608> | About the derpi*u AI/ML deletion situation. Concerns of mine: comments, metadata, open WARCs, no-derive full images, newer uploads. That site does use some rate limiting, not sure how bad though. 2264 pages with no filter https://derpibooru.org/search?page=2264&q=ai+content - if I do do this, download the newest first. Also look at or use API/JSON |
03:44:47 | <Webuser481608> | to help me download that. |
03:47:02 | <Webuser481608> | Oh, there's a nightly database dump of that site - https://derpibooru.org/pages/data_dumps - nice that they do that, but it isn't warc or replay or raws. |
03:47:39 | <Webuser481608> | About 4.8GB per dump and "Note: these dumps do not include images." |
03:50:10 | <Webuser481608> | Not sure what's going on with that message from "*" above "@JAA slaps TheTechRobo around a bit with a large trout" |
03:53:29 | | TheTechRobo slaps Webuser481608 around a bit with a large trout |
03:53:43 | <TheTechRobo> | /me |
03:55:11 | <TheTechRobo> | Or /slap if you're using an IRC client that supports it and don't want to type it all out. |
04:02:21 | | Wohlstand (Wohlstand) joins |
04:06:23 | <Webuser481608> | Speaking of APIs, I think it's neat that conceptnet.io has a "View this term in the API" ( https://conceptnet.io/c/en/mare ). Imagine if anytime you went to whateversite.com/search?q= that search page had a link which said "view this search as JSON" or "view this search in the API". |
04:13:43 | | Wohlstand quits [Read error: Connection reset by peer] |
04:13:43 | | Wohlstand1 (Wohlstand) joins |
04:16:06 | | Wohlstand1 is now known as Wohlstand |
04:17:04 | | Wohlstand quits [Client Quit] |
04:17:07 | | Wohlstand1 (Wohlstand) joins |
04:17:32 | <Webuser481608> | How many gigabytes are needed to store 34,000 images/videos? For https://derpibooru.org/tags?tq=id:661924 - I think around 40GB to 80GB. For ~25,000 videos, one to four megabyte per (different site), that ended up being like 50 to 60GB. |
04:19:29 | | Wohlstand1 is now known as Wohlstand |
04:23:59 | | Wohlstand quits [Ping timeout: 252 seconds] |
04:28:07 | | Naruyoko5 quits [Remote host closed the connection] |
04:28:54 | | Naruyoko5 joins |
04:32:43 | <@OrIdow6> | Webuser481608: Am I correct in my reading that all the removed posts are being copied to https://tantabus.ai/ ? |
04:33:09 | | Naruyoko5 quits [Ping timeout: 252 seconds] |
04:33:51 | | Naruyoko joins |
04:39:10 | <Webuser481608> | Seems to be broken: https://derpibooru.org/api/v1/json/search/images?q=ai+content = ~1000 total = not true. https://derpibooru.org/api/v1/json/search/images?q=safe = ~2 million = true. |
04:40:43 | <Webuser481608> | @OrIdow6 Text in the OP of the thread "‘ai content’ images (which includes ai generated) have already been copied to tantabus.ai". Also https://tantabus.ai/images/35312 ~= the 34,000 number that I wrote about. I am guessing that comments and stuff won't be copied over. |
04:48:10 | <@OrIdow6> | Looks like all relevant pictures are "filtered" and filter options are set by POST |
04:53:01 | <Webuser481608> | Thanks for saying that, guess I would have given up if you didn't. /api/v1/json/search/images is said to be GET at https://derpibooru.org/pages/api |
04:55:43 | <Webuser481608> | So the corresponding one is https://derpibooru.org/api/v1/json/search/images?q=ai+content&filter_id=56027 - Everything filter (no filter) and "total":33958. Then just mess with &page= and/or &per_page= |
05:00:55 | <Webuser481608> | ( Unlike another Twibooru, that doesn't work outside of an API - https://derpibooru.org/search?q=explicit&filter_id=56027 and https://twibooru.org/search?q=explicit&filter_id=2 ) |
05:11:54 | <Webuser481608> | Neat, some metadata copied over. No comments copied - compare https://tantabus.ai/images/15549?q=sha512_hash:968144dc1a74d3293c4dbdc25cbe037d5d5e8b7148612ace416328ea134771cb93d05e734a5b2b5f7bf98e67d33474cb8a064de8c29136a8fd57d22537f6d840 and https://derpibooru.org/images/3310654 |
05:15:49 | <@OrIdow6> | Webuser481608: Going away for several hours but will come back to this later |
05:16:18 | <Webuser481608> | OK, glad there's some interest in this. |
05:19:24 | <Webuser481608> | Max 50 per page. First page is 1 and not 0: https://derpibooru.org/api/v1/json/search/images?q=ai+content&filter_id=56027&per_page=999&page=1 = search JSON. https://derpibooru.org/api/v1/json/images/3310654 = full image JSON. |
05:22:38 | | Webuser481608 quits [Client Quit] |
05:39:03 | | benjins2 quits [Read error: Connection reset by peer] |
05:45:53 | | Mateon1 quits [Ping timeout: 260 seconds] |
06:01:28 | | Mateon1 joins |
06:04:54 | | Webuser015671 joins |
06:07:37 | <Webuser015671> | I think this channel is publicly logged. Where's the logs? https://archive.fart.website/ ? EMPTY: https://archive.fart.website/bin/irclogger_logs/archiveteam-bs |
06:09:59 | <pabs> | see https://wiki.archiveteam.org/index.php/Archiveteam:IRC#IRC_Logs |
06:10:09 | | DogsRNice_ quits [Read error: Connection reset by peer] |
06:10:40 | <Webuser015671> | thanks |
06:22:05 | <@JAA> | irclogs.archivete.am is not officially publicly operational. I don't know why it's listed there. |
06:22:45 | <@JAA> | (The logging part is operational, the web part is not.) |
06:24:33 | <Webuser015671> | So you're running a not-so-good web server for that? I don't get it. |
06:33:07 | <Webuser015671> | that wiki article: "irclogs.archivete.am is currently the only IRC logger we have running. It is half-broken and therefore was not previously listed on this page, but now it's the only one we have, and broken is preferable to nothing." |
06:34:13 | <nulldata> | !ig 4v1dllnuwa0np0q828shkpjpo ^https?://www\.gamingonlinux\.com/.*/page=\d/www\..*\.com/ |
06:44:41 | <that_lurker> | nulldata: Ignore noted but not applied :-P |
06:49:43 | <Webuser015671> | I was asking for logs because my computer crashed; details are in this nice text file titled "Fix for lost/zeroed boot partition" -- https://ipfs.ssi.eecc.de/ipfs/bafkreia7fwz3kbys6b3fm55ucqwzpp62245jxmklhkz5d4cbe7plgscyhq -- which may also show up at https://ar.secret-network.xyz/sAySI4jL5LsZGij08O0fIx3TMlJzc3UCWq5TDZKLgaw after a while. Crash |
06:49:43 | <Webuser015671> | resulted in my boot partition getting rekt for the 3rd time - fixed in record time by the steps in that text. |
07:00:29 | <Webuser015671> | That message of mine is kinda "Who cares?" Here's a message related to this channel recently that may be more generally interesting. So I made some posts about My Little Pony web data that's going to soon be deleted. I probably or certainly like MLP:FIM more than "Filly Funtasia". I watched the first two episodes of Filly Funtasia recently: 720p YT |
07:00:29 | <Webuser015671> | downloads. Bella is sorta cute/pretty and Rose is OK or nice. Bella = best mare I guess. The equines in that show look odd: large legs. Everypony is called "filly" even if they are male: also odd. Filly Funtasia is related to or inspired by MLP. Negative: feels a bit like brainrot dumb cartoon while watching Filly Funtasia. Positive: it's kinda |
07:00:29 | <Webuser015671> | interesting. Ad-free knowledge base entry for that show which came from Hong Kong but is dubbed in English or is originally English: https://www.wikidata.org/wiki/Q28224547 . Thought I had: a 4chan /mlp/ user would say '"Filly Funtasia"? More like "Filly CuntAsia".' I'd say this message is sorta archive-related because it's details about media and |
07:00:29 | <Webuser015671> | where you can watch/download it. |
07:05:52 | | Unholy23619246453771312 (Unholy2361) joins |
07:06:42 | | Webuser015671 quits [Client Quit] |
08:23:54 | <@OrIdow6> | Oh they're gone |
08:24:45 | <@OrIdow6> | JAA: I'm the one who added the new logs to the wiki, for the reason Webuser quoted |
08:25:52 | <Fijxu|m> | huh, do people actually use IPFS? |
08:26:01 | <Fijxu|m> | It uses a ton of RAM for me |
08:26:02 | <@OrIdow6> | Given that all the other logs are dead there is there any issue with it being listed there, overloading privacy etc? I don't recall hearing the exact way in which it was broken but it seems like you've discouraged it rather than kept it a secret |
08:26:33 | <Fijxu|m> | I would love to use it more but there is no content there that I'm interested in |
08:27:30 | <@OrIdow6> | Fijxu|m: I think I ran a node for a few days but it was cumbersome for some reason (perhaps for the reasons you mention) |
08:28:09 | <@OrIdow6> | !remindme 2 days my little pony |
08:28:09 | <eggdrop> | [remind] error: "2" (parsed as 1735610400 → 2024-12-31T02:00:00Z) is in the past |
08:28:14 | <@OrIdow6> | !remindme 2d my little pony |
08:28:15 | <eggdrop> | [remind] ok, i'll remind you at 2025-01-02T08:28:14Z |
08:33:38 | <Fijxu|m> | Yeah Orldow6 it really sucks |
08:33:46 | | katia_dect5284 is now known as katia- |
08:33:49 | <Fijxu|m> | I had some like 20GB of saved files and kubo was using like 8GB of RAM |
08:33:53 | <Fijxu|m> | Is just not usable |
08:33:57 | <Fijxu|m> | I like the concept but I think no one uses it because of that |
08:35:13 | <nimaje> | I wanted to look into ipfs a bit more to see if it is possible to implement it in a not so resource hungry way (seems like it is used a bit in the cryptocurrency space, they likely care less about stuff being resource hungry than I) |
08:52:11 | <that_lurker> | is ipfs still used outside of phishing |
08:55:36 | <@OrIdow6> | that_lurker: Hahaha |
09:04:43 | | Island quits [Read error: Connection reset by peer] |
09:23:55 | | emphatic quits [Ping timeout: 252 seconds] |
09:32:50 | | loug8318142 joins |
09:36:18 | | graham9 quits [Quit: The Lounge - https://thelounge.chat] |
09:58:21 | | loug8318142 quits [Client Quit] |
10:01:06 | | loug8318142 joins |
10:01:49 | | loug8318142 quits [Client Quit] |
10:03:05 | | emphatic joins |
10:07:11 | | loug8318142 joins |
10:12:50 | | bladem quits [Quit: Leaving] |
10:14:43 | | loug8318142 quits [Client Quit] |
10:16:21 | | wyatt8750 quits [Ping timeout: 252 seconds] |
10:16:45 | | wyatt8740 joins |
10:23:45 | | loug8318142 joins |
10:56:12 | <h2ibot> | Manu edited Discourse/archived (+80, Queued sendegate.de): https://wiki.archiveteam.org/?diff=54128&oldid=54026 |
11:04:14 | <h2ibot> | Manu edited Discourse/archived (+97, queued discourse.bits-und-baeume.org): https://wiki.archiveteam.org/?diff=54129&oldid=54128 |
11:07:15 | <h2ibot> | Manu edited LiveJournal (+66, /* ArchiveBot: Add information from the…): https://wiki.archiveteam.org/?diff=54130&oldid=53415 |
11:11:36 | <c3manu> | !a https://django.wiki.bits-und-baeume.org/ -u firefox |
11:11:43 | <c3manu> | :O |
11:11:54 | | bladem (bladem) joins |
11:22:00 | | MrMcNuggets (MrMcNuggets) joins |
11:40:39 | | PredatorIWD2 quits [Read error: Connection reset by peer] |
12:00:01 | | Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:50 | | Bleo182600722719623 joins |
12:41:59 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
12:45:48 | | SkilledAlpaca418962 joins |
13:14:29 | | graham9 joins |
13:23:57 | <kiska> | I guess I'll try and see if I can import my own logs to whatever log thing |
13:32:46 | <h2ibot> | Bzc6p edited Blogger.hu (+88, dicovery takes a bit more time, also there will…): https://wiki.archiveteam.org/?diff=54131&oldid=54095 |
13:34:46 | <h2ibot> | Bzc6p edited Cafeblog.hu (+84, proclaimed started): https://wiki.archiveteam.org/?diff=54132&oldid=54100 |
13:37:47 | <h2ibot> | Bzc6p edited Volán (+106, /* Centralization, round 3 */ websites saved): https://wiki.archiveteam.org/?diff=54133&oldid=54097 |
13:38:47 | <h2ibot> | Bzc6p edited Volán (-255, /* Related websites */ remove duplicate entry): https://wiki.archiveteam.org/?diff=54134&oldid=54133 |
14:19:53 | | Webuser617418 joins |
14:20:04 | | Webuser617418 quits [Client Quit] |
14:26:09 | | Wohlstand (Wohlstand) joins |
14:38:53 | | eroc1990 quits [Quit: The Lounge - https://thelounge.chat] |
15:04:41 | | notarobot joins |
15:06:15 | | driib9 quits [Quit: Ping timeout (120 seconds)] |
15:06:32 | | driib9 (driib) joins |
15:45:16 | <Barto> | alright, i finally donated to archive.org for fireonlive. |
15:45:59 | <Barto> | !tell icedice i checked again, fireonlive was 31 |
15:46:01 | <eggdrop> | [tell] ok, I'll tell icedice when they join next |
15:48:35 | <Barto> | fireonlive++ |
15:48:35 | <eggdrop> | [karma] 'fireonlive' now has 841 karma! |
15:55:47 | | NeonGlitch (NeonGlitch) joins |
15:58:07 | | eroc1990 (eroc1990) joins |
16:07:38 | | NeonGlitch quits [Client Quit] |
17:25:40 | | yasomi quits [Quit: ZNC 1.9.1 - https://znc.in] |
17:31:05 | | yasomi (yasomi) joins |
17:32:44 | <@JAA> | OrIdow6: Basically, yeah. It goes down sometimes for various reasons. As long as people don't complain about that, fine. |
17:41:48 | <steering> | i dont think it has ever worked for me :D |
17:51:11 | | NeonGlitch (NeonGlitch) joins |
17:53:54 | <@JAA> | It's usually fine, but I don't actively monitor it nor is it currently a priority to fix when it breaks. There are a few things I need to do to the backend, then that will change. But next year (heh). |
17:56:53 | <steering> | Soon(TM) |
18:01:00 | | DogsRNice joins |
18:37:03 | <that_lurker> | JAA: I can also setup logging stuff if needed. |
18:42:00 | | lennier2 quits [Quit: Going offline, see ya! (www.adiirc.com)] |
18:51:29 | | lennier2 joins |
18:57:03 | | MrMcNuggets quits [Quit: WeeChat 4.3.2] |
19:07:41 | | Webuser156468 joins |
19:08:32 | <Webuser156468> | This https://irclogs.archivete.am/archiveteam-bs/2024-12-31 is down now - I don't know what happened since I last posted here about a 3DCG cartoon. |
19:09:00 | <Webuser156468> | More importantly, I have all the full image IDs of all images tagged as "ai content" in derpibooru.org |
19:11:56 | <Webuser156468> | At the following link with a meaningful filename, 5 days left: https://transfer.archivete.am/dAJBR/2024-12-31_derpi_33961_ai.txt |
19:11:59 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/dAJBR/2024-12-31_derpi_33961_ai.txt |
19:22:11 | <@arkiver> | happy new year everyone :) |
19:22:39 | <@arkiver> | we'll see what 2025 brings us - it may be a busy year |
19:28:33 | <Webuser156468> | Thanks. Hope you have a good time. I also hope that 2025 will be nicer: probably not... |
19:30:07 | <Webuser156468> | I'm downloading WARCs+raws of those 34,000 webpages: `sort -r 2024-12-31_derpi_33961_ai.txt | tail -n+1 | xargs -d "\n" sh -c 'for args do curl -skL "https://10.0.0.200/cgi-bin/arcmni?url=https://derpibooru.org/images/$args" 1>/dev/null; sleep 0.2; done' _` |
19:33:59 | <Webuser156468> | .sh file "arcmni" looks something like this: https://github.com/ProximaNova/ipfs-kubo-rpc-api-for-cgi/tree/main/usr/lib/cgi-bin/ - it uses wget and downloads WARCs+raws. Raws go in ./memento/20241231193108/. I'm using ZFS but without dedup on. I didn't want to download CSS files and whatever a million times so I'm using shell script "arcnoi" - |
19:33:59 | <Webuser156468> | about the same but without wget option --directory-prefix=$basepath/memento/$time/ |
19:37:08 | <Webuser156468> | Sort by file size -- "ls -S ./warc/derpibooru.org/images/" (or "find . [...]") -- may help find any possible rate limiting webpage downloads. |
19:42:15 | | NeonGlitch quits [Client Quit] |
19:45:47 | | wickedplayer494 quits [Ping timeout: 252 seconds] |
19:48:55 | <Webuser156468> | I'm using a ZFS mirror pool across two different HDDs. The hard drives aren't the crappy model(s) that I used in the past. With ZFS, how much fragmentation is too much? "zpool list" says FRAG:56%. The 16.4-TB pool is 92% full. Definitely don't fill a ZFS pool up above 94 or 95% full. While waiting on stats: more thoughts on "Filly Funtasia" (posted |
19:48:55 | <Webuser156468> | about earlier). Episodes 3 and 4 are maybe better or more gripping than episodes 1 and 2. Episode 1=about Isabella and lying, 2=about Rose and helping one another(?), 3=about an antagonist and subterfuge, 4=about Lynn and getting along with one another. |
19:49:59 | <Webuser156468> | Stats are in: 200 webpages downloaded from 2024-12-31T19:37:44.634666667Z to 2024-12-31T19:48:41.176222711Z |
19:54:24 | <Webuser156468> | arcnoi downloads one warc per url. Convert to Unix time: date -d "2024-12-31T19:48:41Z" +%s -> ... -> 200 pages downloaded in 657 seconds = 0.3044 webpages/second with 0.2-second sleep between requests -> will I meet the deadline? |
19:56:41 | <Webuser156468> | Will take 111,562 seconds at this rate, or about 31 hours, which is less than 5 days, so yes I can hopefully download it all in time. |
19:56:51 | | Island joins |
19:59:27 | | Webuser156468 quits [Client Quit] |
20:00:01 | | wickedplayer494 joins |
20:00:18 | | wickedplayer494 is now authenticated as wickedplayer494 |
20:00:48 | | BlueMaxima joins |
20:18:57 | | PredatorIWD2 joins |
20:49:57 | | Wohlstand quits [Ping timeout: 252 seconds] |
20:50:43 | | linuxgemini quits [Quit: getting (hopefully fresh) air o/] |
20:53:35 | | linuxgemini (linuxgemini) joins |
21:45:17 | | BlueMaxima quits [Client Quit] |
21:47:18 | | Wohlstand (Wohlstand) joins |
21:48:58 | | HP_Archivist quits [Ping timeout: 260 seconds] |
22:01:56 | <@OrIdow6> | that_lurker: If you do setup your own logging stuff might be useful to combine it with WBM versions of kis ka's logs |
22:02:11 | <@OrIdow6> | Tho I guess we can just pool all our IRC client logs or something |
22:20:08 | | BornOn420 quits [Remote host closed the connection] |
22:20:45 | | BornOn420 (BornOn420) joins |
22:36:28 | <h2ibot> | Bear edited List of websites excluded from the Wayback Machine/Partial exclusions (-16, Time ranges were moved.): https://wiki.archiveteam.org/?diff=54136&oldid=54024 |
22:51:30 | <h2ibot> | Bear edited YouTube/Technical details (+232, /* Format codes */ clarifications): https://wiki.archiveteam.org/?diff=54138&oldid=53712 |
22:58:58 | <szczot3k> | happy new year |
23:02:49 | | NeonGlitch (NeonGlitch) joins |
23:15:34 | | BearFortress quits [] |
23:23:38 | <@OrIdow6> | 30 minutes til mignight UTC |
23:25:05 | <Barto> | happy new year gang |
23:25:06 | | NeonGlitch quits [Client Quit] |
23:55:21 | | BearFortress joins |