00:07:05Webuser769984 joins
00:07:50Webuser769984 quits [Client Quit]
00:13:05BearFortress quits [Read error: Connection reset by peer]
00:13:10atphoenix quits [Read error: Connection reset by peer]
00:13:14BearFortress joins
00:13:42atphoenix (atphoenix) joins
00:16:52tzt (tzt) joins
00:26:40etnguyen03 quits [Client Quit]
01:01:16rewby (rewby) joins
01:01:16@ChanServ sets mode: +o rewby
01:03:33etnguyen03 (etnguyen03) joins
01:27:15tertu quits [Quit: so long...]
01:47:40tertu (tertu) joins
02:05:46HP_Archivist (HP_Archivist) joins
03:11:57Wohlstand quits [Quit: Wohlstand]
03:26:27<eggdrop>[remind] OrIdow6: my little pony
03:27:09HP_Archivist quits [Read error: Connection reset by peer]
03:27:29HP_Archivist (HP_Archivist) joins
03:27:41<that_lurker>* . °•★|•°∵ friendship is magic ∵°•|☆•° . *
03:57:43Arachnophine (Arachnophine) joins
04:14:13Arachnophine quits [Remote host closed the connection]
04:14:31Arachnophine (Arachnophine) joins
04:26:25sonick (sonick) joins
04:27:54sonick quits [Client Quit]
04:29:45sonick_ (sonick) joins
04:31:01etnguyen03 quits [Remote host closed the connection]
04:31:58sonick_ is now known as sonick
04:42:52Webuser059697 joins
04:43:59<Webuser059697>Derpibooru.org will start deleting all AI content starting tomorrow. Logs https://irclogs.archivete.am/archiveteam-bs/2024-12-31 (IPFS is great, whatutalking about), https://irclogs.archivete.am/archiveteam-bs/2025-01-05
04:44:28<Webuser059697>What I've done in regards to https://transfer.archivete.am/dAJBR/2024-12-31_derpi_33961_ai.txt =
04:44:29<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/dAJBR/2024-12-31_derpi_33961_ai.txt
04:45:22<Webuser059697>Got WARC+raws of all those IDs in these forms: webpage for full image, webpage for tag history for full image, JSON for API metadata for full image
04:45:58<Webuser059697>What I didn't do: get WARCs+raws for all of those full image files (about 34,000 of them).
04:46:23<Webuser059697>Interested user: @OrIdow6
04:47:13<Webuser059697>Regret or something: since I did one-URL-one-WARC-file grabs, the CSS files and whatever other duplicates were saved a million times to multiple .warc.gz files.
04:47:54<Webuser059697>However they weren't saved a million times to /memento/$time as a posted about days ago.
04:48:38<Webuser059697>*./memento/$time [duplicate raws weren't saved a million times, that is]
04:50:59<Webuser059697>Actually, for tag history webpages I'm on line 27,914 out of 33,961 for that URL/ID list text file. Question: webpages for full images sometimes have comments. Those comments sometimes span across multiple pages. Do my downloads have all comments pages for each full image page?
04:52:24<Webuser059697>*as I posted about days ago.
04:59:40<Webuser059697>Convert ID list to URL list text file: /^/https:\/\/derpibooru.org\/images\// = full image webpages, /^(\d+)$/https:\/\/derpibooru.org\/images\/\1\/tag_changes/ = tag history webpages, /^/https:\/\/derpibooru.org\/api\/v1\/json\/images\// = full image metadata, [parse JSON] = full image file URLs.
05:00:18<h2ibot>JustAnotherArchivist edited University Web Hosting (+468, Add University of Massachusetts Amherst…): https://wiki.archiveteam.org/?diff=54161&oldid=54007
05:04:00<Webuser059697>On the question of comments across multiple pages -> https://derpibooru.org/images/3024337?q=ai+content%2Ccomment_count.gt%3A23 -> ./derpibooru.org/images/3024337.html = ...
05:13:35DogsRNice quits [Read error: Connection reset by peer]
05:13:38<Webuser059697>"49 comments posted" -- https://10.0.0.200/.../cunt/warc/derpibooru.org/images/3024337.html -- page 1 comment="Amazing", page 2 comment="[...]I don’t care for AI.[...]", page 2 link= https://derpibooru.org/images/3024337/comments?page=2
05:15:50<Webuser059697>Page 2 comment isn't in "3024337.html". wget didn't download derpibooru.org/images/3024337/comments?page=2 , comments per page = ...
05:16:33<@OrIdow6>Webuser059697: https://transfer.archivete.am/inline/dAJBR/2024-12-31_derpi_33961_ai.txt is the list of affected IDs?
05:17:06<Webuser059697>It's all full images tagged as ai content as of 2024-12-31, so basically yes
05:20:57<Webuser059697>comments per page = 25, per ctrl+f 'posted <time' in webpage source code. filtered search for that: https://derpibooru.org/search?q=ai+content,comment_count.gt:25 , unfiltered JSON search for that: ...
05:22:44<Webuser059697>https://derpibooru.org/api/v1/json/search/images?q=ai+content,comment_count.gt:25&filter_id=56027&per_page=999&page=1 = 2 or more pages = 41 total.
05:23:24<Webuser059697>https://derpibooru.org/api/v1/json/search/images?q=ai+content,comment_count.gt:50&filter_id=56027&per_page=999&page=1 = 3 or more pages = 5 total
05:24:02<Webuser059697>https://derpibooru.org/api/v1/json/search/images?q=ai+content,comment_count.gt:75&filter_id=56027&per_page=999&page=1 = 4 or more pages = 2 total
05:24:28<Webuser059697>https://derpibooru.org/api/v1/json/search/images?q=ai+content,comment_count.gt:100&filter_id=56027&per_page=999&page=1 = 5 or more pages of comments = 1 total
05:25:54<@OrIdow6>Webuser059697: Do you have lists of full/preview images?
05:26:34<Webuser059697>no, I didn't parse any of the 34000 jsons that I downloaded.
05:28:12<@OrIdow6>Webuser059697: So your WARCs don't contain the images themselves, only metadata?
05:28:37<@OrIdow6>Oh, I see, I think you say that
05:28:51<@OrIdow6>Webuser059697: Could you send over those JSONs?
05:34:05<Webuser059697>I think I only have warc+raw of webpages and jsons, no previews or full images. Image files are stored at derpicdn.net and shown in the webpage via HTML PICTURE tag which uses JS or something to fill it in, so wget downloads look like "<picture></picture>" and rendered versions look like "<picture>[preview/full link here]</picture>"
05:36:11<@OrIdow6>Webuser059697: It's difficult to pare the page on the fly, I'm asking if you could send those JSONs over so I can parse them externally and then archive the resulting list of URLs
05:36:20<Webuser059697>So there's only links to derpicdn.net - nothing that wget with page requisites but no recursion would download.
05:36:57<Webuser059697>I can send the JSONs or something... don't have links to those full/preview image file URLs yet, but I'm running...
05:44:37<Webuser059697>$ utc; cat /zc/put/cunt/warc/memento/*/derpibooru.org/api/v1/json/images/* | zstd > /zc/put/2024-12-31_derpi_33961_ai_jsons.zst; utc # zstdcat
05:45:43<Webuser059697>bash: /usr/bin/cat: Argument list too long
05:45:43<Webuser059697>:(
05:45:51<Webuser059697>Will do some other command
05:48:53<@OrIdow6>Webuser059697: I usually end up using find for that
05:49:07<@OrIdow6>"find" the command
05:51:15BlueMaxima quits [Read error: Connection reset by peer]
05:53:45<Webuser059697>Did that before you posted anything about find, now running this which works: "$ zstd --filelist 2024-12-31_derpi_33961_ai_jsons.zst.2.txt -o 2024-12-31_derpi_33961_ai_jsons.zst # zstdcat"
05:54:20<@OrIdow6>Since they seem internetty enoguh I've gone and asked on their IRC
05:54:43<Webuser059697>cat says "Argument list too long" for 34000 args. Somewhat curious what's the limit, 10,000?
05:56:33<Webuser059697>Derpibooru has an IRC? I don't know much or anything about that. Better than having a "Discord server" I guess. Oh, and all those websites where you see a Cl0udflare captcha block every time anyone accesses a page/file: those are a step away from Dicksword with how inaccessible and non-public they are. They aren't really part of the open Internet.
05:58:13<Webuser059697>33961 files compressed : 48.65% ( 85.6 MiB => 41.6 MiB) 8 KiB ==> 44% # ZST file at...
06:07:02qinplus_mobile joins
06:10:02qinplus_mobile is now known as qinplus_phone
06:20:27<Webuser059697>It's at the following links, pick one because they're basically the same thing:
06:20:28<Webuser059697>AT: https://transfer.archivete.am/11f4Gu/2024-12-31_derpi_33961_ai_jsons.zst
06:20:28<Webuser059697>IPFS, no gateway: http://127.0.0.1/ipfs/bafybeidgcm26hzwab4axkchkd6oxrmdtn2ocx4bxtbgaw5mbkw2doph2wq/1736057184.774618991
06:20:28<Webuser059697>IPFS, clearweb gateway: https://ipfs.hypha.coop/ipfs/bafybeidgcm26hzwab4axkchkd6oxrmdtn2ocx4bxtbgaw5mbkw2doph2wq/1736057184.774618991
06:20:28<Webuser059697>IPFS, privacy gateway: http://ponypalsh4y6olziyjlswfv674utokqhz3y6beym2erqtstcgadmacid.onion/ipfs/bafybeidgcm26hzwab4axkchkd6oxrmdtn2ocx4bxtbgaw5mbkw2doph2wq/1736057184.774618991
06:21:50<Webuser059697>*127.0.0.1:8080 and pasting multi-line message = not one message but multiple messages at https://chat.hackint.org/#/chan-2
06:24:32<@JAA>IRC doesn't support multi-line messages (yet).
06:24:56<@JAA>But just the transfer link is sufficient.
06:29:48<Webuser059697>"just the transfer link is sufficient." I thought so, but content-based addressing is nice too :P
06:34:32<Webuser059697>I think the parts in the JSONs next to '"full"' are the full/original image or video file urls - 'zstdcat 2024-12-31_derpi_33961_ai_jsons.zst | grep -o "\"full\":\"[^\"]*" > 2024-12-31_derpi_33961_ai_jsons.zst.3.txt'
06:37:12<Webuser059697>URL list of 34000 images and videos that are gonna get deleted off of a server tomorrow: https://transfer.archivete.am/qfiZq/2024-12-31_derpi_33961_ai_jsons.zst.4.txt
06:37:13<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/qfiZq/2024-12-31_derpi_33961_ai_jsons.zst.4.txt
06:39:19<Webuser059697>Both "inline (for browser viewing)" and the non-inline text file have a SHA1 hash of eaa8248098b76f0b72a20d7e5e147e8264c96f26, but only the inline one shows up as not download-only. Odd.
06:41:00<Webuser059697>@OrIdow6 requested url list posted^
06:42:30<@JAA>Yeah, transfer forces a download normally, and the inline thing suppresses that.
06:43:52<Webuser059697>Also requested JSONs, as one file and not a bunch of individual files = shared. Not shared = warc+raws of tag history, main webpages. Not downloaded = multipage comments, source history, whatever else.
06:44:17<Webuser059697>Also didn't share warcs of JSONS.
06:44:58<Webuser059697>Not downloaded = full images
06:49:07<Webuser059697>Will keep that in mind about IRC multiline and transfer subdomain download-only. That also highlights a significant thing about warc vs. raw: the same bit-identical raw file shows up differently depending on how the server handles it in different places; a significant artifact of this is filename by server/warc only and not raw, such as dropbox.
06:50:07<Webuser059697>So with some dropbox DDL links the raw file will be a bunch of hexadecimal jibberish or whatever. The warc will contain the actual filename, such as "tw.png"
06:50:25<Webuser059697>*the raw file's filename
06:55:39<@JAA>Yeah, full preservation of the HTTP data is part of why WARC is the optimal format for web archival.
06:58:16<@OrIdow6>Webuser059697: Thank you for the file, I will wait a few hours more at least to see if they reply to me on IRC
07:15:08BornOn420 quits [Remote host closed the connection]
07:15:52BornOn420 (BornOn420) joins
07:16:10<qwertyasdfuiopghjkl2>https://clyp.it/ is deleting all content uploaded by non-premium users on 2025-01-19 or earlier: https://ice.arimelody.me/notes/a2gimcpiiumyfvnu (https://archive.today/fKOss), https://ice.arimelody.me/files/20f1bd51-7db6-48b0-a873-716d11e6956e
07:19:22<@JAA>YOuR coNteNt wIlL be ARChIvEd.
07:20:43<@JAA>Looks like they abandoned the free tier a few years ago: https://clypblog.tumblr.com/
07:21:16<@JAA>And now the grandfathering ends, I guess.
07:21:57<@JAA>qwertyasdfuiopghjkl2: Where did you find the 2025-01-19 date?
07:22:17<@JAA>Oh, derived from the email screenshot, right.
07:23:57<qwertyasdfuiopghjkl2>Yeah, "Dec 20" plus 30 days. Haven't been able to find any other announcement of it.
07:25:17<@arkiver>thanks qwertyasdfuiopghjkl2
07:25:24<@arkiver>i wonder how much they actually host
07:25:31<@JAA>I guess there's no indication of the endangered content. *Maybe* it only applies to those grandfathered accounts from pre-2020, but maybe not.
07:26:00<@JAA>At least it's only audio, not video. So can't be that big, right? :-)
07:26:02<@arkiver>seems like non sequential IDs
07:26:04<@arkiver>annoying
07:26:08<@arkiver>JAA: yeah :P
07:26:24<@JAA>Yeah, [0-9a-z]{8} IDs it seems.
07:27:47<qwertyasdfuiopghjkl2>custom IDs too https://clyp.it/c/the-adventures-of-carlo
07:27:56<@arkiver>pff
07:28:09<@JAA>== https://clyp.it/onhisw0l
07:28:14<@JAA>So they all have that 8-char ID.
07:28:37<@arkiver>i'll send clyp.it an email anyway, who knows they might want to work with us
07:28:46<@arkiver>(likely not)
07:29:31<@JAA>> Status: DownloadDisabled
07:29:36<@JAA>So that's a thing, apparently.
07:29:51<@arkiver>when did that ever stop us :P
07:31:43<@JAA>:-)
07:34:57<qwertyasdfuiopghjkl2>"Private" accounts can have public clyps https://clyp.it/4ea2tcmd https://clyp.it/user/kwhmnrbn (example found via the random button)
07:35:54<@JAA>Oh right, the random button. Time to invoke the inverse coupon collector's problem. :-)
07:36:34<@JAA>Channel time?
07:45:31M60_ quits [Quit: Going offline, see ya! (www.adiirc.com)]
07:47:35<f_>JAA: proposed channel name: #clippy
07:47:47<@OrIdow6>clop.it /j
07:47:56<f_>clap.it
07:47:56<qwertyasdfuiopghjkl2>apocaclyps
07:48:33<f_>qwertyasdfuiopghjkl2: your suggestion is way better than mine :D
07:50:04<@JAA>OrIdow6 has been looking at too much MLP 'fan fic'.
07:50:59<@JAA>:-P
07:50:59<@OrIdow6>>:3
07:51:16<@JAA>#apocaclyps is nice.
07:58:12<@OrIdow6>I feel smugly clever with my idea but it violates my dislike of vulgar channel names anyway
08:17:20<@JAA>Update on gagadaily.com: it's still in danger. So I will resume the continuous archival soon.
08:43:34ljcool2006__ quits [Read error: Connection reset by peer]
09:02:50APOLLO03 joins
09:03:23nyakase quits [Quit: @ERROR: max connections (-1) reached -- try again later]
09:04:38APOLLO03 quits [Client Quit]
09:06:49nyakase (nyakase) joins
09:16:43<@arkiver>let's do the channel JAA came up with
09:17:45<@JAA>Not mine :-)
09:18:03<@arkiver>if we cannot get a list of all clyps.it stuff, it may be URL collection time again
09:19:12<@arkiver>JAA: oh yeah i see now
09:19:16<@arkiver>qwertyasdfuiopghjkl2 it is :)
09:19:32<@arkiver>i foresee many typos when writing apocaclyps
09:21:45<@JAA>I know I made one already earlier. lol
10:09:38rohvani quits [Ping timeout: 260 seconds]
10:16:50qinplus_phone quits [Quit: Connection closed for inactivity]
10:36:21<h2ibot>Manu edited Discourse/archived (+79, Queued llllllll.co): https://wiki.archiveteam.org/?diff=54162&oldid=54142
11:16:45Hackerpcs quits [Quit: Hackerpcs]
11:30:25<steering>so... what bout a channel name for livejournal ;)
11:31:51<steering>(unless it could be done with AB but i suspect it's too big for that really)
11:32:41<@arkiver>steering: is it shutting down?
11:32:46<steering>no
11:33:30MrMcNuggets (MrMcNuggets) joins
11:33:46<steering>it came up in -ot :) but it is somewhat on-topic
11:33:49<@arkiver>actually
11:33:51<@arkiver>#recordedjournal
11:33:54<@arkiver>we have one already
11:34:00<steering>hah
11:34:02<steering>ofc
11:56:14Hackerpcs (Hackerpcs) joins
12:00:01Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat]
12:02:50Bleo182600722719623 joins
12:38:58SkilledAlpaca418962 joins
13:12:12f_ quits [Remote host closed the connection]
13:12:20f_ (funderscore) joins
13:16:08ducky quits [Ping timeout: 260 seconds]
13:23:04etnguyen03 (etnguyen03) joins
13:32:30ducky (ducky) joins
13:39:14<kiska>Ping me when the tracker is up, so websocket can be done
14:03:27nyakase quits [Quit: @ERROR: max connections (-1) reached -- try again later]
14:23:44etnguyen03 quits [Client Quit]
14:33:57etnguyen03 (etnguyen03) joins
15:23:04etnguyen03 quits [Client Quit]
15:35:36etnguyen03 (etnguyen03) joins
15:42:38katia- is now known as BouncerServ
15:58:33APOLLO03 joins
15:59:09APOLLO03 quits [Client Quit]
16:10:06AlsoHP_Archivist joins
16:10:06HP_Archivist quits [Read error: Connection reset by peer]
16:12:18etnguyen03 quits [Client Quit]
16:27:51DogsRNice joins
16:40:40MrMcNuggets quits [Quit: WeeChat 4.3.2]
16:50:08andrew1 (andrew) joins
16:52:13andrew quits [Ping timeout: 252 seconds]
16:52:13andrew1 is now known as andrew
16:57:09lurk3r joins
17:07:21lurk3r quits [Changing host]
17:07:21lurk3r (lurk3r) joins
17:44:27etnguyen03 (etnguyen03) joins
17:58:10etnguyen03 quits [Client Quit]
18:04:04AlsoHP_Archivist quits [Client Quit]
18:14:35graham9 joins
18:27:18etnguyen03 (etnguyen03) joins
18:56:02<immibis>deadjournal would be too obvious i suppose
19:04:01etnguyen03 quits [Client Quit]
19:27:51immibis quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
19:30:05immibis joins
19:33:40immibis leaves
19:37:58Wohlstand (Wohlstand) joins
19:54:01sec^nd quits [Remote host closed the connection]
19:54:19sec^nd (second) joins
19:55:41ParanoidAndroid joins
19:55:57SootBector quits [Remote host closed the connection]
19:56:20SootBector (SootBector) joins
19:59:18<ParanoidAndroid>Hey y'all, can anyone advise on how to see if an old Posterous blog is backed up and how I could access it? I see there are hundreds of 25-50GB archive files on Archive.org, but how would I know which one to download?
20:01:34<pokechu22>If you have the URL for the blog, you should be able to just find it on https://web.archive.org/
20:03:17<ParanoidAndroid>pokechu22 The blog comes up there but none of the uploaded images to the posts show up
20:04:43<pokechu22>Hmm. Unfortunately the Posterous job was before my time so I'm not too sure of the details, but if web.archive.org isn't finding the images, that might mean that they weren't archived. (It could also mean that they were archived in a different format/at a slightly different URL; someone else would know more)
20:11:17etnguyen03 (etnguyen03) joins
20:15:41<alexlehm>posterous used another url for images called getfileN.posterous.com which fetched the images from aws or something similar and that was apparently not caught by the archive crawler (or the url changed for different crawls)
20:16:51<alexlehm>the url contains some kind of identifier, not sure if that was static, a typical url was like this: http://getfile5.posterous.com/getfile/files.posterous.com/alexlehm/RyryOZXo4cyudM1M0vjpHaFuMidkk50Ea8ACj8fSMgGbBDOKefFrSU24b37r/20130101_002149.jpg.scaled.1000.jpg
20:20:41<ParanoidAndroid>alexlehm So you think the images weren't part of the archive?
20:25:34<alexlehm>i guess so, i cannot find any of mine
20:25:40driib9 quits [Quit: Ping timeout (120 seconds)]
20:27:46<alexlehm>there is page archived as getfile5.posterous.com, strange
20:39:47BlueMaxima joins
20:47:53tertu quits [Quit: so long...]
20:55:46<that_lurker>The service we all love Akamai is most likely going to buy Edgio https://www.akamai.com/newsroom/press-release/akamai-announces-winning-bid-for-select-assets-of-edgio
21:00:04driib9 (driib) joins
21:00:44driib9 quits [Client Quit]
21:02:40driib9 (driib) joins
21:06:47driib9 quits [Client Quit]
21:09:37driib9 (driib) joins
21:17:57etnguyen03 quits [Client Quit]
21:18:55driib9 quits [Client Quit]
21:20:57driib9 (driib) joins
21:27:37ParanoidAndroid quits [Client Quit]
21:28:41BlueMaxima quits [Read error: Connection reset by peer]
21:35:16driib9 quits [Client Quit]
21:37:25driib9 (driib) joins
21:42:24Webuser249671 joins
21:42:34Webuser249671 quits [Client Quit]
21:42:45driib99 (driib) joins
21:44:49driib9 quits [Ping timeout: 252 seconds]
21:44:49driib99 is now known as driib9
21:51:44etnguyen03 (etnguyen03) joins
21:54:46Naruyoko quits [Quit: Leaving]
21:56:24<szczot3k>Any info on IRC bots being down?
21:56:44<szczot3k>People try to invoke them in #down-the-tube, #telegrab, but they've been shut down
21:57:01<@JAA>arkiver: ^
21:57:38<szczot3k>also #archivebot-alerts is dead
21:57:57<@JAA>Yes, already pinged there.
21:58:24<@JAA>(Also entirely unrelated to qubert.)
21:58:25<szczot3k>ack, just bumping it here
22:26:27pixel leaves [Error from remote client]
22:47:50<katia>is it entirely unrelated to h2ibot?
22:48:14<@JAA>Yes
22:51:52<that_lurker>damn was about to comment that no one mentioned h2ibot this time :-P
22:52:11ljcool2006 joins
23:02:00andrew1 (andrew) joins
23:03:43andrew quits [Ping timeout: 260 seconds]
23:03:43andrew1 is now known as andrew
23:09:06Naruyoko joins
23:12:19@JAA slaps that_lurker around a bit with a large trout
23:13:35that_lurker absorbs the omega-3 from the trout
23:13:53<szczot3k>Can I also get a slap please?
23:18:25loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
23:20:48Webuser201125 joins
23:21:04Webuser201125 quits [Client Quit]
23:40:15pixel (pixel) joins