00:12:49ats quits [Ping timeout: 255 seconds]
00:13:43ats (ats) joins
00:19:34Earendil7 quits [Ping timeout: 255 seconds]
00:21:36Notrealname1234 (Notrealname1234) joins
00:22:08Earendil7 (Earendil7) joins
00:31:16Earendil7 quits [Ping timeout: 255 seconds]
00:38:41Notrealname1234 quits [Client Quit]
01:25:10flotwig_ joins
01:26:10flotwig quits [Ping timeout: 255 seconds]
01:26:10flotwig_ is now known as flotwig
01:39:49Perk quits [Ping timeout: 272 seconds]
01:44:18Perk joins
01:47:38etnguyen03 (etnguyen03) joins
02:02:35etnguyen03 quits [Client Quit]
02:21:37midou quits [Ping timeout: 272 seconds]
02:57:16etnguyen03 (etnguyen03) joins
03:13:20midou joins
04:43:50etnguyen03 quits [Client Quit]
04:54:38midou quits [Ping timeout: 265 seconds]
04:55:20midou joins
05:03:44etnguyen03 (etnguyen03) joins
05:13:50etnguyen03 quits [Remote host closed the connection]
05:15:28Dango360_ (Dango360) joins
05:18:04_Dango360 (Dango360) joins
05:18:49pabs quits [Ping timeout: 255 seconds]
05:19:16Dango360 quits [Ping timeout: 255 seconds]
05:21:31Dango360_ quits [Ping timeout: 255 seconds]
05:28:27pabs (pabs) joins
05:31:37nulldata quits [Client Quit]
05:32:03nulldata (nulldata) joins
05:50:45<@arkiver>let's make a channel for post.news - does anyone have ideas?
05:55:53<audrooku|m>past-news?
06:43:44tapos joins
06:45:22<tapos>michaelblob: How is it going with the Kemono URL scraper, by the way?
06:52:41<h2ibot>JacksonChen666 edited Deathwatch (+283, generatorland.com): https://wiki.archiveteam.org/?diff=52071&oldid=52056
06:59:27_Dango360 quits [Client Quit]
06:59:42Dango360 (Dango360) joins
07:02:21Arcorann (Arcorann) joins
07:05:02Unholy23619 quits [Remote host closed the connection]
07:06:10Unholy23619 (Unholy2361) joins
07:06:25<fireonlive>#past-news wfm
07:15:09<michaelblob>tapos: i already dropped the links here
07:15:22<michaelblob>i threw the blogger sites into #frogger
07:15:32<michaelblob>tapos: https://transfer.archivete.am/bhDE0/kemono-blogspot-com-fixed.txt
07:15:33<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/bhDE0/kemono-blogspot-com-fixed.txt
07:15:41<michaelblob>tapos: https://transfer.archivete.am/CbXSO/kemono-sites-google-com-fixed.txt
07:15:41<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/CbXSO/kemono-sites-google-com-fixed.txt
07:47:55Larsenv quits [Quit: The Lounge - https://thelounge.chat]
07:57:52Larsenv (Larsenv) joins
08:21:42<c3manu>JAA: got a little further with the help of a friend yesterday. didn't feel like finishing it, but it looks promising :)
08:23:06<@arkiver>made some changes to my znc settings that should hopefully prevent these cases in which i do not get a message (which are rare, but they happened ~1 day ago)
09:00:01Bleo182600 quits [Client Quit]
09:01:17Bleo182600 joins
09:48:39Wohlstand (Wohlstand) joins
10:14:26f_ (funderscore) joins
10:54:31Doranwen quits [Ping timeout: 255 seconds]
10:55:07Doranwen (Doranwen) joins
10:58:47<nulldata>#postalone ?
11:15:35Doranwen quits [Remote host closed the connection]
11:16:18f_ quits [Ping timeout: 255 seconds]
11:19:01Doranwen (Doranwen) joins
11:25:20<datechnoman>Haha I like that one
11:31:51Doranwen quits [Remote host closed the connection]
11:32:18Doranwen (Doranwen) joins
11:36:08jacksonchen666 (jacksonchen666) joins
11:37:30jacksonchen666 quits [Remote host closed the connection]
11:38:01jacksonchen666 (jacksonchen666) joins
11:41:21jacksonchen666 quits [Remote host closed the connection]
11:50:44Doranwen quits [Read error: Connection reset by peer]
11:51:10Doranwen (Doranwen) joins
12:00:56Doranwen quits [Ping timeout: 265 seconds]
12:09:18imer quits [Client Quit]
12:11:58Doranwen (Doranwen) joins
12:17:51kiryu leaves
12:24:37Doranwen quits [Ping timeout: 265 seconds]
12:39:08imer (imer) joins
12:47:44imer quits [Client Quit]
12:49:20Doranwen (Doranwen) joins
12:55:55imer (imer) joins
12:56:01Doranwen quits [Ping timeout: 255 seconds]
13:26:32etnguyen03 (etnguyen03) joins
13:32:00<tapos>michaelblob: Nice, I'll take a look at it in a bit when I'm back
13:32:04tapos quits [Client Quit]
13:42:28Doranwen (Doranwen) joins
13:51:49Doranwen quits [Ping timeout: 255 seconds]
13:55:58Arcorann quits [Ping timeout: 265 seconds]
13:56:53nimaje1 joins
13:56:59nimaje quits [Read error: Connection reset by peer]
13:58:22Doranwen (Doranwen) joins
13:59:19nimaje1 is now known as nimaje
14:41:35f_ (funderscore) joins
14:45:09BPCZ quits [Ping timeout: 272 seconds]
14:49:54etnguyen03 quits [Client Quit]
15:01:06booh joins
15:33:19f_ quits [Remote host closed the connection]
15:34:15f_ (funderscore) joins
15:36:27daxxy_ quits [Ping timeout: 272 seconds]
16:03:36booh quits [Client Quit]
16:03:50etnguyen03 (etnguyen03) joins
16:23:30tapos joins
16:26:35<tapos>We'd need to get the entire sites that the blog posts are on since some don't link to the main site, but to individual blog posts instead. I can do that manually today or tomorrow.
16:27:40<tapos>michaelblob: Can you run the same thing for posts on Kemono that link to catbox.moe, wordpress.com, wixsite.com, and tumblr.com?
16:27:42<tapos>https://kemono.su/posts?q=catbox.moe
16:27:48<tapos>https://kemono.su/posts?q=wordpress.com
16:27:54<tapos>https://kemono.su/posts?q=wixsite.com
16:28:03<tapos>https://kemono.su/posts?q=tumblr.com
16:28:40<tapos>With WordPress, Wix, and Tumblr we'd cover the other most popular free site hosts
16:29:53Notrealname1234 (Notrealname1234) joins
16:29:57<tapos>And Catbox.moe is the most popular direct link file host, allowing up to 200 MB/file
16:30:05<tapos>Seems to be popular among animators
16:30:57<tapos>With 19,522 posts using Catbox I'd assume that it's going to take some storage space though
16:36:04Notrealname1234 quits [Client Quit]
16:45:06<@arkiver>so, got serious connection problems, hopefully fixed soon
16:48:31<h2ibot>Ryz edited List of websites excluded from the Wayback Machine (+21, Added https://abload.de/): https://wiki.archiveteam.org/?diff=52072&oldid=52039
16:49:31<h2ibot>Ryz edited Deathwatch (+166, /* 2024 */ Added Abload): https://wiki.archiveteam.org/?diff=52073&oldid=52071
16:50:44<Ryz>So uhh, https://abload.de/ is excluded from WBM, so this is double-oof since the content would much less likely to be seen if we decide to do an archiving project
16:53:55<tapos>Might be worth grabbing links for the major (mostly uncompressed) image hosts from Kemono as well:
16:53:57<tapos>https://kemono.su/posts?q=lensdump.com
16:54:03<tapos>https://kemono.su/posts?q=imgur.com
16:54:08<tapos>https://kemono.su/posts?q=postimg.cc
16:54:13<tapos>https://kemono.su/posts?q=postimg.org
16:54:19<tapos>https://kemono.su/posts?q=cubeupload.com
16:55:51<tapos>Found another video host that is used by animators:
16:55:52<tapos>https://kemono.su/posts?q=webmshare.com
16:58:02archivst joins
16:58:03<tapos>I think that should be it
16:58:12<archivst>Please archive this twitter account: https://twitter.com/dipshitsecrets
16:58:24archivst quits [Client Quit]
16:59:01BPCZ (BPCZ) joins
16:59:47<tapos>catbox.moe and webmshare.com are the most important ones of the bunch
17:00:00<tapos>So at least those two would be great to grab
17:00:33<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=52074&oldid=52072
17:32:20Island joins
17:44:08<nicolas17>arkiver: how do you stay online on IRC yet lose messages? :P
17:52:53<nicolas17>Vokun: looks like Samsung replaced SM-S928B_14_Opensource.zip since I grabbed the metadata
17:53:15<nicolas17>can you try uploading it again or did you lose the page by now?
17:56:58JaffaCakes118 quits [Remote host closed the connection]
17:57:22JaffaCakes118 (JaffaCakes118) joins
18:09:36<tapos>Twitter links might be worth getting as well: https://kemono.su/posts?q=twitter.com
18:09:47<tapos>That's 200,750 posts though
18:10:28<tapos>Kemono is NSFW for anyone unaware, by the way
18:12:06f_ quits [Ping timeout: 255 seconds]
19:14:36etnguyen03 quits [Client Quit]
19:14:37<tapos>https://kemono.su/posts?q=redgifs.com
19:14:39<tapos>https://kemono.su/posts?q=gfycat.com
19:15:05<tapos>^ Two more video hosts that I remembered, these ones for short videos
19:16:03<tapos>Gfycat is dead, but the IDs are the same for the videos that migrated over to RedGifs, so it's easy to get the RedGifs URLs as long as the ID is in there
19:16:14michaelblob_ (michaelblob) joins
19:17:22<tapos>Great work on the Kemono URL scraper by the way
19:17:33<tapos>I'm not sure if I said thanks, so I'll say it now
19:19:48michaelblob quits [Ping timeout: 265 seconds]
19:20:17Guest quits [Ping timeout: 265 seconds]
19:24:36<fireonlive>wickerz: I think JAA handled/is handling the Sims Forums
19:25:32<michaelblob_>tapos: working on the other sites now, catbox.moe is a hefty boi
19:25:43<tapos>Rodger that
19:25:46<wickerz>fireonlive: alright :)
19:25:59<tapos>Can Wayback Machine handle Google Drive and Dropbox links?
19:26:18<tapos>And thanks
19:26:38<fireonlive>:)
19:26:39BornOn420 quits [Client Quit]
19:38:20<fireonlive>Apple has reportedly acquired Datakalab https://9to5mac.com/2024/04/22/apple-startup-acquire-ai-compression-and-computer-vision/https://news.ycombinator.com/item?id=40114350
19:39:15<tapos>Abload.de is shutting down:
19:39:16<tapos>https://www.resetera.com/threads/image-uploading-site-abload-to-shut-down-by-june-30th-2024-all-legacy-images-and-links-to-go-offline-as-well.850704/
19:39:23<tapos>https://abload.de/blogpost.php?id=635
19:40:47<@JAA>fireonlive: I may or may not have forgotten about it. So thanks.
19:41:09<fireonlive>ah :) welcome
19:42:26BornOn420 (BornOn420) joins
19:44:00Dango360_ (Dango360) joins
19:44:38Terbium quits [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
19:44:59Terbium joins
19:45:54Dango360 quits [Ping timeout: 265 seconds]
19:56:39Guest joins
20:05:41Wohlstand quits [Client Quit]
20:05:51Wohlstand (Wohlstand) joins
20:06:37<wickerz>Potential dumb question inc… what’s the difference between using AB and the “Save Page” function of WBM?
20:06:46Wohlstand quits [Read error: Connection reset by peer]
20:07:01Wohlstand (Wohlstand) joins
20:07:21<tapos>ArchiveBot works on some pages the Save Page function errors on
20:07:35<tapos>ArchiveBot can also archive entire websites
20:09:30<wickerz>So WBM Save Page is just for a snap shot of i.e the front page/specific url given to it?
20:10:59<wickerz>Since you write “entire websites” that is
20:12:38<tapos>Yeah, Wayback Machine Save Page is only for individual webpages
20:13:42<tapos>ArchiveBot can do individual webpages via the !ao (archive only) command or entire websites and any external webpages that it links to via the !a (archive) command
20:26:18<thuban>wickerz: archivebot is based on wpull (https://github.com/ArchiveTeam/wpull), whereas 'save page now' is based on brozzler (https://github.com/internetarchive/brozzler). the most important difference (aside from being recursive) is that archivebot simply makes http requests, whereas spn emulates an actual browser; some pages which use javascript to load assets will not be
20:26:20<thuban>captured properly by archivebot (unless manually reverse-engineered), but will be by spn.
20:31:37<fireonlive>ah i thought AB was on ludios_wpull but that's just grab-site
20:33:35<fireonlive>tapos: can also use !a with --no-offsite-links to augment as well
20:33:48<tapos>True
20:36:20driib (driib) joins
20:44:55ymgve quits [Ping timeout: 255 seconds]
20:48:02ymgve joins
20:48:08<@JAA>The ArchiveBot WARC data is publicly accessible, SPN's is not.
20:48:14driib quits [Client Quit]
20:48:46driib (driib) joins
20:51:11zhongfu quits [Quit: cya losers]
20:53:29sec^nd quits [Remote host closed the connection]
20:54:12sec^nd (second) joins
20:56:26zhongfu (zhongfu) joins
20:57:18driib quits [Client Quit]
21:00:39ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
21:00:46ThetaDev joins
21:15:41driib (driib) joins
21:16:00driib quits [Client Quit]
21:17:28driib (driib) joins
21:28:55Dango360_ quits [Client Quit]
21:29:11Dango360 (Dango360) joins
21:30:19<michaelblob_>JAA: i have a list of site.google.com links, where can those get processed? could i start an archivebot job for each one?
21:30:54<michaelblob_>similarly working on a list of catbox.moe, wordpress.com, wixsite.com, and tumblr.com sites from kemono for tapos
21:32:26<pokechu22>Archivebot works for those, yeah. tumblr requires the singletumblr ignoreset and wix requires a specific ignore that's not currently in an ignoreset
21:43:06<michaelblob_>hm ok
21:43:17<michaelblob_>i'll just make the list and hopefully someone can do something with it
21:50:01BlueMaxima joins
21:56:14whoom joins
21:56:28<whoom>Sorry if this is off-topic, but could someone help me find potential archives of something?
21:56:43<whoom>I'm looking for archives of an old Invisionfree board
21:56:53<whoom>There are only a few captures directly accessible on the wayback machine
21:57:01<whoom>but I'm wondering if I could possibly find more elsewhere
21:57:16<whoom>here’s a link to an IA snapshot: https://web.archive.org/web/20091204051950/http://z6.invisionfree.com/Ponyville/index.php
21:57:20whoom quits [Client Quit]
22:01:56<tapos>Not sure how he expects to get help with something like that within 1 minute and 5 seconds
22:02:13<tapos>Must be from the Subway Surfers generation
22:10:19BlueMaxima quits [Read error: Connection reset by peer]
22:10:33BlueMaxima joins
22:13:06Notrealname1234 (Notrealname1234) joins
22:14:55lunik1 quits [Ping timeout: 255 seconds]
22:20:54Notrealname1234 quits [Client Quit]
22:22:34Wohlstand quits [Ping timeout: 255 seconds]
22:26:00pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat]
22:26:29pixel leaves
22:27:41<pokechu22>!tell whoom The only info on the WBM is https://web.archive.org/web/*/http://z6.invisionfree.com/Ponyville/* - and there probably aren't any other archives (or at least no other ones that could easily be found)
22:27:42<eggdrop>[tell] ok, I'll tell whoom when they join next
22:29:07pedantic-darwin joins
22:29:48pixel (pixel) joins
22:36:56pixel leaves
22:36:56pixel (pixel) joins
22:45:02<mgrandi>Not sure if this has been posted before but nier reincarnation is shutting down on the 29th, they have a website and a few socials: https://nierreincarnation.com/news/story_completion_countdown_en/
22:45:13<mgrandi>I'm working on getting the actual game data
22:57:15zhongfu_ (zhongfu) joins
22:57:32zhongfu quits [Read error: Connection reset by peer]
23:01:59<@JAA>It's really annoying that browsers changed things such that the web chat thing is essentially broken since you must keep the tab in foreground to stay connected.
23:04:11<nulldata>JAA - good thing we have web IRC clients such as The Lounge!
23:04:16nulldata ducks
23:21:30<nicolas17>pop it into a separate *window* and it'll be fine :P
23:21:59<@JAA>Yeah, but the average user of that thing doesn't know that, see above.
23:26:34<fireonlive>i wonder if the lounge in temporary mode would work any better
23:33:11lunik1 joins
23:43:36<Vokun><nicolas17> "Vokun: looks like Samsung..." <- The last three things I downloaded, the https://data.nicolas17.xyz/samsung-grab/ site refreshed and the upload button was gone.
23:47:31<Vokun>I do still have the file, but the site broke a bit when the page autorefreshed after like an hour
23:48:06etnguyen03 (etnguyen03) joins