00:12:49 | | ats quits [Ping timeout: 255 seconds] |
00:13:43 | | ats (ats) joins |
00:19:34 | | Earendil7 quits [Ping timeout: 255 seconds] |
00:21:36 | | Notrealname1234 (Notrealname1234) joins |
00:22:08 | | Earendil7 (Earendil7) joins |
00:31:16 | | Earendil7 quits [Ping timeout: 255 seconds] |
00:38:41 | | Notrealname1234 quits [Client Quit] |
01:25:10 | | flotwig_ joins |
01:26:10 | | flotwig quits [Ping timeout: 255 seconds] |
01:26:10 | | flotwig_ is now known as flotwig |
01:39:49 | | Perk quits [Ping timeout: 272 seconds] |
01:44:18 | | Perk joins |
01:47:38 | | etnguyen03 (etnguyen03) joins |
02:02:35 | | etnguyen03 quits [Client Quit] |
02:21:37 | | midou quits [Ping timeout: 272 seconds] |
02:57:16 | | etnguyen03 (etnguyen03) joins |
03:13:20 | | midou joins |
04:43:50 | | etnguyen03 quits [Client Quit] |
04:54:38 | | midou quits [Ping timeout: 265 seconds] |
04:55:20 | | midou joins |
05:03:44 | | etnguyen03 (etnguyen03) joins |
05:13:50 | | etnguyen03 quits [Remote host closed the connection] |
05:15:28 | | Dango360_ (Dango360) joins |
05:18:04 | | _Dango360 (Dango360) joins |
05:18:49 | | pabs quits [Ping timeout: 255 seconds] |
05:19:16 | | Dango360 quits [Ping timeout: 255 seconds] |
05:21:31 | | Dango360_ quits [Ping timeout: 255 seconds] |
05:28:27 | | pabs (pabs) joins |
05:31:37 | | nulldata quits [Client Quit] |
05:32:03 | | nulldata (nulldata) joins |
05:50:45 | <@arkiver> | let's make a channel for post.news - does anyone have ideas? |
05:55:53 | <audrooku|m> | past-news? |
06:43:44 | | tapos joins |
06:45:22 | <tapos> | michaelblob: How is it going with the Kemono URL scraper, by the way? |
06:52:41 | <h2ibot> | JacksonChen666 edited Deathwatch (+283, generatorland.com): https://wiki.archiveteam.org/?diff=52071&oldid=52056 |
06:59:27 | | _Dango360 quits [Client Quit] |
06:59:42 | | Dango360 (Dango360) joins |
07:02:21 | | Arcorann (Arcorann) joins |
07:05:02 | | Unholy23619 quits [Remote host closed the connection] |
07:06:10 | | Unholy23619 (Unholy2361) joins |
07:06:25 | <fireonlive> | #past-news wfm |
07:15:09 | <michaelblob> | tapos: i already dropped the links here |
07:15:22 | <michaelblob> | i threw the blogger sites into #frogger |
07:15:32 | <michaelblob> | tapos: https://transfer.archivete.am/bhDE0/kemono-blogspot-com-fixed.txt |
07:15:33 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/bhDE0/kemono-blogspot-com-fixed.txt |
07:15:41 | <michaelblob> | tapos: https://transfer.archivete.am/CbXSO/kemono-sites-google-com-fixed.txt |
07:15:41 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/CbXSO/kemono-sites-google-com-fixed.txt |
07:47:55 | | Larsenv quits [Quit: The Lounge - https://thelounge.chat] |
07:57:52 | | Larsenv (Larsenv) joins |
08:21:42 | <c3manu> | JAA: got a little further with the help of a friend yesterday. didn't feel like finishing it, but it looks promising :) |
08:23:06 | <@arkiver> | made some changes to my znc settings that should hopefully prevent these cases in which i do not get a message (which are rare, but they happened ~1 day ago) |
09:00:01 | | Bleo182600 quits [Client Quit] |
09:01:17 | | Bleo182600 joins |
09:48:39 | | Wohlstand (Wohlstand) joins |
10:14:26 | | f_ (funderscore) joins |
10:54:31 | | Doranwen quits [Ping timeout: 255 seconds] |
10:55:07 | | Doranwen (Doranwen) joins |
10:58:47 | <nulldata> | #postalone ? |
11:15:35 | | Doranwen quits [Remote host closed the connection] |
11:16:18 | | f_ quits [Ping timeout: 255 seconds] |
11:19:01 | | Doranwen (Doranwen) joins |
11:25:20 | <datechnoman> | Haha I like that one |
11:31:51 | | Doranwen quits [Remote host closed the connection] |
11:32:18 | | Doranwen (Doranwen) joins |
11:36:08 | | jacksonchen666 (jacksonchen666) joins |
11:37:30 | | jacksonchen666 quits [Remote host closed the connection] |
11:38:01 | | jacksonchen666 (jacksonchen666) joins |
11:41:21 | | jacksonchen666 quits [Remote host closed the connection] |
11:50:44 | | Doranwen quits [Read error: Connection reset by peer] |
11:51:10 | | Doranwen (Doranwen) joins |
12:00:56 | | Doranwen quits [Ping timeout: 265 seconds] |
12:09:18 | | imer quits [Client Quit] |
12:11:58 | | Doranwen (Doranwen) joins |
12:17:51 | | kiryu leaves |
12:24:37 | | Doranwen quits [Ping timeout: 265 seconds] |
12:39:08 | | imer (imer) joins |
12:47:44 | | imer quits [Client Quit] |
12:49:20 | | Doranwen (Doranwen) joins |
12:55:55 | | imer (imer) joins |
12:56:01 | | Doranwen quits [Ping timeout: 255 seconds] |
13:26:32 | | etnguyen03 (etnguyen03) joins |
13:32:00 | <tapos> | michaelblob: Nice, I'll take a look at it in a bit when I'm back |
13:32:04 | | tapos quits [Client Quit] |
13:42:28 | | Doranwen (Doranwen) joins |
13:51:49 | | Doranwen quits [Ping timeout: 255 seconds] |
13:55:58 | | Arcorann quits [Ping timeout: 265 seconds] |
13:56:53 | | nimaje1 joins |
13:56:59 | | nimaje quits [Read error: Connection reset by peer] |
13:58:22 | | Doranwen (Doranwen) joins |
13:59:19 | | nimaje1 is now known as nimaje |
14:41:35 | | f_ (funderscore) joins |
14:45:09 | | BPCZ quits [Ping timeout: 272 seconds] |
14:49:54 | | etnguyen03 quits [Client Quit] |
15:01:06 | | booh joins |
15:33:19 | | f_ quits [Remote host closed the connection] |
15:34:15 | | f_ (funderscore) joins |
15:36:27 | | daxxy_ quits [Ping timeout: 272 seconds] |
16:03:36 | | booh quits [Client Quit] |
16:03:50 | | etnguyen03 (etnguyen03) joins |
16:23:30 | | tapos joins |
16:26:35 | <tapos> | We'd need to get the entire sites that the blog posts are on since some don't link to the main site, but to individual blog posts instead. I can do that manually today or tomorrow. |
16:27:40 | <tapos> | michaelblob: Can you run the same thing for posts on Kemono that link to catbox.moe, wordpress.com, wixsite.com, and tumblr.com? |
16:27:42 | <tapos> | https://kemono.su/posts?q=catbox.moe |
16:27:48 | <tapos> | https://kemono.su/posts?q=wordpress.com |
16:27:54 | <tapos> | https://kemono.su/posts?q=wixsite.com |
16:28:03 | <tapos> | https://kemono.su/posts?q=tumblr.com |
16:28:40 | <tapos> | With WordPress, Wix, and Tumblr we'd cover the other most popular free site hosts |
16:29:53 | | Notrealname1234 (Notrealname1234) joins |
16:29:57 | <tapos> | And Catbox.moe is the most popular direct link file host, allowing up to 200 MB/file |
16:30:05 | <tapos> | Seems to be popular among animators |
16:30:57 | <tapos> | With 19,522 posts using Catbox I'd assume that it's going to take some storage space though |
16:36:04 | | Notrealname1234 quits [Client Quit] |
16:45:06 | <@arkiver> | so, got serious connection problems, hopefully fixed soon |
16:48:31 | <h2ibot> | Ryz edited List of websites excluded from the Wayback Machine (+21, Added https://abload.de/): https://wiki.archiveteam.org/?diff=52072&oldid=52039 |
16:49:31 | <h2ibot> | Ryz edited Deathwatch (+166, /* 2024 */ Added Abload): https://wiki.archiveteam.org/?diff=52073&oldid=52071 |
16:50:44 | <Ryz> | So uhh, https://abload.de/ is excluded from WBM, so this is double-oof since the content would much less likely to be seen if we decide to do an archiving project |
16:53:55 | <tapos> | Might be worth grabbing links for the major (mostly uncompressed) image hosts from Kemono as well: |
16:53:57 | <tapos> | https://kemono.su/posts?q=lensdump.com |
16:54:03 | <tapos> | https://kemono.su/posts?q=imgur.com |
16:54:08 | <tapos> | https://kemono.su/posts?q=postimg.cc |
16:54:13 | <tapos> | https://kemono.su/posts?q=postimg.org |
16:54:19 | <tapos> | https://kemono.su/posts?q=cubeupload.com |
16:55:51 | <tapos> | Found another video host that is used by animators: |
16:55:52 | <tapos> | https://kemono.su/posts?q=webmshare.com |
16:58:02 | | archivst joins |
16:58:03 | <tapos> | I think that should be it |
16:58:12 | <archivst> | Please archive this twitter account: https://twitter.com/dipshitsecrets |
16:58:24 | | archivst quits [Client Quit] |
16:59:01 | | BPCZ (BPCZ) joins |
16:59:47 | <tapos> | catbox.moe and webmshare.com are the most important ones of the bunch |
17:00:00 | <tapos> | So at least those two would be great to grab |
17:00:33 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=52074&oldid=52072 |
17:32:20 | | Island joins |
17:44:08 | <nicolas17> | arkiver: how do you stay online on IRC yet lose messages? :P |
17:52:53 | <nicolas17> | Vokun: looks like Samsung replaced SM-S928B_14_Opensource.zip since I grabbed the metadata |
17:53:15 | <nicolas17> | can you try uploading it again or did you lose the page by now? |
17:56:58 | | JaffaCakes118 quits [Remote host closed the connection] |
17:57:22 | | JaffaCakes118 (JaffaCakes118) joins |
18:09:36 | <tapos> | Twitter links might be worth getting as well: https://kemono.su/posts?q=twitter.com |
18:09:47 | <tapos> | That's 200,750 posts though |
18:10:28 | <tapos> | Kemono is NSFW for anyone unaware, by the way |
18:12:06 | | f_ quits [Ping timeout: 255 seconds] |
19:14:36 | | etnguyen03 quits [Client Quit] |
19:14:37 | <tapos> | https://kemono.su/posts?q=redgifs.com |
19:14:39 | <tapos> | https://kemono.su/posts?q=gfycat.com |
19:15:05 | <tapos> | ^ Two more video hosts that I remembered, these ones for short videos |
19:16:03 | <tapos> | Gfycat is dead, but the IDs are the same for the videos that migrated over to RedGifs, so it's easy to get the RedGifs URLs as long as the ID is in there |
19:16:14 | | michaelblob_ (michaelblob) joins |
19:17:22 | <tapos> | Great work on the Kemono URL scraper by the way |
19:17:33 | <tapos> | I'm not sure if I said thanks, so I'll say it now |
19:19:48 | | michaelblob quits [Ping timeout: 265 seconds] |
19:20:17 | | Guest quits [Ping timeout: 265 seconds] |
19:24:36 | <fireonlive> | wickerz: I think JAA handled/is handling the Sims Forums |
19:25:32 | <michaelblob_> | tapos: working on the other sites now, catbox.moe is a hefty boi |
19:25:43 | <tapos> | Rodger that |
19:25:46 | <wickerz> | fireonlive: alright :) |
19:25:59 | <tapos> | Can Wayback Machine handle Google Drive and Dropbox links? |
19:26:18 | <tapos> | And thanks |
19:26:38 | <fireonlive> | :) |
19:26:39 | | BornOn420 quits [Client Quit] |
19:38:20 | <fireonlive> | Apple has reportedly acquired Datakalab https://9to5mac.com/2024/04/22/apple-startup-acquire-ai-compression-and-computer-vision/https://news.ycombinator.com/item?id=40114350 |
19:39:15 | <tapos> | Abload.de is shutting down: |
19:39:16 | <tapos> | https://www.resetera.com/threads/image-uploading-site-abload-to-shut-down-by-june-30th-2024-all-legacy-images-and-links-to-go-offline-as-well.850704/ |
19:39:23 | <tapos> | https://abload.de/blogpost.php?id=635 |
19:40:47 | <@JAA> | fireonlive: I may or may not have forgotten about it. So thanks. |
19:41:09 | <fireonlive> | ah :) welcome |
19:42:26 | | BornOn420 (BornOn420) joins |
19:44:00 | | Dango360_ (Dango360) joins |
19:44:38 | | Terbium quits [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.] |
19:44:59 | | Terbium joins |
19:45:54 | | Dango360 quits [Ping timeout: 265 seconds] |
19:56:39 | | Guest joins |
20:05:41 | | Wohlstand quits [Client Quit] |
20:05:51 | | Wohlstand (Wohlstand) joins |
20:06:37 | <wickerz> | Potential dumb question inc… what’s the difference between using AB and the “Save Page” function of WBM? |
20:06:46 | | Wohlstand quits [Read error: Connection reset by peer] |
20:07:01 | | Wohlstand (Wohlstand) joins |
20:07:21 | <tapos> | ArchiveBot works on some pages the Save Page function errors on |
20:07:35 | <tapos> | ArchiveBot can also archive entire websites |
20:09:30 | <wickerz> | So WBM Save Page is just for a snap shot of i.e the front page/specific url given to it? |
20:10:59 | <wickerz> | Since you write “entire websites” that is |
20:12:38 | <tapos> | Yeah, Wayback Machine Save Page is only for individual webpages |
20:13:42 | <tapos> | ArchiveBot can do individual webpages via the !ao (archive only) command or entire websites and any external webpages that it links to via the !a (archive) command |
20:26:18 | <thuban> | wickerz: archivebot is based on wpull (https://github.com/ArchiveTeam/wpull), whereas 'save page now' is based on brozzler (https://github.com/internetarchive/brozzler). the most important difference (aside from being recursive) is that archivebot simply makes http requests, whereas spn emulates an actual browser; some pages which use javascript to load assets will not be |
20:26:20 | <thuban> | captured properly by archivebot (unless manually reverse-engineered), but will be by spn. |
20:31:37 | <fireonlive> | ah i thought AB was on ludios_wpull but that's just grab-site |
20:33:35 | <fireonlive> | tapos: can also use !a with --no-offsite-links to augment as well |
20:33:48 | <tapos> | True |
20:36:20 | | driib (driib) joins |
20:44:55 | | ymgve quits [Ping timeout: 255 seconds] |
20:48:02 | | ymgve joins |
20:48:08 | <@JAA> | The ArchiveBot WARC data is publicly accessible, SPN's is not. |
20:48:14 | | driib quits [Client Quit] |
20:48:46 | | driib (driib) joins |
20:51:11 | | zhongfu quits [Quit: cya losers] |
20:53:29 | | sec^nd quits [Remote host closed the connection] |
20:54:12 | | sec^nd (second) joins |
20:56:26 | | zhongfu (zhongfu) joins |
20:57:18 | | driib quits [Client Quit] |
21:00:39 | | ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
21:00:46 | | ThetaDev joins |
21:15:41 | | driib (driib) joins |
21:16:00 | | driib quits [Client Quit] |
21:17:28 | | driib (driib) joins |
21:28:55 | | Dango360_ quits [Client Quit] |
21:29:11 | | Dango360 (Dango360) joins |
21:30:19 | <michaelblob_> | JAA: i have a list of site.google.com links, where can those get processed? could i start an archivebot job for each one? |
21:30:54 | <michaelblob_> | similarly working on a list of catbox.moe, wordpress.com, wixsite.com, and tumblr.com sites from kemono for tapos |
21:32:26 | <pokechu22> | Archivebot works for those, yeah. tumblr requires the singletumblr ignoreset and wix requires a specific ignore that's not currently in an ignoreset |
21:43:06 | <michaelblob_> | hm ok |
21:43:17 | <michaelblob_> | i'll just make the list and hopefully someone can do something with it |
21:50:01 | | BlueMaxima joins |
21:56:14 | | whoom joins |
21:56:28 | <whoom> | Sorry if this is off-topic, but could someone help me find potential archives of something? |
21:56:43 | <whoom> | I'm looking for archives of an old Invisionfree board |
21:56:53 | <whoom> | There are only a few captures directly accessible on the wayback machine |
21:57:01 | <whoom> | but I'm wondering if I could possibly find more elsewhere |
21:57:16 | <whoom> | here’s a link to an IA snapshot: https://web.archive.org/web/20091204051950/http://z6.invisionfree.com/Ponyville/index.php |
21:57:20 | | whoom quits [Client Quit] |
22:01:56 | <tapos> | Not sure how he expects to get help with something like that within 1 minute and 5 seconds |
22:02:13 | <tapos> | Must be from the Subway Surfers generation |
22:10:19 | | BlueMaxima quits [Read error: Connection reset by peer] |
22:10:33 | | BlueMaxima joins |
22:13:06 | | Notrealname1234 (Notrealname1234) joins |
22:14:55 | | lunik1 quits [Ping timeout: 255 seconds] |
22:20:54 | | Notrealname1234 quits [Client Quit] |
22:22:34 | | Wohlstand quits [Ping timeout: 255 seconds] |
22:26:00 | | pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat] |
22:26:29 | | pixel leaves |
22:27:41 | <pokechu22> | !tell whoom The only info on the WBM is https://web.archive.org/web/*/http://z6.invisionfree.com/Ponyville/* - and there probably aren't any other archives (or at least no other ones that could easily be found) |
22:27:42 | <eggdrop> | [tell] ok, I'll tell whoom when they join next |
22:29:07 | | pedantic-darwin joins |
22:29:48 | | pixel (pixel) joins |
22:36:56 | | pixel leaves |
22:36:56 | | pixel (pixel) joins |
22:45:02 | <mgrandi> | Not sure if this has been posted before but nier reincarnation is shutting down on the 29th, they have a website and a few socials: https://nierreincarnation.com/news/story_completion_countdown_en/ |
22:45:13 | <mgrandi> | I'm working on getting the actual game data |
22:57:15 | | zhongfu_ (zhongfu) joins |
22:57:32 | | zhongfu quits [Read error: Connection reset by peer] |
23:01:59 | <@JAA> | It's really annoying that browsers changed things such that the web chat thing is essentially broken since you must keep the tab in foreground to stay connected. |
23:04:11 | <nulldata> | JAA - good thing we have web IRC clients such as The Lounge! |
23:04:16 | | nulldata ducks |
23:21:30 | <nicolas17> | pop it into a separate *window* and it'll be fine :P |
23:21:59 | <@JAA> | Yeah, but the average user of that thing doesn't know that, see above. |
23:26:34 | <fireonlive> | i wonder if the lounge in temporary mode would work any better |
23:33:11 | | lunik1 joins |
23:43:36 | <Vokun> | <nicolas17> "Vokun: looks like Samsung..." <- The last three things I downloaded, the https://data.nicolas17.xyz/samsung-grab/ site refreshed and the upload button was gone. |
23:47:31 | <Vokun> | I do still have the file, but the site broke a bit when the page autorefreshed after like an hour |
23:48:06 | | etnguyen03 (etnguyen03) joins |