| 00:00:07 | <hlgs|m> | would it be possible to save a bunch of image links via archivebot, but only if they're either a) not archived at all, or b) have been archived, but the latest archive has a specific title pattern? to get more specific: save page now has been breaking on tumblr image and all the broken ones have a title that's "[something]: Image". i've got a list of about 60k image urls and i'd like to only save the ones that are broken, or that haven't been |
| 00:00:07 | <hlgs|m> | saved at all, just to not spend too many resources saving what's already saved okay |
| 00:03:45 | <nicolas17> | I'm not sure but it might take more resources to look up 60k images on the wayback machine than to just archive them again |
| 00:04:43 | <hlgs|m> | right, good to know. shame about the storage cost for the WBM itself but that might be the quickest/simplest option for me (i'd like to get these saved as soon as possible as images were being removed entirely lately) |
| 00:06:41 | <audrooku|m> | What about just listing all saved images using the cdx api? |
| 00:07:13 | <hlgs|m> | i don't have any experience with that, can you explain? |
| 00:08:22 | <hlgs|m> | the key thing would be identifying which images are broken by looking at the title they have in the wayback machine (as in, the title the tab/window shows when it's open in the WBM) |
| 00:08:49 | <hlgs|m> | that's the only consistent tell i've found in my research, other than it all being by save page now and recent, but i can't tell how recent |
| 00:11:08 | <nicolas17> | do you have the URL of the image, or of the page-containing-the-image? |
| 00:11:12 | <@JAA> | Are they actual images when saved correctly or shitty page wrappers? |
| 00:12:04 | <nicolas17> | anyway send us the 60k list, you can use https://transfer.archivete.am/ |
| 00:14:12 | <hlgs|m> | the direct urls of all the images |
| 00:14:37 | <nicolas17> | so they got mis-saved as html pages? |
| 00:14:47 | <nicolas17> | a jpeg file doesn't have a "title" |
| 00:14:51 | <hlgs|m> | save page now has been saving the weird page wrapper things tumblr has been doing lately, but archivebot isn't having that issue, so i'm basically wanting to redo a ton of images i saved using a SPN script recently |
| 00:14:57 | <hlgs|m> | let me get an example |
| 00:15:11 | <hlgs|m> | https://64.media.tumblr.com/377948577d35abb1be9e2be2dc9f2897/tumblr_o98nqnSmqh1s4dx9ko4_r5_1280.png |
| 00:15:27 | <@JAA> | Yup, that's what Tumblr does. |
| 00:15:33 | <hlgs|m> | yeah |
| 00:15:42 | <@JAA> | I'm not sure ArchiveBot is able to archive it correctly when given a direct image URL. |
| 00:15:45 | <hlgs|m> | SPN doesn't save the actual image at the moment (i've reported it as a bug but it's still being worked on it seems) |
| 00:15:54 | <@JAA> | It works on the running Tumblr jobs because those send an appropriate Referer header. |
| 00:16:15 | <hlgs|m> | hmm, really? could test it. so far, i haven't noticed and broken images when saved by archivebot |
| 00:16:16 | <nicolas17> | might depend on user agent too, curl gives me a png |
| 00:16:26 | <hlgs|m> | oh interesting |
| 00:16:40 | <@JAA> | Ah right, yeah, and the Accept header might also matter. |
| 00:16:48 | <nicolas17> | yeah seems it's Accept, not UA |
| 00:16:49 | <hlgs|m> | what i find fascinating is that, with non-gifs, i can right click and open the image in a new tab and get the actual image, but the url stays the same |
| 00:17:29 | <@JAA> | Yes, the URL is not the only thing determining how something gets loaded. |
| 00:17:30 | <nicolas17> | my browser requests that URL with "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8" and gets a webpage |
| 00:17:46 | <nicolas17> | the webpage has an <img> pointing at the same URL |
| 00:17:58 | <hlgs|m> | interesting |
| 00:18:04 | <@JAA> | Even just refreshing after 'open image in new tab' loads the page again. |
| 00:18:14 | <nicolas17> | the browser then requests the same URL with "Accept: image/avif,image/webp,*/*" and gets an image |
| 00:18:14 | <hlgs|m> | yeah, i've noticed that too |
| 00:18:15 | <nicolas17> | seems tumblr only cares that html is *not* in the list? |
| 00:18:37 | <hlgs|m> | (tumblr has been getting... really hard to archive properly lately. this and then the www blog urls and now the permalinks of the previous reblog just being gone in the www blog view... ugh) |
| 00:19:27 | <nicolas17> | hlgs|m: do you have an example of a tumblr image that *did* get archived properly? |
| 00:19:38 | <hlgs|m> | let me see, i can find one |
| 00:21:33 | <hlgs|m> | <https://web.archive.org/web/20230527004757/https://xenobotanist.tumblr.com/post/718394074083852288/since-data-has-spot-and-julian-has-kukalaka-i> |
| 00:21:33 | <hlgs|m> | <https://web.archive.org/web/20230527003724/https://64.media.tumblr.com/5b7a205a9cff48315c7b1d72e5ec6315/365671295079d02a-5f/s1280x1920/425ba3301aa42c89e9610c308b96960583b2b47c.jpg> |
| 00:21:33 | <hlgs|m> | oh, this is interesting, this one actually got saved by SPN outlinks, not archivebot. but i know i've seen archivebot ones that were saved, i'll find one |
| 00:21:51 | <hlgs|m> | ah, although, this one doesn't get the html wrapper thing when i open it separately |
| 00:22:01 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+139, /* 2023 */ Add Ragtag Archive): https://wiki.archiveteam.org/?diff=49876&oldid=49874 |
| 00:22:05 | <hlgs|m> | i'll try to find something |
| 00:23:20 | <@JAA> | Remember that images saved by SPN might have been saved because someone saved the page they're embedded on rather than the image URL directly. |
| 00:23:56 | <hlgs|m> | that's what i've been doing, but pretty much all the images within posts that i've tried saving via SPN have ended up broken |
| 00:24:29 | <hlgs|m> | okay, here's an archivebot one |
| 00:24:29 | <hlgs|m> | https://web.archive.org/web/20230526221101/https://unrestedjade.tumblr.com/post/667862452973780992/bluehedron-the-downside-of-having-a-host-friend |
| 00:24:29 | <hlgs|m> | https://web.archive.org/web/20230526221058/https://64.media.tumblr.com/d2d9086c62bb92fc78d0494a98addf2d/tumblr_paoqjnQrrp1wv21vuo1_r1_1280.jpg |
| 00:24:38 | <hlgs|m> | has the html wrapper thing when opened live |
| 00:24:43 | <@JAA> | Ah right |
| 00:25:02 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+76, /* 2023 */ Add kitsune tweet about Ragtag Archive): https://wiki.archiveteam.org/?diff=49877&oldid=49876 |
| 00:25:09 | <@JAA> | SPN does some weird things with images sometimes. If they're loaded by JS, they might not get archived on the initial SPN but only later when you access the snapshot. |
| 00:25:26 | <@JAA> | Which changes how they get accessed, which can cause this Tumblr nonsense. |
| 00:25:39 | <hlgs|m> | really? i think i've accessed snapshots several times without seeing the images update, but i can try again now |
| 00:26:06 | <nicolas17> | I think there's multiple snapshots of the same URL, some with the page, some with the image |
| 00:26:08 | <@JAA> | I've seen it happen on Imgur for example. SPN itself doesn't archive the actual image when you give it an image page or album. |
| 00:26:25 | <nicolas17> | which makes things harder |
| 00:26:25 | <@JAA> | That can happen, but each URL only gets saved once per 45 minutes. |
| 00:26:31 | <@JAA> | By SPN, anyway. |
| 00:26:33 | <hlgs|m> | just checked one i'd checked before and the images are still broken |
| 00:26:59 | <nicolas17> | https://web.archive.org/web/20230527003724/https://64.media.tumblr.com/5b7a205a9cff48315c7b1d72e5ec6315/365671295079d02a-5f/s1280x1920/425ba3301aa42c89e9610c308b96960583b2b47c.jpg |
| 00:27:00 | <nicolas17> | https://web.archive.org/web/20230604220454/https://64.media.tumblr.com/5b7a205a9cff48315c7b1d72e5ec6315/365671295079d02a-5f/s1280x1920/425ba3301aa42c89e9610c308b96960583b2b47c.jpg |
| 00:27:44 | <hlgs|m> | oh, interesting. let me see what saved those |
| 00:28:00 | <hlgs|m> | the working ones are archivebot |
| 00:28:10 | <hlgs|m> | the broken one is SPN, i did that one earlier today |
| 00:28:22 | <nicolas17> | the problem here is that if you archive an image "properly", then https://web.archive.org/https://64.media.tumblr.com/whatever/whatever.jpg will take you to the latest snapshot and will look correct, *until* something else causes it to get archived again x_x |
| 00:28:24 | <hlgs|m> | it says no collection info, i did it with the addon though i think |
| 00:28:33 | <hlgs|m> | yeahhh |
| 00:28:39 | <nicolas17> | and then the latest snapshot is the stupid wrapper page again |
| 00:29:20 | <hlgs|m> | the wrapper pages are useful because they have the URL of the original post, but they shouldn't be the last saved copy of an image because it then doesn't display, and it breaks embedding |
| 00:29:41 | <hlgs|m> | ideally i'd make sure the last saved copy is one that's not the broken wrapper, and then prevent further saving if it's going to save the wrapper over it... |
| 00:29:46 | <hlgs|m> | * the wrapper version over it... |
| 00:29:52 | <hlgs|m> | not sure if that's possible though |
| 00:30:58 | | icedice quits [Client Quit] |
| 00:31:52 | <@JAA> | That still doesn't help, because when you load a page, it will embed the snapshot of the image that's closest in time to the page's. |
| 00:32:06 | <@JAA> | So you can still end up with broken pages everywhere. |
| 00:32:16 | <hlgs|m> | ah right, ughh |
| 00:33:23 | <@JAA> | It's fun when you SPN something, it looks fine, and then some days later it breaks. Because it was actually embedding an old working snapshot of something (like an image or stylesheet), and in the following days, that something got rearchived in a broken state. |
| 00:33:27 | <hlgs|m> | in which case, ideally i'd just keep one copy of the wrapper because of the usefulness of the original post being linked, and convert all the rest into proper images or just wipe them all (and somehow keep the wrapper copy further away from the post than any images...) |
| 00:33:43 | <hlgs|m> | damn. does archivebot not have this issue? |
| 00:34:11 | <@JAA> | It's purely a WBM issue on playback because it mixes various data sources and uses that 'closest timestamp' stuff. |
| 00:34:25 | <hlgs|m> | well, for me what matters is just that the images and other data is somewhere on the archive and that i can access it with some inspect element digging from the post |
| 00:34:30 | <@JAA> | Isolated archives from AB don't have this problem, but when they're in the WBM, it can still happen. |
| 00:34:36 | <alexshpilkin> | imer: in any case thank you :) |
| 00:34:37 | <hlgs|m> | makes sense |
| 00:35:38 | <@JAA> | Worth mentioning that AB does breadth-first recursion, so embedded images are sometimes archived *much* later than the page. |
| 00:35:41 | <hlgs|m> | so... i suppose i could just run my list of urls that may-or-may-not be broken through archivebot to make sure there's at least one working backup somewhere? and hope the WBM team figures out some solution for this later... |
| 00:35:43 | <@JAA> | Like, can be weeks later. |
| 00:35:53 | <hlgs|m> | good to know |
| 00:36:05 | <@JAA> | Again, not a problem in isolation, can be a problem in the WBM or if the embedded things vanish in the meantime. |
| 00:36:15 | | alexshpilkin just went to a NixOS channel for a second and ended up investigating *a bug in bash* of all things for two hours, sorry imer |
| 00:36:26 | <hlgs|m> | i took the time to get the direct urls so they'd be prioritised now as they're most at risk (aside from people just deleting posts before i can get to them, which is annoying) |
| 00:37:34 | <nicolas17> | JAA: for a moment I thought, wouldn't breadth first get the images before recursing deep into links? but that's assuming the page is the root of the tree... |
| 00:37:39 | | icedice (icedice) joins |
| 00:38:00 | <@JAA> | alexshpilkin: Heh, I've encountered a bunch of weird stuff in Bash that turned out to be intentional/correct behaviour, but I found my first bug a couple weeks ago that also made me bang my head against the wall for hours. (Still need to write an email to bug-bash though.) |
| 00:38:04 | <hlgs|m> | for the moment then... could i get some help running the url list through archivebot? not sure how long it'll take and how many resources for 60k direct image urls, hopefully not that much. i've got to leave town again tomorrow so i can't get started on setting anything up myself sadly |
| 00:38:38 | <@JAA> | nicolas17: There is no distinction between links and page requisites as far as the recursion is concerned. |
| 00:39:20 | <@JAA> | They both just get added to the end of the queue. |
| 00:39:25 | <nicolas17> | yeah |
| 00:39:32 | <nicolas17> | (maybe there should be a distinction) |
| 00:39:44 | <@JAA> | hlgs|m: Well, as nicolas17 said, upload a list. :-) |
| 00:39:48 | <hlgs|m> | https://transfer.archivete.am/gTKal/tumblr_media_urls.txt |
| 00:40:03 | <nicolas17> | I just meant, if you start on a page, you'd soon get its images (and links), before going into a rabbit hole following links |
| 00:40:04 | <imer> | alexshpilkin: no worries, still waiting on an ftp listing to finish (that had some random uiuc.edu mirrors) and then i'm out of leads unfortunately |
| 00:40:29 | <hlgs|m> | thank you all so much for being so helpful, by the way. been stressful doing so much emergency archival over the past month but you here have taken some weight off my shoulders |
| 00:40:48 | <@JAA> | nicolas17: That's true at the beginning, but when the queue is already in the millions, well, it'll take a while until it gets to those images. |
| 00:40:50 | <nicolas17> | but that assumes that page is where you *start*, if you're several levels deep it won't work that way, it has to get all the level n links from all sorts of unrelated pages before it even starts with n+1 where the image is :) |
| 00:41:20 | <@JAA> | But URL prioritisation is something I've partially implemented, and prioritising page requisites is high on the wishlist. |
| 00:41:30 | <@JAA> | Soon™ |
| 00:42:43 | <hlgs|m> | woo |
| 00:42:50 | <alexshpilkin> | imer: that’s honestly $leads leads more than I expected, so cheers |
| 00:43:15 | <hlgs|m> | okay, thanks for the help, going afk for a while now |
| 00:43:22 | <alexshpilkin> | the csrd.uiuc.edu seems to have had different subdomains under that over the years fwiw |
| 00:43:32 | <alexshpilkin> | * FTP seems |
| 00:44:28 | <alexshpilkin> | a note from 2000 mentions sp2.csrd.uiuc.edu for example |
| 00:56:22 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 00:57:10 | | AmAnd0A joins |
| 01:20:08 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 01:20:24 | | AmAnd0A joins |
| 01:24:38 | <fireonlive> | SketchCow: just a heads up the discord invite link expired, unsure if that's intentional tho |
| 01:41:02 | <alexshpilkin> | JAA: the secret ingredient is getting someone else to write and send the email for you :) |
| 01:41:26 | <alexshpilkin> | (to be fair, that person discovered the bug) |
| 01:42:25 | <@JAA> | And miss out on all that street cred‽ |
| 02:03:16 | <alexshpilkin> | ... send and cc you on it :P |
| 02:34:25 | | Rotietip joins |
| 02:35:05 | <Rotietip> | Hello all, a few weeks ago I uploaded https://archive.org/details/epsonianos which contains a WARC file from epsonianos.com, but when I checked in https://web.archive.org/web/collections/20180000000000*/http://epsonianos.com/ it seems that the content of it has not been indexed yet. Why is this? Because I made sure to indicate "mediatype:web" when I created the item. |
| 02:38:30 | <nicolas17> | WARCs uploaded by regular users to regular collections don't appear in the WBM |
| 02:39:26 | <nicolas17> | as said earlier today here, "items need to go into special collections to be ingested into the WBM. Because allowing anyone to ingest WARCs allows anyone to fake WBM snapshots, you need special permissions for those collections." |
| 02:40:38 | <Rotietip> | Well, how do I make them appear or who do I have to contact for that? |
| 02:41:14 | <nicolas17> | how do people know your WARC is a legitimate and accurate archive of the website? :) |
| 02:42:44 | <Rotietip> | Perhaps by checking the file type and reviewing the first few lines of the file (in addition to the CDX file)? |
| 02:43:30 | | Kinille quits [] |
| 02:44:14 | | Kinille (Kinille) joins |
| 02:53:35 | | pokechu22 (pokechu22) joins |
| 02:55:00 | <vokunal|m> | Ragtag actually sounds kind of fun to archive (as someone that doesn't have to code it). I've been liking the mediafire archive for a while since it's much heavier on filesize than other projects. Makes it feel like i'm contributing more |
| 02:55:32 | | leo60228 quits [Ping timeout: 252 seconds] |
| 02:56:27 | | Island_ joins |
| 02:56:47 | | leo60228 (leo60228) joins |
| 02:58:28 | | Island quits [Ping timeout: 252 seconds] |
| 03:00:27 | <TheTechRobo> | Rotietip: Unfortunately the WBM only allows certain people to put WARCs into Wayback Machine-ingesting collections. That's because there's no good way to tell if a WARC file has been modified. |
| 03:00:42 | <TheTechRobo> | If they let just anyone put stuff into the WBM, then someone could fake a snapshot. |
| 03:08:53 | <vokunal|m> | I'm not sure how hard it would be to set up something that could grab these quickly, but this spreadsheet has links to direct downloads for every mkv in their database apparently. https://ragtag.link/archive-videos |
| 03:14:24 | <vokunal|m> | Is that something that could get sent into urls in small batches, or to AB in batches? |
| 03:28:48 | <@JAA> | We're not going to archive 1.4 PB through AB. |
| 03:29:12 | <@JAA> | (Amazing that I actually have to type that out.) |
| 03:29:46 | <@JAA> | All of AB's 9-year crawls are only 3.1 PiB... |
| 03:29:48 | <vokunal|m> | yeah that was a dumb question |
| 03:29:56 | <nicolas17> | for starters 1.4PB is way into "formally ask Internet Archive for approval" range |
| 03:30:21 | <@JAA> | More into 'we need to filter this down to something that's actually reasonable' territory. |
| 03:30:33 | <nicolas17> | the imgur project is at 520TB |
| 03:30:35 | <SketchCow> | I'll probably make the discord link perm soon. |
| 03:31:00 | <@JAA> | Do we know the size of the videos that are no longer on YouTube? |
| 03:32:53 | <fireonlive> | thanks sketch |
| 03:34:21 | | nostalgebraist joins |
| 03:36:58 | | decky joins |
| 03:37:02 | | decky quits [Remote host closed the connection] |
| 03:38:43 | | dumbgoy quits [Ping timeout: 265 seconds] |
| 03:44:55 | <vokunal|m> | At a glance, it seems every video in that list is either private or unlisted. It'll take me a minute, but i can try to see if I can narrow it down to only private or deleted videos |
| 03:50:34 | <vokunal|m> | The total size of all unlisted and private videos seems to be 49072 GB |
| 03:50:51 | | BigBrain quits [Ping timeout: 245 seconds] |
| 03:57:59 | <nicolas17> | aaaaugh how do I find someone's tweet after he disabled his twitter account because of Musk? >_< |
| 04:13:02 | | nostalgebraist quits [Ping timeout: 265 seconds] |
| 04:22:58 | <Rotietip> | TheTechRobo, nicolas17 had mentioned that an item must be in certain collections in order to be indexed in Wayback Machine. Is there a way to contact the owners of some of those collections to ask them to add an item? |
| 04:25:53 | <nicolas17> | JAA: we need help explaining this :p |
| 04:29:58 | <nicolas17> | "Accepting WARCs from random people would make the WBM useless because anyone could insert manipulated data. You can still upload them to IA, but they won't be in the WBM." |
| 04:35:33 | <Rotietip> | That's why I was asking if there is a way to request permission or something like that. |
| 04:38:09 | <vokunal|m> | That'd still be a random person asking verified person to upload it to IA for them. Same problem |
| 04:38:36 | <Rotietip> | Anyway another approach occurs to me. Do you know any online viewer for WARC files? I tried with https://replayweb.page/ but when I try to upload the file from Internet Archive I get this error: "An unexpected error occured: TypeError: Failed to fetch" |
| 04:39:17 | <nicolas17> | Rotietip: if someone could give you permission, how would they know they can trust you and your data? |
| 04:42:36 | <Rotietip> | In the case of epsonianos.com just check the CDX, there you can see that it is a forum that I downloaded in 2018 and that currently appears the default page of the hosting. |
| 05:11:30 | | nostalgebraist joins |
| 05:22:47 | | nostalgebraist quits [Client Quit] |
| 05:41:43 | <pabs> | Reddits organising a strike https://news.ycombinator.com/item?id=36187705 |
| 05:41:55 | <pabs> | https://old.reddit.com/r/LifeProTips/comments/140b6q6/rlifeprotips_will_be_going_dark_from_june_1214_in/ |
| 05:42:27 | <nicolas17> | it has been talked about in #shreddit |
| 05:42:38 | <nicolas17> | pabs: https://news.ycombinator.com/item?id=36192312 "My Reddit account was banned after adding my subs to the protest" |
| 05:43:02 | <pabs> | ah |
| 05:43:50 | | railen63 quits [Remote host closed the connection] |
| 05:44:21 | | railen64 joins |
| 06:37:01 | | Island_ quits [Read error: Connection reset by peer] |
| 06:42:05 | <vokunal|m> | JAA: https://transfer.archivete.am/uK5k0/ragtag-non-200-videos.csv Here's a csv of every video that's privated or deleted from that list, its equivalent youtube id, and its filesize. Total is 38,574GB. And here's a plain txt of all the direct download urls, https://transfer.archivete.am/HA8XK/ragtag_deleted_videos.txt |
| 06:43:13 | <vokunal|m> | It took a little longer than a minute, but I don't actually know python. Just a chatgpt wizard apparently |
| 07:00:03 | | nfriedly quits [Remote host closed the connection] |
| 07:09:38 | | bf_ joins |
| 07:46:23 | | BlueMaxima quits [Client Quit] |
| 08:15:24 | <Rotietip> | Just out of curiosity, can ArchiveBot be given a referrer and a cookie when downloading something? |
| 08:15:59 | <pabs> | no for the cookie, I assume also no for the referrer. |
| 08:20:18 | <@rewby|backup> | Grabsite can do cookies |
| 08:21:00 | <@rewby|backup> | But if you wanna do huge archives like 1.4pb. that'd require talking to the IA first of all |
| 08:22:41 | <@rewby|backup> | And it's well within DPoS territory at that point |
| 08:22:48 | <@rewby|backup> | No way AB can do it |
| 08:26:51 | <flashfire42> | !a http://1945.melbourne/ --concurrency 1 |
| 08:27:08 | <@rewby|backup> | Wrong channel |
| 08:28:36 | <pabs> | does ArchiveTeam have any way to archive rsync servers? for eg these datasets: rsync sourceware.org:: |
| 08:39:12 | <Rotietip> | The question is that I would like to archive some boards from https://8chan.moe/ to be accessible from Wayback Machine, but this can only be achieved by sending https://8chan.moe as referrer and the cookie disclaimer2=1 How feasible would it be to do this? Because otherwise it shows a fucking disclaimer like http://web.archive.org/web/20230413194626/https://8chan.moe/ and all the images and thumbnails look the same as https://8chan |
| 08:39:12 | <Rotietip> | .moe/.media/t_1e724d164bee05ea1d9c2c069172b916212f6742e07b6230194d3a4bb34f953a |
| 08:39:12 | <Rotietip> | According to my estimations what I am interested in archiving would be between 70 and 80 GB (although if you want to archive the whole site I won't stop you). |
| 08:49:48 | | sonick quits [Client Quit] |
| 08:51:13 | | BigBrain (bigbrain) joins |
| 09:04:47 | | T31M quits [Quit: ZNC - https://znc.in] |
| 09:05:07 | | T31M joins |
| 09:06:10 | | T31M is now authenticated as T31M |
| 09:19:26 | | Rotietip quits [Ping timeout: 252 seconds] |
| 09:19:34 | <masterx244|m> | those sites are the worst. and a WARC to download on archive.org is better than no archive at all. |
| 09:21:10 | | Rotietip joins |
| 09:22:52 | | nfriedly joins |
| 09:38:57 | | IDK (IDK) joins |
| 10:00:58 | | hackbug quits [Read error: Connection reset by peer] |
| 10:01:22 | | hackbug (hackbug) joins |
| 10:35:08 | | Ruthalas5 quits [Ping timeout: 252 seconds] |
| 10:55:32 | | Ruthalas5 (Ruthalas) joins |
| 11:01:26 | | JohnnyJ quits [Read error: Connection reset by peer] |
| 11:01:50 | | decky_e quits [Remote host closed the connection] |
| 11:03:55 | | threedeeitguy quits [Client Quit] |
| 11:35:27 | | threedeeitguy (threedeeitguy) joins |
| 11:42:05 | | threedeeitguy quits [Client Quit] |
| 11:42:46 | | threedeeitguy (threedeeitguy) joins |
| 11:44:02 | | railen64 quits [Remote host closed the connection] |
| 11:44:16 | | railen64 joins |
| 12:01:08 | | Chris5010 quits [Ping timeout: 252 seconds] |
| 12:13:06 | | Chris5010 (Chris5010) joins |
| 12:45:51 | | Rotietip quits [Ping timeout: 265 seconds] |
| 12:46:27 | | sec^nd quits [Remote host closed the connection] |
| 12:46:45 | | sec^nd (second) joins |
| 12:55:37 | | threedeeitguy quits [Client Quit] |
| 12:56:34 | | threedeeitguy (threedeeitguy) joins |
| 13:08:00 | | Rotietip joins |
| 13:13:54 | | sonick (sonick) joins |
| 13:14:38 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 13:15:08 | | AmAnd0A joins |
| 13:34:27 | | hitgrr8 joins |
| 14:14:06 | | railen64 quits [Remote host closed the connection] |
| 14:17:30 | | railen63 joins |
| 14:21:53 | | railen63 quits [Remote host closed the connection] |
| 14:22:14 | | railen63 joins |
| 14:53:48 | | icedice quits [Client Quit] |
| 14:54:16 | | threedeeitguy quits [Client Quit] |
| 14:56:41 | | Rotietip quits [Client Quit] |
| 15:04:02 | | threedeeitguy (threedeeitguy) joins |
| 15:25:46 | | railen63 quits [Remote host closed the connection] |
| 15:26:03 | | railen63 joins |
| 15:26:25 | | icedice (icedice) joins |
| 15:37:50 | | Island joins |
| 15:41:12 | | railen63 quits [Remote host closed the connection] |
| 15:41:27 | | railen63 joins |
| 15:41:46 | | railen63 quits [Remote host closed the connection] |
| 15:42:00 | | railen63 joins |
| 15:44:19 | | threedeeitguy quits [Client Quit] |
| 15:44:49 | | threedeeitguy (threedeeitguy) joins |
| 15:49:49 | | BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
| 16:01:23 | | BearFortress joins |
| 16:08:47 | | decky_e (decky_e) joins |
| 16:30:59 | | decky_e quits [Ping timeout: 252 seconds] |
| 16:50:04 | | threedeeitguy quits [Client Quit] |
| 16:50:34 | | threedeeitguy (threedeeitguy) joins |
| 16:56:18 | | decky_e (decky_e) joins |
| 16:57:51 | | dumbgoy joins |
| 17:01:05 | | dumbgoy quits [Client Quit] |
| 17:01:47 | | dumbgoy joins |
| 17:01:48 | | threedeeitguy quits [Client Quit] |
| 17:02:17 | | threedeeitguy (threedeeitguy) joins |
| 17:03:16 | | threedeeitguy quits [Client Quit] |
| 17:03:26 | | dumbgoy quits [Read error: Connection reset by peer] |
| 17:03:47 | | threedeeitguy (threedeeitguy) joins |
| 17:04:37 | | dumbgoy joins |
| 17:08:09 | | gorillajones joins |
| 17:08:46 | | gorillajones quits [Remote host closed the connection] |
| 17:14:11 | | bob53545 joins |
| 17:15:04 | | bob53545 quits [Remote host closed the connection] |
| 17:30:28 | | lflare (lflare) joins |
| 17:34:46 | <audrooku|m> | Would it be redundant to mention the ragtag archive in #archiveteam at this point? |
| 17:50:32 | <vokunal|m> | not sure. I pasted a list of every private or deleted url from their site |
| 17:51:03 | <vokunal|m> | I think Jaa'll be on it when they get the time |
| 17:52:12 | <@arkiver> | i didn't follow this |
| 17:54:27 | <@arkiver> | can someone please give me a tl;dr on ragtag? |
| 17:56:27 | <BigBrain> | vtuber archive, around 1.4PB, has a lot of lost media |
| 17:56:47 | <BigBrain> | all of it yt i think |
| 17:59:16 | <BigBrain> | shutting down on or before july 24, dumped full database with metadata and has "compiled a list of videos that are no longer available on YouTube, but are still available in Ragtag Archive" in a csv |
| 17:59:31 | <@arkiver> | thank you |
| 17:59:48 | <BigBrain> | np |
| 18:36:53 | | Frosty815 joins |
| 18:38:11 | | BigBrain quits [Remote host closed the connection] |
| 18:39:19 | | BigBrain (bigbrain) joins |
| 18:40:14 | | decky_e quits [Ping timeout: 252 seconds] |
| 18:40:54 | | Frosty815 leaves |
| 18:47:56 | | BigBrain quits [Ping timeout: 245 seconds] |
| 18:48:18 | | atphoenix quits [Remote host closed the connection] |
| 18:48:59 | | atphoenix (atphoenix) joins |
| 18:57:06 | <@JAA> | arkiver: And ~38 TB for the stuff on Ragtag Archive that's no longer on YouTube, which would be more feasible. |
| 19:04:52 | | BigBrain (bigbrain) joins |
| 19:12:12 | | decky_e (decky_e) joins |
| 20:15:05 | | Naruyoko5 quits [Quit: Leaving] |
| 20:24:10 | | klnex joins |
| 20:37:23 | | hitgrr8 quits [Client Quit] |
| 20:54:30 | | klnex quits [Ping timeout: 265 seconds] |
| 21:07:03 | | imer quits [Quit: Oh no] |
| 21:39:17 | | CraftByte (DragonSec|CraftByte) joins |
| 22:06:31 | | bf_ quits [Ping timeout: 265 seconds] |
| 22:07:51 | | decky_e quits [Read error: Connection reset by peer] |
| 22:08:20 | | decky_e (decky_e) joins |
| 22:09:24 | | decky_e quits [Read error: Connection reset by peer] |
| 22:10:04 | | decky_e (decky_e) joins |
| 22:11:18 | | decky_e quits [Read error: Connection reset by peer] |
| 22:11:41 | | decky_e (decky_e) joins |
| 22:13:15 | | decky_e quits [Read error: Connection reset by peer] |
| 22:13:51 | | BigBrain_ (bigbrain) joins |
| 22:14:29 | | BigBrain quits [Remote host closed the connection] |
| 22:18:29 | | decky_e (decky_e) joins |
| 22:34:50 | | imer (imer) joins |
| 22:40:54 | | imer quits [Client Quit] |
| 22:45:12 | | imer (imer) joins |
| 22:46:35 | | BlueMaxima joins |
| 22:46:44 | | imer quits [Client Quit] |
| 22:51:01 | | Naruyoko joins |
| 23:00:09 | | geezabiscuit joins |
| 23:00:09 | | geezabiscuit is now authenticated as geezabiscuit |
| 23:00:09 | | geezabiscuit quits [Changing host] |
| 23:00:09 | | geezabiscuit (geezabiscuit) joins |
| 23:05:01 | | BigBrain_ quits [Ping timeout: 245 seconds] |
| 23:07:39 | | BigBrain_ (bigbrain) joins |
| 23:28:55 | | imer (imer) joins |
| 23:32:56 | | dumbgoy_ joins |
| 23:36:25 | | dumbgoy quits [Ping timeout: 265 seconds] |
| 23:43:20 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 23:43:37 | | AmAnd0A joins |
| 23:44:59 | | BigBrain (bigbrain) joins |
| 23:47:31 | | BigBrain_ quits [Ping timeout: 245 seconds] |