| 00:06:10 | <nicolas17> | seems the stackexchange dump has been uploaded: https://meta.stackexchange.com/a/390200/285481 |
| 00:07:23 | <@arkiver> | are they crawling back because of what happens with Reddit? :P |
| 00:09:06 | | tzt (tzt) joins |
| 00:09:39 | <@JAA> | Great, mirroring it to a separate item. :-) |
| 00:09:54 | | driib quits [Remote host closed the connection] |
| 00:10:12 | | driib (driib) joins |
| 00:13:02 | <@arkiver> | haha awesome :P |
| 00:13:59 | <@JAA> | Running, will be at https://archive.org/details/stackexchange_20230614 eventually. |
| 00:18:55 | | quackifi quits [Client Quit] |
| 00:24:02 | | driib quits [Remote host closed the connection] |
| 00:24:21 | | driib (driib) joins |
| 00:39:25 | | driib quits [Remote host closed the connection] |
| 00:39:44 | | driib (driib) joins |
| 00:40:50 | | TTN quits [Remote host closed the connection] |
| 00:43:10 | | TNN joins |
| 00:45:47 | | AmAnd0A joins |
| 00:52:09 | | TNN is now authenticated as TNN |
| 00:53:13 | <fireonlive> | oooooh, that's interesting |
| 00:53:45 | | driib quits [Remote host closed the connection] |
| 00:54:03 | | driib (driib) joins |
| 00:57:40 | | TNN quits [Changing host] |
| 00:57:40 | | TNN (TNN) joins |
| 00:58:42 | | TNN quits [Remote host closed the connection] |
| 00:59:08 | | TTN joins |
| 01:05:36 | | driib quits [Remote host closed the connection] |
| 01:05:36 | | TTN quits [Remote host closed the connection] |
| 01:05:56 | | sarayalth (sarayalth) joins |
| 01:05:56 | | driib (driib) joins |
| 01:09:11 | | Ruthalas5 (Ruthalas) joins |
| 01:17:24 | | driib quits [Remote host closed the connection] |
| 01:17:42 | | driib (driib) joins |
| 01:19:18 | | skyrocket joins |
| 01:28:38 | <h2ibot> | PaulWise edited Mailman2 (+52, save the chiark lists): https://wiki.archiveteam.org/?diff=49969&oldid=49949 |
| 01:29:10 | | driib quits [Remote host closed the connection] |
| 01:29:30 | | driib (driib) joins |
| 01:40:41 | <h2ibot> | PaulWise edited Mailman2 (+111, dyne lists): https://wiki.archiveteam.org/?diff=49970&oldid=49969 |
| 01:40:42 | <h2ibot> | PaulWise edited Mailman2 (+0, woops, dyne not done yet): https://wiki.archiveteam.org/?diff=49971&oldid=49970 |
| 01:41:28 | | driib quits [Remote host closed the connection] |
| 01:41:47 | | driib (driib) joins |
| 01:55:24 | | driib quits [Remote host closed the connection] |
| 01:55:45 | | driib (driib) joins |
| 02:01:41 | | sonick quits [Client Quit] |
| 02:54:44 | | Hajdar (Hajdar) joins |
| 03:02:15 | | Hajdar quits [Client Quit] |
| 03:07:08 | | nicolas17 quits [Ping timeout: 265 seconds] |
| 03:10:26 | | nicolas17 joins |
| 03:10:31 | | dumbgoy__ quits [Ping timeout: 265 seconds] |
| 03:20:58 | | Megame (Megame) joins |
| 03:23:04 | | skyrock3t joins |
| 03:23:38 | | skyrocket quits [Ping timeout: 252 seconds] |
| 03:28:58 | | Hajdar (Hajdar) joins |
| 04:25:41 | | Ivan226 joins |
| 05:04:03 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 05:09:26 | <fireonlive> | https://meta.miraheze.org/wiki/Miraheze_is_Not_Shutting_Down |
| 05:15:49 | | hitgrr8 joins |
| 05:28:51 | | Megame quits [Client Quit] |
| 05:32:50 | | JohnnyJ joins |
| 05:48:16 | | Arcorann (Arcorann) joins |
| 05:50:22 | | AmAnd0A quits [Remote host closed the connection] |
| 05:50:22 | | yts98 leaves |
| 05:50:31 | | yts98 joins |
| 05:50:34 | | AmAnd0A joins |
| 06:01:20 | | AlbertLarsan68 (AlbertLarsan68) joins |
| 06:08:10 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 06:20:09 | | W7RFa6AbNFz joins |
| 06:21:14 | <tech234a> | https://meta.miraheze.org/wiki/Miraheze_is_Not_Shutting_Down |
| 06:21:31 | <tech234a> | oops |
| 06:21:43 | <tech234a> | didn't scroll all the way down |
| 06:33:36 | <W7RFa6AbNFz> | Is there any information about how many concurrent dockers and concurrency inside each docker should be run for each project? |
| 06:39:59 | | datechnoman quits [Quit: The Lounge - https://thelounge.chat] |
| 06:40:41 | | datechnoman (datechnoman) joins |
| 06:48:17 | <vokunal|m> | Depends on the project (rate limit of the site), and the reputation of your ip. Some can only run one, and others can run 10+ |
| 06:50:14 | <vokunal|m> | For me, I can run 80 concurrent on mediafire and be completely fine, but it caps my upload if a bunch of files get dropped there. I can run 20 on Imgur or Reddit and be fine, but some had trouble with 4 or 5. ip reputation is basically the key, and how far you are from the tracker can impact it too |
| 06:59:10 | | Island quits [Read error: Connection reset by peer] |
| 07:00:03 | | nfriedly quits [Remote host closed the connection] |
| 07:08:21 | | Ketchup901 quits [Ping timeout: 245 seconds] |
| 07:09:06 | | Ketchup901 (Ketchup901) joins |
| 07:11:30 | | Ivan226 quits [Remote host closed the connection] |
| 07:11:57 | <W7RFa6AbNFz> | odes the process start giving obvious errors or does it just slow down and you have to monitor it to work out the right number? |
| 07:22:09 | <vokunal|m> | it'll throw out 429s if the site starts rate limiting you, increasing to everything being 429 if you get temp banned by the site. Usually doesn't last long, but if it keeps racking up requests during that time it'll probably extend it. Best not to test right when you leave somewhere or sleep |
| 07:26:58 | <W7RFa6AbNFz> | ok will look out for those, thanks for your help. |
| 08:07:26 | | razul joins |
| 08:28:55 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 09:03:02 | | decky_e quits [Client Quit] |
| 09:11:31 | | decky_e (decky_e) joins |
| 09:21:41 | | Ruthalas5 quits [Ping timeout: 252 seconds] |
| 09:22:40 | | Ruthalas5 (Ruthalas) joins |
| 09:26:53 | | nfriedly joins |
| 10:00:01 | | railen63 quits [Remote host closed the connection] |
| 10:00:18 | | railen63 joins |
| 10:16:59 | | icedice (icedice) joins |
| 11:02:37 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 11:02:37 | | JohnnyJ quits [Read error: Connection reset by peer] |
| 11:52:34 | | railen63 quits [Remote host closed the connection] |
| 11:53:41 | | railen63 joins |
| 11:54:29 | | decky_e quits [Remote host closed the connection] |
| 12:04:15 | | justmolamola joins |
| 12:24:19 | | justmolamola quits [Remote host closed the connection] |
| 13:18:44 | | Arcorann quits [Ping timeout: 252 seconds] |
| 13:21:04 | | BigBrain quits [Remote host closed the connection] |
| 13:21:25 | | BigBrain (bigbrain) joins |
| 13:24:03 | | diggan quits [Quit: Connection closed for inactivity] |
| 13:33:53 | <h2ibot> | SaveThEWhatNow edited Google (+0, updating amount deleted): https://wiki.archiveteam.org/?diff=49972&oldid=49789 |
| 13:33:54 | <h2ibot> | Rob Kam edited WikiTeam (+850, add about MediaWiki Scraper): https://wiki.archiveteam.org/?diff=49973&oldid=48790 |
| 14:05:10 | | sonick (sonick) joins |
| 14:08:53 | | PredatorIWD quits [Quit: Leaving] |
| 14:51:21 | | pabs quits [Ping timeout: 265 seconds] |
| 14:51:58 | | pabs (pabs) joins |
| 15:10:23 | | pabs quits [Ping timeout: 252 seconds] |
| 15:13:24 | | pabs (pabs) joins |
| 15:46:19 | | Island joins |
| 16:13:47 | | LeGoupil joins |
| 16:17:18 | | LeGoupil1 joins |
| 16:18:02 | | LeGoupil quits [Ping timeout: 252 seconds] |
| 16:18:02 | | LeGoupil1 is now known as LeGoupil |
| 16:24:56 | | Dango360 quits [Read error: Connection reset by peer] |
| 16:26:52 | | Dango360 (Dango360) joins |
| 16:36:11 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 16:37:00 | | AmAnd0A joins |
| 16:37:33 | | LeGoupil quits [Ping timeout: 258 seconds] |
| 16:40:48 | | sonick quits [Client Quit] |
| 16:57:05 | | threedeeitguy quits [Ping timeout: 252 seconds] |
| 16:57:48 | | Unholy2361 quits [Remote host closed the connection] |
| 16:59:24 | | Unholy2361 (Unholy2361) joins |
| 17:05:35 | | threedeeitguy (threedeeitguy) joins |
| 17:15:08 | | threedeeitguy quits [Read error: Connection reset by peer] |
| 17:15:24 | | threedeeitguy6 (threedeeitguy) joins |
| 17:22:13 | | nostalgebraist joins |
| 17:36:00 | <nostalgebraist> | i'm running warrior in docker, and sometimes there are long pauses between successive steps in the item logs, with nothing downloading or uploading. and no message about sleeping - often it comes between one successful request and the next. it doesn't seem to be isolated to one project. what would cause this? |
| 17:37:42 | <nostalgebraist> | there are long periods where all items are like this, and then it mysteriously stops and no items are like this for a while |
| 17:39:11 | <fireonlive> | see #warrior for help |
| 17:51:39 | <h2ibot> | Bzc6p edited Network.hu (+236, Downloading content finished.): https://wiki.archiveteam.org/?diff=49974&oldid=49449 |
| 17:54:40 | <h2ibot> | Bzc6p edited Indafotó (-64, Downloading content started.): https://wiki.archiveteam.org/?diff=49975&oldid=49638 |
| 17:56:41 | <BigBrain> | stackexchange listens more to reddit community than reddit does? |
| 18:00:41 | <h2ibot> | JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=49976&oldid=49965 |
| 18:14:57 | <fireonlive> | at this point i think an inanimate object would listen better to the reddit community than reddit does |
| 18:19:33 | <BigBrain> | true |
| 18:19:55 | <BigBrain> | stone would not be so toxic |
| 18:27:36 | | tiger_millionaire joins |
| 18:28:31 | | Megame (Megame) joins |
| 18:32:10 | | Twisty joins |
| 18:41:02 | | nicolas17 quits [Ping timeout: 252 seconds] |
| 18:47:42 | | andrew quits [Client Quit] |
| 18:51:38 | | andrew (andrew) joins |
| 18:53:02 | | nicolas17 joins |
| 18:55:07 | <masterx244|m> | sucks that we couldnt suckle reddit out the slow way and instead got a immediate fire burning there... |
| 18:57:03 | <BigBrain> | if burns down we have a good chunk |
| 19:01:08 | <masterx244|m> | once [a]rkiver processed the i.redd.it linklist we got everything in some form (text content is secured by pushshift as the fallback for the stuff that we can#t yank out otherwise) |
| 19:05:11 | <fireonlive> | do what to reddit >_> |
| 19:05:37 | | Urgo quits [Remote host closed the connection] |
| 19:05:58 | <BigBrain> | fireonlive: suckle it dry of data |
| 19:06:05 | <BigBrain> | hmmm yummy data ;) |
| 19:06:12 | <fireonlive> | mmmmm yummy |
| 19:06:46 | <masterx244|m> | BigBrain: aka the usual buisness of archiveteam |
| 19:08:25 | <BigBrain> | masterx244|m: AT is more like vacuum than slow suckle |
| 19:10:07 | <masterx244|m> | yeah, high-speed pipelines that can even get amazon to wave a white flag if it ends there unexpectedly |
| 19:10:12 | <masterx244|m> | remember imgone and commoncrawl... |
| 19:10:52 | <JTL> | 3 |
| 19:10:58 | <JTL> | oops |
| 19:26:58 | | Urgo (Urgo) joins |
| 19:41:21 | | AmAnd0A quits [Ping timeout: 265 seconds] |
| 19:41:40 | | AmAnd0A joins |
| 19:46:26 | <fireonlive> | 2 |
| 19:47:02 | <BigBrain> | 1 |
| 19:50:02 | <masterx244|m> | BOOM! |
| 19:51:31 | <JTL> | :D |
| 19:52:33 | <fireonlive> | =3 |
| 19:57:56 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 19:58:13 | | AmAnd0A joins |
| 20:02:34 | | decky_e (decky_e) joins |
| 20:06:35 | | yts98 leaves |
| 20:06:38 | | yts98 joins |
| 20:10:40 | <BigBrain> | :) |
| 20:12:58 | | decky_e quits [Read error: Connection reset by peer] |
| 20:13:17 | | decky_e (decky_e) joins |
| 20:23:53 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 20:24:24 | | AmAnd0A joins |
| 20:37:44 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 20:38:00 | | AmAnd0A joins |
| 20:39:10 | | nostalgebraist quits [Client Quit] |
| 20:57:38 | | railen63 quits [Remote host closed the connection] |
| 21:00:39 | | railen63 joins |
| 21:04:35 | | @Sanqui quits [Ping timeout: 252 seconds] |
| 21:04:35 | | lunik173 quits [Ping timeout: 252 seconds] |
| 21:24:59 | <@arkiver> | hello all |
| 21:25:01 | <@arkiver> | on ragtag |
| 21:25:13 | <@arkiver> | an archive of "vtuber". assume this is meant? https://en.wikipedia.org/wiki/VTuber |
| 21:25:25 | | Sanqui joins |
| 21:25:35 | <@arkiver> | 38 TB for 14k videos that are actually not on IA |
| 21:25:37 | <@arkiver> | err |
| 21:25:40 | <@arkiver> | not on YouTube* |
| 21:26:15 | <@arkiver> | are all these videos from "vtubers"? |
| 21:26:38 | | Sanqui is now authenticated as Sanqui |
| 21:26:38 | | Sanqui quits [Changing host] |
| 21:26:38 | | Sanqui (Sanqui) joins |
| 21:26:38 | | @ChanServ sets mode: +o Sanqui |
| 21:26:45 | <@arkiver> | imer noted that these videos have been deleted from youtube when vtubers whitch avatars? |
| 21:26:51 | <imer> | https://old.reddit.com/r/DataHoarder/comments/143zvuh/ragtag_archive_is_going_offline_138_pb_of_vtuber/ there's some context here, albeit quite opinionated comments, apparently |
| 21:27:28 | <imer> | Archivist (think that's one of the-eye people?): "/u/textfiles will tell me if I'm allowed put a copy on archive.org down the line. I think ArchiveTeam ripping from the site is just going to be a headache given it's served from google, if anything maybe they just rip the site and not the video and I deliver the video separately?" |
| 21:27:42 | <@arkiver> | yes |
| 21:29:18 | <@arkiver> | are copyright strikes not the reason for data deletion from youtube? |
| 21:30:41 | <imer> | https://old.reddit.com/r/DataHoarder/comments/143zvuh/ragtag_archive_is_going_offline_138_pb_of_vtuber/jnern6f/ think this is what I remembered with the person quitting = channel deletion |
| 21:31:08 | <imer> | got no involvement in vtuber stuff so thats all I know |
| 21:31:28 | <@arkiver> | thanks a lot |
| 21:31:36 | <nicolas17> | good god the URL mislead me |
| 21:31:41 | <nicolas17> | "138PB?!?!?" |
| 21:31:46 | <@arkiver> | if this is correct about the 35 TB, we can put it on IA |
| 21:32:14 | <@arkiver> | it would be best to follow the identifier and metadata pattern that is followed by tubeup for this data to fit in |
| 21:32:34 | <@arkiver> | and i cannot make promises about the accessibility of the data, since there may be content problems found later on |
| 21:33:10 | <imer> | "content problems"? as in the copyright strikes you mentioned? |
| 21:33:19 | <@arkiver> | aybe |
| 21:33:21 | <@arkiver> | maybe |
| 21:34:02 | <@arkiver> | (disclaimer that i speak not for IA, this is personal opinions/thoughts/etc.) |
| 21:35:07 | <@arkiver> | vokunal|m: ^ |
| 21:35:42 | <@arkiver> | i won't have time to work on this, but if someone wants to do that for the 35 TB (or 38?) please do - with tubeup identifiers and similar metadata |
| 21:36:17 | <@arkiver> | there is also metadata for all other videos? that might be interesting to preserve. the outlinks we can find could go into the #// project |
| 21:37:31 | <nicolas17> | [21:38] <vokunal|m> Here's a link to all the metadata on the site. https://ragtag.link/archive-database |
| 21:37:33 | <nicolas17> | [21:38] <vokunal|m> I don't know how to sort through that, but if someone has the knowhow on how to grab the info, each line has a video id, and it would be easy to grab only the data you need form it. it's in NDJSON |
| 21:37:41 | <nicolas17> | do you need some data-processing of that metadata dump? |
| 21:40:02 | <@arkiver> | only 189 MB? |
| 21:40:06 | <@arkiver> | we can put that in ArchiveBot |
| 21:40:17 | <@arkiver> | (done) |
| 21:40:27 | <nicolas17> | that's (uncompressed) a 725MB ndjson with the metadata |
| 21:48:14 | <@JAA> | That one was already archived when they announced the shutdown, but I guess now we have two copies. :-) |
| 21:48:24 | <@arkiver> | yay two copies |
| 21:48:34 | <nicolas17> | for each video there's an "info" json file like this https://content.archive.ragtag.moe/gd:1X8hZ10XrxwI5X14F5QwWUR3_YKnwRDcP/0Ibafh7x_ow/0Ibafh7x_ow.info.json |
| 21:51:53 | <nicolas17> | for some there's a giant json with chat replay |
| 21:52:00 | <vokunal|m> | arkiver The 38TB varies. They're any video that doesn't show up anymore. Sifting through randomly, looks like some are banned channels, some are deleted channels, some deleted videos, some private videos. The original list ragtag gave out had 19,246 videos, ~5K were simply unlisted, total 49TB. The 14K list I sent was everything minus unlisted videos and videos that are now public |
| 21:53:20 | <nicolas17> | archive-database has 219259 videos listed |
| 21:53:54 | <nicolas17> | so you filtered out the ones still available on youtube, and the result was 49TB? |
| 21:54:57 | <vokunal|m> | They gave out this datadump which I assume has or had 0% public videos in it when they put it up. I don't know of a way to distinguish between public and unlisted. We can archive either, but I sorted it by all non unlisted in the files I gave https://ragtag.link/archive-videos |
| 21:55:47 | <nicolas17> | ah so that's their dump, okay |
| 21:56:11 | <vokunal|m> | Based on their announcement, it seems to be a complete list of everything they had that's no longer public |
| 21:56:40 | <nicolas17> | to make our own check of what's public/unlisted/private, we'd need to hit some YouTube API 221259 times :) |
| 21:58:18 | <imer> | I used thumbnails for checking if videos were inaccessible (so deleted or private), unlisted ones will have their thumbnail public still though |
| 21:58:51 | <imer> | https://i.ytimg.com/vi/ID_GOES_HERE/hqdefault.jpg |
| 21:59:09 | <nicolas17> | (looking at some of the videos, I feel like a good video codec could compress this a *lot* :P but I know that's not the "preservation" way to go) |
| 22:00:04 | | Hajdar quits [Remote host closed the connection] |
| 22:00:20 | | Hajdar (Hajdar) joins |
| 22:01:31 | <vokunal|m> | A lot of the videos are just a static image with asmr going on, so I'd imagine those would be super compresseable |
| 22:02:22 | <nicolas17> | even with movement, some vtubers seemed super cartoony with super flat colors which also compresses well |
| 22:02:56 | <nicolas17> | one had a video effect of leaves or flower petals or something constantly falling in the foreground, that's basically noise and it's *terrible* for compression :D |
| 22:03:22 | <imer> | recompressing 35tb of video is going to take *forever* though |
| 22:03:37 | <nicolas17> | for sure |
| 22:04:48 | <nicolas17> | anyway, this seemed to end in "if someone wants to archive it please do" but I'm unclear on *what* needs to be done |
| 22:06:20 | | dumbgoy__ joins |
| 22:09:47 | | za3k joins |
| 22:10:20 | <fireonlive> | i think arkiver meant anyone can do it (but please be sure to follow the identifier/metdata paterns that tubeup uses so it's easily findable) |
| 22:10:35 | | yawkat quits [Ping timeout: 252 seconds] |
| 22:10:46 | <fireonlive> | aka pls do it with that in mind but otherwise it's up for grabs for someone who feels like it |
| 22:11:47 | <nicolas17> | yes but I don't know what needs to be done exactly |
| 22:12:07 | <nicolas17> | would this be uploaded as an archive.org item or as WARCs for WBM? |
| 22:12:50 | <vokunal|m> | Does archivebot have a way to grab the site without grabbing the videos? Just to save the site itself. |
| 22:14:43 | <vokunal|m> | I don't have a clue where to start on trying to get the comments |
| 22:15:16 | <fireonlive> | hm i think 3rd party would preclude WBM inclusion |
| 22:15:27 | <fireonlive> | but ye let's see |
| 22:18:41 | <vokunal|m> | Oh. seems not as hard as I thought. All chat downloads have .chat.json at the end of the url |
| 22:18:55 | <nicolas17> | that's chat replay yes |
| 22:19:00 | <nicolas17> | the .info.json has video comments |
| 22:19:48 | <fireonlive> | darn google >_> |
| 22:24:29 | | yawkat (yawkat) joins |
| 22:26:19 | | Island quits [Read error: Connection reset by peer] |
| 22:27:52 | | lennier2 joins |
| 22:28:14 | | Island joins |
| 22:30:36 | | lennier1 quits [Ping timeout: 258 seconds] |
| 22:30:44 | | lennier2 is now known as lennier1 |
| 22:35:49 | | hitgrr8 quits [Client Quit] |
| 22:46:41 | <fireonlive> | fuck sake |
| 22:46:46 | <fireonlive> | oops wrong window |
| 22:48:31 | <h2ibot> | KamafaDelgato edited Miraheze (+3): https://wiki.archiveteam.org/?diff=49977&oldid=49961 |
| 23:01:50 | | lunik173 joins |
| 23:06:16 | <@arkiver> | yeah let's not archive the videos into WARCs |
| 23:18:23 | | icedice quits [Client Quit] |
| 23:22:21 | | AmAnd0A quits [Ping timeout: 258 seconds] |
| 23:23:03 | | AmAnd0A joins |
| 23:23:55 | | G4te_Keep3r3492 quits [Quit: The Lounge - https://thelounge.chat] |
| 23:24:32 | | G4te_Keep3r3492 joins |
| 23:25:07 | | MactasticMendez quits [Quit: Connection closed for inactivity] |
| 23:34:37 | | AmAnd0A quits [Ping timeout: 258 seconds] |
| 23:34:42 | | AmAnd0A joins |
| 23:39:20 | | Steamy joins |
| 23:39:29 | | Steamy quits [Remote host closed the connection] |
| 23:57:14 | <Misty> | @nicolar17 @imer @arkiver r/datahoarder's -Archivist is now downloading and going to host ragtag's archive |
| 23:57:36 | <Misty> | also CC @vokunal|m |
| 23:59:07 | <Misty> | currently ragtag's archive: 1. archivist has a full local backup 2. I have a full backup on GDrive 3. RagTag's own GSuite will still last, just turned into RO 4. some of precious video are made into torrent and seeded |