00:06:10<nicolas17>seems the stackexchange dump has been uploaded: https://meta.stackexchange.com/a/390200/285481
00:07:23<@arkiver>are they crawling back because of what happens with Reddit? :P
00:09:06tzt (tzt) joins
00:09:39<@JAA>Great, mirroring it to a separate item. :-)
00:09:54driib quits [Remote host closed the connection]
00:10:12driib (driib) joins
00:13:02<@arkiver>haha awesome :P
00:13:59<@JAA>Running, will be at https://archive.org/details/stackexchange_20230614 eventually.
00:18:55quackifi quits [Client Quit]
00:24:02driib quits [Remote host closed the connection]
00:24:21driib (driib) joins
00:39:25driib quits [Remote host closed the connection]
00:39:44driib (driib) joins
00:40:50TTN quits [Remote host closed the connection]
00:43:10TNN joins
00:45:47AmAnd0A joins
00:53:13<fireonlive>oooooh, that's interesting
00:53:45driib quits [Remote host closed the connection]
00:54:03driib (driib) joins
00:57:40TNN quits [Changing host]
00:57:40TNN (TNN) joins
00:58:42TNN quits [Remote host closed the connection]
00:59:08TTN joins
01:05:36driib quits [Remote host closed the connection]
01:05:36TTN quits [Remote host closed the connection]
01:05:56sarayalth (sarayalth) joins
01:05:56driib (driib) joins
01:09:11Ruthalas5 (Ruthalas) joins
01:17:24driib quits [Remote host closed the connection]
01:17:42driib (driib) joins
01:19:18skyrocket joins
01:28:38<h2ibot>PaulWise edited Mailman2 (+52, save the chiark lists): https://wiki.archiveteam.org/?diff=49969&oldid=49949
01:29:10driib quits [Remote host closed the connection]
01:29:30driib (driib) joins
01:40:41<h2ibot>PaulWise edited Mailman2 (+111, dyne lists): https://wiki.archiveteam.org/?diff=49970&oldid=49969
01:40:42<h2ibot>PaulWise edited Mailman2 (+0, woops, dyne not done yet): https://wiki.archiveteam.org/?diff=49971&oldid=49970
01:41:28driib quits [Remote host closed the connection]
01:41:47driib (driib) joins
01:55:24driib quits [Remote host closed the connection]
01:55:45driib (driib) joins
02:01:41sonick quits [Client Quit]
02:54:44Hajdar (Hajdar) joins
03:02:15Hajdar quits [Client Quit]
03:07:08nicolas17 quits [Ping timeout: 265 seconds]
03:10:26nicolas17 joins
03:10:31dumbgoy__ quits [Ping timeout: 265 seconds]
03:20:58Megame (Megame) joins
03:23:04skyrock3t joins
03:23:38skyrocket quits [Ping timeout: 252 seconds]
03:28:58Hajdar (Hajdar) joins
04:25:41Ivan226 joins
05:04:03BlueMaxima quits [Read error: Connection reset by peer]
05:09:26<fireonlive>https://meta.miraheze.org/wiki/Miraheze_is_Not_Shutting_Down
05:15:49hitgrr8 joins
05:28:51Megame quits [Client Quit]
05:32:50JohnnyJ joins
05:48:16Arcorann (Arcorann) joins
05:50:22AmAnd0A quits [Remote host closed the connection]
05:50:22yts98 leaves
05:50:31yts98 joins
05:50:34AmAnd0A joins
06:01:20AlbertLarsan68 (AlbertLarsan68) joins
06:08:10qwertyasdfuiopghjkl quits [Remote host closed the connection]
06:20:09W7RFa6AbNFz joins
06:21:14<tech234a>https://meta.miraheze.org/wiki/Miraheze_is_Not_Shutting_Down
06:21:31<tech234a>oops
06:21:43<tech234a>didn't scroll all the way down
06:33:36<W7RFa6AbNFz>Is there any information about how many concurrent dockers and concurrency inside each docker should be run for each project?
06:39:59datechnoman quits [Quit: The Lounge - https://thelounge.chat]
06:40:41datechnoman (datechnoman) joins
06:48:17<vokunal|m>Depends on the project (rate limit of the site), and the reputation of your ip. Some can only run one, and others can run 10+
06:50:14<vokunal|m>For me, I can run 80 concurrent on mediafire and be completely fine, but it caps my upload if a bunch of files get dropped there. I can run 20 on Imgur or Reddit and be fine, but some had trouble with 4 or 5. ip reputation is basically the key, and how far you are from the tracker can impact it too
06:59:10Island quits [Read error: Connection reset by peer]
07:00:03nfriedly quits [Remote host closed the connection]
07:08:21Ketchup901 quits [Ping timeout: 245 seconds]
07:09:06Ketchup901 (Ketchup901) joins
07:11:30Ivan226 quits [Remote host closed the connection]
07:11:57<W7RFa6AbNFz>odes the process start giving obvious errors or does it just slow down and you have to monitor it to work out the right number?
07:22:09<vokunal|m>it'll throw out 429s if the site starts rate limiting you, increasing to everything being 429 if you get temp banned by the site. Usually doesn't last long, but if it keeps racking up requests during that time it'll probably extend it. Best not to test right when you leave somewhere or sleep
07:26:58<W7RFa6AbNFz>ok will look out for those, thanks for your help.
08:07:26razul joins
08:28:55qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
09:03:02decky_e quits [Client Quit]
09:11:31decky_e (decky_e) joins
09:21:41Ruthalas5 quits [Ping timeout: 252 seconds]
09:22:40Ruthalas5 (Ruthalas) joins
09:26:53nfriedly joins
10:00:01railen63 quits [Remote host closed the connection]
10:00:18railen63 joins
10:16:59icedice (icedice) joins
11:02:37Jonboy345 quits [Read error: Connection reset by peer]
11:02:37JohnnyJ quits [Read error: Connection reset by peer]
11:52:34railen63 quits [Remote host closed the connection]
11:53:41railen63 joins
11:54:29decky_e quits [Remote host closed the connection]
12:04:15justmolamola joins
12:24:19justmolamola quits [Remote host closed the connection]
13:18:44Arcorann quits [Ping timeout: 252 seconds]
13:21:04BigBrain quits [Remote host closed the connection]
13:21:25BigBrain (bigbrain) joins
13:24:03diggan quits [Quit: Connection closed for inactivity]
13:33:53<h2ibot>SaveThEWhatNow edited Google (+0, updating amount deleted): https://wiki.archiveteam.org/?diff=49972&oldid=49789
13:33:54<h2ibot>Rob Kam edited WikiTeam (+850, add about MediaWiki Scraper): https://wiki.archiveteam.org/?diff=49973&oldid=48790
14:05:10sonick (sonick) joins
14:08:53PredatorIWD quits [Quit: Leaving]
14:51:21pabs quits [Ping timeout: 265 seconds]
14:51:58pabs (pabs) joins
15:10:23pabs quits [Ping timeout: 252 seconds]
15:13:24pabs (pabs) joins
15:46:19Island joins
16:13:47LeGoupil joins
16:17:18LeGoupil1 joins
16:18:02LeGoupil quits [Ping timeout: 252 seconds]
16:18:02LeGoupil1 is now known as LeGoupil
16:24:56Dango360 quits [Read error: Connection reset by peer]
16:26:52Dango360 (Dango360) joins
16:36:11AmAnd0A quits [Ping timeout: 252 seconds]
16:37:00AmAnd0A joins
16:37:33LeGoupil quits [Ping timeout: 258 seconds]
16:40:48sonick quits [Client Quit]
16:57:05threedeeitguy quits [Ping timeout: 252 seconds]
16:57:48Unholy2361 quits [Remote host closed the connection]
16:59:24Unholy2361 (Unholy2361) joins
17:05:35threedeeitguy (threedeeitguy) joins
17:15:08threedeeitguy quits [Read error: Connection reset by peer]
17:15:24threedeeitguy6 (threedeeitguy) joins
17:22:13nostalgebraist joins
17:36:00<nostalgebraist>i'm running warrior in docker, and sometimes there are long pauses between successive steps in the item logs, with nothing downloading or uploading. and no message about sleeping - often it comes between one successful request and the next. it doesn't seem to be isolated to one project. what would cause this?
17:37:42<nostalgebraist>there are long periods where all items are like this, and then it mysteriously stops and no items are like this for a while
17:39:11<fireonlive>see #warrior for help
17:51:39<h2ibot>Bzc6p edited Network.hu (+236, Downloading content finished.): https://wiki.archiveteam.org/?diff=49974&oldid=49449
17:54:40<h2ibot>Bzc6p edited Indafotó (-64, Downloading content started.): https://wiki.archiveteam.org/?diff=49975&oldid=49638
17:56:41<BigBrain>stackexchange listens more to reddit community than reddit does?
18:00:41<h2ibot>JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=49976&oldid=49965
18:14:57<fireonlive>at this point i think an inanimate object would listen better to the reddit community than reddit does
18:19:33<BigBrain>true
18:19:55<BigBrain>stone would not be so toxic
18:27:36tiger_millionaire joins
18:28:31Megame (Megame) joins
18:32:10Twisty joins
18:41:02nicolas17 quits [Ping timeout: 252 seconds]
18:47:42andrew quits [Client Quit]
18:51:38andrew (andrew) joins
18:53:02nicolas17 joins
18:55:07<masterx244|m>sucks that we couldnt suckle reddit out the slow way and instead got a immediate fire burning there...
18:57:03<BigBrain>if burns down we have a good chunk
19:01:08<masterx244|m>once [a]rkiver processed the i.redd.it linklist we got everything in some form (text content is secured by pushshift as the fallback for the stuff that we can#t yank out otherwise)
19:05:11<fireonlive>do what to reddit >_>
19:05:37Urgo quits [Remote host closed the connection]
19:05:58<BigBrain>fireonlive: suckle it dry of data
19:06:05<BigBrain>hmmm yummy data ;)
19:06:12<fireonlive>mmmmm yummy
19:06:46<masterx244|m>BigBrain: aka the usual buisness of archiveteam
19:08:25<BigBrain>masterx244|m: AT is more like vacuum than slow suckle
19:10:07<masterx244|m>yeah, high-speed pipelines that can even get amazon to wave a white flag if it ends there unexpectedly
19:10:12<masterx244|m>remember imgone and commoncrawl...
19:10:52<JTL>3
19:10:58<JTL>oops
19:26:58Urgo (Urgo) joins
19:41:21AmAnd0A quits [Ping timeout: 265 seconds]
19:41:40AmAnd0A joins
19:46:26<fireonlive>2
19:47:02<BigBrain>1
19:50:02<masterx244|m>BOOM!
19:51:31<JTL>:D
19:52:33<fireonlive>=3
19:57:56AmAnd0A quits [Read error: Connection reset by peer]
19:58:13AmAnd0A joins
20:02:34decky_e (decky_e) joins
20:06:35yts98 leaves
20:06:38yts98 joins
20:10:40<BigBrain>:)
20:12:58decky_e quits [Read error: Connection reset by peer]
20:13:17decky_e (decky_e) joins
20:23:53AmAnd0A quits [Ping timeout: 252 seconds]
20:24:24AmAnd0A joins
20:37:44AmAnd0A quits [Read error: Connection reset by peer]
20:38:00AmAnd0A joins
20:39:10nostalgebraist quits [Client Quit]
20:57:38railen63 quits [Remote host closed the connection]
21:00:39railen63 joins
21:04:35@Sanqui quits [Ping timeout: 252 seconds]
21:04:35lunik173 quits [Ping timeout: 252 seconds]
21:24:59<@arkiver>hello all
21:25:01<@arkiver>on ragtag
21:25:13<@arkiver>an archive of "vtuber". assume this is meant? https://en.wikipedia.org/wiki/VTuber
21:25:25Sanqui joins
21:25:35<@arkiver>38 TB for 14k videos that are actually not on IA
21:25:37<@arkiver>err
21:25:40<@arkiver>not on YouTube*
21:26:15<@arkiver>are all these videos from "vtubers"?
21:26:38Sanqui quits [Changing host]
21:26:38Sanqui (Sanqui) joins
21:26:38@ChanServ sets mode: +o Sanqui
21:26:45<@arkiver>imer noted that these videos have been deleted from youtube when vtubers whitch avatars?
21:26:51<imer>https://old.reddit.com/r/DataHoarder/comments/143zvuh/ragtag_archive_is_going_offline_138_pb_of_vtuber/ there's some context here, albeit quite opinionated comments, apparently
21:27:28<imer>Archivist (think that's one of the-eye people?): "/u/textfiles will tell me if I'm allowed put a copy on archive.org down the line. I think ArchiveTeam ripping from the site is just going to be a headache given it's served from google, if anything maybe they just rip the site and not the video and I deliver the video separately?"
21:27:42<@arkiver>yes
21:29:18<@arkiver>are copyright strikes not the reason for data deletion from youtube?
21:30:41<imer>https://old.reddit.com/r/DataHoarder/comments/143zvuh/ragtag_archive_is_going_offline_138_pb_of_vtuber/jnern6f/ think this is what I remembered with the person quitting = channel deletion
21:31:08<imer>got no involvement in vtuber stuff so thats all I know
21:31:28<@arkiver>thanks a lot
21:31:36<nicolas17>good god the URL mislead me
21:31:41<nicolas17>"138PB?!?!?"
21:31:46<@arkiver>if this is correct about the 35 TB, we can put it on IA
21:32:14<@arkiver>it would be best to follow the identifier and metadata pattern that is followed by tubeup for this data to fit in
21:32:34<@arkiver>and i cannot make promises about the accessibility of the data, since there may be content problems found later on
21:33:10<imer>"content problems"? as in the copyright strikes you mentioned?
21:33:19<@arkiver>aybe
21:33:21<@arkiver>maybe
21:34:02<@arkiver>(disclaimer that i speak not for IA, this is personal opinions/thoughts/etc.)
21:35:07<@arkiver>vokunal|m: ^
21:35:42<@arkiver>i won't have time to work on this, but if someone wants to do that for the 35 TB (or 38?) please do - with tubeup identifiers and similar metadata
21:36:17<@arkiver>there is also metadata for all other videos? that might be interesting to preserve. the outlinks we can find could go into the #// project
21:37:31<nicolas17>[21:38] <vokunal|m> Here's a link to all the metadata on the site. https://ragtag.link/archive-database
21:37:33<nicolas17>[21:38] <vokunal|m> I don't know how to sort through that, but if someone has the knowhow on how to grab the info, each line has a video id, and it would be easy to grab only the data you need form it. it's in NDJSON
21:37:41<nicolas17>do you need some data-processing of that metadata dump?
21:40:02<@arkiver>only 189 MB?
21:40:06<@arkiver>we can put that in ArchiveBot
21:40:17<@arkiver>(done)
21:40:27<nicolas17>that's (uncompressed) a 725MB ndjson with the metadata
21:48:14<@JAA>That one was already archived when they announced the shutdown, but I guess now we have two copies. :-)
21:48:24<@arkiver>yay two copies
21:48:34<nicolas17>for each video there's an "info" json file like this https://content.archive.ragtag.moe/gd:1X8hZ10XrxwI5X14F5QwWUR3_YKnwRDcP/0Ibafh7x_ow/0Ibafh7x_ow.info.json
21:51:53<nicolas17>for some there's a giant json with chat replay
21:52:00<vokunal|m>arkiver The 38TB varies. They're any video that doesn't show up anymore. Sifting through randomly, looks like some are banned channels, some are deleted channels, some deleted videos, some private videos. The original list ragtag gave out had 19,246 videos, ~5K were simply unlisted, total 49TB. The 14K list I sent was everything minus unlisted videos and videos that are now public
21:53:20<nicolas17>archive-database has 219259 videos listed
21:53:54<nicolas17>so you filtered out the ones still available on youtube, and the result was 49TB?
21:54:57<vokunal|m>They gave out this datadump which I assume has or had 0% public videos in it when they put it up. I don't know of a way to distinguish between public and unlisted. We can archive either, but I sorted it by all non unlisted in the files I gave https://ragtag.link/archive-videos
21:55:47<nicolas17>ah so that's their dump, okay
21:56:11<vokunal|m>Based on their announcement, it seems to be a complete list of everything they had that's no longer public
21:56:40<nicolas17>to make our own check of what's public/unlisted/private, we'd need to hit some YouTube API 221259 times :)
21:58:18<imer>I used thumbnails for checking if videos were inaccessible (so deleted or private), unlisted ones will have their thumbnail public still though
21:58:51<imer>https://i.ytimg.com/vi/ID_GOES_HERE/hqdefault.jpg
21:59:09<nicolas17>(looking at some of the videos, I feel like a good video codec could compress this a *lot* :P but I know that's not the "preservation" way to go)
22:00:04Hajdar quits [Remote host closed the connection]
22:00:20Hajdar (Hajdar) joins
22:01:31<vokunal|m>A lot of the videos are just a static image with asmr going on, so I'd imagine those would be super compresseable
22:02:22<nicolas17>even with movement, some vtubers seemed super cartoony with super flat colors which also compresses well
22:02:56<nicolas17>one had a video effect of leaves or flower petals or something constantly falling in the foreground, that's basically noise and it's *terrible* for compression :D
22:03:22<imer>recompressing 35tb of video is going to take *forever* though
22:03:37<nicolas17>for sure
22:04:48<nicolas17>anyway, this seemed to end in "if someone wants to archive it please do" but I'm unclear on *what* needs to be done
22:06:20dumbgoy__ joins
22:09:47za3k joins
22:10:20<fireonlive>i think arkiver meant anyone can do it (but please be sure to follow the identifier/metdata paterns that tubeup uses so it's easily findable)
22:10:35yawkat quits [Ping timeout: 252 seconds]
22:10:46<fireonlive>aka pls do it with that in mind but otherwise it's up for grabs for someone who feels like it
22:11:47<nicolas17>yes but I don't know what needs to be done exactly
22:12:07<nicolas17>would this be uploaded as an archive.org item or as WARCs for WBM?
22:12:50<vokunal|m>Does archivebot have a way to grab the site without grabbing the videos? Just to save the site itself.
22:14:43<vokunal|m>I don't have a clue where to start on trying to get the comments
22:15:16<fireonlive>hm i think 3rd party would preclude WBM inclusion
22:15:27<fireonlive>but ye let's see
22:18:41<vokunal|m>Oh. seems not as hard as I thought. All chat downloads have .chat.json at the end of the url
22:18:55<nicolas17>that's chat replay yes
22:19:00<nicolas17>the .info.json has video comments
22:19:48<fireonlive>darn google >_>
22:24:29yawkat (yawkat) joins
22:26:19Island quits [Read error: Connection reset by peer]
22:27:52lennier2 joins
22:28:14Island joins
22:30:36lennier1 quits [Ping timeout: 258 seconds]
22:30:44lennier2 is now known as lennier1
22:35:49hitgrr8 quits [Client Quit]
22:46:41<fireonlive>fuck sake
22:46:46<fireonlive>oops wrong window
22:48:31<h2ibot>KamafaDelgato edited Miraheze (+3): https://wiki.archiveteam.org/?diff=49977&oldid=49961
23:01:50lunik173 joins
23:06:16<@arkiver>yeah let's not archive the videos into WARCs
23:18:23icedice quits [Client Quit]
23:22:21AmAnd0A quits [Ping timeout: 258 seconds]
23:23:03AmAnd0A joins
23:23:55G4te_Keep3r3492 quits [Quit: The Lounge - https://thelounge.chat]
23:24:32G4te_Keep3r3492 joins
23:25:07MactasticMendez quits [Quit: Connection closed for inactivity]
23:34:37AmAnd0A quits [Ping timeout: 258 seconds]
23:34:42AmAnd0A joins
23:39:20Steamy joins
23:39:29Steamy quits [Remote host closed the connection]
23:57:14<Misty>@nicolar17 @imer @arkiver r/datahoarder's -Archivist is now downloading and going to host ragtag's archive
23:57:36<Misty>also CC @vokunal|m
23:59:07<Misty>currently ragtag's archive: 1. archivist has a full local backup 2. I have a full backup on GDrive 3. RagTag's own GSuite will still last, just turned into RO 4. some of precious video are made into torrent and seeded