00:00:04<nicolas17>mediafire has a whole project and tracker and dedicated channel
00:00:26<nicolas17>for mega I believe we have nothing
00:02:45<pokechu22>Archivebot records outlinks in the meta-warc, but they aren't collected in a particularly organized way and for mega, you need data in the anchor which is not logged. #// automatically sends mediafire links to the dedicated DPoS project but I'm not sure about anything else
00:04:08lennier2_ quits [Read error: Connection reset by peer]
00:04:23lennier2_ joins
00:16:44tekulvw (tekulvw) joins
00:17:31<ericgallager>right, #mediaonfire
00:18:22<ericgallager>and #googlecrash for Google Drive
00:19:04lunik1 quits [Quit: :x]
00:19:29lunik1 joins
00:20:38<h2ibot>Cooljeanius edited Mega (+28, /* Vital signs */ wikify): https://wiki.archiveteam.org/?diff=60559&oldid=60558
00:21:30tekulvw quits [Ping timeout: 268 seconds]
00:31:05colokoko joins
00:31:37BrokenStone joins
00:32:22<colokoko>https://myrient.erista.me/ announced closure on telegram.
00:34:23<nicolas17>I think we'll have to contact the owner of myrient before archival, it seems they've been a target of scraping and hotlinking
00:34:33<nicolas17>we'll likely fall into "Your download was detected as abusive and heavily restricted for that reason" very soon
00:34:41<nicolas17>if we just throw it into AB
00:34:44<ericgallager>link to the telegram post?
00:35:04<nicolas17>ericgallager: https://transfer.archivete.am/inline/QdGZO/myrient_message.txt
00:37:32<nicolas17>>Why do you not make the content available to download via torrents?
00:37:34<nicolas17>> The truth is that people are only willing to seed the content they are interested in and not obscure content that nobody has heard of. Direct downloads ensure full availability by allowing all content to be available for download.
00:37:39<nicolas17>isn't that solved with webseeds?
00:38:03<nicolas17>keep serving it via http so no file is lost
00:38:12<nicolas17>use the torrent so it doesn't kill your server bandwidth
00:38:40<BlankEclair>could you imagine web leeches
00:42:45Synbi quits [Client Quit]
00:44:41Webuser642322 joins
00:45:23Webuser642322 quits [Client Quit]
01:01:15BrokenStone quits [Client Quit]
01:06:27colokoko quits [Client Quit]
01:13:05kdy quits [Ping timeout: 272 seconds]
01:14:38kdy (kdy) joins
01:19:09hexa- quits [Killed (hexa- (kill me already))]
01:19:14quackifi joins
01:19:29hexa- (hexa-) joins
01:20:46tekulvw (tekulvw) joins
01:22:08<quackifi>hey guys, has there ever been an archive of themeworld.com? it's still up and has so many windows themes from the 2000s and 90s. considering it's still up and running could someone please put it into archivebot? thank you
01:24:56<pokechu22>Looks like we ran it in 2019, and it was 62GB then. Probably fine to run it again
01:25:27<quackifi>awesome thank you
01:28:21quackifi quits [Client Quit]
01:38:12tekulvw quits [Remote host closed the connection]
01:58:03lennier2_ quits [Ping timeout: 272 seconds]
01:59:41lennier2_ joins
01:59:56<nicolas17>94600 pages retrieved ugh
02:01:03<nicolas17>wonder if images need the akamai crap too
02:08:25tekulvw (tekulvw) joins
02:26:21sec^nd quits [Remote host closed the connection]
02:27:02sec^nd (second) joins
02:28:32tekulvw quits [Ping timeout: 268 seconds]
02:31:02tekulvw (tekulvw) joins
02:37:47v01d quits [Ping timeout: 268 seconds]
02:41:29tekulvw quits [Ping timeout: 268 seconds]
02:41:45ducky quits [Ping timeout: 272 seconds]
02:44:33<nicolas17>100xxx pages take 2 seconds, 105xxx pages take 6 seconds, wtf?
02:45:11tekulvw (tekulvw) joins
02:52:51ducky (ducky) joins
02:56:19tekulvw quits [Ping timeout: 272 seconds]
03:00:48pabs quits [Read error: Connection reset by peer]
03:05:37tekulvw (tekulvw) joins
03:06:29pabs (pabs) joins
03:10:15tekulvw quits [Ping timeout: 272 seconds]
03:11:30tekulvw (tekulvw) joins
03:22:17tekulvw quits [Ping timeout: 272 seconds]
03:33:19tekulvw (tekulvw) joins
03:33:52DogsRNice joins
03:34:57DogsRNice_ quits [Ping timeout: 272 seconds]
03:38:07tekulvw quits [Ping timeout: 272 seconds]
03:56:08Hackerpcs quits [Quit: Hackerpcs]
03:56:59Hackerpcs (Hackerpcs) joins
04:10:14kiska (kiska) joins
04:13:29etnguyen03 quits [Remote host closed the connection]
04:24:32cyanbox joins
04:35:18tekulvw joins
04:39:53tekulvw quits [Ping timeout: 268 seconds]
04:43:23chunkynutz60 quits [Quit: The Lounge - https://thelounge.chat]
04:43:37chunkynutz60 joins
04:48:40tekulvw (tekulvw) joins
04:51:57Island quits [Read error: Connection reset by peer]
04:53:27tekulvw quits [Ping timeout: 268 seconds]
05:04:33n9nes quits [Ping timeout: 268 seconds]
05:05:36n9nes joins
05:22:55object404 joins
05:24:17<object404>Hey all!
05:24:17<object404>Question: what's the best free way to archive social media posts to preserve as evidence in research papers and citations?
05:24:17<object404>like have citation links to the archived post to now that Archive Today seems to have turned into a bad actor that tampers with the contents of what they archive...
05:27:50tekulvw (tekulvw) joins
05:30:50SootBector quits [Remote host closed the connection]
05:31:58SootBector (SootBector) joins
05:35:23tekulvw quits [Ping timeout: 268 seconds]
05:38:08DogsRNice quits [Read error: Connection reset by peer]
05:40:47object404 quits [Client Quit]
05:58:05tekulvw (tekulvw) joins
05:58:54rover joins
06:00:37roverinexile quits [Ping timeout: 272 seconds]
06:01:18ArchivalEfforts quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
06:01:27ArchivalEfforts joins
06:03:08tekulvw quits [Ping timeout: 268 seconds]
06:12:36Wohlstand (Wohlstand) joins
06:18:24nexussfan quits [Quit: Konversation terminated!]
06:45:52tekulvw (tekulvw) joins
06:50:37tekulvw quits [Ping timeout: 268 seconds]
06:51:53webuser joins
06:52:17<webuser>heres a website if you need to archive
06:52:27<webuser>this website is goimagine.com
06:53:26<webuser>goimagine said that their services will be shutting down march 23 2026 along with mosaic websites https://goimagine.com/goodbye
06:54:31<webuser>ok cya
06:54:54webuser quits [Client Quit]
06:56:14tekulvw (tekulvw) joins
07:05:13tekulvw quits [Ping timeout: 272 seconds]
07:21:48atphoenix__ quits [Quit: Leaving]
07:53:10tekulvw (tekulvw) joins
08:13:28atphoenix (atphoenix) joins
08:24:03Zalgo joins
08:56:14AlsoHP_Archivist (HP_Archivist) joins
08:56:20HP_Archivist quits [Quit: Leaving]
08:57:26AlsoHP_Archivist quits [Client Quit]
08:57:38HP_Archivist (HP_Archivist) joins
09:00:00tertu quits [Quit: so long...]
09:03:01tertu (tertu) joins
09:04:57<pabs>PredatorIWD: re ArchiveBot monitoring for Mediafire etc links, https://wiki.archiveteam.org/index.php/ArchiveBot/Monitoring
09:06:53<pabs>#// does something similar
09:09:21tekulvw quits [Ping timeout: 272 seconds]
09:14:57<pabs>!tell object404 depends on the social media in question. most of them require JS and have heavy anti-bot stuff, so archive.today is usually best, or #jseater may work (not added to WBM) or SPN may work
09:14:58<eggdrop>[tell] ok, I'll tell object404 when they join next
09:16:57cyanbox quits [Ping timeout: 272 seconds]
09:19:19cyanbox joins
09:28:44Webuser842443 joins
09:28:59fea (fea) joins
09:34:04<BlankEclair>!tell object404 i've found that https://megalodon.jp works for twitter, mastodon, and misskey
09:34:04<eggdrop>[tell] ok, I'll tell object404 when they join next
10:01:52fea quits [Client Quit]
10:04:53Andres99 joins
10:26:56FiTheArchiver joins
10:27:21FiTheArchiver quits [Remote host closed the connection]
10:36:36<@arkiver>Andres99: hi, do you know if the game archive holds unique data?
10:37:23cyanbox quits [Read error: Connection reset by peer]
10:37:31<Andres99>arkiver hi there, yes many say that it does
10:38:17<@arkiver>Andres99: alright, but do you have more details? what parts/directories of it are unique?
10:38:26<@arkiver>or rare or not widely available normally
10:45:10<Andres99>arkiver it has Redump, No-Intro, TOSEC, raw isos and game assets (and more), most of which are not widely available normally and are pretty rare
10:46:28cipherrot quits [Quit: ZNC 1.10.1 - https://znc.in]
10:54:42cyanbox joins
10:55:03Dada joins
11:15:33corentin0 joins
11:19:29corentin quits [Ping timeout: 268 seconds]
11:19:35corentin9 joins
11:21:57corentin0 quits [Ping timeout: 268 seconds]
11:24:17<Andres99>arkiver So what do you say?
11:25:38<@arkiver>390 TB is significant, so we need to have a good look at it
11:25:48<@arkiver>if 200 TB of that is easily available elsewhere, we should not duplicate that
11:26:14<Andres99>Alright
11:28:01<Andres99>please inform me if anything new happens
11:55:19<@arkiver>medica project starting
11:55:22<@arkiver>in minutes
11:57:35<@arkiver>this would be shutting down tomorrow
12:00:02Bleo1826007227196234552220 quits [Quit: The Lounge - https://thelounge.chat]
12:02:42Bleo1826007227196234552220 joins
12:08:12Snivy quits [Ping timeout: 268 seconds]
12:19:38Andres99 quits [Client Quit]
12:20:52Andres99 joins
12:23:14Webuser761710 joins
12:23:20Webuser761710 quits [Client Quit]
13:04:44<@imer>arkiver: thoughts on increasing the rate limit? seems to be running well
13:15:21midou joins
13:19:43<@imer>bumped to 1k/min
13:21:13<kiska>websocket listening is up
13:28:14<@imer>bumped to 2k/min
13:30:09<kiska>I would think its ok as long as RTT doesn't explode
13:31:33Arcorann__ quits [Ping timeout: 272 seconds]
13:32:32<kiska>Any rate limits I should be aware of?
13:32:52<kiska>IRSR is still pretty high so I'll add some workers
13:37:44tekulvw (tekulvw) joins
13:42:33tekulvw quits [Ping timeout: 268 seconds]
13:45:29TastyWiener95 quits [Ping timeout: 272 seconds]
13:48:04<@imer>bumped to 4k/min
13:48:23<Sluggs>im running conc 10 per ip
13:48:30<@imer>no sign of limiting @ 20
13:52:17<@imer>bumped to 8k/min
13:52:35TastyWiener95 (TastyWiener95) joins
13:57:21midou quits [Ping timeout: 268 seconds]
13:57:26<kiska>I see
13:59:32<pabs>some ideas from OpenStreetMap folks about how to archive websites of shops/etc that closed: https://news.ycombinator.com/item?id=47177190
14:00:02Webuser120495 joins
14:08:44<@imer>bumped medica rate limit to 16k/min
14:10:24<@imer>also increasing multi item size to 20 (from 5)
14:13:47<@imer>it is nice to have a site that's not running on a potato for once, limit to 32k/min
14:14:21<BornOn420_>any advice for the concurrency for medica?
14:14:42<h2ibot>Cruller edited ArchiveBot/Monitoring (+62, /* Ideas */ Add a link to…): https://wiki.archiveteam.org/?diff=60560&oldid=60515
14:15:19<@imer>BornOn420_: I'm running fine at 20, might not be any limiting
14:31:48Andres99 quits [Client Quit]
14:37:46<h2ibot>Justauser edited FurAffinity (+13, Marked access-restricted): https://wiki.archiveteam.org/?diff=60561&oldid=59655
14:49:16midou joins
14:55:27Andres99 joins
15:04:44cyanbox quits [Read error: Connection reset by peer]
15:16:26Webuser399533 joins
15:23:52<h2ibot>Klea edited Phorge/uncategorized (+66, Add phabricator.services.mozilla.com): https://wiki.archiveteam.org/?diff=60562&oldid=60225
15:29:06<justauser>What do people think about The Powder Toy website https://powdertoy.co.uk/ ?
15:29:38<justauser>I can see an attempted AB job about a decade ago, aborted with no explanation next day, and nothing else.
15:32:46<klea>huh, abort reasons are saved in IA?
15:32:53<h2ibot>Manu edited Discourse/archived (+110, Queued discourse.writefreesoftware.org): https://wiki.archiveteam.org/?diff=60563&oldid=60551
15:32:54Mateon1 quits [Remote host closed the connection]
15:33:05Mateon1 joins
15:33:24<justauser>They are saved in IRC logs.
15:33:34<justauser>But this time, nothing useful is here.
15:33:51<klea>oh
15:35:54<h2ibot>Manu edited Discourse/archived (+97, Queued forum.yunohost.org): https://wiki.archiveteam.org/?diff=60564&oldid=60563
15:36:36<justauser>It may be large (forums, wiki, community data) but unlikely to be huge.
15:42:09Nekroschizofrenetyk joins
15:46:07Nekroschizofrenetyk quits [Client Quit]
15:52:36<justauser>Found a possible complication. On the wiki, some links point to http://, which is broken. Something something misconfigured Anubis, I guess?
15:53:12<justauser>Probably was not the reason back then.
15:56:19<klea>Does the usual -u curl not fix it?
15:57:16<justauser>It bypasses the Anubis on HTTPS, but HTTP breakage seems hopeless.
15:57:46<justauser>You can try poking it a bit.
15:58:33<justauser>For a Wikiteam dump that I'm running right now, I just used sed 's@http://@https://@' on image list.
15:58:48<justauser>But AB can't do that without a lot of pain.
16:00:41midou quits [Ping timeout: 268 seconds]
16:02:33midou joins
16:13:48<klea>:(
16:14:12<klea>maybe having *AA run some automated script to modify db behind the scenes?
16:27:54<Andres99>Exorcism|irc hey, what do you mean?
16:31:03tekulvw (tekulvw) joins
16:35:50tekulvw quits [Ping timeout: 268 seconds]
16:44:47<kiska>I think we might be killing medica, I am getting a couple =0's from my workers
16:48:27<kiska>And RTT is starting to go up quite a bit
16:49:12<kiska>See https://grafana3.kiska.pw/goto/53NO8bdvR?orgId=1
16:50:39tekulvw (tekulvw) joins
16:51:42<kiska>And item completion is going down https://server8.kiska.pw/uploads/31fb844d04f16f81/image.png
16:51:49Webuser244410 joins
16:52:04<Webuser244410>Myrient is dying in a month. It's 600+ TB of data.
16:55:07Webuser244410 quits [Client Quit]
16:55:34tekulvw quits [Ping timeout: 268 seconds]
16:57:25<@imer>re Myrient: question is how much of that is not on IA yet - don't think IA would be particularly happy to have even more copies of roms..
16:59:30Island joins
17:01:58Webuser399533 quits [Client Quit]
17:02:46<@imer>arkiver: medica todo seems to have run out - are the zoom items good to run? I see you're moving them, but didn't pattern limit
17:06:17<Andres99>@webuser244410 arkiver said that they will take a good look at it, also it's 390 TBs not 600
17:06:53<Andres99>sorry didn't mean to ping
17:10:31<Andres99>imer i would say a pretty good chunk of it
17:11:02<kiska>Perhaps we should reduce the multi-item size or maybe the limit for medico seems to be struggling hard
17:11:38<@imer>Andres99: any specific subsets of their data? would be easier to justify then
17:11:44<@imer>kiska: checking
17:12:22<kiska>https://server8.kiska.pw/uploads/ff45da3448796752/image.png Seems to have cratered
17:21:54<Andres99>imer well there's raw isos, game assets, region exclusive dumps (and also a lot more) most of which are rare and obscure
17:25:22<Andres99>And those are the ones that I can remember off the top of my head
17:53:05kiska520 joins
17:54:23kiska52 quits [Ping timeout: 272 seconds]
17:54:24kiska520 is now known as kiska52
18:11:29hexagonwin quits [Ping timeout: 272 seconds]
18:12:19hexagonwin (hexagonwin) joins
18:26:01<nicolas17>considering the website says "Myrient sets the standard for video game preservation and takes a different approach by focusing on accessibility" it seems the owner likes preservation and wouldn't mind archival
18:26:57tekulvw (tekulvw) joins
18:27:01<nicolas17>but considering the website says "Why is my download speed limited to 10 KB/s? Your download was detected as abusive and heavily restricted for that reason." we'll probably get banned very soon if we just throw our scrapers at the entire site
18:31:46tekulvw quits [Ping timeout: 268 seconds]
18:33:51Andres99 quits [Quit: Ooops, wrong browser tab.]
18:37:46<nicolas17>I joined the Myrient discord
18:38:10<klea>Ask if the download speeds can be unlimited?
18:38:54<nicolas17>seems there's a hundred uncoordinated people individually trying to save this and making things harder for everyone else
18:39:39sg72 quits [Quit: Leaving]
18:39:49<klea>:(
18:40:19<nicolas17>also, considering there's a folder "Internet Archive: Various content that is at risk of being removed or was removed from the Internet Archive"
18:41:36<nicolas17>archiving it on IA could be a problem
18:44:26Sk1d joins
18:44:47Andres99 joins
18:46:48<masterx244|m>not sure how easily they would notice it being hidden inside a WARC and not a plain item in that case
18:46:48chunkynutz60 quits [Read error: Connection reset by peer]
18:46:55chunkynutz60 joins
18:49:27klea points out IA monitor this channel very closely.
18:50:01Sk1d quits [Client Quit]
18:50:05<klea>s/IA /part of IA's staff /
18:54:55Yakov quits [Quit: Ping timeout (120 seconds)]
18:55:12Yakov (Yakov) joins
18:55:25tekulvw (tekulvw) joins
19:00:08tekulvw quits [Ping timeout: 268 seconds]
19:02:32<nicolas17>arkiver: https://myrient.erista.me/files/Redump/BD-Video/?C=S&O=D these file sizes could be a problem I guess?
19:04:00sg72 joins
19:17:39tekulvw (tekulvw) joins
19:21:01<IDK>item_request_serve_rate: 6.4042844931251e-8
19:21:07<IDK>whats going on with medica project lol
19:22:20tekulvw quits [Ping timeout: 268 seconds]
19:25:50tekulvw (tekulvw) joins
19:33:12<IDK>ah, the 4.7m items are in todo:redo
19:33:40<IDK>arkiver, imer: should that be activated now?
19:34:26<IDK>project currently at standstil right now
19:41:50<@imer>IDK: arkiver put a limit on the urls that are in redo, so i'm not sure what to do with them
19:42:03tekulvw quits [Ping timeout: 272 seconds]
19:42:45<Andres99>nicolas17 for that directory, it comes to a total of 5.73 TBs
19:43:08<nicolas17>Andres99: afaik individual file sizes are a problem for a few of our tools
19:43:22<Andres99>Alright
19:49:34<multisn8>it contains no "new" content i think; from what i checked so far it seems to "just" be a well-sorted downloaded collection of torrents
19:49:50tekulvw (tekulvw) joins
19:51:39Webuser664432 joins
19:51:52Webuser664432 quits [Client Quit]
19:52:06Webuser972288 joins
19:53:21Webuser972288 quits [Client Quit]
19:54:24tekulvw quits [Ping timeout: 268 seconds]
19:55:13<multisn8>ref myrient
20:03:29<h2ibot>Manu edited Discourse/archived (+99, Queued community.icinga.com): https://wiki.archiveteam.org/?diff=60565&oldid=60564
20:05:03etnguyen03 (etnguyen03) joins
20:05:11archiveDrill quits [Quit: The Lounge - https://thelounge.chat]
20:06:14archiveDrill joins
20:07:51<IDK>now its moving!
20:09:29<nicolas17>what's the medica tracker?
20:10:19<nicolas17>ETA 6 days...
20:14:20<IDK>I mean we ~could~ increase the ratelimit
20:16:22<klea>2026-02-26 15:30:32 <@ark****> shutting down on the 28th
20:16:41<klea>Also, wtf are we grabbing, from what I saw it's just links to IA?
20:20:20tekulvw (tekulvw) joins
20:25:07tekulvw quits [Ping timeout: 272 seconds]
20:29:45tekulvw (tekulvw) joins
20:34:29tekulvw quits [Ping timeout: 268 seconds]
20:37:03grill (grill) joins
20:37:06Webuser566923 joins
20:44:42tekulvw (tekulvw) joins
20:49:37<nicolas17>ok so yesterday there were 3842558 titles in the classification.gov.au website
20:50:04<nicolas17>now there are 3843252
20:50:25Andres99 quits [Client Quit]
20:50:30<nicolas17>but I have seen other numbers including 3843196 and 3842350 (are titles *removed* too?)
20:52:21tekulvw quits [Ping timeout: 272 seconds]
20:53:11<nicolas17>page-130509.html:of 3842946 results</div>
20:53:13<nicolas17>page-130510.html:of 3842946 results</div>
20:53:14<nicolas17>page-130511.html:of 3843196 results</div>
20:53:16<nicolas17>page-130512.html:of 3843252 results</div>
20:53:17<nicolas17>page-130513.html:of 3843252 results</div>
20:53:19<nicolas17>this was like 50 minutes ago, isn't Australia asleep at this time? T_T
20:53:39<klea>Maybe some kind of automated system changing stuff?
20:54:41<nicolas17>if I only look at pages with the current number of results, assuming I have to redo all the others... I have 6597 out of 192046 T_T
21:00:28Andres99 joins
21:07:28tekulvw (tekulvw) joins
21:12:18Webuser818649 joins
21:12:37tekulvw quits [Ping timeout: 272 seconds]
21:15:55max (max) joins
21:22:13Laggamer30xx joins
21:22:40<Laggamer30xx>Myrient will shutdown in march. TIME TO ARCHIVE!
21:22:52Laggamer30xx quits [Client Quit]
21:27:46lumidify quits [Quit: leaving]
21:28:13<IDK>19=0 https://numerabilis.u-paris.fr/iiif/2/bibnum:pharma_prix_gobleyx1889x03:0147/1024,3072,1024,1024/512,/0/default.jpg
21:28:18<IDK>works in browser, am I banned?
21:28:52lumidify (lumidify) joins
21:42:13tekulvw (tekulvw) joins
21:42:23grill quits [Ping timeout: 272 seconds]
21:44:01grill (grill) joins
21:47:15tekulvw quits [Ping timeout: 268 seconds]
21:47:35<hexagonwin>https://myrient.erista.me/ oops
21:50:37Max_G quits [Ping timeout: 272 seconds]
21:53:31Max_G joins
22:01:36nexussfan (nexussfan) joins
22:01:37tekulvw (tekulvw) joins
22:10:15tekulvw quits [Ping timeout: 272 seconds]
22:14:59Webuser754812 joins
22:16:32Webuser754812 quits [Client Quit]
22:21:00tekulvw (tekulvw) joins
22:23:42scotrod joins
22:24:31Wohlstand quits [Remote host closed the connection]
22:26:06tekulvw quits [Ping timeout: 268 seconds]
22:30:13<@imer>!remindme 15min unpause medica (info.json seems to be timing out now)
22:30:14<eggdrop>[remind] ok, i'll remind you at 2026-02-27T22:45:13Z
22:30:40BennyOtt quits [Quit: ZNC 1.10.1 - https://znc.in]
22:32:23<nicolas17>we need a counter of how many times we've been warned about myrient
22:32:23tekulvw (tekulvw) joins
22:33:05BennyOtt (BennyOtt) joins
22:37:29tekulvw quits [Ping timeout: 272 seconds]
22:38:49<Guest>myrient++
22:38:49<eggdrop>[karma] 'myrient' now has 1 karma!
22:43:07tekulvw (tekulvw) joins
22:43:38Shard1115 (Shard) joins
22:45:15<eggdrop>[remind] imer: unpause medica (info.json seems to be timing out now)