00:04:36<thuban>ok, i'm currently scraping all the page images, and i'll feed them back into archivebot once i've got the list
00:04:45<thuban>(it'll take a little while, i'm being gentle)
00:07:32<h2ibot>JustAnotherArchivist edited Deathwatch (+159, /* 2023 */ Add OneHallyu): https://wiki.archiveteam.org/?diff=51153&oldid=51152
00:08:12<vokunal|m>It looks to be about 1.1M posts
00:08:21<vokunal|m>nvm
00:09:18<thuban>Peroniko: did you want any advice about converting to pdf and/or uploading to ia?
00:09:20<thuban>the former is easy, but the latter will probably require some manual work due to the lack / inconsistent formatting of metadata
00:10:14<vokunal|m>~11.7M posts on OneHallyu
00:10:33<@JAA>Yeah, 11.8 million is what the homepage says.
00:11:38<vokunal|m>someday i'll learn. First time i helped with a forum, I didn't see any number, the second time, i count it manually. Fool me twice, I better learn the third time
00:11:45<@JAA>:-)
00:11:46<Peroniko>thuban: I have uploaded some thing to IA, but if there is a guide for better metadata it would be helpful.
00:11:55<Peroniko>I've uploaded this for example: https://archive.org/details/arhitektura-graficki-dio
00:13:11<thuban>Peroniko: the general metadata documentation is here: https://archive.org/developers/metadata-schema/index.html
00:19:01Peroniko quits [Client Quit]
00:19:25Peroniko (Peroniko) joins
00:19:26Peroniko quits [Max SendQ exceeded]
00:19:52Peroniko (Peroniko) joins
00:24:25<@JAA>OneHallyu is running through AB now. We'll see how that goes.
00:24:32Peroniko quits [Client Quit]
00:24:40<@JAA>Buttflare is involved.
00:25:05Peroniko (Peroniko) joins
00:35:17Peroniko quits [Ping timeout: 272 seconds]
00:35:52Peroniko (Peroniko) joins
00:39:57<kpcyrd>from -ot: how do I archive videos hosted on sharepoint? it's going to be deleted in a few days: https://kth-my.sharepoint.com/:v:/g/personal/longz_ug_kth_se/EesSEHqiHHtQabKFAAAx5PsB-5r8MVtnp5NECOtKN-YGsA?e=8s0kaj
00:42:55<nicolas17>it's probably some temporary signed URL that will change on every load and can't be archived in a way that lets the original link work in WBM
00:44:58<nicolas17>oof that seems to be DASH even
00:50:33<thuban>kpcyrd: i don't think there's anything reliably plug-and-play for sharepoint
00:50:41<nicolas17>transcoded on the fly from an original .mp4 that seems impossible to access
00:51:11<thuban>you could try https://github.com/kylon/Sharedown or some of the workarounds suggested in sharepoint-related issues at https://github.com/snobu/destreamer
00:51:26<thuban>(or punt and screen-record it)
00:52:19<nulldata>There's an open PR for yt-dlp to add SharePoint. https://github.com/yt-dlp/yt-dlp/pull/6531
00:52:52<nicolas17>I *could* grab the DASH but it sucks that I can't access the original .mp4 :/
00:54:11<kpcyrd>I'm trying to get it into the wayback machine specifically:
00:54:13<kpcyrd>https://web.archive.org/web/20231116002236/https://kth-my.sharepoint.com/personal/longz_ug_kth_se/_layouts/15/stream.aspx?id=%2Fpersonal%2Flongz%5Fug%5Fkth%5Fse%2FDocuments%2Fbox%5Ffiles%2FKTH%20SR%20Meetup%2F2020%2D11%2D24%2013%2E06%2E13%20Localization%20of%20Unreproducible%20Builds%2F2020%2D11%2D24%2013%2E06%2E13%20Localization%20of%20Unreproducible%20Builds%20%2D%20Jifeng%
00:54:15<kpcyrd>20Xuan%2Emp4&ga=1
00:54:19<nicolas17>that's not going to work
00:54:25<kpcyrd>rip
00:54:26etnguyen03 quits [Ping timeout: 265 seconds]
00:54:32<nicolas17>there's timestamped, signed URLs that change every time you load the page
00:55:04<thuban>ia item not adequate?
00:55:41<nicolas17>ffmpeg doesn't do parallel requests (in fact I'm not sure if it does proper HTTP keepalive) so this DASH remux is taking me forever
01:04:37etnguyen03 (etnguyen03) joins
01:07:35igloo22225 quits [Ping timeout: 272 seconds]
01:09:30<nicolas17>oh great I got some 503 Service Unavailable too
01:12:49lunik173 quits [Client Quit]
01:18:45Peroniko quits [Remote host closed the connection]
01:19:49andrew (andrew) joins
01:22:22Peroniko (Peroniko) joins
01:28:25Peroniko quits [Client Quit]
01:49:41qwertyasdfuiopghjkl quits [Remote host closed the connection]
01:56:22rohvani5 joins
01:56:32andrew5 (andrew) joins
01:56:46TheTechRobo9 (TheTechRobo) joins
01:57:25monoxane quits [Quit: estoy fuera]
01:57:37CraftByte quits [Client Quit]
01:57:37andrew quits [Client Quit]
01:57:37rohvani quits [Client Quit]
01:57:37TheTechRobo quits [Client Quit]
01:57:37andrew5 is now known as andrew
01:57:38rohvani5 is now known as rohvani
01:57:38TheTechRobo9 is now known as TheTechRobo
01:57:38h3ndr1k quits [Client Quit]
01:57:52h3ndr1k (h3ndr1k) joins
01:58:04monoxane (monoxane) joins
01:58:09monoxane1 (monoxane) joins
01:58:09monoxane4 (monoxane) joins
01:58:14monoxane quits [Remote host closed the connection]
01:58:14monoxane4 quits [Remote host closed the connection]
01:58:14monoxane1 is now known as monoxane
02:12:55<nulldata>Gitlab is now requiring new users to verify using a phone number or credit card, or account will be deleted. So far only seems to apply to new accounts, but something to keep an eye on if they expand it to existing accounts. https://lemmy.world/post/8297909
02:23:49<flashfire42|m>https://www.androidpolice.com/ensuring-high-quality-apps-on-google-play/
02:26:41<flashfire42|m>https://www.animenewsnetwork.com/news/2023-11-11/crunchyroll-ends-digital-manga-app-on-mobile-web-on-december-11/
02:28:41wyatt8740 quits [Remote host closed the connection]
02:28:42<@JAA>Correct link for the latter: https://www.animenewsnetwork.com/news/2023-11-11/crunchyroll-ends-digital-manga-app-on-mobile-web-on-december-11/.204339
02:32:33wyatt8740 joins
02:40:06wyatt8740 quits [Remote host closed the connection]
02:45:12wyatt8740 joins
03:34:31yano quits [Ping timeout: 272 seconds]
03:48:35yano (yano) joins
04:10:49Wohlstand quits [Client Quit]
04:39:14DogsRNice_ quits [Read error: Connection reset by peer]
04:44:11dumbgoy__ quits [Ping timeout: 272 seconds]
04:48:05BlueMaxima quits [Read error: Connection reset by peer]
05:00:32<h2ibot>JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=51154&oldid=51143
05:32:57etnguyen03 quits [Ping timeout: 272 seconds]
05:37:09etnguyen03 (etnguyen03) joins
05:58:48etnguyen03 quits [Client Quit]
06:09:03icedice2 quits [Ping timeout: 272 seconds]
06:13:26Hackerpcs quits [Ping timeout: 265 seconds]
06:14:19Hackerpcs (Hackerpcs) joins
06:18:10atphoenix__ (atphoenix) joins
06:21:05atphoenix_ quits [Ping timeout: 272 seconds]
06:21:15superkuh_ joins
06:22:12atphoenix_ (atphoenix) joins
06:23:42nicolas17 quits [Client Quit]
06:24:15atphoenix__ quits [Ping timeout: 272 seconds]
06:24:15superkuh quits [Ping timeout: 272 seconds]
06:38:46Island quits [Read error: Connection reset by peer]
06:40:27Arcorann (Arcorann) joins
06:45:48atphoenix_ quits [Remote host closed the connection]
06:45:48superkuh_ quits [Remote host closed the connection]
06:45:55superkuh_ joins
06:46:12atphoenix_ (atphoenix) joins
07:02:57mindstrut joins
07:03:01mindstrut quits [Remote host closed the connection]
07:53:10<that_lurker>https://bird.makeup would be a nice alternative way to grab twitter(x) stuff. They create a mastodon account where all the tweets are posted.
07:54:10<that_lurker>also means there is no rate limiting
07:54:26<that_lurker>other than on their end of course
09:25:34superkuh__ joins
09:25:42wyatt8740 quits [Client Quit]
09:25:42andrew quits [Client Quit]
09:25:42Pedrosso quits [Client Quit]
09:25:42TheTechRobo quits [Client Quit]
09:25:42superkuh_ quits [Remote host closed the connection]
09:25:48Pedrosso joins
09:25:54wyatt8740 joins
09:26:01andrew (andrew) joins
09:26:12TheTechRobo (TheTechRobo) joins
09:35:53jacksonchen666 (jacksonchen666) joins
09:37:59TheTechRobo quits [Client Quit]
09:38:25TheTechRobo (TheTechRobo) joins
09:40:11TheTechRobo quits [Excess Flood]
09:40:40TheTechRobo (TheTechRobo) joins
09:43:40monoxane3 (monoxane) joins
09:43:45TheTechRobo quits [Excess Flood]
09:43:45andrew quits [Client Quit]
09:43:45monoxane quits [Read error: Connection reset by peer]
09:43:45Pedrosso quits [Read error: Connection reset by peer]
09:43:45monoxane3 is now known as monoxane
09:43:50Pedrosso joins
09:43:53andrew (andrew) joins
09:44:13TheTechRobo (TheTechRobo) joins
09:44:21Peroniko (Peroniko) joins
09:51:28Pedrosso quits [Client Quit]
09:51:28Peroniko quits [Remote host closed the connection]
09:51:32Pedrosso joins
10:00:01Bleo1 quits [Client Quit]
10:01:24Bleo18 joins
10:14:27jacksonchen666 quits [Client Quit]
10:36:23icedice (icedice) joins
10:48:07icedice quits [Client Quit]
11:33:59Megame (Megame) joins
11:45:26sec^nd quits [Ping timeout: 245 seconds]
11:51:12Megame1_ (Megame) joins
11:51:44TheTechRobo quits [Client Quit]
11:51:44Pedrosso quits [Client Quit]
11:51:44Megame quits [Remote host closed the connection]
11:51:48Pedrosso joins
11:52:09TheTechRobo (TheTechRobo) joins
11:52:34sec^nd (second) joins
11:58:30Megame1_ is now known as Megame
12:15:05BornOn420_ (BornOn420) joins
12:18:55BornOn420 quits [Ping timeout: 272 seconds]
13:12:12jodizzle quits [Remote host closed the connection]
13:12:57jodizzle (jodizzle) joins
13:16:03ScenarioPlanet (ScenarioPlanet) joins
13:16:33Arcorann quits [Ping timeout: 272 seconds]
13:26:41benjinsm quits [Ping timeout: 272 seconds]
13:28:29<null>https://blog.opensubtitles.com/opensubtitles/saying-goodbye-to-opensubtitles-org-api-embrace-the-20-black-friday-treat
13:28:35null is now known as rawktucc
13:28:47rawktucc quits [Client Quit]
13:29:41rktk (rktk) joins
13:29:54<rktk>stupid sexy nickserv
13:29:56<rktk>https://blog.opensubtitles.com/opensubtitles/saying-goodbye-to-opensubtitles-org-api-embrace-the-20-black-friday-treat
13:30:03<rktk>Does anyone know of a full dump or half dump of open subtitles
13:30:06<rktk>this is a real slap in the face
13:34:18<h2ibot>MasterX244 edited List of websites excluded from the Wayback Machine (+28): https://wiki.archiveteam.org/?diff=51155&oldid=51036
13:42:27lunik173 joins
13:54:32lunik173 quits [Ping timeout: 265 seconds]
13:55:27lunik173 joins
14:00:23<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51156&oldid=51155
14:01:35etnguyen03 (etnguyen03) joins
14:08:33ScenarioPlanet quits [Client Quit]
14:36:10benjins joins
14:57:12lunik1731 joins
14:57:12benjins quits [Remote host closed the connection]
14:57:12TheTechRobo quits [Client Quit]
14:57:12lunik173 quits [Client Quit]
14:57:12Pedrosso quits [Client Quit]
14:57:12lunik1731 is now known as lunik173
14:57:16Pedrosso joins
14:57:17benjins joins
14:57:36TheTechRobo (TheTechRobo) joins
14:59:33TheTechRobo quits [Excess Flood]
14:59:33benjins quits [Remote host closed the connection]
14:59:36benjins joins
15:00:09TheTechRobo (TheTechRobo) joins
15:00:55TheTechRobo quits [Excess Flood]
15:01:37TheTechRobo (TheTechRobo) joins
15:12:24automato83 quits [Read error: Connection reset by peer]
15:19:33<@arkiver>opensubtitles closing themselves?
15:23:00null joins
15:23:21wyatt8750 joins
15:23:31rktk quits [Remote host closed the connection]
15:23:31aismallard quits [Remote host closed the connection]
15:23:31h3ndr1k quits [Remote host closed the connection]
15:23:31Pedrosso quits [Client Quit]
15:23:31TheTechRobo quits [Client Quit]
15:23:31JensRex quits [Remote host closed the connection]
15:23:31wyatt8740 quits [Client Quit]
15:23:35Pedrosso joins
15:23:57TheTechRobo (TheTechRobo) joins
15:24:26aismallard joins
15:24:29h3ndr1k (h3ndr1k) joins
15:24:44JensRex (JensRex) joins
15:31:23null quits [Client Quit]
15:40:41CraftByte (DragonSec|CraftByte) joins
15:41:02xkey quits [Remote host closed the connection]
15:41:12xkey (xkey) joins
15:44:11xkey quits [Remote host closed the connection]
15:44:18xkey (xkey) joins
15:45:03xkey quits [Remote host closed the connection]
15:45:10xkey (xkey) joins
16:08:34Island joins
16:18:07Wohlstand (Wohlstand) joins
16:19:11lader joins
16:19:33lader quits [Remote host closed the connection]
16:20:53Naruyoko5 quits [Quit: Leaving]
16:31:29<Hans5958>Has anyone backed this up yet? https://pabio.com/blog/company/bankruptcy/
16:43:55<murb>oh talking of which https://www.bleed-clothing.com/de/info # "Wir sind insolvent."
16:45:33etnguyen03 quits [Ping timeout: 272 seconds]
16:52:12dumbgoy__ joins
16:55:53dumbgoy joins
16:58:41dumbgoy__ quits [Ping timeout: 265 seconds]
17:06:49Dango360 (Dango360) joins
17:09:55Dango360 quits [Read error: Connection reset by peer]
17:10:42BearFortress quits [Client Quit]
17:12:09<@JAA>arkiver: 'Only' the API, as I understand it?
17:12:27icedice (icedice) joins
17:13:06Dango360 (Dango360) joins
17:20:39<Megame>Hans5958, murb I threw them in AB
17:21:06<murb>ta
17:44:55BearFortress joins
17:47:28TheTechRobo quits [Client Quit]
17:48:02TheTechRobo (TheTechRobo) joins
17:55:25CraftByte quits [Client Quit]
17:55:25icedice quits [Remote host closed the connection]
17:55:31CraftByte (DragonSec|CraftByte) joins
17:55:37icedice (icedice) joins
17:56:49TheTechRobo quits [Client Quit]
17:57:24Pedrosso4 joins
17:57:31Pedrosso quits [Client Quit]
17:57:31Pedrosso4 is now known as Pedrosso
17:57:45TheTechRobo (TheTechRobo) joins
17:59:43icedice quits [Remote host closed the connection]
17:59:56icedice (icedice) joins
18:10:37Megame quits [Client Quit]
18:16:40<fireonlive>here's the forum post about it: https://forum.opensubtitles.org/viewtopic.php?t=17930#p47873
18:17:16<fireonlive>looks like the 'new rest api' still has a free tier: https://opensubtitles.stoplight.io/docs/opensubtitles-api/a7d25b650b784-api-subscription-prices
18:17:42<fireonlive>though those prices are.. hm.
18:17:54<@JAA>XML-RPC... Ok, yeah, I agree that needs to die already.
18:19:02<fireonlive>https://blog.opensubtitles.com/opensubtitles/saying-goodbye-to-opensubtitles-org-api-embrace-the-20-black-friday-treat posted earlier says "This decision, initially disclosed in a forum post, will primarily affect non-VIP users, while VIP members will continue to enjoy access to the API." so i guess they're keeping it around for VIP people for a
18:19:02<fireonlive>bit longer at least?
18:19:12<fireonlive>and yeah it does haha
18:55:53rktk (rktk) joins
19:04:15aninternettroll quits [Ping timeout: 272 seconds]
19:08:02icedice quits [Client Quit]
19:13:26Naruyoko joins
19:25:31parfait (kdqep) joins
19:36:42aninternettroll (aninternettroll) joins
19:49:53qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
19:52:15nicolas17 joins
19:58:19DogsRNice joins
20:15:40<anarchat>so uh
20:16:24<anarchat>apparently, i have a blogspot blog, two actually... and i learned this because google/blogger.com wrote me to tell me i haven't logged in since 2007 and so they will delete my shit... i wonder if we need to do something about this
20:16:36<anarchat>my two blogs are totally irrelevant and empty, but there might be others facing destruction out there
20:18:50<nicolas17>anarchat: it has been discussed before; how do we find "all blogs"?
20:20:17<anarchat>i have no idea
20:30:16<fireonlive>#frogger :)
20:31:00DogsRNice_ joins
20:31:10DogsRNice quits [Remote host closed the connection]
20:31:10parfait quits [Remote host closed the connection]
20:31:10Naruyoko quits [Remote host closed the connection]
20:31:10qwertyasdfuiopghjkl quits [Remote host closed the connection]
20:31:10Naruyoko5 joins
20:31:25parfait (kdqep) joins
20:33:33<nicolas17>kpcyrd: https://data.nicolas17.xyz/localization-unreproducible-builds.mp4 this is from the DASH stream on sharepoint
20:34:08<nicolas17>it wasn't easy to get because if I just fed the DASH manifest to ffmpeg, one or two segments would randomly give a "503 service unavailable" and ffmpeg doesn't retry
20:34:17<nicolas17>so I got a gap in the video
20:34:36<nicolas17>I had to download all segments and rewrite the manifest to use those local files
20:35:21<nicolas17>take it and figure out what to do with it; archive.org item or whatever :P
20:41:54qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
20:46:41<Pedrosso>https://transfer.archivete.am/J2GVQ/sporeforums1.txt does this list of largely unarchived spore forums have any URLs that the bot wouldn't be able to archive properly? In either case could the viable URLs be fed to AB?
20:48:17<@JAA>The ones that aren't entire domains could be problematic recursion-wise.
20:48:30<Pedrosso>Problematic in what way?
20:48:31<@JAA>Other than that, not sure.
20:49:18<@JAA>Not recursing properly. If you !a https://example.org/foo/ and it has a link to /bar, that won't be followed.
20:49:31<Pedrosso>not even with offsite links allowed?
20:49:46<@JAA>No, because they're not offsite.
20:50:21<@JAA>Offsite = different host
20:50:54<Pedrosso>I getcha, had hoped it was just offsite (named after different host) = outside recursion
20:51:40<@JAA>For example, it wouldn't recurse anywhere useful from https://www.mobygames.com/forum/game/36030/spore/ because those URLs aren't in .../spore/.
20:52:11<@JAA>In that case, !a https://www.mobygames.com/forum/game/36030/spore would work though.
20:52:22<@JAA>But yeah, each of those needs to be looked at individually.
20:52:31<@JAA>And some might simply not be possible.
20:52:42<Pedrosso>what does the / at the end do?
20:53:25<@JAA>It's a path segment delimiter. For the purpose of AB, the last slash in the path part of the URL determines where it'll recurse onsite.
20:53:43parfait quits [Remote host closed the connection]
20:53:43CraftByte quits [Client Quit]
20:53:48CraftByte (DragonSec|CraftByte) joins
20:53:50<@JAA>From https://www.mobygames.com/forum/game/36030/spore, it would recurse to any link starting with https://www.mobygames.com/forum/game/36030/ .
20:53:57parfait (kdqep) joins
20:54:00<Pedrosso>I see
20:56:05<Pedrosso>Should I send an transfer.archivete.am link with just the full-domain ones then?
20:57:17<@JAA>No need, this one is fine.
20:58:28<Pedrosso>Alright. Thanks a lot then
21:02:41lennier2_ quits [Ping timeout: 272 seconds]
21:04:48<@JAA>I wonder whether https://blog.seamonkey-project.org/2023/11/14/migrating-off-archive-mozilla-org/ only applies to SeaMonkey or also to other projects or even the entire archive.mozilla.org.
21:04:56<@JAA>(It's already running through AB courtesy of arkiver.)
21:05:22<@JAA>Cc pabs ^
21:07:15lennier2_ joins
21:12:50<@JAA>I'm currently listing all of archive.mozilla.org. It's ... large.
21:13:09<@JAA>I'll have a size estimate later.
21:20:27<@arkiver>JAA: maybe it's too large for ArchiveBot, i wonder how large it is. hope we can archive it entirely
21:24:12<@JAA>I'm already up to over 1.2 million *directories* after only processing 17k.
21:24:15<@JAA>So yeah...
21:29:16<@JAA>To rephrase it a bit clearer: I've processed 17k directories and discovered over 1.2 million directories from those. I'm recursing through the dir tree, obviously.
21:30:07<@JAA>And those numbers are now at 32k done, 2.1M discovered.
21:30:13<@JAA>It'll be a while...
21:31:00<Pedrosso>How long will it stay up? Assuming it has any sort of shutdown date
21:31:55<@JAA>See link above
21:32:43<@JAA>Beware of https://archive.mozilla.org/pub/firefox/tinderbox-builds/ , those subdirs are *massive*. Like, 100 MB dir listings massive.
21:33:48<@JAA>There's also at least one which doesn't finish loading within a minute.
21:34:13<Pedrosso>does AB ignore something if it doesn't load within a minute?
21:34:14<project10>mod_autoindex like 😰
21:34:39<@JAA>It's complicated.
21:35:27<@JAA>AB expects the HTTP headers within 20 seconds and the complete response within 6 hours, but slow processing of parallel requests (such as link extraction or compressing for WARC) can break the retrieval.
21:35:45<@JAA>I bet most of the dirs in there were not listed correctly on the first attempt by AB.
21:36:54<@JAA>The 1 minute timeout is the default in qwarc, which I'm using for listing this more efficiently.
21:37:09<nicolas17>100MB *listings*?
21:38:29<@JAA>Running into a problem, will need to restart the listing.
21:40:52<Pedrosso>I bid you good luck with this, lookin' forward to seeing just how big the listing file will be.
21:41:50<@JAA>nicolas17: Yes, autoland-linux64 is that one, it contains 195k entries.
21:42:37<@JAA>autoland-macosx64-debug times out on the server side after a bit over a minute with a 502.
21:50:49<@JAA>Listing restarted, going more faster now.
21:51:27dumbgoy quits [Ping timeout: 272 seconds]
21:54:51<@JAA>(I hope there are no loops via symlinks.)
21:59:36<@JAA>Oh, this time autoland-linux64 repeatedly timed out as well, yay.
22:02:12<@JAA>I think I'm running into SQLite lock contention at this point. But processing 7-9k dirs per minute isn't bad.
22:08:45qwertyasdfuiopghjkl quits [Remote host closed the connection]
22:09:40<Pedrosso>As for what I sent of spore forums, here are a few archive-related comments about the domains of the few that weren't directly in the domain https://transfer.archivete.am/mawW8/sporeforums%20addendum.txt
22:14:22etnguyen03 (etnguyen03) joins
22:16:14nicolas17 quits [Ping timeout: 265 seconds]
22:17:30<Pedrosso>An addendum to that addendum; https://gamefaqs.gamespot.com/ has an archive but https://gamefaqs.gamespot.com/boards/926714-spore/72994456 (posted before the archive) is missing (https://gamefaqs.gamespot.com/boards/926714-spore has 1 archive from ArchiveTeam though)
22:19:54nicolas17 joins
22:20:32<nicolas17>my modem rebooted... maybe because of telegrab at high concurrency /o\
22:21:09<@JAA>I don't think we ever fully archived GameFAQs. I believe there were unsuccessful/incomplete attempts only.
22:21:19<nicolas17>(reddit is much more prone to doing that)
22:22:31<nicolas17>does wget-at use keepalive?
22:25:45<Pedrosso>Ah, I see.
22:32:34Arcorann (Arcorann) joins
22:50:33<@JAA>Now doing over 10k dirs per minute. Brrrrr
22:50:55<@JAA>Still going to take at least 3 hours to get through the remaining queue. lol
22:51:06<@JAA>So yes, it is marginally too big for AB. :-P
22:52:14<Pedrosso>What alternatives are there then?
22:52:46<@JAA>It does depend a bit on how many files there are and how large they are.
22:52:53<@JAA>DPoS would be an option.
22:53:13<@JAA>Or maybe it can be done with AB with a few !ao < jobs rather than one big recursive one.
22:53:42<@JAA>The listings I've retrieved so far are already over 1 GiB of WARC, i.e. after compression.
22:54:04<Pedrosso>o_o
22:54:10<Pedrosso>how many would "a few" be?
22:54:11<h2ibot>Arkiver uploaded File:Blogger-icon.png: https://wiki.archiveteam.org/?title=File%3ABlogger-icon.png
22:54:45<@JAA>And arkiver spoke: 'let there be an icon!', and there was an icon.
22:54:57<Pedrosso>That is how it be
22:55:17<fireonlive>and it was glorious
22:55:56<Flashfire42>And the administrators of those websites said "Did anybody hear that?, Must have been the wind"
22:56:25parfait quits [Remote host closed the connection]
22:56:25CraftByte quits [Client Quit]
22:56:30CraftByte (DragonSec|CraftByte) joins
22:56:41parfait (kdqep) joins
22:57:57<Flashfire42>Whenever a new project is about to start I always imagine some kind of eldritch abomination machine just slowly whirring to life. With eyes of red blink into existence and start a march towards their target
22:58:22parfait quits [Remote host closed the connection]
22:58:44parfait (kdqep) joins
22:59:13Wohlstand quits [Ping timeout: 272 seconds]
22:59:17Wohlstand1 (Wohlstand) joins
23:00:08<@JAA>Pedrosso: 'A few' would be more than 'a couple' but not 'many'. :-P I don't know, it depends on the output of the listing.
23:00:12<h2ibot>JAABot edited CurrentWarriorProject (+4): https://wiki.archiveteam.org/?diff=51159&oldid=51154
23:01:42Wohlstand1 is now known as Wohlstand
23:01:44parfait quits [Remote host closed the connection]
23:01:44CraftByte quits [Client Quit]
23:01:49CraftByte (DragonSec|CraftByte) joins
23:02:57<@arkiver>yep :)
23:03:19<@arkiver>JAA: what is your opinion on already writing a WARC-TLS-Cipher-Suite field before it's standardised?
23:03:31<@arkiver>(related to that issue on the warc specs github repo)
23:04:49<@arkiver>or actually
23:05:05<@arkiver>WARC-Cipher-Suite (the value starting with TLS_ already makes it clear it's for TLS)
23:05:21<fireonlive>(thank you for not calling it SSL)
23:06:01<@arkiver>i'm glad i made your day fireonlive :)
23:06:07<fireonlive>:D
23:11:28<Pedrosso>JAA: oh, well it's nice that there are such convenient solutions
23:26:09<pabs>JAA: I expect archive.mozilla.org has a lot of stuff that isn't that useful to archive, like millions of test results :)
23:27:50<@JAA>arkiver: Fine with me, it's not a violation of the spec to write fields that aren't specified. Might be worth leaving a comment about the intent on https://github.com/iipc/warc-specifications/issues/86 though and seeing if anyone else has concerns about that.
23:29:22<@JAA>pabs: Yeah, I'm sure there are more and less useful parts to it.
23:29:40<@JAA>Have you possibly seen another announcement from Mozilla themselves about it?
23:29:54<pabs>not yet, but I did just wake up :)
23:30:46<pabs>nothing on https://planet.mozilla.org/
23:31:22<pabs>nothing on https://blog.thunderbird.net/ either
23:31:32<@JAA>Ah yes, time zones. :-)
23:31:35<pabs>maybe it is only ex-Mozilla projects moving?
23:33:26dumbgoy joins
23:49:23<thuban>repeating some requests related to old.dlib.me here, since they got lost in #archivebot:
23:49:33<thuban>https://transfer.archivete.am/AGArb/www.old.dlib.me-document-viewers-nom - yet another slightly different viewer url
23:49:45<thuban>https://transfer.archivete.am/ej3GO/www.old.dlib.me-item-pdfs - a small number of items available as pdf rather than through the document viewer
23:50:04<thuban>https://transfer.archivete.am/y7EDo/www.old.dlib.me-item-info-byname - item info pages, as linked from the library index (extracted from post xhr--not that we can duplicate that, but it's what external links are likely to be). media items like photos and videos are included in page assets
23:50:19<thuban>https://transfer.archivete.am/157ht6/www.old.dlib.me-item-info-byid - item info pages, by document id (this is the only way to see metadata for some items, mostly newspapers)
23:50:39<thuban>i believe that's everything that will actually work
23:56:23<@JAA>I'll run them shortly.
23:57:12Wohlstand1 (Wohlstand) joins
23:58:29nicolas17 quits [Read error: Connection reset by peer]
23:59:03nicolas17 joins
23:59:23Wohlstand quits [Ping timeout: 272 seconds]
23:59:23Wohlstand1 is now known as Wohlstand