00:30:34etnguyen03 quits [Client Quit]
00:32:37loug4 quits [Client Quit]
00:38:45GNU_world quits [Ping timeout: 272 seconds]
01:08:46<@JAA>boet: Sounds fun, and comprehensive DNS datasets are hard to come by! So please release the raw data in bulk, too. I'd assume it won't be very large, and being textual, it'd compress very well, so an upload to IA would likely be a good idea.
01:37:50shgaqnyrjp_ (shgaqnyrjp) joins
01:40:19shgaqnyrjp quits [Ping timeout: 260 seconds]
01:46:12lemuria quits [Remote host closed the connection]
01:47:42lemuria (lemuria) joins
01:52:12etnguyen03 (etnguyen03) joins
02:07:44shgaqnyrjp_ is now known as shgaqnyrjp
02:09:05lennier2 quits [Read error: Connection reset by peer]
02:09:23lennier2 joins
02:31:13etnguyen03 quits [Remote host closed the connection]
02:58:34Wohlstand quits [Client Quit]
03:03:30Island quits [Read error: Connection reset by peer]
03:08:21Wohlstand (Wohlstand) joins
03:39:44sec^nd quits [Remote host closed the connection]
03:40:11sec^nd (second) joins
03:55:29JaffaCakes118 quits [Remote host closed the connection]
03:55:34JaffaCakes118_2 (JaffaCakes118) joins
04:32:46nicknamesarehard quits [Client Quit]
05:11:04<pabs>https://www.gamingonlinux.com/2024/07/humble-games-confirmed-a-restructuring-of-operations-with-reports-of-all-staff-gone/
05:24:55<that_lurker>off
05:25:33<that_lurker>s/off/oof
05:36:45JaffaCakes118_2 quits [Remote host closed the connection]
05:41:19JaffaCakes118 (JaffaCakes118) joins
06:05:10nepeat quits [Ping timeout: 255 seconds]
06:06:58Coderjo quits [Ping timeout: 255 seconds]
06:07:27lun4 quits [Ping timeout: 272 seconds]
06:07:27ave quits [Ping timeout: 272 seconds]
06:13:43wickedplayer494 quits [Ping timeout: 255 seconds]
06:13:45Coderjo_ joins
06:14:09wickedplayer494 joins
06:14:41lun4 (lun4) joins
06:14:46ave (ave) joins
06:17:06nepeat (nepeat) joins
06:20:23<@OrIdow6>!tell kerim What do you want to tell us about https://www.animekalesi.com ?
06:20:25<eggdrop>[tell] ok, I'll tell kerim when they join next
06:24:59BlueMaxima quits [Read error: Connection reset by peer]
06:56:47JaffaCakes118 quits [Remote host closed the connection]
06:57:14JaffaCakes118 (JaffaCakes118) joins
07:05:44Unholy236192464537713 quits [Remote host closed the connection]
07:06:14Unholy236192464537713 (Unholy2361) joins
07:07:49loug4 joins
07:28:52Dango360 quits [Ping timeout: 255 seconds]
08:13:18<h2ibot>Exorcism edited Mailman/2 (+0): https://wiki.archiveteam.org/?diff=53142&oldid=53141
08:27:32Dango360 (Dango360) joins
08:34:39efi joins
08:36:22<h2ibot>Exorcism edited MoinMoin (+0): https://wiki.archiveteam.org/?diff=53143&oldid=53140
08:48:49Dango360 quits [Client Quit]
09:00:01Bleo1826007227196 quits [Client Quit]
09:01:19Bleo1826007227196 joins
09:02:21Dango360 (Dango360) joins
09:06:49_Dango360 (Dango360) joins
09:07:01_Dango360 quits [Remote host closed the connection]
09:07:05Dango360 quits [Client Quit]
09:08:27Dango360 (Dango360) joins
09:08:29JaffaCakes118_2 (JaffaCakes118) joins
09:11:06JaffaCakes118 quits [Remote host closed the connection]
09:11:27michaelblob quits [Read error: Connection reset by peer]
09:14:24michaelblob (michaelblob) joins
09:15:37michaelblob quits [Read error: Connection reset by peer]
09:17:52andybak joins
09:40:55grid joins
10:23:27Dango360 quits [Read error: Connection reset by peer]
11:00:02Bleo1826007227196 quits [Client Quit]
11:01:25Bleo1826007227196 joins
11:03:07<andybak>I wonder if I could get some guidance. I'm trying to retrieve 150 x 50gb warc.gz files from archive.org and it's going very slowly. Also the extraction from the warcs is super slow (lots of small files). We're trying to make all of Google Poly available again and this is one of our road blocks.
11:03:08<andybak>I'm not entirely sure what I'm asking - but is there anything I could be doing differently?
11:04:55<andybak>(for clarification there are two related projects: https://polygone.art and https://poly.pizza who were involved in the initial scrape but they've chosen only to make a subset of the files available and we specifically want to make them available more comprehensively)
11:50:50grid quits [Client Quit]
11:52:19SkilledAlpaca quits [Client Quit]
11:53:32SkilledAlpaca joins
12:09:03FalconK quits [Quit: WeeChat 4.0.4]
12:09:45FalconK (FalconK) joins
12:22:16FalconK quits [Ping timeout: 255 seconds]
12:27:32FalconK (FalconK) joins
12:49:20icedice quits [Client Quit]
13:13:11<@arkiver>andybak: are you downloading this concurrently?
13:34:07lemuria quits [Remote host closed the connection]
13:34:29lemuria (lemuria) joins
14:10:06imer quits [Quit: Oh no]
14:10:43sludge_ quits [Ping timeout: 255 seconds]
14:10:55imer (imer) joins
14:18:57Jens leaves
14:19:11Jens (JensRex) joins
14:32:18<andybak>arkiver - no. I'm using aria2c which I think is using 16 separate connections
14:32:24<andybak>arkiver - I'm using aria2c which I think is using 16 separate connections
14:32:59<andybak>aria_cmd = "aria2c -c -s 16 -x 16 {0}"
14:38:15<andybak>hmmmm. i just switched to an SSD on USB 3.1 and it seems a lot better. Might just be a crappy USB port or spinning disk
14:41:49<nimaje>as far as I know, multiple connections is concurrently
14:44:38<@JAA>You may get better throughput if you download multiple files in parallel rather than a single file with multiple connections. I'm not sure if aria2c supports the former at all, but with the options above, it definitely does the latter.
14:45:29<andybak>I can always launch multiple instances of aria. i'll play around.
14:45:33<@JAA>Right, --max-concurrent-downloads aka -j.
14:46:08<andybak>Is grabbing the warcs themselves the right thing to do here? Instead of - I dunno - grabbing the file contents directly from wayback urls?
14:46:30<@JAA>IA has two copies of each item, and each copy is on a single HDD. So by going highly parallel, those two HDDs get very sad with seeking.
14:47:04<@JAA>At that magnitude, downloading the WARCs is the right approach. Whether unpacking them makes sense depends on how you'll use them.
14:47:34<andybak>what are the alternatives to unpacking them? treating them like a virtual file system and grabbing files as needed?
14:47:49<andybak>I hadn't even thought of that. I guess I need to test the overhead of that approach.
14:48:13<@JAA>Yeah, a custom self-hosted Wayback Machine, if you will.
14:48:36<andybak>I'm usually iterating through 1000s of small files rapidly collating metadata.
14:48:55<@JAA>There's pywb and openwayback, but not sure they're appropriate for this use case.
14:51:47icedice (icedice) joins
15:03:47<andybak>yeah. I think i'm ok now i've realised that the bottleneck isn't actually the download! i've never had broadband fast enough before that it wasn't the limiting factor
15:05:24Wohlstand quits [Client Quit]
15:49:18hexagonwin quits [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
16:56:37Doranwen quits [Remote host closed the connection]
16:57:00Doranwen (Doranwen) joins
17:00:42Island joins
17:03:37JaffaCakes118_2 quits [Remote host closed the connection]
17:04:04JaffaCakes118_2 (JaffaCakes118) joins
17:22:34loug4 quits [Client Quit]
17:24:43loug4 joins
17:52:12loug4 quits [Client Quit]
17:54:13loug4 joins
18:04:28loug45 joins
18:05:31loug4 quits [Read error: Connection reset by peer]
18:05:44loug4 joins
18:10:05loug45 quits [Ping timeout: 272 seconds]
18:19:29loug42 joins
18:23:23loug4 quits [Ping timeout: 272 seconds]
18:37:16loug423 joins
18:39:31loug42 quits [Read error: Connection reset by peer]
18:39:33loug426 joins
18:43:25loug423 quits [Ping timeout: 255 seconds]
18:50:07boet quits [Client Quit]
19:01:10michaelblob (michaelblob) joins
19:10:20Dango360 (Dango360) joins
19:17:13flotwig quits [Ping timeout: 272 seconds]
19:21:11<yarrow_alt>AppleVis feels like a particularly important closure: 1) underserved community that has relied heavily on this resource, 2) lack of any clear alternative, 3) site was influential to Apple employees and even management, and 4) since the community is blind and visually impaired users, special care may need to be taken to ensure the archive of the site works with screen readers or other accessibility tools.
19:21:31flotwig joins
19:31:09SkilledAlpaca quits [Ping timeout: 272 seconds]
19:57:23loug42 joins
20:01:16loug426 quits [Ping timeout: 255 seconds]
20:04:03SkilledAlpaca joins
20:24:13matoro quits [Ping timeout: 255 seconds]
20:34:53<pokechu22>Relating to that, I assume downloading a WARC from an item where other WARCs are currently being uploaded would run into that same seeking problem?
20:35:40<pokechu22>or derived, I guess
20:36:54<@JAA>Derives run on a separate machine. With archive.php tasks, it could happen, yeah.
20:41:37<fireonlive>https://x.com/bokieiey/status/1818506690826059827 hope someone archives that lol
20:41:37<eggdrop>nitter: https://nitter.lucabased.xyz/bokieiey/status/1818506690826059827
20:52:33systwi__ is now known as systwi
20:57:26<@JAA>Only one left: https://www.ebay.com/itm/266337902355
20:58:08<@JAA>Well, one lot of 10, I guess.
21:04:06etnguyen03 (etnguyen03) joins
21:05:31<fireonlive>ah damn, gone.
21:05:31Medowar quits [Ping timeout: 272 seconds]
21:10:16JaffaCakes118_2 quits [Remote host closed the connection]
21:10:43JaffaCakes118_2 (JaffaCakes118) joins
21:13:59Medowar joins
21:18:57Aoede quits [Quit: ZNC - https://znc.in]
21:19:38etnguyen03 quits [Client Quit]
21:46:17DogsRNice joins
21:49:58etnguyen03 (etnguyen03) joins
21:55:33matoro joins
22:32:36<fireonlive>JAA++
22:32:36<eggdrop>[karma] 'JAA' now has 87 karma!
22:32:40<fireonlive>i now have more channel space
22:44:03BlueMaxima joins
22:45:32<@OrIdow6>yarrow_alt: We're already archiving it
22:45:46<@OrIdow6>What do you mean that "special care may need to be taken to ensure the archive of the site works..."?
22:45:51<@OrIdow6>What specifically?
22:49:35<fireonlive>oh right, i should compile a list and remove them from eggdrop's channelfile and firebot's database..
22:49:51<fireonlive>(eggdrop will keep trying to join forever)
22:50:37<h2ibot>JustAnotherArchivist edited Gfycat (+23): https://wiki.archiveteam.org/?diff=53144&oldid=50940
22:51:37<h2ibot>JustAnotherArchivist edited Operation London Bridge (-1): https://wiki.archiveteam.org/?diff=53145&oldid=48983
22:51:38<h2ibot>JustAnotherArchivist edited V Live (+23): https://wiki.archiveteam.org/?diff=53146&oldid=50784
22:51:39<h2ibot>JustAnotherArchivist edited BuzzVideo (+23): https://wiki.archiveteam.org/?diff=53147&oldid=49421
22:51:40<h2ibot>JustAnotherArchivist edited Pandora.tv (+23): https://wiki.archiveteam.org/?diff=53148&oldid=49429
22:52:37<h2ibot>JustAnotherArchivist edited Revue (+23): https://wiki.archiveteam.org/?diff=53149&oldid=49496
22:52:38<h2ibot>JustAnotherArchivist edited Egloos (+23): https://wiki.archiveteam.org/?diff=53150&oldid=50983
22:52:39<h2ibot>JustAnotherArchivist edited Skyblog (+23): https://wiki.archiveteam.org/?diff=53151&oldid=50550
22:52:40<h2ibot>JustAnotherArchivist edited Tiki (+23): https://wiki.archiveteam.org/?diff=53152&oldid=50217
22:53:38<h2ibot>JustAnotherArchivist edited ЯRUS (+23): https://wiki.archiveteam.org/?diff=53153&oldid=50113
22:53:39<h2ibot>JustAnotherArchivist edited Wysp (+23): https://wiki.archiveteam.org/?diff=53154&oldid=50982
22:53:40<h2ibot>JustAnotherArchivist edited Xuite (+23): https://wiki.archiveteam.org/?diff=53155&oldid=50631
22:53:41<h2ibot>JustAnotherArchivist edited ZOWA (+13): https://wiki.archiveteam.org/?diff=53156&oldid=50923
23:10:18etnguyen03 quits [Client Quit]
23:21:41lflare quits [Ping timeout: 272 seconds]
23:27:09loug42 quits [Client Quit]