00:30:34 | | etnguyen03 quits [Client Quit] |
00:32:37 | | loug4 quits [Client Quit] |
00:38:45 | | GNU_world quits [Ping timeout: 272 seconds] |
01:08:46 | <@JAA> | boet: Sounds fun, and comprehensive DNS datasets are hard to come by! So please release the raw data in bulk, too. I'd assume it won't be very large, and being textual, it'd compress very well, so an upload to IA would likely be a good idea. |
01:37:50 | | shgaqnyrjp_ (shgaqnyrjp) joins |
01:40:19 | | shgaqnyrjp quits [Ping timeout: 260 seconds] |
01:46:12 | | lemuria quits [Remote host closed the connection] |
01:47:42 | | lemuria (lemuria) joins |
01:52:12 | | etnguyen03 (etnguyen03) joins |
02:07:44 | | shgaqnyrjp_ is now known as shgaqnyrjp |
02:09:05 | | lennier2 quits [Read error: Connection reset by peer] |
02:09:23 | | lennier2 joins |
02:31:13 | | etnguyen03 quits [Remote host closed the connection] |
02:58:34 | | Wohlstand quits [Client Quit] |
03:03:30 | | Island quits [Read error: Connection reset by peer] |
03:08:21 | | Wohlstand (Wohlstand) joins |
03:39:44 | | sec^nd quits [Remote host closed the connection] |
03:40:11 | | sec^nd (second) joins |
03:55:29 | | JaffaCakes118 quits [Remote host closed the connection] |
03:55:34 | | JaffaCakes118_2 (JaffaCakes118) joins |
04:32:46 | | nicknamesarehard quits [Client Quit] |
05:11:04 | <pabs> | https://www.gamingonlinux.com/2024/07/humble-games-confirmed-a-restructuring-of-operations-with-reports-of-all-staff-gone/ |
05:24:55 | <that_lurker> | off |
05:25:33 | <that_lurker> | s/off/oof |
05:36:45 | | JaffaCakes118_2 quits [Remote host closed the connection] |
05:41:19 | | JaffaCakes118 (JaffaCakes118) joins |
06:05:10 | | nepeat quits [Ping timeout: 255 seconds] |
06:06:58 | | Coderjo quits [Ping timeout: 255 seconds] |
06:07:27 | | lun4 quits [Ping timeout: 272 seconds] |
06:07:27 | | ave quits [Ping timeout: 272 seconds] |
06:13:43 | | wickedplayer494 quits [Ping timeout: 255 seconds] |
06:13:45 | | Coderjo_ joins |
06:14:09 | | wickedplayer494 joins |
06:14:41 | | lun4 (lun4) joins |
06:14:46 | | ave (ave) joins |
06:17:06 | | nepeat (nepeat) joins |
06:20:23 | <@OrIdow6> | !tell kerim What do you want to tell us about https://www.animekalesi.com ? |
06:20:25 | <eggdrop> | [tell] ok, I'll tell kerim when they join next |
06:24:59 | | BlueMaxima quits [Read error: Connection reset by peer] |
06:56:47 | | JaffaCakes118 quits [Remote host closed the connection] |
06:57:14 | | JaffaCakes118 (JaffaCakes118) joins |
07:05:44 | | Unholy236192464537713 quits [Remote host closed the connection] |
07:06:14 | | Unholy236192464537713 (Unholy2361) joins |
07:07:49 | | loug4 joins |
07:28:52 | | Dango360 quits [Ping timeout: 255 seconds] |
08:13:18 | <h2ibot> | Exorcism edited Mailman/2 (+0): https://wiki.archiveteam.org/?diff=53142&oldid=53141 |
08:27:32 | | Dango360 (Dango360) joins |
08:34:39 | | efi joins |
08:36:22 | <h2ibot> | Exorcism edited MoinMoin (+0): https://wiki.archiveteam.org/?diff=53143&oldid=53140 |
08:48:49 | | Dango360 quits [Client Quit] |
09:00:01 | | Bleo1826007227196 quits [Client Quit] |
09:01:19 | | Bleo1826007227196 joins |
09:02:21 | | Dango360 (Dango360) joins |
09:06:49 | | _Dango360 (Dango360) joins |
09:07:01 | | _Dango360 quits [Remote host closed the connection] |
09:07:05 | | Dango360 quits [Client Quit] |
09:08:27 | | Dango360 (Dango360) joins |
09:08:29 | | JaffaCakes118_2 (JaffaCakes118) joins |
09:11:06 | | JaffaCakes118 quits [Remote host closed the connection] |
09:11:27 | | michaelblob quits [Read error: Connection reset by peer] |
09:14:24 | | michaelblob (michaelblob) joins |
09:15:37 | | michaelblob quits [Read error: Connection reset by peer] |
09:17:52 | | andybak joins |
09:40:55 | | grid joins |
10:23:27 | | Dango360 quits [Read error: Connection reset by peer] |
11:00:02 | | Bleo1826007227196 quits [Client Quit] |
11:01:25 | | Bleo1826007227196 joins |
11:03:07 | <andybak> | I wonder if I could get some guidance. I'm trying to retrieve 150 x 50gb warc.gz files from archive.org and it's going very slowly. Also the extraction from the warcs is super slow (lots of small files). We're trying to make all of Google Poly available again and this is one of our road blocks. |
11:03:08 | <andybak> | I'm not entirely sure what I'm asking - but is there anything I could be doing differently? |
11:04:55 | <andybak> | (for clarification there are two related projects: https://polygone.art and https://poly.pizza who were involved in the initial scrape but they've chosen only to make a subset of the files available and we specifically want to make them available more comprehensively) |
11:50:50 | | grid quits [Client Quit] |
11:52:19 | | SkilledAlpaca quits [Client Quit] |
11:53:32 | | SkilledAlpaca joins |
12:09:03 | | FalconK quits [Quit: WeeChat 4.0.4] |
12:09:45 | | FalconK (FalconK) joins |
12:22:16 | | FalconK quits [Ping timeout: 255 seconds] |
12:27:32 | | FalconK (FalconK) joins |
12:49:20 | | icedice quits [Client Quit] |
13:13:11 | <@arkiver> | andybak: are you downloading this concurrently? |
13:34:07 | | lemuria quits [Remote host closed the connection] |
13:34:29 | | lemuria (lemuria) joins |
14:10:06 | | imer quits [Quit: Oh no] |
14:10:43 | | sludge_ quits [Ping timeout: 255 seconds] |
14:10:55 | | imer (imer) joins |
14:18:57 | | Jens leaves |
14:19:11 | | Jens (JensRex) joins |
14:32:18 | <andybak> | arkiver - no. I'm using aria2c which I think is using 16 separate connections |
14:32:24 | <andybak> | arkiver - I'm using aria2c which I think is using 16 separate connections |
14:32:59 | <andybak> | aria_cmd = "aria2c -c -s 16 -x 16 {0}" |
14:38:15 | <andybak> | hmmmm. i just switched to an SSD on USB 3.1 and it seems a lot better. Might just be a crappy USB port or spinning disk |
14:41:49 | <nimaje> | as far as I know, multiple connections is concurrently |
14:44:38 | <@JAA> | You may get better throughput if you download multiple files in parallel rather than a single file with multiple connections. I'm not sure if aria2c supports the former at all, but with the options above, it definitely does the latter. |
14:45:29 | <andybak> | I can always launch multiple instances of aria. i'll play around. |
14:45:33 | <@JAA> | Right, --max-concurrent-downloads aka -j. |
14:46:08 | <andybak> | Is grabbing the warcs themselves the right thing to do here? Instead of - I dunno - grabbing the file contents directly from wayback urls? |
14:46:30 | <@JAA> | IA has two copies of each item, and each copy is on a single HDD. So by going highly parallel, those two HDDs get very sad with seeking. |
14:47:04 | <@JAA> | At that magnitude, downloading the WARCs is the right approach. Whether unpacking them makes sense depends on how you'll use them. |
14:47:34 | <andybak> | what are the alternatives to unpacking them? treating them like a virtual file system and grabbing files as needed? |
14:47:49 | <andybak> | I hadn't even thought of that. I guess I need to test the overhead of that approach. |
14:48:13 | <@JAA> | Yeah, a custom self-hosted Wayback Machine, if you will. |
14:48:36 | <andybak> | I'm usually iterating through 1000s of small files rapidly collating metadata. |
14:48:55 | <@JAA> | There's pywb and openwayback, but not sure they're appropriate for this use case. |
14:51:47 | | icedice (icedice) joins |
15:03:47 | <andybak> | yeah. I think i'm ok now i've realised that the bottleneck isn't actually the download! i've never had broadband fast enough before that it wasn't the limiting factor |
15:05:24 | | Wohlstand quits [Client Quit] |
15:49:18 | | hexagonwin quits [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in] |
16:56:37 | | Doranwen quits [Remote host closed the connection] |
16:57:00 | | Doranwen (Doranwen) joins |
17:00:42 | | Island joins |
17:03:37 | | JaffaCakes118_2 quits [Remote host closed the connection] |
17:04:04 | | JaffaCakes118_2 (JaffaCakes118) joins |
17:22:34 | | loug4 quits [Client Quit] |
17:24:43 | | loug4 joins |
17:52:12 | | loug4 quits [Client Quit] |
17:54:13 | | loug4 joins |
18:04:28 | | loug45 joins |
18:05:31 | | loug4 quits [Read error: Connection reset by peer] |
18:05:44 | | loug4 joins |
18:10:05 | | loug45 quits [Ping timeout: 272 seconds] |
18:19:29 | | loug42 joins |
18:23:23 | | loug4 quits [Ping timeout: 272 seconds] |
18:37:16 | | loug423 joins |
18:39:31 | | loug42 quits [Read error: Connection reset by peer] |
18:39:33 | | loug426 joins |
18:43:25 | | loug423 quits [Ping timeout: 255 seconds] |
18:50:07 | | boet quits [Client Quit] |
19:01:10 | | michaelblob (michaelblob) joins |
19:10:20 | | Dango360 (Dango360) joins |
19:17:13 | | flotwig quits [Ping timeout: 272 seconds] |
19:21:11 | <yarrow_alt> | AppleVis feels like a particularly important closure: 1) underserved community that has relied heavily on this resource, 2) lack of any clear alternative, 3) site was influential to Apple employees and even management, and 4) since the community is blind and visually impaired users, special care may need to be taken to ensure the archive of the site works with screen readers or other accessibility tools. |
19:21:31 | | flotwig joins |
19:31:09 | | SkilledAlpaca quits [Ping timeout: 272 seconds] |
19:57:23 | | loug42 joins |
20:01:16 | | loug426 quits [Ping timeout: 255 seconds] |
20:04:03 | | SkilledAlpaca joins |
20:24:13 | | matoro quits [Ping timeout: 255 seconds] |
20:34:53 | <pokechu22> | Relating to that, I assume downloading a WARC from an item where other WARCs are currently being uploaded would run into that same seeking problem? |
20:35:40 | <pokechu22> | or derived, I guess |
20:36:54 | <@JAA> | Derives run on a separate machine. With archive.php tasks, it could happen, yeah. |
20:41:37 | <fireonlive> | https://x.com/bokieiey/status/1818506690826059827 hope someone archives that lol |
20:41:37 | <eggdrop> | nitter: https://nitter.lucabased.xyz/bokieiey/status/1818506690826059827 |
20:52:33 | | systwi__ is now known as systwi |
20:57:26 | <@JAA> | Only one left: https://www.ebay.com/itm/266337902355 |
20:58:08 | <@JAA> | Well, one lot of 10, I guess. |
21:04:06 | | etnguyen03 (etnguyen03) joins |
21:05:31 | <fireonlive> | ah damn, gone. |
21:05:31 | | Medowar quits [Ping timeout: 272 seconds] |
21:10:16 | | JaffaCakes118_2 quits [Remote host closed the connection] |
21:10:43 | | JaffaCakes118_2 (JaffaCakes118) joins |
21:13:59 | | Medowar joins |
21:18:57 | | Aoede quits [Quit: ZNC - https://znc.in] |
21:19:38 | | etnguyen03 quits [Client Quit] |
21:46:17 | | DogsRNice joins |
21:49:58 | | etnguyen03 (etnguyen03) joins |
21:55:33 | | matoro joins |
22:32:36 | <fireonlive> | JAA++ |
22:32:36 | <eggdrop> | [karma] 'JAA' now has 87 karma! |
22:32:40 | <fireonlive> | i now have more channel space |
22:44:03 | | BlueMaxima joins |
22:45:32 | <@OrIdow6> | yarrow_alt: We're already archiving it |
22:45:46 | <@OrIdow6> | What do you mean that "special care may need to be taken to ensure the archive of the site works..."? |
22:45:51 | <@OrIdow6> | What specifically? |
22:49:35 | <fireonlive> | oh right, i should compile a list and remove them from eggdrop's channelfile and firebot's database.. |
22:49:51 | <fireonlive> | (eggdrop will keep trying to join forever) |
22:50:37 | <h2ibot> | JustAnotherArchivist edited Gfycat (+23): https://wiki.archiveteam.org/?diff=53144&oldid=50940 |
22:51:37 | <h2ibot> | JustAnotherArchivist edited Operation London Bridge (-1): https://wiki.archiveteam.org/?diff=53145&oldid=48983 |
22:51:38 | <h2ibot> | JustAnotherArchivist edited V Live (+23): https://wiki.archiveteam.org/?diff=53146&oldid=50784 |
22:51:39 | <h2ibot> | JustAnotherArchivist edited BuzzVideo (+23): https://wiki.archiveteam.org/?diff=53147&oldid=49421 |
22:51:40 | <h2ibot> | JustAnotherArchivist edited Pandora.tv (+23): https://wiki.archiveteam.org/?diff=53148&oldid=49429 |
22:52:37 | <h2ibot> | JustAnotherArchivist edited Revue (+23): https://wiki.archiveteam.org/?diff=53149&oldid=49496 |
22:52:38 | <h2ibot> | JustAnotherArchivist edited Egloos (+23): https://wiki.archiveteam.org/?diff=53150&oldid=50983 |
22:52:39 | <h2ibot> | JustAnotherArchivist edited Skyblog (+23): https://wiki.archiveteam.org/?diff=53151&oldid=50550 |
22:52:40 | <h2ibot> | JustAnotherArchivist edited Tiki (+23): https://wiki.archiveteam.org/?diff=53152&oldid=50217 |
22:53:38 | <h2ibot> | JustAnotherArchivist edited ЯRUS (+23): https://wiki.archiveteam.org/?diff=53153&oldid=50113 |
22:53:39 | <h2ibot> | JustAnotherArchivist edited Wysp (+23): https://wiki.archiveteam.org/?diff=53154&oldid=50982 |
22:53:40 | <h2ibot> | JustAnotherArchivist edited Xuite (+23): https://wiki.archiveteam.org/?diff=53155&oldid=50631 |
22:53:41 | <h2ibot> | JustAnotherArchivist edited ZOWA (+13): https://wiki.archiveteam.org/?diff=53156&oldid=50923 |
23:10:18 | | etnguyen03 quits [Client Quit] |
23:10:55 | | wickedplayer494 is now authenticated as wickedplayer494 |
23:21:41 | | lflare quits [Ping timeout: 272 seconds] |
23:27:09 | | loug42 quits [Client Quit] |