00:01:10 | | etnguyen03 quits [Client Quit] |
00:01:12 | | Carnildo quits [Ping timeout: 265 seconds] |
00:01:52 | | etnguyen03 (etnguyen03) joins |
00:07:32 | | nertzy_ joins |
00:09:49 | | Carnildo joins |
00:11:39 | | etnguyen03 quits [Client Quit] |
00:12:40 | <nulldata> | !tell BenFranske Looks like there's at least some on IA but none recent. https://archive.org/details/twit-podcasts Was there an item of yours taken down? Probably should discuss in #internetarchive |
00:12:40 | <eggdrop> | [tell] ok, I'll tell BenFranske when they join next |
00:13:17 | | nertzy_ quits [Ping timeout: 265 seconds] |
00:16:11 | | Carnildo quits [Ping timeout: 265 seconds] |
00:17:32 | | Carnildo joins |
00:27:56 | | SootBector quits [Remote host closed the connection] |
00:28:21 | | SootBector (SootBector) joins |
00:32:48 | | nertzy_ joins |
00:53:23 | | DogsRNice joins |
01:08:31 | <fireonlive> | at least nulldata is useful |
01:09:25 | <nulldata> | Huh? |
01:40:44 | | michaelblob_ (michaelblob) joins |
01:44:38 | | michaelblob quits [Ping timeout: 265 seconds] |
01:45:03 | | michaelblob (michaelblob) joins |
01:47:19 | | michaelblob_ quits [Ping timeout: 255 seconds] |
01:53:52 | | tapos joins |
01:58:18 | | DogsRNice_ joins |
02:01:43 | | DogsRNice quits [Ping timeout: 255 seconds] |
02:07:16 | | Unholy23619246 (Unholy2361) joins |
02:10:44 | | Unholy2361924 quits [Ping timeout: 265 seconds] |
02:10:44 | | Unholy23619246 is now known as Unholy2361924 |
02:10:53 | | BenFranske joins |
02:10:54 | <eggdrop> | [tell] BenFranske: [2024-05-02T00:12:40Z] <nulldata> Looks like there's at least some on IA but none recent. https://archive.org/details/twit-podcasts Was there an item of yours taken down? Probably should discuss in #internetarchive |
02:13:59 | <BenFranske> | nulldata Yes, I was working on uploading everything from there and had about 7500 episodes uploaded of the 24500 episodes that they have published. Most of them have been pulled and the rest are probably going to get pulled soon I think. My account also got locked at IA just a bit ago. See my currently still available set at |
02:13:59 | <BenFranske> | https://archive.org/details/@benfranske (currently 870 items rather than the 7500+ that were there earlier today) |
02:15:25 | | ell (ell) joins |
02:24:29 | | Carnildo_again joins |
02:24:34 | | Carnildo quits [Read error: Connection reset by peer] |
02:26:49 | | grid joins |
02:37:31 | | gaz joins |
02:44:09 | | DogsRNice_ quits [Read error: Connection reset by peer] |
02:51:20 | | Carnildo_again quits [Read error: Connection reset by peer] |
02:52:26 | | Carnildo joins |
02:57:58 | | Carnildo quits [Read error: Connection reset by peer] |
02:58:08 | | Carnildo joins |
03:05:52 | | etnguyen03 (etnguyen03) joins |
03:11:29 | | Carnildo quits [Read error: Connection reset by peer] |
03:13:22 | | Carnildo joins |
03:13:22 | | Carnildo quits [Read error: Connection reset by peer] |
03:36:48 | | BenFranske quits [Client Quit] |
03:55:49 | | Carnildo joins |
04:01:21 | <@hook54321> | does anyone know if https://dumps.wikimedia.org/other/shorturls/ is dumped anywhere on a regular basis? it looks like they used to be dumped to IA but haven't been for a few years https://archive.org/details/shorturls-20200907 |
04:02:27 | | Carnildo quits [Remote host closed the connection] |
04:02:39 | | Carnildo joins |
04:20:29 | | Carnildo quits [Read error: Connection reset by peer] |
04:20:44 | | Carnildo joins |
04:23:57 | | nertzy_ quits [Client Quit] |
04:24:38 | | Carnildo quits [Read error: Connection reset by peer] |
04:24:48 | | Carnildo joins |
04:29:12 | | lennier2 joins |
04:32:01 | | lennier2_ quits [Ping timeout: 255 seconds] |
04:36:43 | | grid quits [Client Quit] |
04:38:09 | | Carnildo quits [Read error: Connection reset by peer] |
04:38:33 | | Carnildo joins |
04:40:04 | | shgaqnyrjp_ (shgaqnyrjp) joins |
04:42:10 | | shgaqnyrjp quits [Remote host closed the connection] |
04:49:08 | | Carnildo quits [Read error: Connection reset by peer] |
04:49:19 | | Carnildo joins |
04:53:33 | | shgaqnyrjp_ is now known as shgaqnyrjp |
04:57:47 | | Carnildo quits [Read error: Connection reset by peer] |
04:58:11 | | Carnildo joins |
04:59:43 | | kiryu__ joins |
05:00:07 | | Church quits [Quit: WeeChat info:version] |
05:02:37 | | kiryu_ quits [Ping timeout: 255 seconds] |
05:05:03 | | kiryu_ joins |
05:07:09 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
05:08:01 | | kiryu__ quits [Ping timeout: 255 seconds] |
05:11:59 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
05:12:27 | | benjinsmi joins |
05:12:54 | | etnguyen03 quits [Client Quit] |
05:13:35 | | benjins2_ joins |
05:14:01 | | Carnildo quits [Remote host closed the connection] |
05:14:04 | | Carnildo_again joins |
05:15:51 | | benjinsm quits [Ping timeout: 265 seconds] |
05:15:51 | | benjins2 quits [Ping timeout: 265 seconds] |
05:16:58 | | etnguyen03 (etnguyen03) joins |
05:17:24 | | Church (Church) joins |
05:31:30 | | Carnildo_again quits [Read error: Connection reset by peer] |
05:31:34 | | Carnildo joins |
05:34:13 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
05:34:53 | | etnguyen03 quits [Remote host closed the connection] |
05:40:10 | | kiryu_ quits [Remote host closed the connection] |
05:41:31 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
05:41:40 | | kiryu joins |
05:41:40 | | kiryu is now authenticated as kiryu |
05:41:40 | | kiryu quits [Changing host] |
05:41:40 | | kiryu (kiryu) joins |
05:45:17 | | shgaqnyrjp quits [Remote host closed the connection] |
05:45:33 | | Carnildo quits [Read error: Connection reset by peer] |
05:45:35 | | Carnildo joins |
05:45:54 | | shgaqnyrjp (shgaqnyrjp) joins |
05:53:01 | | Carnildo quits [Read error: Connection reset by peer] |
05:53:18 | | Carnildo joins |
06:01:29 | | shgaqnyrjp quits [Remote host closed the connection] |
06:02:05 | | shgaqnyrjp (shgaqnyrjp) joins |
06:02:44 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
06:05:12 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
06:05:43 | | no-n0rth joins |
06:07:06 | <no-n0rth> | Hey folks! I was looking at the Blingee archive, I'm looking for a file that has some of the stamp swf - would anyone here be familiar with the project? |
06:14:43 | <pokechu22> | Hmm, I don't know too much about blingee, but based on the information on https://wiki.archiveteam.org/index.php/Blingee someone here would probably be able to find it. If you have a URL then it'd be on web.archive.org; if you have something else I think that page has enough information on how to figure out the URL? |
06:15:39 | | BlueMaxima quits [Read error: Connection reset by peer] |
06:17:22 | <no-n0rth> | Thanks for the link! I followed that to the internet archive backups, but the files are huge lol and so far it seems most of them just have comments and gifs. I suspect the AES key was rotated, but I might try running the scraper tomorrow if I don't find a cdx that has swf files |
06:22:19 | | Carnildo quits [Read error: Connection reset by peer] |
06:22:27 | | Carnildo joins |
06:59:47 | | grid joins |
07:06:09 | | Unholy23619246 (Unholy2361) joins |
07:07:21 | | Doomaholic quits [Ping timeout: 272 seconds] |
07:07:34 | | Doomaholic (Doomaholic) joins |
07:09:55 | | Unholy2361924 quits [Ping timeout: 265 seconds] |
07:14:20 | | Arcorann_ joins |
07:19:26 | | Carnildo quits [Read error: Connection reset by peer] |
07:19:31 | | Carnildo joins |
07:23:33 | | Carnildo quits [Read error: Connection reset by peer] |
07:23:42 | | Carnildo joins |
07:31:32 | | Carnildo quits [Read error: Connection reset by peer] |
07:31:43 | | Carnildo joins |
07:36:15 | | Carnildo_again joins |
07:36:17 | | Carnildo quits [Remote host closed the connection] |
07:55:50 | | Carnildo_again quits [Read error: Connection reset by peer] |
07:56:01 | | Carnildo joins |
07:58:59 | | qwertyasdfuiopghjkl quits [Client Quit] |
08:05:32 | | shgaqnyrjp_ (shgaqnyrjp) joins |
08:07:51 | | shgaqnyrjp quits [Ping timeout: 250 seconds] |
08:08:17 | | SootBector quits [Ping timeout: 250 seconds] |
08:11:45 | | SootBector (SootBector) joins |
08:15:19 | | superkuh quits [Remote host closed the connection] |
08:20:57 | | superkuh joins |
08:28:28 | | Carnildo quits [Read error: Connection reset by peer] |
08:28:33 | | Carnildo joins |
08:39:26 | | Carnildo quits [Read error: Connection reset by peer] |
08:39:37 | | Carnildo joins |
08:39:50 | <lea> | say I want to archive a site that's behind a login wall. I could probably write a scraper for it. can I somehow upload the results to the web archive? |
08:40:44 | <lea> | site in question: https://usdb.animux.de/ (hosts synced song texts for karaoke apps) |
08:41:52 | | Carnildo quits [Read error: Connection reset by peer] |
08:41:55 | | Carnildo joins |
08:54:40 | | Carnildo quits [Read error: Connection reset by peer] |
08:54:44 | | Carnildo joins |
09:00:05 | | Bleo18260072 quits [Client Quit] |
09:00:44 | | Carnildo quits [Read error: Connection reset by peer] |
09:00:58 | | Carnildo joins |
09:01:23 | | Bleo18260072 joins |
09:05:47 | <katia> | lea, i think stuff that is behind login never goes to wayback machine |
09:05:55 | <katia> | but you/anyone can upload it to IA |
09:06:14 | <katia> | https://archive.org/developers/internetarchive/cli.html |
09:06:56 | <lea> | katia: is there a documentation on the preferred format for uploads? |
09:07:30 | <lea> | or should I just dump a zip file with all the current data of the site? what about new content? the site is still alive |
09:08:06 | <katia> | you probably want to design your scraper to be incremental then |
09:09:40 | | grid quits [Client Quit] |
09:11:42 | <lea> | yes |
09:12:03 | <lea> | since these are individual files, I guess I could just upload tens of thousands of individual files to the archive? |
09:13:24 | <katia> | maybe better for #internetarchive |
09:14:27 | <katia> | IA unpacks some .tar and maybe others, packing it/compressing it might make more sense than single files |
09:18:13 | | Doran quits [Ping timeout: 255 seconds] |
09:21:15 | | Carnildo quits [Read error: Connection reset by peer] |
09:21:18 | | Carnildo joins |
09:22:45 | | Doran (Doranwen) joins |
09:26:45 | | Doran quits [Remote host closed the connection] |
09:36:00 | <thuban> | lea: the best format for archival purposes is warc (you can upload warcs to the internet archive like any other item even though they don't go into the wayback machine). |
09:36:03 | <thuban> | i suggest using https://github.com/ArchiveTeam/grab-site/, which outputs warc and which you can configure to use your login cookies |
09:38:22 | <lea> | the page needs a JS-initiated HTTP POST to give out the data. I can also initiate it without JS. does the tool support a use case like that? |
09:41:39 | <lea> | thanks for the pointer btw |
09:43:47 | | Ruthalas59 quits [Ping timeout: 272 seconds] |
09:44:06 | | pabs quits [Ping timeout: 265 seconds] |
09:45:03 | <thuban> | lea: yes, you can use --wpull-args with wpull's --post options (see https://wpull.readthedocs.io/en/master/options.html) to send POST requests. that said, depending on the details this may become very inconvenient |
09:47:03 | | pabs (pabs) joins |
09:47:06 | | Ruthalas59 (Ruthalas) joins |
09:57:57 | | Doran (Doranwen) joins |
10:00:05 | | f_ (funderscore) joins |
10:00:56 | <thuban> | (since wpull uses the same post data for _all_ requests, worst-case, you may need to scrape the site once, process the output to determine what urls and post data you need for the txt downloads, and invoke grab-site on each individually in a loop. you can combine the results with eg warcat: https://github.com/chfoo/warcat) |
10:01:11 | <thuban> | (might still be quicker than writing your own scraper) |
10:14:55 | | Doran quits [Ping timeout: 255 seconds] |
10:50:11 | <Miori> | Did you guys see subscene closing down in 24 hours? https://forum.subscene.com/topic/subscene-is-closing-so-sorry |
10:56:16 | <joepie91|m> | well shit |
10:56:37 | <katia> | buttflare :| |
10:57:00 | | Doran (Doranwen) joins |
10:57:43 | <katia> | well not on subscene.com, just on forum? |
10:58:11 | <katia> | started an archivebot job for subscene.com |
11:07:43 | | f_ quits [Client Quit] |
11:07:49 | | SootBector quits [Remote host closed the connection] |
11:08:18 | | SootBector (SootBector) joins |
11:17:32 | <Miori> | https://www.reddit.com/r/DataHoarder/comments/1b5rxc2/subscenecom_full_dump/ and apparently https://subdl.com/ is mirroring data from subscene every hour |
11:24:37 | <katia> | nice |
11:44:44 | | thalia quits [Quit: Connection closed for inactivity] |
11:51:02 | | knecht4 quits [Client Quit] |
11:52:03 | | knecht4 joins |
12:03:57 | | nertzy_ joins |
12:09:27 | | Carnildo quits [Read error: Connection reset by peer] |
12:09:35 | | Carnildo joins |
12:23:24 | | Carnildo quits [Read error: Connection reset by peer] |
12:23:47 | | Carnildo joins |
12:42:49 | | jaxon joins |
12:44:15 | | jaxon quits [Client Quit] |
13:00:02 | | etnguyen03 (etnguyen03) joins |
13:14:55 | | Carnildo quits [Read error: Connection reset by peer] |
13:15:03 | | Carnildo joins |
13:20:59 | | Carnildo quits [Read error: Connection reset by peer] |
13:21:41 | | Carnildo joins |
13:27:04 | | Arcorann_ quits [Ping timeout: 255 seconds] |
13:28:00 | | nertzy_ quits [Client Quit] |
13:40:34 | | sonick quits [Client Quit] |
13:42:39 | | Carnildo quits [Read error: Connection reset by peer] |
13:42:43 | | Carnildo joins |
13:52:09 | | Carnildo quits [Read error: Connection reset by peer] |
13:52:15 | | Carnildo joins |
13:53:17 | | tapos quits [Client Quit] |
13:55:03 | | Wohlstand (Wohlstand) joins |
14:03:56 | | Carnildo quits [Read error: Connection reset by peer] |
14:04:02 | | Carnildo joins |
14:09:54 | | Carnildo quits [Read error: Connection reset by peer] |
14:10:02 | | Carnildo joins |
14:17:31 | | s-crypt quits [Quit: Ping timeout (120 seconds)] |
14:17:43 | | s-crypt (s-crypt) joins |
14:18:13 | | Carnildo quits [Read error: Connection reset by peer] |
14:18:28 | | Carnildo joins |
14:23:13 | | Carnildo_again joins |
14:23:17 | | Carnildo quits [Remote host closed the connection] |
14:26:44 | | Mateon1 quits [Quit: Mateon1] |
14:27:25 | | Mateon1 joins |
14:28:53 | | Carnildo_again quits [Read error: Connection reset by peer] |
14:29:07 | | Carnildo joins |
14:31:36 | | knecht4 quits [Client Quit] |
14:33:35 | | knecht4 joins |
15:16:28 | | f_ (funderscore) joins |
15:26:54 | | RealPerson joins |
15:28:15 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
15:37:28 | | etnguyen03 quits [Client Quit] |
15:38:09 | | etnguyen03 (etnguyen03) joins |
15:40:25 | | Carnildo quits [Read error: Connection reset by peer] |
15:40:29 | | Carnildo joins |
15:47:56 | | etnguyen03 quits [Client Quit] |
15:48:37 | | etnguyen03 (etnguyen03) joins |
15:53:04 | | Carnildo quits [Read error: Connection reset by peer] |
15:53:09 | | Carnildo joins |
15:56:28 | | Perk quits [Read error: Connection reset by peer] |
15:58:23 | | etnguyen03 quits [Client Quit] |
15:59:04 | | etnguyen03 (etnguyen03) joins |
16:08:50 | | etnguyen03 quits [Client Quit] |
16:10:20 | | Carnildo quits [Read error: Connection reset by peer] |
16:10:32 | | Carnildo joins |
16:20:43 | | Carnildo quits [Read error: Connection reset by peer] |
16:20:59 | | Carnildo joins |
16:27:41 | | JaffaCakes118 quits [Ping timeout: 265 seconds] |
16:30:01 | | Carnildo quits [Read error: Connection reset by peer] |
16:30:21 | | Carnildo joins |
16:36:20 | | Carnildo quits [Remote host closed the connection] |
16:36:51 | | Carnildo joins |
16:39:48 | <gaz> | hey peeps, i'm looking for some advice or tips: i want to download absolutely everything associated with a few domains from the wayback machine (all subdomains, images, js, css, etc etc etc). my initial investigations put what i want to grab at like 30 million urls, and would take like 6 months on one machine. i'm hoping you guys have info that |
16:39:49 | <gaz> | could help :) |
16:51:59 | | Carnildo quits [Read error: Connection reset by peer] |
16:52:12 | | Carnildo joins |
16:54:43 | <that_lurker> | gaz: Easiest might be to try and search for the domain in https://archive.fart.website/archivebot/viewer/?q=utu.fi and download the associated .warc.gz file |
16:54:59 | <that_lurker> | correction the link is https://archive.fart.website/archivebot/viewe |
16:55:34 | | that_lurker wonder how one can send the wrong link twice |
16:55:42 | <gaz> | ok i'll have a look |
16:55:43 | <gaz> | lol |
16:56:00 | <@JAA> | That will only work if it's an ArchiveBot crawl, of course. You wouldn't get snapshots from other sources etc. |
16:56:26 | <@JAA> | But yeah, if there is such a crawl, it's probably a good start. |
16:56:53 | <that_lurker> | yeah. forgot to mention that too.... The sudden summer heat in Finland is getting to me :-P |
16:59:48 | <Vokun> | Ah yes. With a high of just above freezing, i'd be sweating too |
17:00:37 | | Carnildo_again joins |
17:00:38 | | Carnildo quits [Read error: Connection reset by peer] |
17:02:14 | | f_ quits [Remote host closed the connection] |
17:03:00 | | f_ (funderscore) joins |
17:03:31 | | eightthree quits [Ping timeout: 255 seconds] |
17:03:33 | <Vokun> | Actually, sorry. It's too hot where I live too |
17:04:18 | | Carnildo_again quits [Remote host closed the connection] |
17:04:25 | | Carnildo joins |
17:04:52 | | eightthree joins |
17:09:01 | | f_ quits [Remote host closed the connection] |
17:10:07 | | RealPerson leaves |
17:10:32 | | RealPerson joins |
17:11:03 | | f_ (funderscore) joins |
17:11:42 | <that_lurker> | These are the first days when its starting to go over 10 C here during the day. Nights are still around 5 C and now days are somewhere close to 20 C |
17:14:06 | | Carnildo quits [Read error: Connection reset by peer] |
17:14:13 | | Carnildo joins |
17:14:37 | <Larsenv> | it's https://archive.fart.website/archivebot/viewer/ |
17:18:34 | <Vokun> | It goes from about 12-28 here from night to day. I run a fan at night from the window while I sleep cause it doesn't cool down till really late at night, so when I wake up i'm fridged. |
17:19:55 | | f_ quits [Ping timeout: 250 seconds] |
17:20:55 | | that_lurker watches for the looming gaze of JAA as the conversation has gone offtopic and wonders whether to continue or not :P |
17:21:02 | | f_ (funderscore) joins |
17:21:21 | <@JAA> | :-) |
17:26:02 | | Carnildo quits [Remote host closed the connection] |
17:26:09 | | Carnildo joins |
17:43:30 | | Island quits [Read error: Connection reset by peer] |
18:11:00 | | shgaqnyrjp_ is now known as shgaqnyrjp |
18:11:11 | | Carnildo quits [Read error: Connection reset by peer] |
18:11:14 | | Carnildo joins |
18:18:27 | | Notrealname1234 (Notrealname1234) joins |
18:20:10 | | Carnildo quits [Read error: Connection reset by peer] |
18:20:34 | | Carnildo joins |
18:22:48 | | etnguyen03 (etnguyen03) joins |
18:27:44 | | Carnildo quits [Read error: Connection reset by peer] |
18:27:49 | | Carnildo joins |
18:28:15 | | Notrealname1234 quits [Client Quit] |
18:36:53 | | Carnildo_again joins |
18:37:13 | | Notrealname1234 (Notrealname1234) joins |
18:37:34 | | Carnildo quits [Ping timeout: 255 seconds] |
18:48:25 | | sd (sd) joins |
18:48:33 | | Carnildo_again quits [Read error: Connection reset by peer] |
18:48:38 | | Carnildo joins |
18:49:53 | | Notrealname1234 quits [Client Quit] |
19:00:55 | | benah joins |
19:03:45 | | etnguyen03 quits [Client Quit] |
19:04:05 | | benah quits [Client Quit] |
19:05:26 | | Carnildo quits [Read error: Connection reset by peer] |
19:05:44 | | Carnildo joins |
19:08:28 | | Carnildo quits [Read error: Connection reset by peer] |
19:08:30 | | Carnildo joins |
19:12:52 | | Carnildo_again joins |
19:12:52 | | Carnildo quits [Read error: Connection reset by peer] |
19:19:31 | | Wohlstand quits [Client Quit] |
19:20:02 | | Carnildo_again quits [Read error: Connection reset by peer] |
19:22:06 | | Carnildo joins |
19:22:07 | | f_ quits [Ping timeout: 250 seconds] |
19:27:04 | | Carnildo quits [Ping timeout: 255 seconds] |
19:27:30 | | Carnildo joins |
19:36:18 | <Ryz> | Heya folks, does anyone wanna help me extract subdomains of http://htmlplanet.com/ ? I found loads of it through https://www.subdomain.center/ and might've found 900 of 'em and I'm planning to run 'em all in AB (can't be HTTPS curiously, it's HTTP only!) |
19:36:31 | <Ryz> | I'm...I'm trying to recall if there's a IRC channel dedicated to this <#>; |
19:37:08 | <Ryz> | I was initially going to say #webroasting - but that's specifically for ISP hosting websites |
19:57:41 | | Wohlstand (Wohlstand) joins |
20:08:55 | | kiryu quits [Ping timeout: 255 seconds] |
20:11:10 | | jasons quits [Ping timeout: 255 seconds] |
20:16:10 | | jasons (jasons) joins |
20:26:01 | | jasons quits [Ping timeout: 255 seconds] |
20:37:15 | | nertzy_ joins |
20:41:11 | | Wohlstand quits [Client Quit] |
20:43:22 | <that_lurker> | Ryz: Quick scan found 610. Most likely the same you already got though https://transfer.archivete.am/inline/Gd4Ub/htmlplanetsubdomains.txt |
20:44:27 | | sec^nd quits [Ping timeout: 250 seconds] |
20:46:24 | | etnguyen03 (etnguyen03) joins |
20:50:44 | | sec^nd (second) joins |
20:55:36 | | Webuser536 joins |
20:56:32 | | no-n0rth quits [Client Quit] |
20:57:26 | <Webuser536> | If this is the chat to be talking about this, is there a way of properly using wget to download files from the Wayback Machine? |
21:06:45 | <Ryz> | that_lurker, this is from WBM CDX I assume? oo; |
21:07:22 | <that_lurker> | Got those by doing a scan with Sublist3r |
21:07:45 | <Ryz> | Hello Webuser536, please go to #internetarchive for a better chance of your question being answered |
21:08:48 | <Ryz> | that_lurker, go for a WBM CDX please if you can, there might be more subdomains there |
21:16:02 | | etnguyen03 quits [Client Quit] |
21:16:06 | <that_lurker> | Ryz, Not finding anything at least with https://web.archive.org/cdx/search/cdx?url=*.htmlplanet.com/&matchType=domain |
21:18:16 | <Ryz> | Hmm, there has to be more... :C |
21:24:46 | <that_lurker> | You could maybe do some hardcore bruteforcing, but that would take a while |
21:27:26 | | Island joins |
21:34:47 | | Notrealname1234 (Notrealname1234) joins |
21:40:43 | | pedantic-darwin quits [Ping timeout: 255 seconds] |
21:52:24 | <pokechu22> | try https://web.archive.org/cdx/search/cdx?url=htmlplanet.com&matchType=domain&collapse=urlkey&fl=original&limit=10000&showResumeKey=1&resumeKey= |
22:01:33 | | shgaqnyrjp quits [Remote host closed the connection] |
22:01:43 | <that_lurker> | oh that found a lot |
22:01:59 | | shgaqnyrjp (shgaqnyrjp) joins |
22:03:38 | | pedantic-darwin joins |
22:09:14 | <pokechu22> | yeah, and you can copy the thing at the bottom and put it into the resumeKey parameter to get more |
22:11:08 | <@JAA> | 60% of the time, it works every time! |
22:13:08 | <fireonlive> | *JAA CDX api flashback horror stories* |
22:14:43 | <Notrealname1234> | "JAA" CDX api! |
22:15:01 | <@JAA> | Also, little-things/ia-cdx-search is a thing. :-) |
22:17:17 | <@JAA> | I guess it might work fine in this case. |
22:17:35 | <@JAA> | The resumeKey-based pagination, I mean. |
22:20:59 | <that_lurker|m> | I looked at, little-things/ia-cdx-search today and totally forgot about it when i needed it :-) |
22:25:33 | | Notrealname1234 quits [Client Quit] |
22:26:01 | | Notrealname1234 (Notrealname1234) joins |
22:27:30 | <that_lurker|m> | JAA thanks for making those amazing scripts available |
22:27:58 | <Notrealname1234> | Wonderful scripts |
22:28:08 | <@JAA> | :-) |
22:28:13 | <that_lurker|m> | JAA++ |
22:28:14 | <eggdrop> | [karma] 'JAA' now has 37 karma! |
23:03:31 | | Notrealname1234 quits [Client Quit] |
23:23:43 | | sec^nd quits [Remote host closed the connection] |
23:24:06 | | sec^nd (second) joins |
23:26:07 | | lunik1 quits [Client Quit] |
23:26:55 | | lunik1 joins |
23:39:21 | | Guest77 joins |
23:39:48 | <Guest77> | Hello! what is the best way to handle '.warc' files? i have tested a bit the 'grab-site' program but i am clueless on how to treat the .warc file as an 'extractible' file. I would like to see and select which files to extract as one usually does with .zip and other compressed files. zless shows the raw data but it is not the best way |
23:50:24 | | Island quits [Read error: Connection reset by peer] |