07:47:23 | | Maturion joins |
08:36:59 | | Maturion quits [Remote host closed the connection] |
12:24:26 | | shreyasminocha quits [Remote host closed the connection] |
12:24:32 | | shreyasminocha (shreyasminocha) joins |
14:10:33 | | kiryu joins |
14:10:33 | | kiryu is now authenticated as kiryu |
14:10:33 | | kiryu quits [Changing host] |
14:10:33 | | kiryu (kiryu) joins |
17:28:30 | | Maturion joins |
18:06:58 | | pabs quits [Ping timeout: 255 seconds] |
18:08:10 | | pabs (pabs) joins |
18:31:43 | | systwi quits [Ping timeout: 255 seconds] |
18:44:59 | | systwi (systwi) joins |
19:23:28 | | tzt quits [Ping timeout: 255 seconds] |
21:10:22 | | Maturion quits [Remote host closed the connection] |
21:45:17 | <imer> | JAA: https://transfer.archivete.am/KEQpz/www.people.vcu.edu.txt this is what I got from common crawl cdx, might be dupes of IA cdx data |
21:45:17 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/KEQpz/www.people.vcu.edu.txt |
21:58:22 | <@JAA> | imer: Thanks, looks like that yielded one extra URL which is a 404. |
21:58:57 | <imer> | woo lol |
21:59:15 | <imer> | my guess was correct at least then |
21:59:48 | <@JAA> | I wonder why it didn't show up in my lists. I didn't filter the CDX results to 200s or similar. |
22:00:17 | <@JAA> | And I thought all CC data is in the WBM. Maybe not... |
22:12:27 | <imer> | maybe the "newest" crawl isnt just yet or something |
22:13:16 | <@JAA> | Yeah, could be. The CDX API also isn't a full reflection of the WBM IIRC. Something something layers of indices. |