00:10:39 | | Notrealname1234 (Notrealname1234) joins |
00:50:50 | | Notrealname1234 quits [Client Quit] |
01:01:27 | | eightthree quits [Ping timeout: 272 seconds] |
01:04:42 | | eightthree joins |
01:18:50 | | Jake quits [Quit: Leaving for a bit!] |
01:19:14 | | Jake (Jake) joins |
01:25:10 | <fireonlive> | pabs: denada :) |
01:27:31 | | qw3rty__ quits [Ping timeout: 255 seconds] |
01:33:34 | | qw3rty__ joins |
01:45:58 | | lemuria joins |
01:48:07 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
02:02:50 | | lemuria is now authenticated as lemuria |
02:04:14 | | lemuria quits [Client Quit] |
02:04:36 | | lemuria (lemuria) joins |
02:05:16 | | lemuria quits [Changing host] |
02:05:16 | | lemuria (lemuria) joins |
02:14:35 | | Guest54 quits [Client Quit] |
02:23:38 | <lemuria> | hi there, is --level 2 a good option for crawling a wordpress site from 2014 |
02:47:14 | | muklumsum quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
02:48:23 | | muklumsum joins |
03:21:10 | | Panasonic joins |
03:21:48 | | Ravenloft quits [Read error: Connection reset by peer] |
03:27:39 | | Wohlstand quits [Client Quit] |
03:45:33 | | etnguyen03 quits [Client Quit] |
04:54:01 | | BlueMaxima quits [Read error: Connection reset by peer] |
04:57:56 | | DogsRNice quits [Read error: Connection reset by peer] |
06:41:11 | | loug4 joins |
06:47:49 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52976&oldid=52975 |
06:56:08 | <fireonlive> | lemuria: which program are you using? #archivebot might be a good choice instead to get it in the wayback machine (and you can grab the warcs after) |
07:10:03 | | Unholy236192464537713 quits [Ping timeout: 272 seconds] |
07:29:50 | | Unholy236192464537713 (Unholy2361) joins |
08:10:24 | <lemuria> | grab-site, i forgot to say, fireonlive |
08:11:08 | <lemuria> | the archive was kinda OK-ish but fonts were missing and one of the images too, is that normal when grabbing sites? |
08:23:05 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52977&oldid=52976 |
08:25:06 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52978&oldid=52977 |
08:42:09 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52979&oldid=52978 |
08:46:09 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52980&oldid=52979 |
08:47:10 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52981&oldid=52980 |
08:58:11 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */ aborted): https://wiki.archiveteam.org/?diff=52982&oldid=52981 |
09:00:03 | | Bleo1826007227196 quits [Client Quit] |
09:00:12 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52983&oldid=52982 |
09:01:25 | | Bleo1826007227196 joins |
09:02:55 | <thuban> | asie, nullpeta, c3manu: as expected, my bruteforce did not turn up any sites not found in https://asie.pl/files/hp_vector_urls_20161012_plus.txt (except 403s) |
09:03:12 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52984&oldid=52983 |
09:03:14 | <thuban> | i can produce a list of the 403s if that's wanted but idk how useful it would be |
09:04:32 | <thuban> | also, c3manu, how did you run that list? `!ao <` or `!a <`? |
09:08:20 | <c3manu> | !ao and !a < |
09:08:45 | <thuban> | ty! |
09:08:49 | <c3manu> | np :) |
09:09:10 | <c3manu> | depends on what causes the 403s. if AB can someone get around that, it would be pretty useful ^^ |
09:10:20 | <thuban> | i doubt it. my (100% speculative) guess is that they're sites set to private in some way by their authors |
09:10:36 | <lemuria> | what site are we investigating the 403s for |
09:10:41 | <lemuria> | multiple domains? |
09:10:45 | <thuban> | hp.vector.co.jp |
09:14:12 | <thuban> | personally i'd really like to know more about the non-'VA\d{6}' authors. given that there were only two in the 2016 directory and that that directory was 99.9% complete, we're probably not missing much in that regard, but maybe somebody can grep CC/IA CDX and see if anything turns up? |
09:14:14 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52985&oldid=52984 |
09:38:18 | <h2ibot> | Exorcism edited Bugzilla (+28, /* Status */): https://wiki.archiveteam.org/?diff=52986&oldid=52985 |
09:46:20 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52987&oldid=52986 |
09:46:21 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52988&oldid=52987 |
09:50:21 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52989&oldid=52988 |
09:52:21 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52990&oldid=52989 |
09:54:21 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52991&oldid=52990 |
10:00:22 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52992&oldid=52991 |
10:03:23 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52993&oldid=52992 |
10:03:24 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52994&oldid=52993 |
10:05:23 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52995&oldid=52994 |
10:21:26 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52996&oldid=52995 |
10:22:26 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52997&oldid=52996 |
10:22:27 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52998&oldid=52997 |
10:23:26 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52999&oldid=52998 |
10:31:27 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=53000&oldid=52999 |
10:53:31 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=53001&oldid=53000 |
11:00:00 | | Bleo1826007227196 quits [Client Quit] |
11:01:17 | | Bleo1826007227196 joins |
11:43:56 | | SkilledAlpaca quits [Client Quit] |
11:46:42 | | yarrow2 quits [Quit: Connection closed for inactivity] |
11:47:13 | | SkilledAlpaca joins |
12:06:29 | | etnguyen03 (etnguyen03) joins |
12:16:16 | | danwellby quits [Quit: Watch out For sysops carrying carpet and quicklime] |
12:37:20 | | Guest54 joins |
13:04:59 | <h2ibot> | Exorcism edited Bugzilla (+18, /* Status */): https://wiki.archiveteam.org/?diff=53002&oldid=53001 |
13:05:59 | <h2ibot> | PaulWise edited SmolNet (+345, add link to mercury protocl doc, mention…): https://wiki.archiveteam.org/?diff=53003&oldid=52708 |
13:17:01 | <h2ibot> | Exorcism edited Bugzilla (+27, /* Archived */): https://wiki.archiveteam.org/?diff=53004&oldid=53002 |
13:46:15 | <JaffaCakes118> | Could someone archive https://hlelo101.github.io/ with archivebot please (no coverage |
13:54:07 | <h2ibot> | Exorcism edited Bugzilla (+38, /* Archived */): https://wiki.archiveteam.org/?diff=53005&oldid=53004 |
14:04:53 | <@JAA> | (That's been handled in #archivebot since.) |
14:10:10 | <h2ibot> | Exorcism edited Bugzilla (+34, /* Archived */): https://wiki.archiveteam.org/?diff=53006&oldid=53005 |
14:14:11 | <h2ibot> | Exorcism edited Bugzilla (+36, /* Archived */): https://wiki.archiveteam.org/?diff=53007&oldid=53006 |
14:18:11 | <h2ibot> | Exorcism edited Bugzilla (+39, /* Archived */): https://wiki.archiveteam.org/?diff=53008&oldid=53007 |
14:23:12 | <h2ibot> | Exorcism edited Bugzilla (+40, /* Archived */): https://wiki.archiveteam.org/?diff=53009&oldid=53008 |
14:26:01 | | Dango360 quits [Ping timeout: 255 seconds] |
14:33:14 | <h2ibot> | Exorcism edited Bugzilla (+45, /* Archived */): https://wiki.archiveteam.org/?diff=53010&oldid=53009 |
14:38:15 | <h2ibot> | Exorcism edited Bugzilla (+34, /* Archived */): https://wiki.archiveteam.org/?diff=53011&oldid=53010 |
14:49:00 | | Notrealname1234 (Notrealname1234) joins |
14:52:42 | | Dango360 (Dango360) joins |
14:58:34 | | Notrealname1234 quits [Remote host closed the connection] |
14:58:40 | | Notrealname1234 (Notrealname1234) joins |
14:59:17 | | Notrealname1234 quits [Client Quit] |
15:35:05 | | danwellby joins |
15:54:25 | | Wohlstand (Wohlstand) joins |
16:20:45 | <@arkiver> | JAA: what is sense even :P |
16:35:35 | <h2ibot> | Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=53012&oldid=53011 |
17:01:40 | <h2ibot> | Bzc6p edited Demotivalo.net (-28, /* Sister sites */ kommenthuszar.com restored): https://wiki.archiveteam.org/?diff=53013&oldid=50595 |
17:44:27 | | DogsRNice joins |
18:03:39 | | ilnrja quits [Remote host closed the connection] |
18:03:56 | | ilnrja (ilnrja) joins |
18:16:52 | | danwellby quits [Read error: Connection reset by peer] |
18:24:01 | | Island joins |
18:29:29 | <lemuria> | HELP VER |
18:29:36 | <lemuria> | (sorry forgot the /) |
18:36:20 | | danwellby joins |
18:44:29 | | pseudorizer quits [Quit: ZNC 1.9.1 - https://znc.in] |
18:45:07 | <katia> | /) |
18:45:47 | | pseudorizer (pseudorizer) joins |
19:00:28 | | JaffaCakes118 quits [Remote host closed the connection] |
19:30:40 | | SkilledAlpaca quits [Ping timeout: 255 seconds] |
19:34:38 | | SkilledAlpaca joins |
19:48:40 | | SkilledAlpaca quits [Ping timeout: 255 seconds] |
19:49:18 | | ilnrja quits [Remote host closed the connection] |
19:49:56 | | ilnrja (ilnrja) joins |
20:33:04 | | SkilledAlpaca joins |
20:36:25 | | Exorcism quits [Remote host closed the connection] |
20:36:25 | | DigitalDragons quits [Read error: Connection reset by peer] |
20:36:54 | | DigitalDragons (DigitalDragons) joins |
20:36:57 | | Exorcism (exorcism) joins |
20:47:57 | | TheGamer2000 joins |
20:48:04 | | TheGamer2000 quits [Client Quit] |
20:51:07 | | icedice (icedice) joins |
21:28:03 | | pixel leaves |
21:28:04 | | pixel (pixel) joins |
21:39:24 | | Megame (Megame) joins |
21:52:29 | | yarrow2 joins |
22:29:37 | | BlueMaxima joins |
22:41:41 | | yarrow_irccloud (yarrow_irccloud) joins |
23:17:19 | | sec^nd quits [Remote host closed the connection] |
23:17:19 | | shgaqnyrjp quits [Remote host closed the connection] |
23:17:43 | | shgaqnyrjp (shgaqnyrjp) joins |
23:17:50 | | sec^nd (second) joins |
23:22:11 | | CookMePlox joins |
23:24:24 | <CookMePlox> | hi friends! i am wondering if anyone knows specifics about how the "length" field from wayback machine's cdx api is calculated |
23:24:53 | <CookMePlox> | specifically, I'm seeing a bunch of cases where the digest matches, but the length is different, for example |
23:24:57 | <CookMePlox> | it,tip)/zenit/natale-6.jpg 20010725190547 http://www.tip.it:80/zenit/natale-6.JPG image/jpeg 200 PO53TOR6WEL4F3CFWDXP3OEUEY7F25NW 29785 |
23:24:58 | <CookMePlox> | it,tip)/zenit/natale-6.jpg 20011230210923 http://www.tip.it:80/zenit/natale-6.JPG image/jpeg 200 PO53TOR6WEL4F3CFWDXP3OEUEY7F25NW 29773 |
23:26:30 | <CookMePlox> | it's not obvious to me why the length would vary if the hash is the same. is the length maybe including some http headers that varied between responses, even though the response payloads were otherwise identical? |
23:32:06 | <CookMePlox> | ah, I see! the headers are preserved under X-Archive-Orig, and they are indeed different. so I think the length must be the compressed (gzip maybe?) size of the original entire network request, including headers |
23:33:46 | | CookMePlox quits [Client Quit] |
23:43:15 | <@OrIdow6> | Wish more people could just answer their questions by looking at the list of users in the channel |