00:05:06 | | BlueMaxima joins |
00:15:00 | | etnguyen03 quits [Client Quit] |
00:29:43 | | @Sanqui quits [Quit: .] |
00:34:21 | | qwertyasdfuiopghjkl2 joins |
00:34:58 | | qwertyasdfuiopghjkl2 leaves |
00:35:44 | | qwertyasdfuiopghjkl2 joins |
00:37:13 | | qwertyasdfuiopghjkl2 leaves |
00:37:42 | <immibis> | TheTechRobo: I think they are mostly concerned about layer 7 transparent proxies and similar. NAT is allowed, so if your fancy setup amounts to a NAT, my judgement says it's allowed... my judgement also doesn't count for anything. |
00:38:20 | <pabs> | katia: see the wiki page I'm maintaining https://wiki.archiveteam.org/index.php/SmolNet |
00:39:31 | <pabs> | katia: short answer: its not possible in a standards-compliant way, but others have already done it without updating the WARC standard, and I have done it using HTTP to SmolNet proxies, but the AB jobs crashed once, then killed due to pipeline disappearance once both times |
00:40:01 | <pabs> | kokos: ^ |
00:40:40 | <pabs> | see also the Gopher wiki page |
00:40:47 | | etnguyen03 (etnguyen03) joins |
00:41:55 | <h2ibot> | PaulWise created Gemini (+21, create page): https://wiki.archiveteam.org/?title=Gemini |
00:41:56 | <h2ibot> | PaulWise edited Regex Rodeo (-9, update redirect): https://wiki.archiveteam.org/?diff=53843&oldid=53827 |
00:42:55 | <h2ibot> | PaulWise edited ArchiveBot/Regex rodeo (-9, update redirect): https://wiki.archiveteam.org/?diff=53844&oldid=53825 |
00:44:29 | <immibis> | given the pedantry around WARCs when applied to HTTP i would have expected WARC should be updated to support Gopher/Gemini/HTTP2 exchanges and then the exchanges should be stored in the file without modification. anything else seems to defeat the point of the pedantry, no? |
00:53:07 | <pabs> | I've no idea about WARC. currently it doesn't support anything but HTTP 1.1 I thought |
00:53:30 | <pabs> | the wiki page includes some links to WARC standard stuff about Gopher/etc too |
00:53:57 | <h2ibot> | Switchnode edited CuriousCat (+44, add data link): https://wiki.archiveteam.org/?diff=53845&oldid=53572 |
00:55:31 | <pabs> | oh I forgot to add them |
00:55:46 | <pabs> | https://github.com/iipc/warc-specifications/issues/87 |
00:56:16 | <pabs> | ah no, third paragraph |
00:56:18 | <pabs> | https://github.com/iipc/warc-specifications/issues/85 |
00:56:26 | <pabs> | https://github.com/iipc/warc-specifications/issues/42 |
00:59:40 | | mls quits [Ping timeout: 260 seconds] |
01:16:38 | | mls (mls) joins |
01:25:55 | | tbc1887 quits [Ping timeout: 260 seconds] |
01:29:49 | <that_lurker> | https://img.kuhaon.fun/u/BfK6vP.mp4 |
01:33:20 | | that_lurker forgot to add context |
01:33:56 | <that_lurker> | CNN tiktok about reasearchers rushing to catalogue and save scientific work post-election |
01:34:13 | <@OrIdow6> | Namely about the IA (no surprise in the information given there) and "EDGI" |
01:35:18 | <@OrIdow6> | Also contains what I think is a really old photo of Brewster Khale at 0:44? |
01:39:33 | <pabs> | heh "engineers" |
01:49:23 | | Sanqui joins |
01:49:25 | | Sanqui is now authenticated as Sanqui |
01:49:25 | | Sanqui quits [Changing host] |
01:49:25 | | Sanqui (Sanqui) joins |
01:49:25 | | @ChanServ sets mode: +o Sanqui |
02:11:25 | <@OrIdow6> | Parallel now thinks it'll be 4 days to do my (limited) scan |
02:21:49 | | Hackerpcs quits [Quit: Hackerpcs] |
02:23:32 | | Hackerpcs (Hackerpcs) joins |
02:23:44 | | DopefishJustin quits [Remote host closed the connection] |
02:28:20 | | pokechu22 quits [Ping timeout: 260 seconds] |
02:29:30 | | Hackerpcs quits [Ping timeout: 260 seconds] |
02:31:51 | | Hackerpcs (Hackerpcs) joins |
02:37:05 | | Hackerpcs quits [Ping timeout: 260 seconds] |
02:41:31 | | Hackerpcs (Hackerpcs) joins |
03:09:45 | | decky_e quits [Ping timeout: 260 seconds] |
03:26:56 | | datechnoman quits [Quit: The Lounge - https://thelounge.chat] |
03:27:40 | | datechnoman (datechnoman) joins |
03:29:34 | | BlueMaxima quits [Read error: Connection reset by peer] |
03:36:55 | | etnguyen03 quits [Remote host closed the connection] |
03:50:33 | | Matthww quits [Quit: Ping timeout (120 seconds)] |
03:52:05 | | Matthww joins |
04:09:26 | | Matthww quits [Client Quit] |
04:10:59 | | Matthww joins |
04:28:59 | | Island quits [Read error: Connection reset by peer] |
04:47:41 | | Island joins |
04:49:21 | <@OrIdow6> | Looking thru some of the CDXs in WARCs which are accessible to me I find that hashes consisting of all 'A's occur about 7 times as often as those consisting of e.g. 1 byte and then all As after that |
04:50:21 | <@JAA> | That sounds about right for the base32 strings. |
04:50:31 | <@JAA> | Should be a factor 8, actually. |
04:50:43 | <@OrIdow6> | Ahh forgot about base32 |
04:52:09 | <@OrIdow6> | Ex4plains it |
04:52:21 | <@JAA> | Hashes with all As should be about as common as hashes with two arbitrary base32 chars and then all As. |
04:52:45 | <@JAA> | (Actually not quite arbitrary, but I'm too lazy to figure out the possible value for the second char.) |
04:53:00 | | Wohlstand quits [Ping timeout: 260 seconds] |
05:19:50 | | ArchivalEfforts quits [Ping timeout: 260 seconds] |
05:20:29 | | ArchivalEfforts joins |
05:21:15 | | Commander001 quits [Read error: Connection reset by peer] |
05:21:28 | | Commander001 joins |
05:21:59 | | AlsoHP_Archivist joins |
05:24:30 | | HP_Archivist quits [Ping timeout: 260 seconds] |
05:26:54 | | wickedplayer494 quits [Ping timeout: 252 seconds] |
05:27:45 | | wickedplayer494 joins |
05:28:02 | | wickedplayer494 is now authenticated as wickedplayer494 |
05:28:35 | | Commander001 quits [Ping timeout: 260 seconds] |
05:28:52 | <h2ibot> | JustAnotherArchivist edited ArchiveBot/Ignore (+560, /* Drupal */ Include basePath and fix profiles): https://wiki.archiveteam.org/?diff=53846&oldid=53832 |
05:28:57 | <@JAA> | c3manu: ^ |
05:35:54 | <h2ibot> | JustAnotherArchivist edited ArchiveBot/Ignore (+572, /* Drupal */ Add Backdrop CMS): https://wiki.archiveteam.org/?diff=53847&oldid=53846 |
05:38:42 | | Guest54 quits [Quit: My MacBook has gone to sleep. ZZZzzz…] |
05:38:49 | <@arkiver> | JAA: IA recalculates yes |
05:39:07 | | Guest54 joins |
05:48:05 | <@arkiver> | OrIdow6: on the SHA1 collision attacks, likely yes. (unrelated to the issue Wget-AT had of course) |
05:58:29 | | wessel15126 joins |
05:58:51 | | wessel1512 quits [Read error: Connection reset by peer] |
05:58:51 | | wessel15126 is now known as wessel1512 |
06:08:51 | | ArchivalEfforts quits [Client Quit] |
06:09:01 | | ArchivalEfforts joins |
06:49:29 | | pixel (pixel) joins |
06:50:01 | <@arkiver> | OrIdow6: JAA: doing a scan similar to what OrIdow6 is doing, but on all items |
06:50:14 | <@arkiver> | also those not reachable outside |
06:50:20 | <@arkiver> | using the CDX GZ files |
06:52:54 | <thuban> | JAA: the archivebot job for forum.pclab.pl (shuts down 29/30 november) definitely will not finish in time, partly because of size but also because of the enumeration issues (i thought user pages could get us around this but they can't). |
06:53:15 | <thuban> | can you get a qwarc job going / would it be helpful if i tried writing a spec file? |
07:05:28 | | qwertyasdfuiopghjkl quits [Ping timeout: 255 seconds] |
07:05:50 | | Unholy2361924645377131 (Unholy2361) joins |
07:07:40 | <@arkiver> | OrIdow6: JAA: looking into requeuing items to projects that are still up |
07:19:13 | <@JAA> | thuban: Right, will get that started this week. I've done Invision that way before, so can mostly copy the spec file from one of those. |
07:22:07 | <thuban> | ok, cool. unfortunately the php error pages have status 200, so there'll need to be a specific check for that (i don't _think_ it can happen on thread pages, but probably safest to do it everywhere) |
07:22:43 | <@JAA> | Right |
07:22:54 | <@JAA> | Do you have an example of a page that always fails? |
07:24:08 | <@JAA> | I do always have a check whether the expected content is in the response (in this case, whether there are posts on a thread page), but it normally just generates a warning rather than rejecting the response and retrying. |
07:24:44 | <@JAA> | If it's not something that fixes itself within a minute though, I guess it doesn't matter. |
07:25:03 | <thuban> | no, sorry. a lot of them are frequent but i don't think any are 100% consistent |
07:28:14 | | loug8318142 joins |
07:39:10 | | pixel leaves |
07:51:41 | | pixel (pixel) joins |
07:57:14 | | Wohlstand (Wohlstand) joins |
08:02:03 | <@arkiver> | first doing a general scan for all SHA1s ending with a NUL byte, then doing a second scan over those results to get the items these revisit record belonged to |
08:05:02 | | pixel leaves |
08:13:49 | | Island quits [Read error: Connection reset by peer] |
08:16:51 | | wessel1512 quits [Ping timeout: 252 seconds] |
08:37:54 | | tek_dmn quits [Quit: ZNC - https://znc.in] |
08:38:11 | | tek_dmn (tek_dmn) joins |
08:44:43 | <@arkiver> | the queuing bot is back up! |
08:44:56 | <@arkiver> | feel free to queue whatever was not picked up |
08:45:14 | <@JAA> | qubert++ |
08:45:15 | <eggdrop> | [karma] 'qubert' now has 1 karma! |
08:45:23 | <@arkiver> | i will also take some time today or tomorrow to go through my logs and find !a commands that have not been run yet |
08:45:28 | <@arkiver> | first karma! |
08:45:51 | <@arkiver> | also let's release this |
08:46:09 | <@arkiver> | i like qubert |
08:46:16 | <@arkiver> | the long version is Quantum BERT |
09:04:11 | <@arkiver> | no opinions on Quantum BERT? :P |
09:04:45 | <@arkiver> | if it's a horrible idea, tell me... and if it's a great horrible idea, also tell me :P |
09:05:51 | <@JAA> | We can call it that once it runs on a quantum computer and processes items in parallel. :-P |
09:06:46 | <@arkiver> | it processes in parallel |
09:07:10 | <@JAA> | But not in the quantum computing sense. :-) |
09:07:42 | <@JAA> | (I don't know whether this makes any sense and should probably be in bed anyway.) |
09:07:47 | <@arkiver> | it will... we're just a bit early with the name |
09:52:09 | | Wohlstand quits [Client Quit] |
10:07:33 | | Wohlstand (Wohlstand) joins |
10:18:13 | <Vokun> | qubert++ |
10:18:13 | <eggdrop> | [karma] 'qubert' now has 2 karma! |
10:21:17 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
10:30:27 | | sralracer (sralracer) joins |
10:51:21 | | Wohlstand quits [Client Quit] |
11:05:49 | | ducky_ (ducky) joins |
11:06:13 | | ducky quits [Ping timeout: 260 seconds] |
11:06:24 | | ducky_ is now known as ducky |
11:10:00 | | LomanicOld|m joins |
11:11:28 | <pabs> | JAA: re AB monitoring for puu.sh, can you add it to the AB/Monitoring wiki in a new ideas section? |
11:14:58 | <h2ibot> | JustAnotherArchivist edited ArchiveBot/Monitoring (+74, Add puush as an idea): https://wiki.archiveteam.org/?diff=53848&oldid=53810 |
11:33:45 | | Naruyoko5 quits [Ping timeout: 252 seconds] |
12:00:06 | | Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:37 | | qwertyasdfuiopghjkl2 joins |
12:02:48 | | Bleo182600722719623 joins |
12:04:42 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
12:05:17 | | decky_e joins |
12:05:52 | | qwertyasdfuiopghjkl2 joins |
12:28:27 | <imer> | arkiver++ |
12:28:28 | <eggdrop> | [karma] 'arkiver' now has 37 karma! |
12:33:07 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
12:33:50 | | f_ quits [Ping timeout: 260 seconds] |
12:34:15 | | qwertyasdfuiopghjkl2 joins |
12:34:24 | | qwertyasdfuiopghjkl2 leaves |
12:34:55 | | qwertyasdfuiopghjkl2 joins |
12:36:02 | | f_ (funderscore) joins |
12:38:29 | | SkilledAlpaca41896 quits [Quit: SkilledAlpaca41896] |
12:39:58 | | SkilledAlpaca41896 joins |
12:44:01 | | qwertyasdfuiopghjkl2 quits [Excess Flood] |
12:45:15 | | Sluggs quits [Ping timeout: 252 seconds] |
12:48:34 | | Sluggs joins |
12:50:01 | | qwertyasdfuiopghjkl2 joins |
12:57:01 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
12:58:41 | | qwertyasdfuiopghjkl2 joins |
12:59:40 | | qwertyasdfuiopghjkl2 leaves |
13:03:23 | | qwertyasdfuiopghjkl2 joins |
13:11:39 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
13:12:15 | | qwertyasdfuiopghjkl2 joins |
13:12:27 | | qwertyasdfuiopghjkl2 quits [Excess Flood] |
13:14:20 | | qwertyasdfuiopghjkl2 joins |
13:14:23 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
13:14:37 | | qwertyasdfuiopghjkl2 joins |
13:14:37 | | qwertyasdfuiopghjkl2 quits [Excess Flood] |
13:16:54 | <@arkiver> | thanks imer :) |
13:18:04 | | qwertyasdfuiopghjkl2 joins |
13:18:04 | | qwertyasdfuiopghjkl2 quits [Excess Flood] |
13:20:31 | | qwertyasdfuiopghjkl2 joins |
13:20:34 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
13:22:41 | | qwertyasdfuiopghjkl2 joins |
13:28:06 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
13:45:39 | <f_> | arkiver++ |
13:45:39 | <eggdrop> | [karma] 'arkiver' now has 38 karma! |
13:46:07 | | qwertyasdfuiopghjkl2 joins |
13:46:07 | | qwertyasdfuiopghjkl2 quits [Excess Flood] |
13:47:34 | | qwertyasdfuiopghjkl2 joins |
13:47:34 | | qwertyasdfuiopghjkl2 quits [Excess Flood] |
14:00:34 | | f_ is now known as funderscore |
14:01:40 | | funderscore is now known as f_ |
14:05:16 | | qwertyasdfuiopghjkl2 joins |
14:09:20 | | qwertyasdfuiopghjkl2 leaves |
14:12:46 | | qwertyasdfuiopghjkl2 joins |
14:13:26 | | qwertyasdfuiopghjkl2 leaves |
14:14:08 | | qwertyasdfuiopghjkl2 joins |
14:21:23 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
14:24:37 | | qwertyasdfuiopghjkl2 joins |
14:26:06 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
14:28:24 | | qwertyasdfuiopghjkl2 joins |
14:30:02 | | vix5110_ joins |
14:34:55 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
14:58:20 | | qwertyasdfuiopghjkl2 joins |
15:03:14 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
15:03:22 | | qwertyasdfuiopghjkl2 joins |
15:03:41 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
15:03:49 | | qwertyasdfuiopghjkl2 joins |
15:06:39 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
15:06:57 | | qwertyasdfuiopghjkl2 joins |
15:15:57 | | katocala quits [Ping timeout: 252 seconds] |
15:16:43 | | katocala joins |
15:21:26 | | qwertyasdfuiopghjkl2 leaves |
15:21:43 | | AlsoHP_Archivist quits [Quit: Leaving] |
15:21:57 | | HP_Archivist (HP_Archivist) joins |
15:23:54 | | Muad-Dib quits [Quit: ZNC - http://znc.in] |
15:24:18 | | qwertyasdfuiopghjkl2 joins |
15:29:15 | | sludge quits [Remote host closed the connection] |
15:29:16 | | DopefishJustin joins |
15:29:16 | | DopefishJustin is now authenticated as DopefishJustin |
15:29:28 | | sludge joins |
15:46:47 | | qwertyasdfuiopghjkl2 quits [Client Quit] |
15:51:24 | | qwertyasdfuiopghjkl2 joins |
15:57:46 | | Webuser618972 joins |
15:58:54 | | Webuser618972 quits [Client Quit] |
16:07:20 | | katocala quits [Ping timeout: 260 seconds] |
16:08:07 | | katocala joins |
16:15:54 | | qwertyasdfuiopghjkl2 quits [Excess Flood] |
16:18:41 | | qwertyasdfuiopghjkl2 joins |
16:35:11 | | Muad-Dib joins |
16:45:15 | | Doranwen quits [Ping timeout: 260 seconds] |
16:46:18 | | Doranwen (Doranwen) joins |
16:51:05 | | Naruyoko5 joins |
17:00:48 | <c3manu> | JAA++ |
17:00:48 | <eggdrop> | [karma] 'JAA' now has 170 karma! |
17:03:16 | <that_lurker> | arkiver++ |
17:03:16 | <eggdrop> | [karma] 'arkiver' now has 39 karma! |
17:03:23 | <that_lurker> | JAA++ |
17:03:23 | <eggdrop> | [karma] 'JAA' now has 171 karma! |
17:03:28 | <that_lurker> | c3manu++ |
17:03:29 | <eggdrop> | [karma] 'c3manu' now has 49 karma! |
17:07:36 | <@OrIdow6> | arkiver: I'll just go with your scan then, because you have access to these things + are closer to the data |
17:07:48 | <@OrIdow6> | What are your plans for what you're going to do with the results? |
17:09:38 | | ducky quits [Ping timeout: 260 seconds] |
17:11:07 | | ducky (ducky) joins |
17:18:36 | | HP_Archivist quits [Ping timeout: 252 seconds] |
17:25:40 | <nicolas17> | h2ibot: wb |
17:28:23 | <nicolas17> | anyone queueing stuff from IRC logs? I don't think I have complete-enough logs to do that myself |
17:40:28 | | HP_Archivist (HP_Archivist) joins |
18:27:44 | | khaoohs joins |
18:27:47 | <kiska> | nicolas17: I have done #frogger #pastalavista #imgone and #mediaonfire but #down-the-tube has an error and I am waiting for arkiver to fix that before resuming queuing from my logs, and I don't have +v or +o in #telegrab |
18:28:34 | <kiska> | But #telegrab requires more... careful selection from the logs due to new restrictions |
18:51:28 | <h2ibot> | Manu edited Discourse/archived (+91, Grabbing https://discourse.joplinapp.org/): https://wiki.archiveteam.org/?diff=53849&oldid=53839 |
19:01:38 | | HP_Archivist quits [Read error: Connection reset by peer] |
19:02:02 | | HP_Archivist (HP_Archivist) joins |
19:22:00 | <@JAA> | kiska++ |
19:22:01 | <eggdrop> | [karma] 'kiska' now has 7 karma! |
19:30:46 | | pabs quits [Read error: Connection reset by peer] |
19:31:59 | | pabs (pabs) joins |
20:30:23 | | Commander001 joins |
21:00:18 | | matoro quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
21:00:38 | | matoro joins |
21:54:11 | | vix5110_ quits [Quit: Ooops, wrong browser tab.] |
21:56:37 | | Island joins |
22:27:14 | | simon8162 quits [Quit: ZNC 1.9.1 - https://znc.in] |
22:30:51 | | simon816 (simon816) joins |
22:32:37 | | etnguyen03 (etnguyen03) joins |
22:40:13 | | ducksauce joins |
22:40:34 | | ducksauce quits [Client Quit] |
22:56:21 | | loug8318142 quits [Quit: The Lounge - https://thelounge.chat] |
23:00:35 | | Radzig quits [Remote host closed the connection] |
23:06:38 | | Radzig joins |
23:27:41 | | BornOn420 quits [Remote host closed the connection] |
23:28:03 | | BornOn420 (BornOn420) joins |