| 00:02:20 | <fireonlive> | is there a way to stop grab-site and have it resume where it left off on a different server? |
| 00:17:07 | | icedice quits [Client Quit] |
| 00:20:43 | | ymgve joins |
| 00:21:07 | | Dango360_ (Dango360) joins |
| 00:24:41 | | Dango360 quits [Ping timeout: 252 seconds] |
| 00:28:14 | <h2ibot> | Tomodachi94 edited Prnt.sc (+28, Add category): https://wiki.archiveteam.org/?diff=49853&oldid=49849 |
| 00:28:15 | <h2ibot> | Tomodachi94 edited Roblox (+425, /* Group sales */ New section): https://wiki.archiveteam.org/?diff=49854&oldid=48716 |
| 00:30:21 | | ymgve quits [Read error: Connection reset by peer] |
| 00:30:33 | | icedice (icedice) joins |
| 00:30:42 | | ymgve joins |
| 00:36:12 | | dumbgoy_ quits [Read error: Connection reset by peer] |
| 00:37:26 | | dumbgoy_ joins |
| 00:37:52 | | dumbgoy_ quits [Read error: Connection reset by peer] |
| 00:40:43 | | dumbgoy_ joins |
| 00:47:55 | | HP_Archivist (HP_Archivist) joins |
| 01:01:03 | | leo60228 quits [Read error: Connection reset by peer] |
| 01:01:23 | | leo60228 (leo60228) joins |
| 01:20:25 | | Barto quits [Ping timeout: 265 seconds] |
| 01:20:47 | | Barto (Barto) joins |
| 01:44:09 | | HiccupJul quits [Client Quit] |
| 01:57:38 | | tertu quits [Ping timeout: 252 seconds] |
| 01:59:08 | | tertu (tertu) joins |
| 01:59:21 | | zhongfu quits [Client Quit] |
| 02:09:13 | | icedice quits [Client Quit] |
| 02:20:17 | | icedice (icedice) joins |
| 03:07:08 | | dumbgoy_ quits [Ping timeout: 252 seconds] |
| 04:00:41 | | _Dango360 (Dango360) joins |
| 04:04:08 | | Dango360_ quits [Ping timeout: 252 seconds] |
| 04:08:30 | | dumbgoy_ joins |
| 04:21:56 | | sonick (sonick) joins |
| 04:27:14 | | dumbgoy_ quits [Ping timeout: 252 seconds] |
| 06:08:37 | | Island quits [Read error: Connection reset by peer] |
| 06:41:44 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 06:49:19 | | decky_e (decky_e) joins |
| 07:38:53 | | decky_e quits [Read error: Connection reset by peer] |
| 07:43:38 | | decky_e (decky_e) joins |
| 07:48:12 | | fullpwn joins |
| 07:48:23 | | atphoenix quits [Remote host closed the connection] |
| 07:48:23 | | yano1 quits [Remote host closed the connection] |
| 07:48:23 | | AnotherIki quits [Remote host closed the connection] |
| 07:48:23 | | fullpwnmedia quits [Remote host closed the connection] |
| 07:48:31 | | AnotherIki joins |
| 07:48:38 | | yano1 (yano) joins |
| 07:48:49 | | atphoenix (atphoenix) joins |
| 08:02:26 | | Arcorann (Arcorann) joins |
| 08:17:47 | | zhongfu (zhongfu) joins |
| 08:46:20 | | railen63 joins |
| 08:47:17 | | railen63 quits [Remote host closed the connection] |
| 08:47:31 | | railen63 joins |
| 09:30:40 | | icedice quits [Client Quit] |
| 09:56:43 | | icedice (icedice) joins |
| 10:03:58 | | Ruthalas5 quits [Client Quit] |
| 10:04:19 | | Ruthalas5 (Ruthalas) joins |
| 10:06:26 | | Ruthalas5 quits [Client Quit] |
| 10:06:45 | | Ruthalas5 (Ruthalas) joins |
| 10:09:02 | | TastyWiener95 quits [Quit: So long, farewell, auf wiedersehen, good night] |
| 10:09:34 | | TastyWiener95 (TastyWiener95) joins |
| 10:18:51 | | TastyWiener95 quits [Client Quit] |
| 10:25:38 | | sonick quits [Client Quit] |
| 10:44:57 | | x-56k-mo1 quits [Quit: WeeChat 3.8] |
| 11:02:03 | | JohnnyJ quits [Read error: Connection reset by peer] |
| 11:20:40 | | decky_e quits [Remote host closed the connection] |
| 11:31:25 | | TastyWiener95 (TastyWiener95) joins |
| 12:40:25 | | Megame (Megame) joins |
| 12:55:22 | | Unholy23614 (Unholy2361) joins |
| 12:55:26 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 12:55:31 | | AmAnd0A joins |
| 12:59:17 | | Unholy2361 quits [Ping timeout: 252 seconds] |
| 12:59:17 | | Unholy23614 is now known as Unholy2361 |
| 13:48:23 | | cultpony quits [Client Quit] |
| 13:49:09 | | cultpony (cultpony) joins |
| 14:12:26 | | Arcorann quits [Ping timeout: 252 seconds] |
| 14:19:02 | | thuban quits [Ping timeout: 252 seconds] |
| 14:20:50 | | thuban joins |
| 14:38:17 | | parfait (kdqep) joins |
| 14:42:07 | | hitgrr8 joins |
| 14:44:12 | | AmAnd0A quits [Ping timeout: 265 seconds] |
| 15:23:07 | | eroc19909 is now known as eroc1990 |
| 15:28:11 | | Island joins |
| 15:34:36 | | AmAnd0A joins |
| 15:36:24 | | Megame quits [Client Quit] |
| 15:46:15 | | decky_e (decky_e) joins |
| 15:53:19 | | decky_e quits [Ping timeout: 265 seconds] |
| 15:53:46 | | decky_e (decky_e) joins |
| 16:10:46 | | flashfire42 quits [Remote host closed the connection] |
| 16:10:46 | | kiska quits [Remote host closed the connection] |
| 16:10:46 | | s-crypt quits [Remote host closed the connection] |
| 16:10:56 | | Ryz2 (Ryz) joins |
| 16:10:57 | | s-crypt (s-crypt) joins |
| 16:11:02 | | flashfire42 (flashfire42) joins |
| 16:12:11 | | kiska (kiska) joins |
| 16:23:59 | <h2ibot> | Tomodachi94 created Technic Platform (+851, Create page): https://wiki.archiveteam.org/?title=Technic%20Platform |
| 16:24:00 | <h2ibot> | Tomodachi94 uploaded File:Technic Platform wordmark.webp (Wordmark of [[Technic Platform]]): https://wiki.archiveteam.org/?title=File%3ATechnic%20Platform%20wordmark.webp |
| 16:24:01 | <h2ibot> | Tomodachi94 uploaded File:Technic Platform 2023-05-29 homepage.png (Homepage of [[Technic Platform]] on 2023-05-29): https://wiki.archiveteam.org/?title=File%3ATechnic%20Platform%202023-05-29%20homepage.png |
| 16:27:51 | | flashfire42 quits [Client Quit] |
| 16:27:51 | | kiska quits [Client Quit] |
| 16:27:51 | | Ryz2 quits [Client Quit] |
| 16:27:51 | | s-crypt quits [Client Quit] |
| 16:35:14 | | icedice quits [Client Quit] |
| 16:35:51 | | parfait quits [Ping timeout: 265 seconds] |
| 16:38:10 | | Ryz2 (Ryz) joins |
| 16:38:11 | | s-crypt (s-crypt) joins |
| 16:38:16 | | flashfire42 (flashfire42) joins |
| 16:39:25 | | icedice (icedice) joins |
| 16:39:26 | | kiska (kiska) joins |
| 16:40:24 | | HP_Archivist quits [Ping timeout: 252 seconds] |
| 16:50:23 | | c3manu (c3manu) joins |
| 17:31:46 | <masterX244> | damnit... 302 to ignored content gets the 302 dropped from the WARC on grabsitre, too... |
| 17:32:57 | | c3manu quits [Remote host closed the connection] |
| 17:42:15 | <pokechu22> | That's weird since archivebot doesn't check ignores on redirect targets at all |
| 17:42:33 | <pokechu22> | (it does apply the no-parent rule to redirect targets, though, but I'm not sure if the redirect is dropped from the WARC in that case) |
| 18:20:24 | | spirit joins |
| 18:20:25 | | dumbgoy_ joins |
| 18:22:29 | <spirit> | pokechu22: artdoxa has finished, right? :) |
| 18:23:42 | <pokechu22> | Yeah, it finished a while ago |
| 18:25:20 | <spirit> | awesome! |
| 18:25:21 | <pokechu22> | It did error on https://www.artdoxa.com/more?page=5753 (after successfully loading https://www.artdoxa.com/more?page=5752) and while it was going through those it was occasionally finding new users (but not new artworks I think). It seems like now https://www.artdoxa.com/more?page=5754 gives an error instead - this might just be it running out of pages or something, not sure. |
| 18:26:15 | <spirit> | it's totally fine if some pages were not archived. the contact initially did not think about archiving it at all so this is fantastic |
| 18:26:17 | <spirit> | thank you so much! |
| 18:26:34 | <spirit> | is there a way to see the list of warcs? |
| 18:26:47 | <pokechu22> | an example of a user it found is https://www.artdoxa.com/e-amsalk, who has no submitted artworks but did have favorites (which I guess it picked up from that list) |
| 18:27:09 | <pokechu22> | Yeah, at https://archive.fart.website/archivebot/viewer/job/eofoo - it came out to around 200GB |
| 18:29:00 | <spirit> | hm, i had estimated about 3 times that. is there a way to get a log of all urls contained in the warcs without downloading them? |
| 18:29:53 | <@JAA> | spirit: The -meta.warc.gz contains the complete log of the job, including ignored URLs. |
| 18:30:03 | <spirit> | excellent |
| 18:30:29 | <@JAA> | For more details, like MIME type, response size (in the general case), etc., you'd want each WARC's CDX. |
| 18:30:41 | <pokechu22> | The meta-warc is basically just a (gz-compressed) text file in that case; you can read it with zless |
| 18:32:49 | | decky_e quits [Ping timeout: 265 seconds] |
| 18:33:02 | <pokechu22> | Looks like amazon urls do have a length in the meta-warc but the artdoxa urls don't (probably because they don't set a content-length header) |
| 18:33:46 | <spirit> | i'll just take a brief look at the number of URLs later, if that matches expectations roughly, then this is good enough |
| 18:33:48 | <spirit> | thanks again! |
| 18:33:48 | <pokechu22> | JAA: is there a case where the meta-warc wouldn't contain the MIME type? I see `Length: unspecified [text/html; charset=utf-8]` in it currently which seems to imply that it has one even when it doesn't have the length |
| 18:33:56 | | decky_e (decky_e) joins |
| 18:34:31 | <pokechu22> | It might be worth checking if every thumbnail also had a full-sized image saved and vice versa; that'd be a good heuristic to make sure everything's saved |
| 18:35:19 | <@JAA> | pokechu22: Hmm, yeah, I guess the MIME type should always be there, true. |
| 18:35:43 | <@rewby|backup> | Heyo, I know people've been trying to get me to fix target stuff. Can anyone give me a summary? ( arkiver, datechnoman , JAA) |
| 18:36:37 | <pokechu22> | I guess also, if you estimated the expected total size by using the average size of recently uploaded images, it's probably the case that older images are generally smaller |
| 18:36:56 | <pokechu22> | though I'm not sure if that'd make a 400GB difference |
| 18:37:02 | <spirit> | pokechu22: yeah, that was my thinking too with the corresponding images |
| 18:37:13 | <spirit> | it was a *really* rough estimate :) |
| 18:37:35 | <spirit> | iirc i sampled ~10 full size images, might have been huge ones by random chance |
| 18:37:50 | <@JAA> | rewby: The targets behind Reddit and #// are struggling to keep up with the increased load from historical Reddit data, and some targets on Imgur have been erroring consistently (though that might just be the SBs). |
| 18:38:40 | <@rewby|backup> | JAA: Ack. First prio is dealing with #// and reddit. I need to shift some stuff around. I was handed back one of the target servers a day or two ago. I need to put it back online. |
| 18:40:59 | <@JAA> | Lovely :-) |
| 18:41:02 | | imer quits [Client Quit] |
| 18:42:48 | | imer (imer) joins |
| 18:57:15 | | railen63 quits [Remote host closed the connection] |
| 18:57:28 | | railen63 joins |
| 19:08:51 | | HP_Archivist (HP_Archivist) joins |
| 20:05:10 | <spirit> | looks good. there should be ~130k artworks and grabbed were ~125k fullsize artworks. nice! |
| 20:16:15 | | icedice quits [Ping timeout: 265 seconds] |
| 20:16:48 | | icedice (icedice) joins |
| 21:00:51 | | spirit quits [Client Quit] |
| 21:05:08 | | decky_e quits [Ping timeout: 252 seconds] |
| 21:05:34 | | decky_e (decky_e) joins |
| 21:09:33 | | decky_e quits [Remote host closed the connection] |
| 21:10:55 | | hitgrr8 quits [Client Quit] |
| 21:23:04 | <icedice> | Is #// Imgur brute force? |
| 21:27:12 | <pokechu22> | #// is a general URLs project, I think? Not sure of the details. The imgur brute force is discussed in #imgone to my understanding |
| 21:32:26 | | bf_ quits [Ping timeout: 252 seconds] |
| 21:33:58 | <@JAA> | Correct, #// mostly handles external URLs discovered in other projects, e.g. Reddit and, yes, Imgur (image descriptions etc.). |
| 21:34:28 | <@JAA> | *Everything* Imgur-related is in #imgone and should stay there. |
| 22:05:02 | | fullpwn quits [Remote host closed the connection] |
| 22:06:15 | | fullpwn joins |
| 22:06:41 | | luckcolors_ quits [Quit: No Ping reply in 180 seconds.] |
| 22:08:15 | | luckcolors (luckcolors) joins |
| 22:15:44 | | TastyWiener95 quits [Client Quit] |
| 22:30:37 | | icedice quits [Ping timeout: 265 seconds] |
| 22:46:47 | | HP_Archivist quits [Client Quit] |
| 22:55:48 | | icedice (icedice) joins |
| 22:56:28 | <icedice> | All right, gotcha |
| 22:57:50 | | icedice quits [Client Quit] |
| 22:59:11 | | icedice (icedice) joins |
| 23:01:28 | | bf_ joins |
| 23:06:45 | | sonick (sonick) joins |
| 23:15:29 | | yts98 leaves |
| 23:15:31 | | yts98 joins |
| 23:19:41 | | bf_ quits [Ping timeout: 252 seconds] |
| 23:26:14 | | decky_e (decky_e) joins |
| 23:28:51 | | TastyWiener95 (TastyWiener95) joins |
| 23:32:12 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 23:32:28 | | AmAnd0A joins |