00:02:20<fireonlive>is there a way to stop grab-site and have it resume where it left off on a different server?
00:17:07icedice quits [Client Quit]
00:20:43ymgve joins
00:21:07Dango360_ (Dango360) joins
00:24:41Dango360 quits [Ping timeout: 252 seconds]
00:28:14<h2ibot>Tomodachi94 edited Prnt.sc (+28, Add category): https://wiki.archiveteam.org/?diff=49853&oldid=49849
00:28:15<h2ibot>Tomodachi94 edited Roblox (+425, /* Group sales */ New section): https://wiki.archiveteam.org/?diff=49854&oldid=48716
00:30:21ymgve quits [Read error: Connection reset by peer]
00:30:33icedice (icedice) joins
00:30:42ymgve joins
00:36:12dumbgoy_ quits [Read error: Connection reset by peer]
00:37:26dumbgoy_ joins
00:37:52dumbgoy_ quits [Read error: Connection reset by peer]
00:40:43dumbgoy_ joins
00:47:55HP_Archivist (HP_Archivist) joins
01:01:03leo60228 quits [Read error: Connection reset by peer]
01:01:23leo60228 (leo60228) joins
01:20:25Barto quits [Ping timeout: 265 seconds]
01:20:47Barto (Barto) joins
01:44:09HiccupJul quits [Client Quit]
01:57:38tertu quits [Ping timeout: 252 seconds]
01:59:08tertu (tertu) joins
01:59:21zhongfu quits [Client Quit]
02:09:13icedice quits [Client Quit]
02:20:17icedice (icedice) joins
03:07:08dumbgoy_ quits [Ping timeout: 252 seconds]
04:00:41_Dango360 (Dango360) joins
04:04:08Dango360_ quits [Ping timeout: 252 seconds]
04:08:30dumbgoy_ joins
04:21:56sonick (sonick) joins
04:27:14dumbgoy_ quits [Ping timeout: 252 seconds]
06:08:37Island quits [Read error: Connection reset by peer]
06:41:44BlueMaxima quits [Read error: Connection reset by peer]
06:49:19decky_e (decky_e) joins
07:38:53decky_e quits [Read error: Connection reset by peer]
07:43:38decky_e (decky_e) joins
07:48:12fullpwn joins
07:48:23atphoenix quits [Remote host closed the connection]
07:48:23yano1 quits [Remote host closed the connection]
07:48:23AnotherIki quits [Remote host closed the connection]
07:48:23fullpwnmedia quits [Remote host closed the connection]
07:48:31AnotherIki joins
07:48:38yano1 (yano) joins
07:48:49atphoenix (atphoenix) joins
08:02:26Arcorann (Arcorann) joins
08:17:47zhongfu (zhongfu) joins
08:46:20railen63 joins
08:47:17railen63 quits [Remote host closed the connection]
08:47:31railen63 joins
09:30:40icedice quits [Client Quit]
09:56:43icedice (icedice) joins
10:03:58Ruthalas5 quits [Client Quit]
10:04:19Ruthalas5 (Ruthalas) joins
10:06:26Ruthalas5 quits [Client Quit]
10:06:45Ruthalas5 (Ruthalas) joins
10:09:02TastyWiener95 quits [Quit: So long, farewell, auf wiedersehen, good night]
10:09:34TastyWiener95 (TastyWiener95) joins
10:18:51TastyWiener95 quits [Client Quit]
10:25:38sonick quits [Client Quit]
10:44:57x-56k-mo1 quits [Quit: WeeChat 3.8]
11:02:03JohnnyJ quits [Read error: Connection reset by peer]
11:20:40decky_e quits [Remote host closed the connection]
11:31:25TastyWiener95 (TastyWiener95) joins
12:40:25Megame (Megame) joins
12:55:22Unholy23614 (Unholy2361) joins
12:55:26AmAnd0A quits [Ping timeout: 252 seconds]
12:55:31AmAnd0A joins
12:59:17Unholy2361 quits [Ping timeout: 252 seconds]
12:59:17Unholy23614 is now known as Unholy2361
13:48:23cultpony quits [Client Quit]
13:49:09cultpony (cultpony) joins
14:12:26Arcorann quits [Ping timeout: 252 seconds]
14:19:02thuban quits [Ping timeout: 252 seconds]
14:20:50thuban joins
14:38:17parfait (kdqep) joins
14:42:07hitgrr8 joins
14:44:12AmAnd0A quits [Ping timeout: 265 seconds]
15:23:07eroc19909 is now known as eroc1990
15:28:11Island joins
15:34:36AmAnd0A joins
15:36:24Megame quits [Client Quit]
15:46:15decky_e (decky_e) joins
15:53:19decky_e quits [Ping timeout: 265 seconds]
15:53:46decky_e (decky_e) joins
16:10:46flashfire42 quits [Remote host closed the connection]
16:10:46kiska quits [Remote host closed the connection]
16:10:46s-crypt quits [Remote host closed the connection]
16:10:56Ryz2 (Ryz) joins
16:10:57s-crypt (s-crypt) joins
16:11:02flashfire42 (flashfire42) joins
16:12:11kiska (kiska) joins
16:23:59<h2ibot>Tomodachi94 created Technic Platform (+851, Create page): https://wiki.archiveteam.org/?title=Technic%20Platform
16:24:00<h2ibot>Tomodachi94 uploaded File:Technic Platform wordmark.webp (Wordmark of [[Technic Platform]]): https://wiki.archiveteam.org/?title=File%3ATechnic%20Platform%20wordmark.webp
16:24:01<h2ibot>Tomodachi94 uploaded File:Technic Platform 2023-05-29 homepage.png (Homepage of [[Technic Platform]] on 2023-05-29): https://wiki.archiveteam.org/?title=File%3ATechnic%20Platform%202023-05-29%20homepage.png
16:27:51flashfire42 quits [Client Quit]
16:27:51kiska quits [Client Quit]
16:27:51Ryz2 quits [Client Quit]
16:27:51s-crypt quits [Client Quit]
16:35:14icedice quits [Client Quit]
16:35:51parfait quits [Ping timeout: 265 seconds]
16:38:10Ryz2 (Ryz) joins
16:38:11s-crypt (s-crypt) joins
16:38:16flashfire42 (flashfire42) joins
16:39:25icedice (icedice) joins
16:39:26kiska (kiska) joins
16:40:24HP_Archivist quits [Ping timeout: 252 seconds]
16:50:23c3manu (c3manu) joins
17:31:46<masterX244>damnit... 302 to ignored content gets the 302 dropped from the WARC on grabsitre, too...
17:32:57c3manu quits [Remote host closed the connection]
17:42:15<pokechu22>That's weird since archivebot doesn't check ignores on redirect targets at all
17:42:33<pokechu22>(it does apply the no-parent rule to redirect targets, though, but I'm not sure if the redirect is dropped from the WARC in that case)
18:20:24spirit joins
18:20:25dumbgoy_ joins
18:22:29<spirit>pokechu22: artdoxa has finished, right? :)
18:23:42<pokechu22>Yeah, it finished a while ago
18:25:20<spirit>awesome!
18:25:21<pokechu22>It did error on https://www.artdoxa.com/more?page=5753 (after successfully loading https://www.artdoxa.com/more?page=5752) and while it was going through those it was occasionally finding new users (but not new artworks I think). It seems like now https://www.artdoxa.com/more?page=5754 gives an error instead - this might just be it running out of pages or something, not sure.
18:26:15<spirit>it's totally fine if some pages were not archived. the contact initially did not think about archiving it at all so this is fantastic
18:26:17<spirit>thank you so much!
18:26:34<spirit>is there a way to see the list of warcs?
18:26:47<pokechu22>an example of a user it found is https://www.artdoxa.com/e-amsalk, who has no submitted artworks but did have favorites (which I guess it picked up from that list)
18:27:09<pokechu22>Yeah, at https://archive.fart.website/archivebot/viewer/job/eofoo - it came out to around 200GB
18:29:00<spirit>hm, i had estimated about 3 times that. is there a way to get a log of all urls contained in the warcs without downloading them?
18:29:53<@JAA>spirit: The -meta.warc.gz contains the complete log of the job, including ignored URLs.
18:30:03<spirit>excellent
18:30:29<@JAA>For more details, like MIME type, response size (in the general case), etc., you'd want each WARC's CDX.
18:30:41<pokechu22>The meta-warc is basically just a (gz-compressed) text file in that case; you can read it with zless
18:32:49decky_e quits [Ping timeout: 265 seconds]
18:33:02<pokechu22>Looks like amazon urls do have a length in the meta-warc but the artdoxa urls don't (probably because they don't set a content-length header)
18:33:46<spirit>i'll just take a brief look at the number of URLs later, if that matches expectations roughly, then this is good enough
18:33:48<spirit>thanks again!
18:33:48<pokechu22>JAA: is there a case where the meta-warc wouldn't contain the MIME type? I see `Length: unspecified [text/html; charset=utf-8]` in it currently which seems to imply that it has one even when it doesn't have the length
18:33:56decky_e (decky_e) joins
18:34:31<pokechu22>It might be worth checking if every thumbnail also had a full-sized image saved and vice versa; that'd be a good heuristic to make sure everything's saved
18:35:19<@JAA>pokechu22: Hmm, yeah, I guess the MIME type should always be there, true.
18:35:43<@rewby|backup>Heyo, I know people've been trying to get me to fix target stuff. Can anyone give me a summary? ( arkiver, datechnoman , JAA)
18:36:37<pokechu22>I guess also, if you estimated the expected total size by using the average size of recently uploaded images, it's probably the case that older images are generally smaller
18:36:56<pokechu22>though I'm not sure if that'd make a 400GB difference
18:37:02<spirit>pokechu22: yeah, that was my thinking too with the corresponding images
18:37:13<spirit>it was a *really* rough estimate :)
18:37:35<spirit>iirc i sampled ~10 full size images, might have been huge ones by random chance
18:37:50<@JAA>rewby: The targets behind Reddit and #// are struggling to keep up with the increased load from historical Reddit data, and some targets on Imgur have been erroring consistently (though that might just be the SBs).
18:38:40<@rewby|backup>JAA: Ack. First prio is dealing with #// and reddit. I need to shift some stuff around. I was handed back one of the target servers a day or two ago. I need to put it back online.
18:40:59<@JAA>Lovely :-)
18:41:02imer quits [Client Quit]
18:42:48imer (imer) joins
18:57:15railen63 quits [Remote host closed the connection]
18:57:28railen63 joins
19:08:51HP_Archivist (HP_Archivist) joins
20:05:10<spirit>looks good. there should be ~130k artworks and grabbed were ~125k fullsize artworks. nice!
20:16:15icedice quits [Ping timeout: 265 seconds]
20:16:48icedice (icedice) joins
21:00:51spirit quits [Client Quit]
21:05:08decky_e quits [Ping timeout: 252 seconds]
21:05:34decky_e (decky_e) joins
21:09:33decky_e quits [Remote host closed the connection]
21:10:55hitgrr8 quits [Client Quit]
21:23:04<icedice>Is #// Imgur brute force?
21:27:12<pokechu22>#// is a general URLs project, I think? Not sure of the details. The imgur brute force is discussed in #imgone to my understanding
21:32:26bf_ quits [Ping timeout: 252 seconds]
21:33:58<@JAA>Correct, #// mostly handles external URLs discovered in other projects, e.g. Reddit and, yes, Imgur (image descriptions etc.).
21:34:28<@JAA>*Everything* Imgur-related is in #imgone and should stay there.
22:05:02fullpwn quits [Remote host closed the connection]
22:06:15fullpwn joins
22:06:41luckcolors_ quits [Quit: No Ping reply in 180 seconds.]
22:08:15luckcolors (luckcolors) joins
22:15:44TastyWiener95 quits [Client Quit]
22:30:37icedice quits [Ping timeout: 265 seconds]
22:46:47HP_Archivist quits [Client Quit]
22:55:48icedice (icedice) joins
22:56:28<icedice>All right, gotcha
22:57:50icedice quits [Client Quit]
22:59:11icedice (icedice) joins
23:01:28bf_ joins
23:06:45sonick (sonick) joins
23:15:29yts98 leaves
23:15:31yts98 joins
23:19:41bf_ quits [Ping timeout: 252 seconds]
23:26:14decky_e (decky_e) joins
23:28:51TastyWiener95 (TastyWiener95) joins
23:32:12AmAnd0A quits [Read error: Connection reset by peer]
23:32:28AmAnd0A joins