00:00:25<thuban>arkiver, pokechu22: here's my list of 156864 raw orange.fr urls: https://transfer.archivete.am/bE5jI/orangefr_raw.txt.zst
00:01:23<pokechu22>Will look at this shortly, thanks
00:01:44Naruyoko5 quits [Read error: Connection reset by peer]
00:01:50<thuban>here's my list of 159650 'cleaned' urls (where i cleaned up whitespace, handled transformations like monsite.orange.fr/<slug> -> <slug>.monsite-orange.fr, and otherwise took my best guess at anything malformed): https://transfer.archivete.am/SB82D/orangefr_scrubbed.txt.zst
00:02:49<thuban>and here's a list of 61667 'bad' urls (which is just the raw list minus the cleaned list): https://transfer.archivete.am/vCBTZ/orangefr_badraw.txt.zst
00:03:02imer4 quits [Ping timeout: 252 seconds]
00:05:34<thuban>(the cleaned list is longer than the raw list because i (a) generated <site> if i only had <site>/path.ext, to avoid no-parent issues, and (b) generated multiple guesses for some malformed urls where i had only the username)
00:06:36<pokechu22>And this is based on scraping a list that they provide, right? So most of the pages should exist?
00:07:06<thuban>yes; no
00:07:25<thuban>unfortunately a lot of the pages in the directory are down
00:07:57<pokechu22>I'm a bit worried because two of my !a < list jobs for monsite-orange.fr both seem to have resulted in the site banning it (possibly because of too many requests to nonexistent pages, but maybe just because it was running too fast) which is annoying...
00:08:19<thuban>the api had 'accessible' and 'status' parameters; i am not sure what the distinction is and chose the values that gave me the largest list
00:08:22<thuban>oof :/
00:09:29<thuban>i can change those params and get you a shorter list to prioritize, if that would help
00:09:31<pokechu22>An additional anoyance is that each page that doesn't exist redirects twice (https://yachtlink.pagesperso-orange.fr/ -> https://r.orange.fr/r/Oerreur_404 -> https://e.orange.fr/error404.html)
00:09:43<thuban>ye
00:09:58<pokechu22>Sure, that'd be helpful as it'd be pretty easy to run that list first and then run the remaining stuff not on that list
00:11:08<thuban>ok, will do. probably take a few hours
00:11:16<pokechu22>Alright
00:15:09<thuban>list is going to be about 1/3 the size of the big one
00:15:30imer1 (imer) joins
00:18:59imer quits [Ping timeout: 252 seconds]
00:18:59imer1 is now known as imer
00:31:36<pokechu22>thuban: how exactly did you make the badraw list? http://acf.luis.pagesperso-orange.fr/ is valid for instance (it just doesn't work with https)
00:35:09<thuban>literally just raw minus scrubbed. that site had a trailing slash in the raw list ("acf.luis.pagesperso-orange.fr/"); i removed those if they were directly on the domain (for deduping purposes)
00:36:48<pokechu22>Oh, not links that seemed like complete junk
00:38:24<thuban>yeah, the idea was mostly to have the originals for discoverability (esp for the changed domains)
00:42:22imer6 (imer) joins
00:45:32qwertyasdfuiopghjkl quits [Client Quit]
00:46:08imer quits [Ping timeout: 265 seconds]
00:46:09imer6 is now known as imer
00:48:43dumbgoy_ joins
00:51:59dumbgoy quits [Ping timeout: 252 seconds]
00:55:30fangfufu joins
00:55:32fangfufu quits [Remote host closed the connection]
00:56:40fangfufu joins
01:04:12imer0 (imer) joins
01:05:33nexusxe quits [Client Quit]
01:07:56imer quits [Ping timeout: 252 seconds]
01:07:56imer0 is now known as imer
01:21:20<pokechu22>thuban: some (as in several thousand?) of the ones you have aren't in my list at all, which means there's no archive.org coverage. Unfortunately my organization is a mess and I now have 2GB of lists of URLs so it'll be a bit before I can actually run stuff though... and make sure I'm actually looking at all of this correctly :|
01:29:52<thuban>that's ok, take your time! the priority list will probably be done in another 30-45 minutes, if that helps
01:30:58<nicolas17>my VPS has 659GB unused bandwidth for the rest of the month
01:51:54miki_57 quits [Client Quit]
01:59:02<fireonlive>DogsRNice: trouble with factorio? or just proactive?
02:00:01<DogsRNice>no idea i just noticed someone was doing the factorio sites and didnt do the forums
02:02:06sec^nd quits [Ping timeout: 245 seconds]
02:04:18<fireonlive>ah ok
02:13:20imer6 (imer) joins
02:16:41imer quits [Ping timeout: 252 seconds]
02:16:41imer6 is now known as imer
02:21:36imer1 (imer) joins
02:24:53nic quits [Quit: The Lounge - https://thelounge.chat]
02:25:12<pokechu22>I skipped the forums because they're somewhat large - it'd make sense to do them later but I'd rather not start a multi-day proactive thing just yet
02:25:29imer quits [Ping timeout: 252 seconds]
02:25:29imer1 is now known as imer
02:25:44<pokechu22>If we want to do one it's fine but eh
02:26:21<thuban>arkiver, pokechu22: here are my 'priority' lists (scraped with accessible=true and status=active; sites should all be online). these lists are a strict subset of those previously posted
02:26:44<thuban>48298 raw urls: https://transfer.archivete.am/eabTo/orangefr_online_raw.txt.zst
02:27:12<thuban>49007 cleaned urls: https://transfer.archivete.am/7QXLi/orangefr_online_scrubbed.txt.zst
02:29:09nic (nic) joins
02:30:38<thuban>the 'bad' urls all either had trailing slashes or were of the old *.(orange|wanadoo).fr format with quasi-redirects. trailing slashes are transparent for our purposes, so instead of the entire 'bad' list here are just the redirects
02:31:01<thuban>7440 redirect urls: https://transfer.archivete.am/LUa27/orangefr_online_redirect.txt.zst
02:32:03<pokechu22>I'm going to run this with entries like 08.pagesperso-orange.fr/odp/index.htm stripped out (leaving only 08.pagesperso-orange.fr) for now since having both is the kind of situation that can lead to really weird no-parent behavior
02:32:27<thuban>hmm, ok
02:32:28<pokechu22>AB also needs either http:// or https:// before each URL; I'll add http to ones with multiple dots and https to ones without
02:33:09<thuban>ah, i never remember that. do you want me to do that / any other processing?
02:33:11dumbgoy_ quits [Ping timeout: 252 seconds]
02:33:26<pokechu22>I can handle it - I've already built some jank regexes for it :)
02:33:46<thuban>ok!
02:34:25<pokechu22>first prefix everything with http:// and then replace ^http://([^/\.]+\.[^/\.]+-orange\.fr)$ with https://\1
02:37:09sambo joins
02:38:52sambo quits [Remote host closed the connection]
02:43:58imer2 (imer) joins
02:47:56imer quits [Ping timeout: 265 seconds]
02:47:56imer2 is now known as imer
03:10:04Larsenv76 is now known as Larsenv
03:19:59sec^nd (second) joins
03:35:10sec^nd quits [Remote host closed the connection]
03:35:49sec^nd (second) joins
03:46:43sec^nd quits [Remote host closed the connection]
03:58:01sec^nd (second) joins
04:00:50dumbgoy_ joins
04:15:36lukash96 joins
04:15:54lukash9 quits [Ping timeout: 245 seconds]
04:15:54lukash96 is now known as lukash9
04:24:58DogsRNice quits [Read error: Connection reset by peer]
04:43:40<pokechu22>I don't think the orange stuff is going to finish on time - running at more than 1 page/second seemed to result in blocks, and after going through about 4.5K seed URLs of 45K URLs we're already at ~125K queued or a day and a half. So at that rate it'd be 15 days to finish, which we don't have. And that's just for this smaller list. Any ideas about how to handle that?
04:49:09nic0 (nic) joins
04:52:20nic quits [Ping timeout: 252 seconds]
04:52:20nic0 is now known as nic
04:58:54<thuban>i guess i would suggest either seeing if you can reduce the delay (i know it's different infra, but i was able to do all my scraping with 0.5s delay and didn't get banned) or trying to parallelize the load across multiple pipelines
05:00:25<pokechu22>If .5s is fine I can do that - it was originally .25-.375 at con=1
05:00:42<pokechu22>I'm not sure how long they ban for though which makes me nervous about experimenting
05:02:24<thuban>as i said, different infra (and it involved a token which i just yoinked from the browser), so can't be sure based just on that. could you try testing with a sacrificial ip, like a home connection?
05:03:07<pokechu22>I guess I could - though I don't have quite the same infra either
05:03:14<thuban>i mean on their end
05:03:44<thuban>i.e., the directory api being different from the actual page servers
05:04:21<pokechu22>What host is the directory API on?
05:04:47<thuban>api.annuaire-pp.orange.fr
05:06:29<pokechu22>ah, yeah, might have different rate-limiting then :|
05:09:11<thuban>multiple pipelines is probably easiest/safest, but idk what wrangling them is like
05:09:40<thuban>(alas, this is really a job for #Y...)
05:11:49<pokechu22>Theoretically I could just run e.g. all of the pagespro-orange.fr jobs on one pipeline, pagesperso-orange.fr on a second, and moinsite-orange.fr on a third (that's trivial by just using different lists), and that's what I originally planned on doing, but it's not easy to do that for in-progress jobs
05:12:41<pokechu22>I'm going to try running pagespro-orange.fr locally since there's no job for that yet (beyond the ones you have in your list)
05:29:21<pokechu22>The other thing that would help is if we could just skip the 2-step redirect chain, but there's no way to apply ignores onto redirect targets so it's going to redownload https://r.orange.fr/r/Oerreur_404 and https://e.orange.fr/error404.html every time it hits a 404 :|
05:51:11katocala quits [Ping timeout: 252 seconds]
05:51:23katocala joins
05:54:16Island quits [Read error: Connection reset by peer]
05:58:53Exorcism (exorcism) joins
06:28:09BigBrain quits [Remote host closed the connection]
06:29:48BigBrain (bigbrain) joins
07:00:08nfriedly quits [Remote host closed the connection]
07:01:16qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
07:06:42Unholy2361316618085 (Unholy2361) joins
07:09:54nulldata quits [Ping timeout: 265 seconds]
07:12:51nulldata (nulldata) joins
07:13:37Krume (Krume) joins
07:32:45<AntoninDelFabbro|m>pokechu22: If I can help, I will!
07:54:06Arcorann (Arcorann) joins
07:54:51dazld quits [Ping timeout: 265 seconds]
08:21:20nulldata quits [Ping timeout: 252 seconds]
08:24:38nulldata (nulldata) joins
09:04:10Exorcism quits [Remote host closed the connection]
09:07:31Exorcism (exorcism) joins
09:22:42parfait quits [Client Quit]
09:23:29appledash quits [Ping timeout: 252 seconds]
09:25:31appledash joins
09:26:00BlueMaxima quits [Read error: Connection reset by peer]
09:47:48erkinalp joins
10:00:00nfriedly joins
10:00:01railen63 quits [Remote host closed the connection]
10:00:18railen63 joins
10:13:31miki_57 joins
10:14:02miki_57 quits [Max SendQ exceeded]
10:14:05miki_57 joins
10:14:36miki_57 quits [Max SendQ exceeded]
10:14:39miki_57 joins
10:15:10miki_57 quits [Max SendQ exceeded]
10:15:13miki_57 joins
10:15:44miki_57 quits [Max SendQ exceeded]
10:15:47miki_57 joins
10:16:19miki_57 quits [Max SendQ exceeded]
10:16:21miki_57 joins
10:16:52miki_57 quits [Max SendQ exceeded]
10:16:54miki_57 joins
10:17:26miki_57 quits [Max SendQ exceeded]
10:17:29miki_57 joins
10:18:00miki_57 quits [Max SendQ exceeded]
10:18:03miki_57 joins
10:18:34miki_57 quits [Max SendQ exceeded]
10:18:37miki_57 joins
10:19:08miki_57 quits [Max SendQ exceeded]
10:19:10miki_57 joins
10:19:42miki_57 quits [Max SendQ exceeded]
10:19:44miki_57 joins
10:20:16miki_57 quits [Max SendQ exceeded]
10:20:19miki_57 joins
10:20:50miki_57 quits [Max SendQ exceeded]
10:20:53miki_57 joins
10:21:24miki_57 quits [Max SendQ exceeded]
10:21:27miki_57 joins
10:21:58miki_57 quits [Max SendQ exceeded]
10:22:01miki_57 joins
10:22:32miki_57 quits [Max SendQ exceeded]
10:22:34miki_57 joins
10:23:06miki_57 quits [Max SendQ exceeded]
10:23:08miki_57 joins
10:23:40miki_57 quits [Max SendQ exceeded]
10:23:42miki_57 joins
10:24:14miki_57 quits [Max SendQ exceeded]
10:24:17miki_57 joins
10:24:48miki_57 quits [Max SendQ exceeded]
10:24:51miki_57 joins
10:25:22miki_57 quits [Max SendQ exceeded]
10:25:25miki_57 joins
10:25:56miki_57 quits [Max SendQ exceeded]
10:25:59miki_57 joins
10:26:30miki_57 quits [Max SendQ exceeded]
10:26:33miki_57 joins
10:26:34Earendil7 quits [Client Quit]
10:27:04miki_57 quits [Max SendQ exceeded]
10:27:07miki_57 joins
10:27:38miki_57 quits [Max SendQ exceeded]
10:27:41miki_57 joins
10:27:59Earendil7 (Earendil7) joins
10:28:12miki_57 quits [Max SendQ exceeded]
10:28:15miki_57 joins
10:28:46miki_57 quits [Max SendQ exceeded]
10:28:49miki_57 joins
10:29:20miki_57 quits [Max SendQ exceeded]
10:29:23miki_57 joins
10:29:29wickedplayer494 quits [Ping timeout: 252 seconds]
10:29:48wickedplayer494 joins
10:29:54miki_57 quits [Max SendQ exceeded]
10:29:57miki_57 joins
10:30:28miki_57 quits [Max SendQ exceeded]
10:30:31miki_57 joins
10:31:02miki_57 quits [Max SendQ exceeded]
10:31:04miki_57 joins
10:31:36miki_57 quits [Max SendQ exceeded]
10:31:39miki_57 joins
10:32:10miki_57 quits [Max SendQ exceeded]
10:32:13miki_57 joins
10:32:44miki_57 quits [Max SendQ exceeded]
10:32:47miki_57 joins
10:33:18miki_57 quits [Max SendQ exceeded]
10:33:21miki_57 joins
10:33:52miki_57 quits [Max SendQ exceeded]
10:33:55miki_57 joins
10:34:26miki_57 quits [Max SendQ exceeded]
10:34:29miki_57 joins
10:35:00miki_57 quits [Max SendQ exceeded]
10:35:03miki_57 joins
10:35:34miki_57 quits [Max SendQ exceeded]
10:35:36miki_57 joins
10:36:08miki_57 quits [Max SendQ exceeded]
10:36:11miki_57 joins
10:36:42miki_57 quits [Max SendQ exceeded]
10:36:45miki_57 joins
10:37:16miki_57 quits [Max SendQ exceeded]
10:37:18miki_57 joins
10:37:50miki_57 quits [Max SendQ exceeded]
10:37:53miki_57 joins
10:38:24miki_57 quits [Max SendQ exceeded]
10:38:27miki_57 joins
10:38:58miki_57 quits [Max SendQ exceeded]
10:39:01miki_57 joins
11:23:01<erkinalp>pokechu22: wowturkey still down
11:23:12railen69 joins
11:24:08wickedplayer494 quits [Ping timeout: 265 seconds]
11:24:29wickedplayer494 joins
11:25:36railen63 quits [Ping timeout: 265 seconds]
11:53:52Miki57 joins
11:56:46Earendil7 quits [Client Quit]
11:57:05Earendil7 (Earendil7) joins
12:04:54yo joins
12:05:20yo quits [Remote host closed the connection]
12:11:47Dango360 quits [Ping timeout: 252 seconds]
12:27:48icedice (icedice) joins
12:28:54le0n quits [Ping timeout: 265 seconds]
12:30:53ethan joins
12:31:31ethan quits [Remote host closed the connection]
12:32:20Exorcism quits [Client Quit]
12:33:26Exorcism (exorcism) joins
12:52:06Exorcism quits [Ping timeout: 245 seconds]
12:55:54Exorcism (exorcism) joins
13:08:45Icyelut|2 quits [Quit: bye]
13:29:37nic quits [Client Quit]
13:33:15nic (nic) joins
13:42:18benjins2 joins
14:02:23bf_ joins
14:07:50Arcorann quits [Ping timeout: 252 seconds]
14:17:51erkinalp quits [Remote host closed the connection]
14:18:21le0n (le0n) joins
14:46:36Island joins
14:51:48miki_57 quits [Client Quit]
15:07:35khaosfox quits [Quit: leaving]
15:10:51LeGoupil joins
15:18:03Core4657 joins
15:18:07Core4657 quits [Remote host closed the connection]
16:14:02kiryu quits [Read error: Connection reset by peer]
16:37:35<pokechu22>So unfortunately, 500-500 delay results in a ban unfortunately. Happened to me on my residential connection overnight and happened to one of the jobs (not the priority one) I changed yesterday too. I guess the 1-second delay is the only safe one :|
16:48:27<pokechu22>I did, however, build a list of stuff under pagespro-orange.fr that's valid
17:03:55<fireonlive>09:58:59 AM -+rss- Fig Has Joined AWS: https://fig.io/blog/post/fig-joins-aws https://news.ycombinator.com/item?id=37296401
17:07:27aninternettroll quits [Remote host closed the connection]
17:10:19aninternettroll (aninternettroll) joins
17:26:06railen69 quits [Remote host closed the connection]
17:29:21railen63 joins
17:42:21<@JAA>So, what channel do we use for ZOWA?
17:43:06<@JAA>The ideas from yesterday: zowch z-oww-a nowa zowwa zowaah zowie (plus one that shall not be named)
18:31:47DogsRNice joins
18:36:43AmAnd0A quits [Ping timeout: 265 seconds]
18:37:21AmAnd0A joins
18:46:48wyatt8740 quits [Remote host closed the connection]
18:56:26yts98 leaves
18:56:31yts98 joins
18:59:04Unholy2361316618085 quits [Remote host closed the connection]
19:01:29Unholy2361316618085 (Unholy2361) joins
19:04:34AmAnd0A quits [Read error: Connection reset by peer]
19:04:47AmAnd0A joins
19:16:06<fireonlive>ooh! ooh! the shall not be named one!
19:17:47<fireonlive>in absence of that, zowch
19:19:51<DigitalDragons>+1 zowch
19:20:01Exorcism quits [Ping timeout: 245 seconds]
19:20:26sec^nd quits [Ping timeout: 245 seconds]
19:20:51BigBrain quits [Ping timeout: 245 seconds]
19:21:46<h2ibot>FireonLive edited Current Projects (+121, add ZOWA): https://wiki.archiveteam.org/?diff=50608&oldid=50551
19:22:04Exorcism (exorcism) joins
19:22:44BigBrain (bigbrain) joins
19:24:08<fireonlive>one day i'll go though and make 300,000 edits with the https://www.mediawiki.org/wiki/Help:Magic_words#formatdate thing
19:24:21<fireonlive>too bad there doesn't seem to be one for time
19:25:05<fireonlive>hmmm
19:25:44sec^nd (second) joins
19:26:09<fireonlive>yeah sadly {{#formatdate:2023-09-29T03:00Z}} doesn't appear to work
19:27:02LeGoupil quits [Client Quit]
19:28:47<h2ibot>FireonLive edited Current Projects (+16, use formatdate for ZOWA, more to come): https://wiki.archiveteam.org/?diff=50609&oldid=50608
19:31:33<fireonlive>i found {{#time}} but what the fuck is this: 2023-09-29UTC03:000
19:32:10<fireonlive>i'll look more into it later :p
19:32:49<fireonlive>mediawiki is really something
19:39:50Exorcism quits [Client Quit]
19:47:47<@JAA>#time doesn't seem to account for user preferences.
19:49:50<h2ibot>Yts98 edited ZOWA (+24, Update project status): https://wiki.archiveteam.org/?diff=50610&oldid=50195
19:51:50leo60228 quits [Quit: ZNC 1.8.2 - https://znc.in]
19:52:12leo60228 (leo60228) joins
19:54:25<fireonlive>ah, darn
19:54:39<fireonlive>thanks yts98 :)
20:01:01<@JAA>Perhaps we should just have a simple template to render datetimes in a consistent manner. {{datetime|2023-08-28|22:00|CEST|+2}} → {{#formatdate:2023-08-28}} 22:00 CEST (UTC+2) or similar
20:01:39<fireonlive>i'd be up for something that's consistent
20:01:56<@JAA>The last two parameters could be optional, and the default would be UTC.
20:02:23<fireonlive>people wildly get confused with named timezones though so perhaps we could leave that out
20:02:38<fireonlive>EST vs EDT, even big streamers scheduling things
20:03:05<fireonlive>'hey you know it's DT over there now.. so is happening at 7 or 8?'
20:03:15<fireonlive>seems to come up a lot lol
20:04:36<@JAA>'ET'
20:04:42<@JAA>(ノಥ益ಥ)ノ彡┻━┻
20:04:53<fireonlive>too bad we can't just link them all to something like (js-ridden) https://www.timeanddate.com/worldclock/converter.html?iso=20230831T030000&p1=1440
20:04:55<fireonlive>:P
20:05:06<fireonlive>'type where you are and see what it is'
20:05:29<fireonlive>https://mkx9delh5a.execute-api.ca-central-1.amazonaws.com/uploads/e5654758afc913ec/image.png (i added Ottawa in this example)
20:07:39<fireonlive>the frowny faces are because it's mainly used for figuring out when to meet i guess
20:08:42<fireonlive>JAA: can we pls kill DST everywhere tks
20:08:45<fireonlive>T_T
20:09:00<fireonlive>inb4 perma-dst everywhere because i guess that sounds nicer to politicans
20:16:54<@JAA>Yes please
20:19:40<fireonlive>as long as it's gone i'll accept it
20:19:49<fireonlive>:D
20:20:04<fireonlive>(the DST vs ST 'final time' debate)
20:20:30<@JAA>Same, I don't even care anymore which one is chosen, just get rid of the stupid transition twice per year.
20:21:33<fireonlive>for sure
20:32:09<thuban>pokechu22: that sucks. multiple pipelines, then? i know you can't really do that with the jobs already in progress, but i don't think duplicating some of the work would hurt
20:32:13<thuban>(i also don't see any reason it needs to be done by domain--seems better to just split evenly)
20:34:35<pokechu22>Yeah, there's no real reason to split by domain, other than how I was building up my own lists originally. If it were an !a < list job for example.com/foo example.com/bar example.org/baz example.org/quux it would make sense to split example.com and example.org into two jobs to fully avoid !a < list issues, but we've already got multiple subdomains and multiple domains doesn't
20:34:38<pokechu22>make much of a difference
20:35:08<pokechu22>Unfortunately there are only 6 different sets of pipelines with distinct IPs, of which 3 are banned and 2 currently have jobs running on them
20:35:21<thuban>oof
20:35:51<pokechu22>the remaining one is also basically always full since it effectively only has 4 slots at the moment and they're usually filled with long-running jobs :|
20:36:41<pokechu22>Hopefully the bans don't last too long and we can get the other ones back into use
20:36:46<thuban>:I
20:36:48<thuban>yeah
20:38:05<thuban>at least we'll definitely get through all the front pages from the priority list (and probably their assets as well)
20:40:11<pokechu22>Yeah
20:53:17<vokunal|m>+1 zowch
20:53:44Unholy2361316618085 quits [Ping timeout: 252 seconds]
21:03:47<nicolas17>what's ZOWA
21:04:03<@JAA>https://wiki.archiveteam.org/index.php/ZOWA
21:04:41<nicolas17>oh yikes, video... any idea of size?
21:05:51Unholy2361316618085 (Unholy2361) joins
21:06:50<@JAA>#zowch for ZOWA
21:07:28<nicolas17>anyone updating channel on wiki?
21:07:38<appledash>Does archiveteam accept donations? if so, I hope they all go to the guy responsible for coming up with the channel names
21:07:42<appledash>he's got a hard jo
21:07:43<appledash>b
21:07:43<flashfire42>is the telegram thing still going nuts?
21:08:07<flashfire42>Like is the redoing everything thing still active or is it back to normal?
21:09:19<fireonlive>so many OWASP channels
21:11:13<@JAA>appledash: https://wiki.archiveteam.org/index.php/Donate
21:12:22<appledash>wtf, the fact that someone who has only donated $40 is top 15 is a travesty
21:12:28<appledash>Remind me to contribute when I gat paid
21:13:55<nstrom|m>Can someone fill me in on the owasp drama? Maybe in -ot
21:14:37<flashfire42>I have no fucking idea I just jumped on the bandwagon
21:14:50<@JAA>appledash: It has only been in use and publicised since a couple months ago during the Imgur project, although the page has existed for years.
21:14:59<appledash>Ahhh
21:16:30<h2ibot>Switchnode edited ZOWA (+5, add irc channel): https://wiki.archiveteam.org/?diff=50611&oldid=50610
21:16:53<pokechu22>I queued one more job for orange.fr URLs that aren't found on archive.org at all, though whether or not the pipeline slot will free up remains to be seen
21:33:10<h2ibot>JustAnotherArchivist edited ZOWA (+56, Reference for shutdown): https://wiki.archiveteam.org/?diff=50612&oldid=50611
21:44:40<nicolas17>rewby: how are the targets and IA doing? do you have a giant backlog in temporary storage again?
21:50:16<@rewby>nicolas17: I have about 31.2TiB in temp storage. And another 200 or so TiB left on it.
21:50:30<@rewby>Targets are fine at the moment]
21:50:47<@rewby>It's just that all active projects managed to hit bugs all at once as far as I can tell
21:51:32<@rewby>Based on what I've read (and I'm not an authority here): shreddit is paused due to some concern around image capture maybe not working right
21:51:39<@rewby>deadcat is just mostly done
21:51:57<nicolas17>oh, I thought shreddit was still paused to give capacity to gfycat/xuite
21:51:59<@rewby>(and waiting for an update for the last few items)
21:52:03<@rewby>xuite is just slow
21:52:14<@rewby>(something something asia is a pain to get data in and out of)
21:52:29<@rewby>If you have ipv6, I think xuite could use your help
21:52:49<@rewby>telegram was provided offload capacity but I don't know if it's being used yet
21:53:10<nicolas17>telegram seems to have 0 in todo
21:53:25<@rewby>Actually, tg is slowly returning stuff
21:53:28<@rewby>So looks to be working
21:53:42<@rewby>Uh... what else... urls is still paused
21:54:06<nicolas17>I think a bunch of stuff in tg was stashed away, maybe it needs to be brought back, but idk status, I wasn't even in the channel the last few days
21:54:09<@rewby>Although that's been hooked up to offload too in case arkiver wants to have a go at it (although probably not at full speed to conserve space)
21:54:31<@rewby>And yeah... that's about it?
21:55:24<fireonlive>shreddit was paused while i.reddit.com's new javascript/etc fuckery is checked to ensure the data we save is good
21:55:28<fireonlive>AIUI
21:55:48<nicolas17>if there's "free" capacity we can slightly open the faucet on imgur (:
21:56:03<fireonlive>imgur is slowly deleting images off of the CDN now, per BigBrain
21:56:11<fireonlive>302s are rising from the canary list
21:56:19<@rewby>Ah
21:56:23<@rewby>I'll add it to offload I guess
21:56:28<@rewby>And then it's up to arkiver and JAA to turn that on and off
21:56:34<fireonlive>:) thanks
21:56:50<@rewby>Mind you, I've only got like a quarter of a PiB of space
21:56:58<@rewby>And that has to last us until the IA comes back
21:57:11<nicolas17>are you not uploading anything to IA right now?
21:57:16<@rewby>Not yet
21:57:18<@rewby>Code's not ready for it
21:57:51<vokunal|m>It's nice to see nearly 200M items in queue and realize for once it's only like ~75GiB
21:58:13<nicolas17>vokunal|m: lol, in what project?
21:58:22<fireonlive>xuite if i had to guess
21:58:58<vokunal|m>Imgur. Though is it probably the item size avg bugged after being offline so long?
21:59:10<@rewby>nicolas17: Getting code ready for uploading to IA is a lower prio than actually capturing data atm
21:59:11<thuban>telegram is still running (so items submitted to the bot are still processed), but its backlog was stashed and since other projects are paused it's not receiving items from outlinks (which were the majority of its volume)
21:59:38Megame (Megame) joins
22:00:38<fireonlive>ah
22:00:44<nicolas17>vokunal|m: that math doesn't look right :P
22:01:02<nicolas17>item size is 367 KB
22:01:20BlueMaxima joins
22:02:25BlueMaxima quits [Read error: Connection reset by peer]
22:02:31BlueMaxima joins
22:02:40<flashfire42>arkiver is the deduplication still turned off for telegram?
22:03:49<nicolas17>rewby: imgur has a lot of 'redo' that will probably have low success rate, so we can also regulate speed that way
22:03:59<nicolas17>move some stuff from redo to todo to slow down, ask me to add a bruteforced list to speed up :P
22:04:09<vokunal|m>73TB? I think i divided instead of multiplied
22:05:03<nicolas17>vokunal|m: yes that's the right multiplication, but note a lot of those 200M are retries and will fail
22:12:19<h2ibot>FireonLive edited Current Projects (+27, add IRC channel for ZOWA): https://wiki.archiveteam.org/?diff=50613&oldid=50609
22:12:24ymgve_ joins
22:13:44ymgve quits [Ping timeout: 265 seconds]
22:22:00<@arkiver>flashfire42: yes, i'll turn that on shortly again
22:23:47<flashfire42|m>Probably a good idea
22:28:48<fireonlive>https://wiki.archiveteam.org/index.php/Template:@
22:28:51<fireonlive>interesting template
22:29:20<fireonlive>(it's an image!)
22:29:31<fireonlive>oh, for emails
22:30:31<fireonlive>(well one email :3)
22:31:04<flashfire42|m>I wonder if we will ever find out the reason behind the ingestion issues
22:32:07<flashfire42|m>And are we slowly pushing from the offload storage or is it just sitting quietly?
22:33:52<fireonlive>not uploading to IA from offload atm, code needs to be written (rewby mentioned it above)
22:34:35<@rewby>My plan is to spend some time later this week getting uploading going
22:35:24<h2ibot>FireonLive edited Template:IRC-Hackint (+22, +deleteme in favour of Template:IRC): https://wiki.archiveteam.org/?diff=50614&oldid=41452
22:35:29<fireonlive>i have no idea what i went to wiki.archiveteam.org for initially, but it ended in that
22:41:26<h2ibot>FireonLive edited YouTube (-2, #youtubearchive → on haitus): https://wiki.archiveteam.org/?diff=50615&oldid=50569
22:41:39<fireonlive>it wasn't that either
22:41:43<fireonlive>oh well :D
23:04:29fangfufu quits [Ping timeout: 265 seconds]
23:05:48<thuban>front pages of 'online' orange.fr sites are done :D
23:06:10fangfufu joins
23:07:25<thuban>~8 days' worth of requests remaining in queue, so front page assets at least should just finish before shutdown
23:09:00<fireonlive>awesome
23:09:08<fireonlive>^_^
23:46:51Darken (Darken) joins
23:56:46Darken quits [Remote host closed the connection]