#webroasting log for 2023-10-02

Home Search Previous day Next day

00:09:57		levomi quits [Ping timeout: 265 seconds]
00:26:05	<project10>	arkiver: do you mean, when was the service established?
00:32:58	<project10>	seems to be ~2001 based on the earliest pages I can find based on last-mod dates. The service may have existed under another incarnation, with Telecom France, since 1996 (less certainty on that)
00:33:44	<thuban>	there are definitely wanadoo.fr user pages from the late 90s, still trying to munge ia cdx data to find the earliest
00:39:16	<@arkiver>	thanks!
00:39:21	<@arkiver>	project10: btw, we do have an update
00:39:28	<@arkiver>	i didn't set the new version yet but will do soon
00:43:57	<project10>	Thanks arkiver, I should be updating within the next 2 hours or so
01:12:25	<thuban>	arkiver: oldest known wanadoo captures are late 1998 for both 'perso' and 'pro' (https://web.archive.org/web/19981201033556/http://perso.wanadoo.fr/florent.buttazzoni/urgences.htm; https://web.archive.org/web/19981201035723/http://mairie.wanadoo.fr/f6flv/index.html)
01:23:29		sonick (sonick) joins
04:21:24		BornOn420 quits [Read error: Connection reset by peer]
04:21:55		BornOn420 (BornOn420) joins
05:24:38		threedeeitguy39 quits [Client Quit]
05:27:00		threedeeitguy39 (threedeeitguy) joins
06:33:25	<pokechu22>	https://ecole.pagespro-orange.fr/therese.eveilleau.mairie.assoc.mairie.assoc.ecole.mairie.mairie.mairie.ecole.ecole.assoc.assoc.mairie/ - this doesn't seem to have been cleaned up :\|
06:35:39	<pokechu22>	Hmm, it also tried to retrieve http://orange.et.rose.assoc.assoc.ecole.mairie.assoc.ecole.ecole.ecole.assoc.assoc.assoc.ecole.mairie.ecole.assoc.pagespro-orange.fr/
06:35:47	<pokechu22>	so I don't think these are being cleaned up if they already exist :\|
06:38:28	<@flashfire42>	pokechu22 the to do is going down not up at least
06:42:16		levomi joins
06:46:03	<thuban>	i believe they are getting cleaned up, it's just taking a while
06:46:57	<thuban>	that's to say, they'll get retrieved if they're already in the tracker, but neither of those should queue anything new
06:56:25	<thuban>	(if admins manually purged the tracker and those are new, then yes, problematic, but afaik that hasn't been done...? arkiver only said "this is somewhat annoying to filter out actually")
06:57:55		jacksonchen666 (jacksonchen666) joins
06:59:42	<fireonlive>	33=302 http://hansi.mairie.assoc.ecole.assoc.mairie.ecole.ecole.mairie.mairie.mairie.ecole.ecole.mairie.ecole.assoc.ecole.ecole.ecole.assoc.assoc.ecole.ecole.ecole.assoc.mairie.ecole.ecole.ecole.ecole.mairie.assoc.pagespro-orange.fr/toiles/un%20long%20dimanche%202.jpg
06:59:42	<fireonlive>	those french have interesting subdomains
07:07:29	<project10>	backfeed queue is dropping fast
07:21:22	<BornOn420>	Are there any valid URLs left?
07:25:08	<thuban>	a few
07:37:32		jacksonchen666 quits [Client Quit]
08:10:59	<thuban>	per logs and grafana, seems to be all legit again--not sure what that bolus was about
08:15:01		shinji257_ (shinji257) joins
09:16:07		imer quits [Ping timeout: 265 seconds]
09:16:12		yts98 leaves
09:16:29		yts98 joins
09:17:04		imer (imer) joins
09:21:55		imer quits [Ping timeout: 265 seconds]
09:25:20		imer (imer) joins
09:29:16		imer quits [Read error: Connection reset by peer]
09:48:19		imer (imer) joins
09:51:55		imer quits [Read error: Connection reset by peer]
09:52:45		imer (imer) joins
09:57:56		imer quits [Ping timeout: 252 seconds]
09:59:41		kallemarc joins
10:05:03		imer (imer) joins
10:18:11	<kallemarc>	Will new tasks be added, or can I switch to another project myself?
10:19:15	<thuban>	kallemarc: new tasks are being added as links are discovered, but we have enough workers that it's probably safe for you to switch
10:19:37		imer quits [Read error: Connection reset by peer]
10:20:26	<kallemarc>	(y)
10:20:30		imer (imer) joins
10:23:25		kallemarc leaves
10:31:13	<plcp>	\ô/
10:34:16	<thuban>	indeed
10:35:22	<thuban>	the ridiculous eta on the remaining claims makes me wonder whether they're held by workers that have been b&, but that's easily remedied
10:39:09	<BornOn420>	still 17.7M out? seems like a lot
10:45:47		imer quits [Ping timeout: 252 seconds]
10:54:01		imer (imer) joins
11:00:32	<nstrom\|m>	Huh yeah we got through a lot last night. Not sure if auto reclaim is on or if we need someone to manually move out to redo
11:05:49		Exorcism quits [Remote host closed the connection]
11:06:58		Exorcism (exorcism) joins
11:35:39	<nstrom\|m>	arkiver ^
12:24:47		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
12:35:43		Exorcism quits [Remote host closed the connection]
12:36:28		Exorcism (exorcism) joins
13:41:43	<@arkiver>	pokechu22: thuban: yes, this is likely because a ton of these URLs were queued previously and they were offload in lists "at the end"
13:42:02	<@arkiver>	which were queued in and quickly moved out, but not fats enough for some to end up with warriors
13:42:29	<@arkiver>	to try to explain
13:43:02	<@arkiver>	URLs are queued to a queue. they are mixed in in this set. the set has a maximum size set, and we take items out of this set to store on disk in 'offloaded lists'
13:43:29	<@arkiver>	so, when we saw assoc.marie.assoc.ecole... etc., it shows there had already been a ton of round of queued of this, the loop went on pretty long
13:44:11	<@arkiver>	the 'top of the iceberg' was only visible through what was now in redis, while the majority was in offloaded lists which would only be loaded in again much later
13:44:24		shinji257_ leaves
13:44:34	<@arkiver>	when those were eventually loaded in, we got through them fast as most of the stuff was taken out again
13:45:27	<thuban>	huh, interesting
13:45:50	<@arkiver>	i'll do some checks here and there to make sure all went well
13:45:56	<@arkiver>	claims are being requeued now
13:46:16	<thuban>	cool cool
13:46:59	<@arkiver>	paused as we requeue everything
13:48:54	<thuban>	any idea why things ground to a halt so completely? (naïvely i would have expected banned workers to fail their items due to connect timeouts even if the tracker didn't have a time-to-live configured)
13:52:26	<@arkiver>	not sure what you mean
13:52:34	<@arkiver>	because there are no items left to do
13:52:40	<@arkiver>	but you probably mean something else
13:52:55	<thuban>	i mean why weren't the out items being returned
13:54:03	<@arkiver>	because i never set for that to happen
13:54:09	<@arkiver>	you mean reclaimed?
13:54:12	<@arkiver>	or
13:54:20	<@arkiver>	those that were not returned failed for some reason
13:58:00	<thuban>	i thought that items with connection issues would be aborted by their workers (without the tracker needing to reclaim them)
13:59:52	<@arkiver>	thuban: no, items are in claims when they are claimed. we can enable some auto-reclaiming with a timeout, or not
13:59:56	<@arkiver>	that was not set in this case
14:01:04	<thuban>	huh, i see. so aborting an item doesn't report anything to the tracker, it just punts and assumes the tracker will figure it out later?
14:03:06	<@arkiver>	no
14:03:11	<@arkiver>	yes
14:03:18	<@arkiver>	the tracker doesn't get informed about aborts
14:03:54	<thuban>	makes sense i guess, since broken items may require other manual intervention like code changes
14:05:47	<thuban>	ty for explaining
14:06:30	<@arkiver>	thanks!
14:06:45	<@arkiver>	as for if it makes sense - i don't know, it's just how it currently is done, not with a very good reason in mind
14:07:16	<@arkiver>	we could probably think of a reason why we would want to do it the current way, but there is not good reason we do it this way - it's just being done this way
14:07:31	<thuban>	that also makes sense :P
14:39:10	<project10>	tracker 500s, that's new
15:40:41		Flo99 joins
15:44:11		magmaus3 quits [Quit: Ping timeout (120 seconds)]
15:44:49		magmaus3 (magmaus3) joins
15:58:26		BornOn420 quits [Client Quit]
16:16:33		Exorcism quits [Remote host closed the connection]
16:17:19		Exorcism (exorcism) joins
16:27:15		sonick quits [Client Quit]
16:44:11	<pokechu22>	Archiving item url:http://leclairefontaine.pagesperso-orange.fr/1/'http://perso.orange.fr/leclairefontaine/cariboost1/'
16:45:06	<pokechu22>	doesn't seem to exist at all, not sure where that came from
17:06:53	<@arkiver>	pokechu22: yeah i think we'll eventually be left with these type of problematic URLs
17:06:58	<@arkiver>	i'll go through them soon
17:43:23		BornOn420 (BornOn420) joins
17:49:02		thuban quits [Read error: Connection reset by peer]
17:49:35		thuban joins
18:21:44		@flashfire42 quits [Ping timeout: 252 seconds]
18:22:17		kiska quits [Ping timeout: 265 seconds]
18:29:17		flashfire42 joins
18:30:21		kiska (kiska) joins
18:36:05		fireonlive quits [Quit: Connection gently closed by peer]
18:37:00		fireonlive (fireonlive) joins
18:51:31		project10 quits [Remote host closed the connection]
18:53:19		project10 (project10) joins
19:01:31		fireonlive quits [Client Quit]
19:02:24		fireonlive (fireonlive) joins
19:31:53		Exorcism quits [Remote host closed the connection]
19:31:53		yts98 leaves
19:32:10		yts98 joins
19:32:41		Exorcism (exorcism) joins
20:02:31		flashfire42 is now authenticated as flashfire42
20:02:31		@ChanServ sets mode: +o flashfire42
20:02:38	<@flashfire42>	https://server8.kiska.pw/uploads/0d2d5e1feae4eede/image.png
20:02:46	<@flashfire42>	I am no expert but its not meant to do that
20:33:59	<@arkiver>	flashfire42: ah
20:39:59	<@arkiver>	fixed
20:40:03	<@arkiver>	and forced the new version
21:14:11	<@arkiver>	DLoader: project10 FYI update is in
21:17:47		kalle joins
21:22:28	<fireonlive>	did they end up raising the rate limits?
21:25:12	<@flashfire42>	https://transfer.archivete.am/LNLvH/range.frsommaire.htmurlhttpscpa25.p.txt
21:25:41	<@flashfire42>	arkiver ton of errors just now coming through
21:29:18	<phaeton>	yeah, I'm seeing the same
21:30:00	<flashfire42\|m>	I have to go do school drop off but I’m assuming someone can push a fix
21:33:16		yts98 leaves
21:33:34		yts98 joins
21:44:24		Flo99 quits [Remote host closed the connection]
21:50:19	<@flashfire42>	yeah ok looks like almost all of them are failing arkiver
22:05:29		Exorcism quits [Remote host closed the connection]
22:06:48		Exorcism (exorcism) joins
22:12:31		LukeMax joins
22:12:48	<LukeMax>	anyone else experiencing download problems from ppo?
22:18:18	<pokechu22>	If you're seeing "Exception: Unknown item" that's happening to me too
22:42:34		Exorcism quits [Remote host closed the connection]
22:43:21		Exorcism (exorcism) joins
22:47:28	<@flashfire42>	pokes arkiver
22:49:37	<@arkiver>	fix coming
22:49:50	<@flashfire42>	neat
22:56:06	<@arkiver>	fixed
23:03:41	<LukeMax>	yeah its exception unknown item
23:04:17	<@arkiver>	it's fixed with latest version
23:04:39	<LukeMax>	ok ill try it
23:04:49	<LukeMax>	wait what latest version
23:07:15	<LukeMax>	nvm
23:07:47	<LukeMax>	thank you perso works
23:07:59		LukeMax quits [Remote host closed the connection]
23:26:53		Exorcism quits [Remote host closed the connection]
23:27:39		Exorcism (exorcism) joins
23:50:18		kalle quits [Remote host closed the connection]

Home Search Previous day Next day