00:14:09<Pedrosso>The wiki states for the docker "--concurrent 1: Process 1 item at a time per container. Although this varies for each project, the maximum recommended value is 5, and the maximum allowed value is 20. Leave this at 1, or check with us on IRC if you are unsure". I don't really understand this. Does it vary between projects or is it always worse to
00:14:09<Pedrosso>have more or something?
00:16:08<@JAA>If you set it too high, the target site might ban you. Ideally, that just means you no longer do meaningful work. In bad cases, it can pollute the archive and lead to missed data if we don't catch it.
00:16:51<pokechu22>It varies between projects
00:17:05<@JAA>So the ideal value is whatever lets you grab as much as possible without getting banned. Rate limits vary by target site, so ideal concurrency varies by project.
00:20:37<Pedrosso>Is there anywhere to check the recommended values, or is asking in their respective chats the most efficient?
00:21:14<@JAA>We (often|usually|sometimes|occasionally) put it in the channel topic.
00:25:05<Pedrosso>I see, thanks
00:27:11<Pedrosso>With docker, is there any reason to not run multiple different projects at once?
00:27:56<@JAA>Nope, many people do exactly that. :-)
00:29:38<Pedrosso>(y) awesome. I (at least temporarily) want to maximize what I can do is all
00:30:45TastyWiener95 quits [Client Quit]
00:31:19<Pedrosso>Oh, as for when it comes to spore, I have found a tool that may assist with finding the items and thus not checking through tons of empty urls however I don't know any javascript. I'll just post the tool here if anyone wants to check it out: https://github.com/Spore-Community/SporeTools/blob/main/SporeDwrApiClient.ts
00:31:43<@JAA>Basically, there are three types of people running workers: a) casual users that just set up the warrior in auto mode once and forget about it; b) powerusers that run multiple projects in parallel to get the most out of existing machinery; c) insane people with large clusters.
00:32:12<@JAA>(Exceptions that don't fall into these categories confirm the rule.)
00:32:58TastyWiener95 (TastyWiener95) joins
00:32:59<Pedrosso>(I've no clue what that saying "exceptions confirm the rule" is supposed to mean but okay)
00:33:23<Pedrosso>idk what a cluser is, is it someone getting like a "cloud" service to do it for them?
00:34:16<Pedrosso>(also someone named idk just got tagged randomly lol)
00:35:09<@JAA>'Exceptions confirm the rule' is a lighthearted way of saying there are counterexamples to a rule but they're few and far between, so the rule's still valid.
00:36:48<@JAA>Cluster ~ significant amount of distributed computing resources. 'Cloud' stuff can be used for that, yeah, but it's not a requirement.
00:37:42<@JAA>It also implies some degree of coordination across the resources, e.g. orchestration.
00:40:19<@JAA>Not sure https://github.com/Spore-Community/SporeTools is of much use for reducing the 1.1 billion number. It's an implementation of Spore's API, but I don't see anything for checking many asset IDs at once.
00:59:40<fireonlive>omegle has shut down https://www.omegle.com/ (https://news.ycombinator.com/item?id=38199355)
01:00:48greckon487 joins
01:01:10<@JAA>Damn
01:01:22greckon487 quits [Remote host closed the connection]
01:02:49<fireonlive>yeah :c
01:09:47<Pedrosso>JAA: What it would do is considerably lower the amount of empty URLs checked, I'd believe. I've been told that all assents use the same IDs but different formats so there will be many, many gaps. Considering the server's stability it may be safer to do something slower with a list in mind. However I've no clue how to get a list of items out of that
01:11:19<@JAA>Pedrosso: If we make the same number of requests, it probably doesn't matter much. And API requests likely take more resources on the Spore server side than service static files.
01:11:39<@JAA>s/service/serving/
01:12:16<h2ibot>FireonLive edited Deathwatch (+307, add Omegle): https://wiki.archiveteam.org/?diff=51110&oldid=51097
01:12:38Wohlstand quits [Client Quit]
01:13:00<Pedrosso>I think I understand. I'll get to checking out the qwark software then. (Hopefully I can understand how to use it lol)
01:24:30etnguyen03 (etnguyen03) joins
01:42:21<@JAA>logs.omegle.com URLs collected from anywhere would be interesting. I already ran the ones from Reddit through AB.
01:46:44icedice2 (icedice) joins
01:47:01Island quits [Remote host closed the connection]
01:47:01TheTechRobo quits [Client Quit]
01:47:01_Dango360 quits [Remote host closed the connection]
01:47:01icedice quits [Remote host closed the connection]
01:47:01TastyWiener95 quits [Client Quit]
01:47:06Island joins
01:47:11_Dango360 joins
01:47:14TastyWiener95 (TastyWiener95) joins
01:47:34TheTechRobo (TheTechRobo) joins
01:49:36icedice2 quits [Client Quit]
01:49:56Island quits [Remote host closed the connection]
01:49:56TheTechRobo quits [Excess Flood]
01:49:56_Dango360 quits [Remote host closed the connection]
01:50:07Island joins
01:50:13_Dango360 joins
01:50:35TheTechRobo (TheTechRobo) joins
01:52:24ragra quits [Remote host closed the connection]
01:52:24TheTechRobo quits [Excess Flood]
01:52:27Pedrosso quits [Remote host closed the connection]
01:52:36Island_ joins
01:52:41Dango360_ joins
01:53:06TheTechRobo (TheTechRobo) joins
01:54:34Pedrosso joins
01:55:54Island quits [Ping timeout: 265 seconds]
01:55:54_Dango360 quits [Ping timeout: 265 seconds]
02:00:15Pedrosso quits [Ping timeout: 265 seconds]
02:02:33etnguyen03 quits [Ping timeout: 272 seconds]
02:05:46etnguyen03 (etnguyen03) joins
02:08:47leo60228- quits [Quit: ZNC 1.8.2 - https://znc.in]
02:09:08leo60228 (leo60228) joins
02:21:01Pedrosso joins
02:21:31ThreeHM quits [Ping timeout: 265 seconds]
02:23:30<Pedrosso>I can't find any documentation for Qwarc, I hope I'm not expected to be big brained enough to figure it all out on my own
02:31:32<fireonlive>they seem to have one indexed (on google) redirect: http://waw1.omegle.com/redir/gj2016
02:31:55<fireonlive>to some youtube video?
02:33:09<fireonlive>log.omegle.com exists too and is indexed; though seems to serve the same content
02:34:58<fireonlive>NSFW: they seem to be running a whitelabel verison of chaturbate too: https://lady.omegle.com/ (though the fact it says 'Whitelabel powered by Chaturbate.com' isn't very whitelabel.. but lol)
02:35:07katocala quits [Remote host closed the connection]
02:35:15<fireonlive>i assume/but didn't check it's just chaturbate with a different logo
02:35:18<@JAA>Pedrosso: I mentioned yesterday that there is no documentation.
02:35:57<@JAA>fireonlive: Yeah, I noticed the same about lady.omegle.com. It's already running through AB to get a sample.
02:36:02<fireonlive>ah :)
02:36:34<Pedrosso>About omegle, nice that you've got something to run on
02:36:37<fireonlive>noticed https://chatserv.omegle.com as well; which redirects to omegle.. but if you append a ?from= parameter you get omegle?
02:36:49<fireonlive>https://chatserv.omegle.com/?from=archive.org
02:37:01<fireonlive>indexed url was https://chatserv.omegle.com/?from=www.xiaodiaomao.com
02:37:15<@JAA>Or even just an empty param works.
02:37:44<fireonlive>ah! :)
02:37:54<fireonlive>trying to use it just gives an error to reload though
02:38:58<fireonlive>antinudeservers: ["waw1.omegle.com", "waw2.omegle.com", "waw3.omegle.com", "waw4.omegle.com"]
02:38:59<fireonlive>haha
02:39:19<fireonlive>(in the start json response)
02:40:37<@JAA>Pedrosso: I can't really recommend trying to use qwarc currently. It works, and it's very powerful, but the lack of documentation just make it a non-starter unless you enjoy reading through my code and figuring out all the quirks yourself.
02:42:22<fireonlive>it links to (NSFW) https://cameglelive.com/ (well a sub page of that) if you want an adult site instead, but not sure of the relation to omegle itself (and it doesn't seem to quickly say, it seems to be different than chaturbate)
02:42:38<fireonlive>i clicked men and now see a man jackin' it live so i guess it's unaffected for now
02:43:15<Pedrosso>JAA: Every programmer is a masochist, hah. I don't know of any other resources than just the code that was mentioned, and the one I wrote sure is slow
02:45:09ThreeHM (ThreeHeadedMonkey) joins
02:46:01<fireonlive>ah ok, it's a whitelabel verison of https://www.streamate.com/ it seems
02:47:31BearFortress quits [Client Quit]
02:47:36<fireonlive>(omegle itself is Omegle.com LLC)
02:48:53BearFortress joins
02:49:23katocala joins
02:53:08coderobe2 is now known as coderobe
03:09:57<@arkiver>RIP omegle
03:14:37fireonlive pours one out
03:16:43<h2ibot>Petchea edited Tumblr (+430, /* History */): https://wiki.archiveteam.org/?diff=51111&oldid=51050
03:18:43<h2ibot>0KepOnline edited Spore (+633, Add tools (Spore PNG Downloader & sporeget) and…): https://wiki.archiveteam.org/?diff=51112&oldid=51106
03:26:45<h2ibot>Petchea edited Tumblr (-47, /* History */ reblog with more context): https://wiki.archiveteam.org/?diff=51113&oldid=51111
03:27:45<h2ibot>JustAnotherArchivist edited Deathwatch (+4, Fix order): https://wiki.archiveteam.org/?diff=51114&oldid=51110
03:28:45<fireonlive>am i... dumb?
03:29:44<fireonlive>i swear i stared at that for a minute
03:29:45<fireonlive>lol
03:53:05qwertyasdfuiopghjkl quits [Remote host closed the connection]
03:53:26DogsRNice quits [Read error: Connection reset by peer]
04:12:23kiryu_ quits [Ping timeout: 272 seconds]
04:18:14mcint quits [Remote host closed the connection]
04:43:25dumbgoy_ quits [Ping timeout: 272 seconds]
04:50:14<project10>Steve Wozniak hospitalized, possible stroke: https://www.cnbc.com/2023/11/08/apple-co-founder-hospitalized-in-mexico-due-to-possible-stroke-local-media-reports.html
04:50:37<fireonlive>..and not the fun kind :(
04:50:52BlueMaxima quits [Client Quit]
04:55:27DopefishJustin quits [Ping timeout: 272 seconds]
05:28:35kiryu joins
05:28:35kiryu quits [Changing host]
05:28:35kiryu (kiryu) joins
05:38:26parfait (kdqep) joins
05:42:29jwn joins
05:49:16DopefishJustin joins
05:58:33jwn quits [Remote host closed the connection]
06:13:10etnguyen03 quits [Client Quit]
06:39:25Arcorann (Arcorann) joins
06:57:00<fireonlive>https://x.com/dexerto/status/1722403634078457878?s=12
06:57:00<eggdrop>nitter: https://nitter.net/dexerto/status/1722403634078457878
07:27:07Hackerpcs quits [Quit: Hackerpcs]
07:30:02Hackerpcs (Hackerpcs) joins
07:35:07lennier1 (lennier1) joins
07:41:14BearFortress_ joins
07:41:23_Dango360 joins
07:41:25Island__ joins
07:41:32parfait_ joins
07:44:33project10 quits [Ping timeout: 272 seconds]
07:44:52parfait quits [Ping timeout: 265 seconds]
07:44:52Island_ quits [Ping timeout: 265 seconds]
07:45:50BearFortress quits [Ping timeout: 265 seconds]
07:46:33Pedrosso quits [Ping timeout: 265 seconds]
07:46:33Dango360_ quits [Ping timeout: 265 seconds]
07:58:39leo60228 quits [Client Quit]
07:58:57leo60228- (leo60228) joins
08:07:48project10 (project10) joins
08:11:16Dango360_ joins
08:11:29TheTechRobo quits [Client Quit]
08:11:29Bleo1 quits [Client Quit]
08:11:29parfait_ quits [Remote host closed the connection]
08:11:29_Dango360 quits [Remote host closed the connection]
08:11:43parfait_ joins
08:11:44Bleo1 joins
08:11:57TheTechRobo (TheTechRobo) joins
08:16:45<that_lurker>Omegle has some interesting subdomains like lady.omegle.com that you should not open at work like i did
08:17:33kdqep__ joins
08:22:05parfait_ quits [Ping timeout: 265 seconds]
08:35:33<Exorcism>LMAOOOO
09:22:24project10 quits [Remote host closed the connection]
09:22:24yts98 leaves
09:22:32yts98 joins
09:44:22parfait_ joins
09:48:36kdqep__ quits [Ping timeout: 265 seconds]
09:50:20bf_ quits [Remote host closed the connection]
10:00:02Bleo1 quits [Client Quit]
10:01:39Bleo1 joins
10:03:58Wohlstand (Wohlstand) joins
10:41:33qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
11:06:38kdqep__ joins
11:10:49parfait_ quits [Ping timeout: 265 seconds]
11:15:06sec^nd quits [Remote host closed the connection]
11:15:39sec^nd (second) joins
11:22:24T31M quits [Client Quit]
11:22:24Wohlstand quits [Remote host closed the connection]
11:22:24kdqep__ quits [Remote host closed the connection]
11:22:24TheTechRobo quits [Client Quit]
11:22:34kdqep__ joins
11:22:37T31M joins
11:22:42Wohlstand (Wohlstand) joins
11:22:56TheTechRobo (TheTechRobo) joins
11:33:39Megame (Megame) joins
11:47:31kdqep__ quits [Remote host closed the connection]
11:47:57kdqep__ joins
11:56:12kdqep__ quits [Ping timeout: 265 seconds]
12:16:30BornOn420 quits [Client Quit]
12:23:18BornOn420 (BornOn420) joins
12:29:28dumbgoy_ joins
12:42:51Arcorann quits [Ping timeout: 272 seconds]
12:44:45systwi quits [Ping timeout: 272 seconds]
12:54:37systwi (systwi) joins
13:57:44etnguyen03 (etnguyen03) joins
14:11:08project10 (project10) joins
14:28:30project10 quits [Remote host closed the connection]
14:29:20project10 (project10) joins
15:41:26Barto quits [Ping timeout: 265 seconds]
15:41:43Barto (Barto) joins
15:48:33<@JAA>that_lurker: Yeah, discussed above, it's a whitelabel version of Chaturbate.
15:51:12DogsRNice joins
16:00:57<thuban>tumblr reportedly going to skeleton crew status: https://spaceoperajay.tumblr.com/post/733460173913489408/screenshot-20231006-005401-2-1-hosted-at-imgbb
16:02:36<thuban>(best source i could find, sorry)
16:07:12BearFortress_ quits [Client Quit]
16:10:14nulldata quits [Read error: Connection reset by peer]
16:20:35<h2ibot>Megame edited Deathwatch (+167, /* 2023 */ https://apo.org.au/ - 15 Dec): https://wiki.archiveteam.org/?diff=51115&oldid=51114
16:21:32nulldata (nulldata) joins
16:25:02<thuban>jezebel shutting down: https://variety.com/2023/digital/news/jezebel-shutting-down-go-media-layoffs-1235785877/
16:30:02icedice (icedice) joins
16:30:37<thuban>no on-site announcement yet
16:30:48<thuban>although a lot of pages depend on js (even the articles, which have 'continue reading' buttons), i _think_ that the relevant data is in-source and playback from archivebot captures would work
16:33:32BearFortress joins
16:37:34<thuban>(except comments, which are an ugly-looking in-house js thing)
16:48:35etnguyen03 quits [Ping timeout: 272 seconds]
17:09:29Dango360_ quits [Ping timeout: 272 seconds]
17:31:03Island__ quits [Read error: Connection reset by peer]
17:34:14Island joins
17:38:01Dango360 (Dango360) joins
17:41:57ScenarioPlanet (ScenarioPlanet) joins
17:54:01nicolas17 joins
18:01:17Pedrosso joins
18:30:08Megame quits [Client Quit]
18:46:20etnguyen03 (etnguyen03) joins
19:09:07<balrog>better source for tumblr: https://arstechnica.com/gadgets/2023/11/tumblr-is-reportedly-on-life-support-as-its-latest-owner-reassigns-staff/
19:17:00etnguyen03 quits [Ping timeout: 265 seconds]
19:28:18<h2ibot>Manu edited Political parties/Germany/Hamburg (+4138): https://wiki.archiveteam.org/?diff=51116&oldid=51108
20:24:40BlueMaxima joins
20:34:45ScenarioPlanet quits [Remote host closed the connection]
20:34:48Island quits [Remote host closed the connection]
20:34:48icedice quits [Remote host closed the connection]
20:34:57Island joins
20:35:01ScenarioPlanet (ScenarioPlanet) joins
20:35:09icedice (icedice) joins
20:36:18<Pedrosso>Would https://sporepedia2.foroactivo.com/ be considered small enough a forum for ArchiveBot to deal with? If so, I've got a small list of similar forums that don't seem to have been archived
20:51:27<nicolas17>we got a filesystem dump of the preinstalled macOS 14.1 on M3 Pro but still looking for 13.5 on M3 ._.
20:51:34<nicolas17>gonna get increasingly hard to find people who didn't update yet
20:57:08<fireonlive>nicolas17: jason seems to be missing from here but maybe you could tweet at him and he might boost it? or email uh whatever fucking cutsey email he left here
20:57:21<fireonlive>ended in textfiles.com i think
20:57:42<fireonlive>jesuschristmorearchiveteamcrap@textfiles.com
20:58:11<fireonlive>he does have that big follower base ™
21:00:37<h2ibot>JAABot edited CurrentWarriorProject (+4): https://wiki.archiveteam.org/?diff=51117&oldid=51109
21:16:30<@JAA>Jason is here, but he doesn't check IRC all that often.
21:24:01BlueMaxima quits [Read error: Connection reset by peer]
21:29:55<that_lurker>Tumblr is reportedly on life support as its latest owner reassigns staff: https://arstechnica.com/gadgets/2023/11/tumblr-is-reportedly-on-life-support-as-its-latest-owner-reassigns-staff/ https://news.ycombinator.com/item?id=38209312
21:30:28<flashfire42|m>sighs time to grab more tumblr
21:31:44<@JAA>→ #tumbledown
21:49:20<Pedrosso>Would anyone mind answering the question I sent above? They allegedly say they have 456,901 messanges
21:51:46<@JAA>Pedrosso: That would be appropriate for AB, yeah.
21:54:08dumbgoy__ joins
21:57:28dumbgoy_ quits [Ping timeout: 265 seconds]
22:02:08<Pedrosso>(y) great
22:08:31etnguyen03 (etnguyen03) joins
22:13:42<fireonlive>JAA: oh maybe under another nick; i tried the two i remember
22:13:48<fireonlive>didn't check hostnames though :)
22:14:09<fireonlive>but yeah true i do remember him saying just use email
22:16:01<@JAA>fireonlive: He's S.ketchCow here normally, currently S.ketchCo1 due to a netsplit or similar.
22:23:36dumbgoy__ quits [Read error: Connection reset by peer]
22:24:02dumbgoy__ joins
22:27:28dumbgoy joins
22:29:26<fireonlive>oh!
22:31:47dumbgoy__ quits [Ping timeout: 265 seconds]
22:49:11etnguyen03 quits [Ping timeout: 265 seconds]
23:04:53mattx433 (mattx433) joins
23:10:26<Pedrosso>I noticed in the ArchiveBot for sporepedia2 that https://i.imgur.com/YHS5Omo.png returned a 429 code. Do those links get sent to #// or #imgone Or are they just logged somewhere?
23:11:38<pokechu22>They're just logged - it's possible to extract them from the meta-warc (which is a gz-compressed log file) and throw them into #imgone manually but nothing automatically happens currently
23:19:49icedice quits [Client Quit]
23:24:18nyany (nyany) joins
23:26:04<Pedrosso>Considering the nature of those images (them being spore creations) that may be a good thing to do
23:28:03Pedrosso quits [Remote host closed the connection]
23:34:48Wohlstand quits [Remote host closed the connection]
23:39:43Pedrosso joins
23:47:35Arcorann (Arcorann) joins
23:53:27kdqep__ joins
23:59:49<vokunal|m>he-man.org forums shutting down on November 14th https://www.he-man.org/forums/boards/showthread.php?285395-The-He-Man-Org-forums-will-close-on-Tuesday-November-14-2023