00:14:09 | <Pedrosso> | The wiki states for the docker "--concurrent 1: Process 1 item at a time per container. Although this varies for each project, the maximum recommended value is 5, and the maximum allowed value is 20. Leave this at 1, or check with us on IRC if you are unsure". I don't really understand this. Does it vary between projects or is it always worse to |
00:14:09 | <Pedrosso> | have more or something? |
00:16:08 | <@JAA> | If you set it too high, the target site might ban you. Ideally, that just means you no longer do meaningful work. In bad cases, it can pollute the archive and lead to missed data if we don't catch it. |
00:16:51 | <pokechu22> | It varies between projects |
00:17:05 | <@JAA> | So the ideal value is whatever lets you grab as much as possible without getting banned. Rate limits vary by target site, so ideal concurrency varies by project. |
00:20:37 | <Pedrosso> | Is there anywhere to check the recommended values, or is asking in their respective chats the most efficient? |
00:21:14 | <@JAA> | We (often|usually|sometimes|occasionally) put it in the channel topic. |
00:25:05 | <Pedrosso> | I see, thanks |
00:27:11 | <Pedrosso> | With docker, is there any reason to not run multiple different projects at once? |
00:27:56 | <@JAA> | Nope, many people do exactly that. :-) |
00:29:38 | <Pedrosso> | (y) awesome. I (at least temporarily) want to maximize what I can do is all |
00:30:45 | | TastyWiener95 quits [Client Quit] |
00:31:19 | <Pedrosso> | Oh, as for when it comes to spore, I have found a tool that may assist with finding the items and thus not checking through tons of empty urls however I don't know any javascript. I'll just post the tool here if anyone wants to check it out: https://github.com/Spore-Community/SporeTools/blob/main/SporeDwrApiClient.ts |
00:31:43 | <@JAA> | Basically, there are three types of people running workers: a) casual users that just set up the warrior in auto mode once and forget about it; b) powerusers that run multiple projects in parallel to get the most out of existing machinery; c) insane people with large clusters. |
00:32:12 | <@JAA> | (Exceptions that don't fall into these categories confirm the rule.) |
00:32:58 | | TastyWiener95 (TastyWiener95) joins |
00:32:59 | <Pedrosso> | (I've no clue what that saying "exceptions confirm the rule" is supposed to mean but okay) |
00:33:23 | <Pedrosso> | idk what a cluser is, is it someone getting like a "cloud" service to do it for them? |
00:34:16 | <Pedrosso> | (also someone named idk just got tagged randomly lol) |
00:35:09 | <@JAA> | 'Exceptions confirm the rule' is a lighthearted way of saying there are counterexamples to a rule but they're few and far between, so the rule's still valid. |
00:36:48 | <@JAA> | Cluster ~ significant amount of distributed computing resources. 'Cloud' stuff can be used for that, yeah, but it's not a requirement. |
00:37:42 | <@JAA> | It also implies some degree of coordination across the resources, e.g. orchestration. |
00:40:19 | <@JAA> | Not sure https://github.com/Spore-Community/SporeTools is of much use for reducing the 1.1 billion number. It's an implementation of Spore's API, but I don't see anything for checking many asset IDs at once. |
00:48:02 | | katocala is now authenticated as katocala |
00:59:40 | <fireonlive> | omegle has shut down https://www.omegle.com/ (https://news.ycombinator.com/item?id=38199355) |
01:00:48 | | greckon487 joins |
01:01:10 | <@JAA> | Damn |
01:01:22 | | greckon487 quits [Remote host closed the connection] |
01:02:49 | <fireonlive> | yeah :c |
01:09:47 | <Pedrosso> | JAA: What it would do is considerably lower the amount of empty URLs checked, I'd believe. I've been told that all assents use the same IDs but different formats so there will be many, many gaps. Considering the server's stability it may be safer to do something slower with a list in mind. However I've no clue how to get a list of items out of that |
01:11:19 | <@JAA> | Pedrosso: If we make the same number of requests, it probably doesn't matter much. And API requests likely take more resources on the Spore server side than service static files. |
01:11:39 | <@JAA> | s/service/serving/ |
01:12:16 | <h2ibot> | FireonLive edited Deathwatch (+307, add Omegle): https://wiki.archiveteam.org/?diff=51110&oldid=51097 |
01:12:38 | | Wohlstand quits [Client Quit] |
01:13:00 | <Pedrosso> | I think I understand. I'll get to checking out the qwark software then. (Hopefully I can understand how to use it lol) |
01:24:30 | | etnguyen03 (etnguyen03) joins |
01:42:21 | <@JAA> | logs.omegle.com URLs collected from anywhere would be interesting. I already ran the ones from Reddit through AB. |
01:46:44 | | icedice2 (icedice) joins |
01:47:01 | | Island quits [Remote host closed the connection] |
01:47:01 | | TheTechRobo quits [Client Quit] |
01:47:01 | | _Dango360 quits [Remote host closed the connection] |
01:47:01 | | icedice quits [Remote host closed the connection] |
01:47:01 | | TastyWiener95 quits [Client Quit] |
01:47:06 | | Island joins |
01:47:11 | | _Dango360 joins |
01:47:14 | | TastyWiener95 (TastyWiener95) joins |
01:47:34 | | TheTechRobo (TheTechRobo) joins |
01:49:36 | | icedice2 quits [Client Quit] |
01:49:56 | | Island quits [Remote host closed the connection] |
01:49:56 | | TheTechRobo quits [Excess Flood] |
01:49:56 | | _Dango360 quits [Remote host closed the connection] |
01:50:07 | | Island joins |
01:50:13 | | _Dango360 joins |
01:50:35 | | TheTechRobo (TheTechRobo) joins |
01:52:24 | | ragra quits [Remote host closed the connection] |
01:52:24 | | TheTechRobo quits [Excess Flood] |
01:52:27 | | Pedrosso quits [Remote host closed the connection] |
01:52:36 | | Island_ joins |
01:52:41 | | Dango360_ joins |
01:53:06 | | TheTechRobo (TheTechRobo) joins |
01:54:34 | | Pedrosso joins |
01:55:54 | | Island quits [Ping timeout: 265 seconds] |
01:55:54 | | _Dango360 quits [Ping timeout: 265 seconds] |
02:00:15 | | Pedrosso quits [Ping timeout: 265 seconds] |
02:02:33 | | etnguyen03 quits [Ping timeout: 272 seconds] |
02:05:46 | | etnguyen03 (etnguyen03) joins |
02:08:47 | | leo60228- quits [Quit: ZNC 1.8.2 - https://znc.in] |
02:09:08 | | leo60228 (leo60228) joins |
02:21:01 | | Pedrosso joins |
02:21:31 | | ThreeHM quits [Ping timeout: 265 seconds] |
02:23:30 | <Pedrosso> | I can't find any documentation for Qwarc, I hope I'm not expected to be big brained enough to figure it all out on my own |
02:31:32 | <fireonlive> | they seem to have one indexed (on google) redirect: http://waw1.omegle.com/redir/gj2016 |
02:31:55 | <fireonlive> | to some youtube video? |
02:33:09 | <fireonlive> | log.omegle.com exists too and is indexed; though seems to serve the same content |
02:34:58 | <fireonlive> | NSFW: they seem to be running a whitelabel verison of chaturbate too: https://lady.omegle.com/ (though the fact it says 'Whitelabel powered by Chaturbate.com' isn't very whitelabel.. but lol) |
02:35:07 | | katocala quits [Remote host closed the connection] |
02:35:15 | <fireonlive> | i assume/but didn't check it's just chaturbate with a different logo |
02:35:18 | <@JAA> | Pedrosso: I mentioned yesterday that there is no documentation. |
02:35:57 | <@JAA> | fireonlive: Yeah, I noticed the same about lady.omegle.com. It's already running through AB to get a sample. |
02:36:02 | <fireonlive> | ah :) |
02:36:34 | <Pedrosso> | About omegle, nice that you've got something to run on |
02:36:37 | <fireonlive> | noticed https://chatserv.omegle.com as well; which redirects to omegle.. but if you append a ?from= parameter you get omegle? |
02:36:49 | <fireonlive> | https://chatserv.omegle.com/?from=archive.org |
02:37:01 | <fireonlive> | indexed url was https://chatserv.omegle.com/?from=www.xiaodiaomao.com |
02:37:15 | <@JAA> | Or even just an empty param works. |
02:37:44 | <fireonlive> | ah! :) |
02:37:54 | <fireonlive> | trying to use it just gives an error to reload though |
02:38:58 | <fireonlive> | antinudeservers: ["waw1.omegle.com", "waw2.omegle.com", "waw3.omegle.com", "waw4.omegle.com"] |
02:38:59 | <fireonlive> | haha |
02:39:19 | <fireonlive> | (in the start json response) |
02:40:37 | <@JAA> | Pedrosso: I can't really recommend trying to use qwarc currently. It works, and it's very powerful, but the lack of documentation just make it a non-starter unless you enjoy reading through my code and figuring out all the quirks yourself. |
02:42:22 | <fireonlive> | it links to (NSFW) https://cameglelive.com/ (well a sub page of that) if you want an adult site instead, but not sure of the relation to omegle itself (and it doesn't seem to quickly say, it seems to be different than chaturbate) |
02:42:38 | <fireonlive> | i clicked men and now see a man jackin' it live so i guess it's unaffected for now |
02:43:15 | <Pedrosso> | JAA: Every programmer is a masochist, hah. I don't know of any other resources than just the code that was mentioned, and the one I wrote sure is slow |
02:45:09 | | ThreeHM (ThreeHeadedMonkey) joins |
02:46:01 | <fireonlive> | ah ok, it's a whitelabel verison of https://www.streamate.com/ it seems |
02:47:31 | | BearFortress quits [Client Quit] |
02:47:36 | <fireonlive> | (omegle itself is Omegle.com LLC) |
02:48:53 | | BearFortress joins |
02:49:23 | | katocala joins |
02:49:39 | | katocala is now authenticated as katocala |
02:53:08 | | coderobe2 is now known as coderobe |
03:09:57 | <@arkiver> | RIP omegle |
03:14:37 | | fireonlive pours one out |
03:16:43 | <h2ibot> | Petchea edited Tumblr (+430, /* History */): https://wiki.archiveteam.org/?diff=51111&oldid=51050 |
03:18:43 | <h2ibot> | 0KepOnline edited Spore (+633, Add tools (Spore PNG Downloader & sporeget) and…): https://wiki.archiveteam.org/?diff=51112&oldid=51106 |
03:26:45 | <h2ibot> | Petchea edited Tumblr (-47, /* History */ reblog with more context): https://wiki.archiveteam.org/?diff=51113&oldid=51111 |
03:27:45 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+4, Fix order): https://wiki.archiveteam.org/?diff=51114&oldid=51110 |
03:28:45 | <fireonlive> | am i... dumb? |
03:29:44 | <fireonlive> | i swear i stared at that for a minute |
03:29:45 | <fireonlive> | lol |
03:53:05 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
03:53:26 | | DogsRNice quits [Read error: Connection reset by peer] |
04:12:23 | | kiryu_ quits [Ping timeout: 272 seconds] |
04:18:14 | | mcint quits [Remote host closed the connection] |
04:43:25 | | dumbgoy_ quits [Ping timeout: 272 seconds] |
04:50:14 | <project10> | Steve Wozniak hospitalized, possible stroke: https://www.cnbc.com/2023/11/08/apple-co-founder-hospitalized-in-mexico-due-to-possible-stroke-local-media-reports.html |
04:50:37 | <fireonlive> | ..and not the fun kind :( |
04:50:52 | | BlueMaxima quits [Client Quit] |
04:55:27 | | DopefishJustin quits [Ping timeout: 272 seconds] |
05:28:35 | | kiryu joins |
05:28:35 | | kiryu is now authenticated as kiryu |
05:28:35 | | kiryu quits [Changing host] |
05:28:35 | | kiryu (kiryu) joins |
05:38:26 | | parfait (kdqep) joins |
05:42:29 | | jwn joins |
05:49:16 | | DopefishJustin joins |
05:49:16 | | DopefishJustin is now authenticated as DopefishJustin |
05:58:33 | | jwn quits [Remote host closed the connection] |
06:13:10 | | etnguyen03 quits [Client Quit] |
06:39:25 | | Arcorann (Arcorann) joins |
06:57:00 | <fireonlive> | https://x.com/dexerto/status/1722403634078457878?s=12 |
06:57:00 | <eggdrop> | nitter: https://nitter.net/dexerto/status/1722403634078457878 |
07:27:07 | | Hackerpcs quits [Quit: Hackerpcs] |
07:30:02 | | Hackerpcs (Hackerpcs) joins |
07:35:07 | | lennier1 (lennier1) joins |
07:41:14 | | BearFortress_ joins |
07:41:23 | | _Dango360 joins |
07:41:25 | | Island__ joins |
07:41:32 | | parfait_ joins |
07:44:33 | | project10 quits [Ping timeout: 272 seconds] |
07:44:52 | | parfait quits [Ping timeout: 265 seconds] |
07:44:52 | | Island_ quits [Ping timeout: 265 seconds] |
07:45:50 | | BearFortress quits [Ping timeout: 265 seconds] |
07:46:33 | | Pedrosso quits [Ping timeout: 265 seconds] |
07:46:33 | | Dango360_ quits [Ping timeout: 265 seconds] |
07:58:39 | | leo60228 quits [Client Quit] |
07:58:57 | | leo60228- (leo60228) joins |
08:07:48 | | project10 (project10) joins |
08:11:16 | | Dango360_ joins |
08:11:29 | | TheTechRobo quits [Client Quit] |
08:11:29 | | Bleo1 quits [Client Quit] |
08:11:29 | | parfait_ quits [Remote host closed the connection] |
08:11:29 | | _Dango360 quits [Remote host closed the connection] |
08:11:43 | | parfait_ joins |
08:11:44 | | Bleo1 joins |
08:11:57 | | TheTechRobo (TheTechRobo) joins |
08:16:45 | <that_lurker> | Omegle has some interesting subdomains like lady.omegle.com that you should not open at work like i did |
08:17:33 | | kdqep__ joins |
08:22:05 | | parfait_ quits [Ping timeout: 265 seconds] |
08:35:33 | <Exorcism> | LMAOOOO |
09:22:24 | | project10 quits [Remote host closed the connection] |
09:22:24 | | yts98 leaves |
09:22:32 | | yts98 joins |
09:44:22 | | parfait_ joins |
09:48:36 | | kdqep__ quits [Ping timeout: 265 seconds] |
09:50:20 | | bf_ quits [Remote host closed the connection] |
10:00:02 | | Bleo1 quits [Client Quit] |
10:01:39 | | Bleo1 joins |
10:03:58 | | Wohlstand (Wohlstand) joins |
10:41:33 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
11:06:38 | | kdqep__ joins |
11:10:49 | | parfait_ quits [Ping timeout: 265 seconds] |
11:15:06 | | sec^nd quits [Remote host closed the connection] |
11:15:39 | | sec^nd (second) joins |
11:22:24 | | T31M quits [Client Quit] |
11:22:24 | | Wohlstand quits [Remote host closed the connection] |
11:22:24 | | kdqep__ quits [Remote host closed the connection] |
11:22:24 | | TheTechRobo quits [Client Quit] |
11:22:34 | | kdqep__ joins |
11:22:37 | | T31M joins |
11:22:42 | | Wohlstand (Wohlstand) joins |
11:22:56 | | TheTechRobo (TheTechRobo) joins |
11:33:39 | | Megame (Megame) joins |
11:47:31 | | kdqep__ quits [Remote host closed the connection] |
11:47:57 | | kdqep__ joins |
11:56:12 | | kdqep__ quits [Ping timeout: 265 seconds] |
12:16:30 | | BornOn420 quits [Client Quit] |
12:23:18 | | BornOn420 (BornOn420) joins |
12:29:28 | | dumbgoy_ joins |
12:42:51 | | Arcorann quits [Ping timeout: 272 seconds] |
12:44:45 | | systwi quits [Ping timeout: 272 seconds] |
12:54:37 | | systwi (systwi) joins |
13:57:44 | | etnguyen03 (etnguyen03) joins |
14:11:08 | | project10 (project10) joins |
14:28:30 | | project10 quits [Remote host closed the connection] |
14:29:20 | | project10 (project10) joins |
15:41:26 | | Barto quits [Ping timeout: 265 seconds] |
15:41:43 | | Barto (Barto) joins |
15:48:33 | <@JAA> | that_lurker: Yeah, discussed above, it's a whitelabel version of Chaturbate. |
15:51:12 | | DogsRNice joins |
16:00:57 | <thuban> | tumblr reportedly going to skeleton crew status: https://spaceoperajay.tumblr.com/post/733460173913489408/screenshot-20231006-005401-2-1-hosted-at-imgbb |
16:02:36 | <thuban> | (best source i could find, sorry) |
16:07:12 | | BearFortress_ quits [Client Quit] |
16:10:14 | | nulldata quits [Read error: Connection reset by peer] |
16:20:35 | <h2ibot> | Megame edited Deathwatch (+167, /* 2023 */ https://apo.org.au/ - 15 Dec): https://wiki.archiveteam.org/?diff=51115&oldid=51114 |
16:21:32 | | nulldata (nulldata) joins |
16:25:02 | <thuban> | jezebel shutting down: https://variety.com/2023/digital/news/jezebel-shutting-down-go-media-layoffs-1235785877/ |
16:30:02 | | icedice (icedice) joins |
16:30:37 | <thuban> | no on-site announcement yet |
16:30:48 | <thuban> | although a lot of pages depend on js (even the articles, which have 'continue reading' buttons), i _think_ that the relevant data is in-source and playback from archivebot captures would work |
16:33:32 | | BearFortress joins |
16:37:34 | <thuban> | (except comments, which are an ugly-looking in-house js thing) |
16:48:35 | | etnguyen03 quits [Ping timeout: 272 seconds] |
17:09:29 | | Dango360_ quits [Ping timeout: 272 seconds] |
17:31:03 | | Island__ quits [Read error: Connection reset by peer] |
17:34:14 | | Island joins |
17:38:01 | | Dango360 (Dango360) joins |
17:41:57 | | ScenarioPlanet (ScenarioPlanet) joins |
17:54:01 | | nicolas17 joins |
18:01:17 | | Pedrosso joins |
18:30:08 | | Megame quits [Client Quit] |
18:46:20 | | etnguyen03 (etnguyen03) joins |
19:09:07 | <balrog> | better source for tumblr: https://arstechnica.com/gadgets/2023/11/tumblr-is-reportedly-on-life-support-as-its-latest-owner-reassigns-staff/ |
19:17:00 | | etnguyen03 quits [Ping timeout: 265 seconds] |
19:28:18 | <h2ibot> | Manu edited Political parties/Germany/Hamburg (+4138): https://wiki.archiveteam.org/?diff=51116&oldid=51108 |
20:24:40 | | BlueMaxima joins |
20:34:45 | | ScenarioPlanet quits [Remote host closed the connection] |
20:34:48 | | Island quits [Remote host closed the connection] |
20:34:48 | | icedice quits [Remote host closed the connection] |
20:34:57 | | Island joins |
20:35:01 | | ScenarioPlanet (ScenarioPlanet) joins |
20:35:09 | | icedice (icedice) joins |
20:36:18 | <Pedrosso> | Would https://sporepedia2.foroactivo.com/ be considered small enough a forum for ArchiveBot to deal with? If so, I've got a small list of similar forums that don't seem to have been archived |
20:51:27 | <nicolas17> | we got a filesystem dump of the preinstalled macOS 14.1 on M3 Pro but still looking for 13.5 on M3 ._. |
20:51:34 | <nicolas17> | gonna get increasingly hard to find people who didn't update yet |
20:57:08 | <fireonlive> | nicolas17: jason seems to be missing from here but maybe you could tweet at him and he might boost it? or email uh whatever fucking cutsey email he left here |
20:57:21 | <fireonlive> | ended in textfiles.com i think |
20:57:42 | <fireonlive> | jesuschristmorearchiveteamcrap@textfiles.com |
20:58:11 | <fireonlive> | he does have that big follower base ™ |
21:00:37 | <h2ibot> | JAABot edited CurrentWarriorProject (+4): https://wiki.archiveteam.org/?diff=51117&oldid=51109 |
21:16:30 | <@JAA> | Jason is here, but he doesn't check IRC all that often. |
21:24:01 | | BlueMaxima quits [Read error: Connection reset by peer] |
21:29:55 | <that_lurker> | Tumblr is reportedly on life support as its latest owner reassigns staff: https://arstechnica.com/gadgets/2023/11/tumblr-is-reportedly-on-life-support-as-its-latest-owner-reassigns-staff/ https://news.ycombinator.com/item?id=38209312 |
21:30:28 | <flashfire42|m> | sighs time to grab more tumblr |
21:31:44 | <@JAA> | → #tumbledown |
21:49:20 | <Pedrosso> | Would anyone mind answering the question I sent above? They allegedly say they have 456,901 messanges |
21:51:46 | <@JAA> | Pedrosso: That would be appropriate for AB, yeah. |
21:54:08 | | dumbgoy__ joins |
21:57:28 | | dumbgoy_ quits [Ping timeout: 265 seconds] |
22:02:08 | <Pedrosso> | (y) great |
22:08:31 | | etnguyen03 (etnguyen03) joins |
22:13:42 | <fireonlive> | JAA: oh maybe under another nick; i tried the two i remember |
22:13:48 | <fireonlive> | didn't check hostnames though :) |
22:14:09 | <fireonlive> | but yeah true i do remember him saying just use email |
22:16:01 | <@JAA> | fireonlive: He's S.ketchCow here normally, currently S.ketchCo1 due to a netsplit or similar. |
22:23:36 | | dumbgoy__ quits [Read error: Connection reset by peer] |
22:24:02 | | dumbgoy__ joins |
22:27:28 | | dumbgoy joins |
22:29:26 | <fireonlive> | oh! |
22:31:47 | | dumbgoy__ quits [Ping timeout: 265 seconds] |
22:49:11 | | etnguyen03 quits [Ping timeout: 265 seconds] |
23:04:53 | | mattx433 (mattx433) joins |
23:10:26 | <Pedrosso> | I noticed in the ArchiveBot for sporepedia2 that https://i.imgur.com/YHS5Omo.png returned a 429 code. Do those links get sent to #// or #imgone Or are they just logged somewhere? |
23:11:38 | <pokechu22> | They're just logged - it's possible to extract them from the meta-warc (which is a gz-compressed log file) and throw them into #imgone manually but nothing automatically happens currently |
23:19:49 | | icedice quits [Client Quit] |
23:24:18 | | nyany (nyany) joins |
23:26:04 | <Pedrosso> | Considering the nature of those images (them being spore creations) that may be a good thing to do |
23:28:03 | | Pedrosso quits [Remote host closed the connection] |
23:34:48 | | Wohlstand quits [Remote host closed the connection] |
23:39:43 | | Pedrosso joins |
23:47:35 | | Arcorann (Arcorann) joins |
23:53:27 | | kdqep__ joins |
23:59:49 | <vokunal|m> | he-man.org forums shutting down on November 14th https://www.he-man.org/forums/boards/showthread.php?285395-The-He-Man-Org-forums-will-close-on-Tuesday-November-14-2023 |