00:06:19etnguyen03 quits [Client Quit]
00:09:00Sk1d joins
00:18:06<kline>is there anyone who still has an interest in chaingang? I now have a text serialised version of the bitcoin (blergh) blockchain, and wondering what the best architecture for it would be to upload
00:19:05<nicolas17>what
00:19:42<nicolas17>what is chaingang?
00:19:57<kline>https://wiki.archiveteam.org/index.php/ArchiveTeam_Chain_Gang
00:20:16<nicolas17>oh
00:20:57<nicolas17>isn't that like impossible to get lost?
00:21:24<kline>an otherwise abandoned project to back up blockchains used for cryptocurrencies. i dont personally like them, but i do think bitcoin and probably ethereum could be considered internet-culturally-important and especially right now has taken a bit of a dive with the allocation of computing resources towards LLMs
00:22:42<kline>i dont think its impossible to lose - there are some subset of users of the network who maintain fully copies of the chain, often service providers who want to sell analytics etc, but most users do not have a complete copy and only maintain the last-N blocks to allow a bit of history
00:23:10<kline>i think monero has already got some holes in its blockchain but i would need to find a source on that
00:23:16<nicolas17>ah
00:23:30<nicolas17>last time I used this there were two types of clients
00:24:01<nicolas17>full-history and thin-client-ish
00:24:16<nicolas17>guess they finally implemented history pruning
00:25:04wickedplayer494 quits [Ping timeout: 256 seconds]
00:25:17<kline>full history nodes (with indexes) are closing in on 1TB of storage space (about 650GB is binary data, the rest are fast indexes so you dont have to iterate back through the chain every time you want to find a given block)
00:25:50<kline>naturally most people aren't spending a week+ downloading this to get started
00:26:02wickedplayer494 (wickedplayer494) joins
00:31:23<kline>failing anyone specifically interested, a more generic question: i have 206 files of monthly data coming to a grand total of ~850GB. Would it be better to structure this as 206 single-file items under a collection, or 1 item with 206 files?
00:31:45<kline>the largest individual file is 11GB
00:33:21<pokechu22>I don't have an opinion, but it's worth considering how you'd handle adding new data next month
00:34:00<@JAA>Something inbetween like yearly or quarterly items might be worth considering, too.
00:34:32<kline>pokechu22, you can add new files to an item afterwards, no?
00:34:43<@JAA>How is this data distributed in the actual network? As in, when you set up a full history node, how does it obtain all the data?
00:35:03<pokechu22>Pretty sure you can, though I feel like that can be a bit weird in some situations
00:35:20<@JAA>You can add more files later, yes, but there is a hard item size limit of 1 TiB, so you'd probably run into that pretty soon.
00:35:23<nicolas17>JAA: from other nodes p2p via the custom bitcoin protocol
00:35:38<kline>JAA, it bootstraps itself into the p2p network finding other full nodes, then just iterates through every block in order until its up to date
00:35:49<@JAA>Ah
00:36:03<@JAA>And how does the bootstrapping work?
00:36:36<kline>thats a good question, let me check how it finds its first peer
00:36:38<nicolas17>peers can tell you about other peers, to find the first ones I think there's some hardcoded IPs or a DNS record?
00:37:53SootBector quits [Remote host closed the connection]
00:38:02<@JAA>Yeah, that's what I'd expect, I think. BitTorrent's DHT works like that.
00:39:03SootBector (SootBector) joins
00:39:17<kline>DNS and hardcoded IPs of promised long-term nodes as a backup, apparently
00:43:40nexussfan (nexussfan) joins
00:47:08useretail quits [Quit: Leaving]
00:48:02Sk1d quits [Read error: Connection reset by peer]
01:00:47wickedplayer494 quits [Ping timeout: 272 seconds]
01:01:50wickedplayer494 (wickedplayer494) joins
01:06:12Wohlstand quits [Quit: Wohlstand]
01:14:12etnguyen03 (etnguyen03) joins
01:25:12<h2ibot>PaulWise edited Obstacles (+44, Spigot poisoner): https://wiki.archiveteam.org/?diff=60473&oldid=60388
01:33:13<h2ibot>PaulWise edited Obstacles (+55, sethrawall): https://wiki.archiveteam.org/?diff=60474&oldid=60473
01:34:59wickedplayer494 quits [Ping timeout: 272 seconds]
01:39:29wickedplayer494 (wickedplayer494) joins
02:32:33LddPotato quits [Read error: Connection reset by peer]
02:34:49LddPotato (LddPotato) joins
02:44:39Bleo1826007227196234552220 quits [Ping timeout: 272 seconds]
02:44:39LddPotato quits [Read error: Connection reset by peer]
02:46:02LddPotato (LddPotato) joins
02:51:45<Yakov>Speaking of archiving archive.today captures, can't we make a system where users solve captchas towards a pool of browser sessions running at AT? Might not generate WBM-valid warcs but will at least be able to be preserved to some extent
02:52:49<Yakov>Thought of this for a while, where the captcha is streamed to the browser via novnc so people can contribute solving a captcha that's being operated on a remote server
02:53:47<Yakov>Then we can scrape popular sites (like wikis) which reference archive.today so we can build a nice priority list of urls
02:56:56<Yakov>Can technicality even be a valid warc if it's acceptable to omit the captcha request from the warc - not really sure about the standards of that
03:03:14LddPotato quits [Read error: Connection reset by peer]
03:04:47LddPotato (LddPotato) joins
03:24:03LddPotato quits [Read error: Connection reset by peer]
03:26:00LddPotato (LddPotato) joins
03:39:07ducky quits [Ping timeout: 272 seconds]
04:04:16APOLLO03a quits [Read error: Connection reset by peer]
04:04:30APOLLO03 joins
04:11:33etnguyen03 quits [Client Quit]
04:12:02etnguyen03 (etnguyen03) joins
04:13:38etnguyen03 quits [Remote host closed the connection]
04:30:37Island quits [Read error: Connection reset by peer]
04:57:56Shjosan quits [Quit: Am sleepy (-, – )…zzzZZZ]
04:58:27Shjosan (Shjosan) joins
05:04:26n9nes quits [Ping timeout: 256 seconds]
05:05:18n9nes joins
05:13:36janos777 quits [Read error: Connection reset by peer]
05:13:36janos778 quits [Read error: Connection reset by peer]
05:24:57BlankEclair leaves
05:25:09BlankEclair (BlankEclair) joins
06:00:07Webuser146109 joins
06:02:12Webuser146109 quits [Client Quit]
06:05:28nexussfan quits [Quit: Konversation terminated!]
07:59:39benjins3_ joins
08:00:06benjins3 quits [Ping timeout: 256 seconds]
08:28:33ichdasich quits [Ping timeout: 272 seconds]
08:29:08ichdasich joins
09:22:04sec^nd quits [Remote host closed the connection]
09:22:27sec^nd (second) joins
09:27:22sec^nd quits [Remote host closed the connection]
09:27:45sec^nd (second) joins
09:28:05ichdasich quits [Ping timeout: 272 seconds]
09:33:15ichdasich joins
09:46:17Wohlstand (Wohlstand) joins
09:57:47<h2ibot>Manu edited Discourse/archived (+103, Queued community.roboticsys.com): https://wiki.archiveteam.org/?diff=60475&oldid=60469
10:22:54ducky (ducky) joins
10:29:22n9nes quits [Quit: ZNC 1.10.1 - https://znc.in]
10:33:02n9nes joins
11:25:02Webuser364520 joins
11:26:17ducky_ (ducky) joins
11:26:31ducky quits [Ping timeout: 272 seconds]
11:26:54ducky_ is now known as ducky
11:42:07Dada joins
11:51:55Webuser364520 quits [Client Quit]
12:02:48Bleo1826007227196234552220 joins
12:08:57Snivy quits [Ping timeout: 272 seconds]
12:13:05<katia>yeah i thought of doing this for my own archival of Difficult thugs Yakov
12:13:14<katia>browser in novnc + warcprox
12:13:56<katia>don't see why it wouldn't work
13:39:58pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat]
13:40:09Arcorann_ quits [Ping timeout: 272 seconds]
13:57:28cyanbox quits [Read error: Connection reset by peer]
15:14:22<justauser>thedude: For a single page, "wget -p" might be perfectly fine if it likes your IP.
15:16:07<justauser>Everybody interested in archive.today: CAPTCHA difficulty level is allegedly adaptive to server load, so we may want to run during some "quiet time".
15:17:51Island joins
15:48:07anarcat quits [Quit: rebooting]
15:49:00<justauser>https://comp.lain.la/notice/B3Fp2LQ98vYq06V9EG
15:49:53<justauser>Did we try talking yet?
15:50:47anarcat (anarcat) joins
15:51:22<klea>/cc pabs because I think they said about asking them.
16:16:06<@arkiver>i think we don't have historyhub.history.gov archived yet
16:18:56Dada quits [Remote host closed the connection]
16:21:47kansei (kansei) joins
16:22:10kansei- quits [Ping timeout: 256 seconds]
16:22:47<@arkiver>found a way to enumerate all discussions
16:26:33<@arkiver>imer: i may need a very urgent target for historyhub, shutting down on the 13th
16:26:34<@arkiver>today
16:26:43<@imer>ack
16:26:47<@arkiver>less than 100k threads, and they can be enumerated easily
16:27:50<@imer>give me a shout once the tracker is up and i'll set one up
16:27:59<@arkiver>imer: it's up at historyhub
16:28:13@imer checked 5s ago and it wasnt
16:28:24<@arkiver>can we get a target with historyhub_ archiveteam_historyhub_ and "Archive Team History Hub:"?
16:28:31<@arkiver>starting as soon as possible, hurrying
16:28:36<@arkiver>imer: yep i created it seconds ago
16:29:43<@imer>target's up
16:30:04<@arkiver>thanks
16:30:10<@arkiver>if needed we'll pause other projects, but i expect it's not needed
16:30:42<@imer>drone poked too
16:36:20<@arkiver>thank you as well for that
16:37:21<@arkiver>going to keep this simple i think
16:40:09<@arkiver>less than 60k threads
16:40:18<@arkiver>if this is not hosted on a potato we should be good
16:56:00Connection closed.
16:56:12atirclog (atirclog) joins
16:56:12Topic: Lengthy ArchiveTeam-related discussions, questions here | Offtopic: #archiveteam-ot | https://twitter.com/textfiles/status/1069715869994020867
16:56:12Topic set by AlsoJAA at 2025-09-09 18:44:47Z
16:56:12h2ibot (h2ibot) joins
16:56:12jinn6 (jinn6) joins
16:56:13ax (ax) joins
16:56:13inedia (inedia) joins
16:56:13igloo22225 (igloo22225) joins
16:56:14yasomi joins
16:56:14yasomi quits [Changing host]
16:56:14yasomi (yasomi) joins
16:56:14FalconK (FalconK) joins
16:56:14maxfan8_ (maxfan8) joins
16:56:14d10n joins
16:56:14TunaLobster joins
16:56:14lukash984 joins
16:56:14s-crypt (s-crypt) joins
16:56:14Lord_Nightmare (Lord_Nightmare) joins
16:56:15wyatt8740 joins
16:56:15notSokar joins
16:56:15PredatorIWD25 joins
16:56:15Barto (Barto) joins
16:56:16unknownsrc (unknownsrc) joins
16:56:16Pedrosso joins
16:56:17Fusl (Fusl) joins
16:56:17skankhunt42 joins
16:56:17monika (boom) joins
16:56:17sepro (sepro) joins
16:56:17michaelblob joins
16:56:17Soulflare joins
16:56:18benjins3_ joins
16:56:18kiska52 joins
16:56:18@ChanServ sets mode: +o Fusl
16:56:18fetcher joins
16:56:19phillipsjk joins
16:56:19archiveDrill joins
16:56:19xkey (xkey) joins
16:56:19Stagnant (Stagnant) joins
16:56:19khaoohs joins
16:56:20wessel1512 joins
16:56:20ThreeHM (ThreeHeadedMonkey) joins
16:56:20lexikiq joins
16:56:20h|ca2 (h) joins
16:56:20G4te_Keep3r34924156 joins
16:56:20nickofnicks (nickofnicks) joins
16:56:21skyrocket joins
16:56:21hexagonwin (hexagonwin) joins
16:56:21bilboed0 joins
16:56:21pokechu22 (pokechu22) joins
16:56:22Island joins
16:56:23Current users: Island, pokechu22 (pokechu22), bilboed0, hexagonwin (hexagonwin), skyrocket, nickofnicks (nickofnicks), G4te_Keep3r34924156, h|ca2 (h), lexikiq, ThreeHM (ThreeHeadedMonkey), wessel1512, khaoohs, Stagnant (Stagnant), xkey (xkey), archiveDrill, phillipsjk, fetcher, kiska52, benjins3_, Soulflare, michaelblob, sepro (sepro), monika (boom), skankhunt42, @Fusl (Fusl), Pedrosso, unknownsrc (unknownsrc), Barto (Barto), PredatorIWD25, notSokar, wyatt8740, Lord_Nightmare (Lord_Nightmare), s-crypt (s-crypt), lukash984, TunaLobster, d10n, maxfan8_ (maxfan8), FalconK (FalconK), yasomi (yasomi), igloo22225 (igloo22225), inedia (inedia), ax (ax), jinn6 (jinn6), h2ibot (h2ibot), atirclog (atirclog), trix (trix), BluRaf (BluRaf), fluke, luckcolors (luckcolors), ^ (^), Dalek (Dalek), PC (PC), ice, Radzig, ducky (ducky), peaches, chaoticbee (chaoticbee), krush, Shjosan (Shjosan), Kenshin (Kenshin), APOLLO03, szczot3k (szczot3k), Ryz2 (Ryz), andrewnyr, chunkynutz60, matoro, tmg1|michelson, ConstantK, ichdasich, summerisle (summerisle), phuzion (phuzion), anarcat (anarcat), cptcobalt, russss (russss), magmaus3 (magmaus3), pie_ (pie_), dan-, oxtyped, flotwig, pixeldesu (pixeldesu), atweedie, kf (kf), w0rm (w0rm), Bleo1826007227196234552220, n9nes, sec^nd (second), BlankEclair (BlankEclair), LddPotato (LddPotato), SootBector (SootBector), rohvani, nicolas17 (nicolas17), atphoenix__ (atphoenix), UwU, Dj-Wawa (Dj-Wawa), Hackerpcs (Hackerpcs), fionera (Fionera), BornOn420 (BornOn420), mgrytbak, TheEnbyperor (TheEnbyperor), BennyOtt (BennyOtt), fireatseaparks (fireatseaparks), fangfufu (fangfufu), roverinexile, pabs (pabs), lunik1, Goofybally (Goofybally), Deewiant (Deewiant), Karlett (Karlett), beastbg8 (beastbg8), tubgoat, DopefishJustin (DopefishJustin), Max_G, jspiros (jspiros), allani, nyakase (nyakase), arch (arch), beardicus (beardicus), Sidpatchy (Sidpatchy), qxtal (qxtal), Webuser995648, aninternettroll (aninternettroll), M--mlv|m, legoktm, miksters|m, IceCodeNew|m, mat|m1, Joy|m, ampdot|m, will|m, username675f|m, saouroun|m, Passiing|m, gareth48|m, yarnover|m, Claire|m, Tyrasuki|m, Valkum|m, yetanotherarchiver|m, Misty|m, hillow596|m, rain|m, Cronfox|m, noxious, PhoHale|m, mister_x, kaz__|m, ram|m, Fijxu|m, Video, starg2|m, mind_combatant (mind_combatant), Exorcism (exorcism), osiride|m, justauser|m (justauser|m), its_notjack (its_notjack), Alienmaster|m, e2mau|m, katia|m, tomodachi94 (tomodachi94), anon00001|m, mikolaj|m, Adamvoltagex|m, supermariofan67|m, vics, Tom|m1, iCesenberk|m, Roki_100|m, jevinskie, CrispyAlice2, Fletcher (Fletcher), lasdkfj|m, masterx244|m (masterx244|m), audrooku|m, britmob|m, nightpool (nightpool), ragu|m, madpro|m, x9fff00 (x9fff00), jackt1365|m, schwarzkatz|m, haha-whered-it-go|m, nstrom|m, theblazehen|m, andrewvieyra|m, hlgs|m, Ruk8 (Ruk8), pannekoek11|m, joepie91|m, ax|m, victor_vaughn|m, th3z0l4|m, Hans5958 (Hans5958), hexagonwin|m, tech234a (tech234a), triplecamera|m, spearcat|m, cruller, nyuuzyou, upperbody321|m, l0rd_enki|m, nano412510 (nano412510), GhostIsBeHere|m, aaq|m, v1cs, bogsen (bogsen), that_lurker|m, Nulo|m, s-crypt|m|m, gamer191-1|m, qyxojzh|m, octylFractal|m, EvanBoehs|m, flashfire42|m (flashfire42), alexshpilkin, coro, phaeton (phaeton), Vokun (Vokun), noobirc|m, cmostracker|m, trumad|m, nosamu|m, Cydog|m, jwoglom|m, MaxG, superusercode, vexr, wrangle|m, moe-a-m|m, yzqzss (yzqzss), Thibaultmol, finalti|m, Minkafighter|m, GRBaset (GRBaset), akaibu|m, NickS|m, igneousx (igneousx), Ajay, DigitalDragon, xxia|m, mpeter|m, thermospheric, MinePlayersPEMyNey|m, tech234a|m-backup (tech234a), @Sanqui|m (Sanqui), @rewby|m (rewby), twiswist (twiswist), nepeat (nepeat), ats (ats), TheoH7 (TheoH7), CYBERDEV, lumidify (lumidify), VerifiedJ (VerifiedJ), Medowar (Medowar), simon816 (simon816), fuzzy80211 (fuzzy80211), evergreen5, iPwnedYourIOTSmartdog, multisn8 (multisn8), HP_Archivist (HP_Archivist), hackbug (hackbug), driib97 (driib), datechnoman (datechnoman), steering (steering), pseudorizer (pseudorizer), tertu2 (tertu), ArchivalEfforts, croissant`, cm, lflare (lflare), Matthww, xarph, @arkiver (arkiver), midou, Ointment8862 (Ointment8862), barry, Guest, Cornelius (Cornelius), Doomaholic (Doomaholic), camrod6362 (camrod), MPThLee (MPThLee), ThetaDev, nukke (nukke), neggles (neggles), ScenarioPlanet (ScenarioPlanet), Coderjo_, Irenes (ireneista), f_ (funderscore), T31M, nimaje, tzt (tzt), DigitalDragons (DigitalDragons), Exorcism|irc (exorcism), Zachava (Zachava), endrift, itachi1706 (itachi1706), stepney141 (stepney141), Mateon1, jonte (jonte4), Dango360 (Dango360), monoxane (monoxane), opl (opl), @rewby (rewby), daxxy, Chris5010 (Chris5010), @imer (imer), ATinySpaceMarine, Jake (Jake), andrew (andrew), za3k, leo60228 (leo60228), knecht (knecht), IDK (IDK), cancername (cancername), superkuh, Church (Church), balrog (balrog), klea (jmjl), asie (asie), colla (colla), alexlehm (alexlehm), eggdrop (eggdrop), programmerq (programmerq), Yakov (Yakov), sknebel (sknebel), Arachnophine (Arachnophine), tuna (tuna), eythian, ivan (ivan), that_lurker (that_lurker), CraftByte (DragonSec|CraftByte), hexa- (hexa-), unlobito (unlobito), nothere, nulldata (nulldata), abirkill (abirkill), Czechball, bb010g (bb010g), raccoon (raccoon), maxfan8 (maxfan8), evan, OctopusET, thehedgeh0g (mrHedgehog0), shreyasminocha (shreyasminocha), alittleglitchy, entrox, DLoader (DLoader), night (night), cultpony (cultpony), graham9, Terbium, void09, runxiyu (runxiyu), justauser (justauser), Xesxen (Xesxen), kdy (kdy), bisector (bisector), @kaz (Kaz), qw3rty, catbottom, mattx433 (mattx433), @ChanServ, katia (katia), jodizzle (jodizzle), @JAA (JAA), erenrich, betamax (betamax), nyany (nyany), Jon (Jon), kallsyms, angenieux2 (angenieux), lea (lea_), @chfoo (chfoo), masterX244 (masterX244), apache2, kuroger (kuroger), @OrIdow6 (OrIdow6), lindowsME, revi (revi), billy549 (Billy549), GradientCat (GradientCat), Stargazers, mete, mikael, jonty (jonty), monohedron (monohedron), JSharp (JSharp), todb, Ctrl-S, th3ph3d, zifnab06, riking, murmur, thejsa, citty, mystique_altrosky (mystique_altrosky), siinus (siinus), @AlsoJAA (JAA), colona (colona), noodle-vrax, plcp, b3nzo, @HCross (HCross), @hook54321 (hook54321), [42] (N4Y), loopy, Muad-Dib, @rewby|backup (rewby), girst (girst), justcool393 (justcool393), kokos, mgrandi (mgrandi), Meroje (Meroje), ShadowJonathan (ShadowJonathan), PwnHoaX (PwnHoaX), efi (efi), Jonimus, kpcyrd (kpcyrd)
16:56:23TastyWiener95 (TastyWiener95) joins
16:56:24wickedplayer494 (wickedplayer494) joins
16:56:24dxrt joins
16:56:24nine joins
16:56:25Boppen (Boppen) joins
16:56:25BitByBit (BitByBit) joins
16:56:26chrismeller3 (chrismeller) joins
16:56:26kiska (kiska) joins
16:56:26dxrt quits [Changing host]
16:56:26dxrt (dxrt) joins
16:56:26@ChanServ sets mode: +o dxrt
16:56:27ramsey (ramsey) joins
16:56:30Flashfire42 (flashfire42) joins
16:56:31devkev0 joins
16:56:32JTL (JTL) joins
16:56:34yano (yano) joins
16:56:40Doranwen (Doranwen) joins
16:56:40adamus1red (adamus1red) joins
16:56:42kline (kline) joins
16:56:43valdikss joins
16:56:43kansei (kansei) joins
16:56:45nine quits [Changing host]
16:56:45nine (nine) joins
16:56:50mls (mls) joins
16:56:54TheTechRobo (TheTechRobo) joins
16:57:00lennier2_ joins
16:57:04Craigle (Craigle) joins
16:57:09sg72 joins
16:57:09Riku_V (riku) joins
16:57:24Sanqui joins
16:57:25Sluggs (Sluggs) joins
16:57:28fmeppo (fmeppo) joins
16:57:28Sanqui quits [Changing host]
16:57:28Sanqui (Sanqui) joins
16:57:28@ChanServ sets mode: +o Sanqui
16:57:28z0ar5 (z0ar) joins
16:57:29_null (_null) joins
16:57:41BearFortress joins
16:57:45HugsNotDrugs joins
16:57:48jacksonchen666 (jacksonchen666) joins
16:58:02TheEnbyperor_ joins
16:58:06Cronfox (Cronfox) joins
16:58:10Suika joins
16:58:13petrichor (petrichor) joins
16:59:40pie_ quits [Client Quit]
17:00:24Juest (Juest) joins
17:01:11bladem (bladem) joins
17:01:13murb (murb) joins
17:01:16chrismrtn (chrismrtn) joins
17:01:20danwellby joins
17:01:28celestial joins
17:01:34Ryz (Ryz) joins
17:04:17@Fusl quits [Client Quit]
17:04:21crullerIRC joins
17:04:25Fusl (Fusl) joins
17:04:25@ChanServ sets mode: +o Fusl
17:05:04linuxgemini (linuxgemini) joins
17:05:12sensitiveParrot (sensitiveParrot) joins
17:16:33pie_ (pie_) joins
17:43:40Suika_ joins
17:43:59Suika quits [Ping timeout: 272 seconds]
17:44:01IDK quits [Quit: Connection closed for inactivity]
17:44:17Dada joins
17:56:13<@arkiver>got the thread pagination in finally
18:19:55UwU quits [Quit: bye]
18:20:37UwU joins
18:22:50anarcat quits [Client Quit]
18:26:02<h2ibot>KleaBot edited List of websites excluded from the Wayback Machine/Partial exclusions (+0, Reordered websites): https://wiki.archiveteam.org/?diff=60478&oldid=60462
18:29:05IDK (IDK) joins
18:37:39UwU quits [Client Quit]
18:38:16UwU joins
18:43:04<@arkiver>historyhub-grab is up
18:49:30Shard111 (Shard) joins
18:51:57anarcat (anarcat) joins
18:55:33FiTheArchiver joins
18:55:53FiTheArchiver quits [Client Quit]
18:56:02<@arkiver>the historyhub project is up!
18:56:09<@arkiver>very high priority, can shut down any minute
18:56:34<@imer>arkiver: not getting any more items?
18:58:55<@arkiver>imer: not any?
18:58:56<@imer>1=200 https://historyhub.history.gov/f/discussions/46462/dummy Lua runtime error: historyhub.lua:452: bad argument #1 to 'match' (string expected, got nil)
18:59:06<@imer>arkiver: just got one^
18:59:17<@imer>discussion:10975 1=200 https://historyhub.history.gov/f/discussions/10975/dummy Lua runtime error: historyhub.lua:452: bad argument #1 to 'match' (string expected, got nil)
18:59:37<@arkiver>huh
19:01:35<@arkiver>imer: very odd, maybe banned? are you able to check? it should give a 301
19:02:27<@imer>yep "Request unsuccessful. Incapsula incident ID: ..."
19:02:56<@arkiver>urgh :/
19:02:58<@arkiver>in the HTML?
19:03:00<@arkiver>i'll add a check for it
19:03:15<pokechu22>incapsula--
19:03:15<eggdrop>[karma] 'incapsula' now has -45 karma!
19:03:28<@imer>there's a few variants apparently
19:04:58<@imer>(dm'd output)
19:06:10<@imer>it does work in browser, so maybe fingerprinting?
19:08:09UwU quits [Remote host closed the connection]
19:09:15<@arkiver>it could be
19:10:03UwU joins
19:10:47<@arkiver>added --secure-protocol=PFS
19:10:55<@arkiver>but it runs for me at least
19:11:22<@imer>some do work so it's probably not just tls fingerprinting? could just throw all the warriors at it and hope we get through
19:12:20<@arkiver>yeah i made it the default project
19:12:29<@arkiver>i see others finishing items now
19:13:41UwU quits [Client Quit]
19:13:43<pokechu22>I immediately got "https://historyhub.history.gov/f/discussions/6662/dummy" -> "You are banned. Sleeping 1800 seconds." (that one seems to redirect to a login page though, not sure if that's related or not)
19:15:30UwU joins
19:15:34<IDK>1=200 https://historyhub.history.gov/f/discussions/20930/dummy
19:15:34<IDK>You are banned. Sleeping 1800 seconds.
19:15:41<IDK>that was quick
19:15:53<unknownsrc>seems to be country based
19:16:01<unknownsrc>my US vpses work fine, elsewhere doesent
19:16:16<unknownsrc>and i got banned
19:16:17<pokechu22>I'm in the US on a residential connection
19:17:30<@arkiver>pokechu22: does it work in the browser?
19:18:17<IDK>arkiver: for me, it does work in browser first try, even if you disable js
19:18:34<@arkiver>IDK: alright i'll try to find a banned IP and test with that
19:19:08<pokechu22>For me, view-source:https://historyhub.history.gov/f/discussions/6662/dummy in browser immediately gets a challenge unless cookies are set
19:19:59<pokechu22>... but https://historyhub.history.gov/f/discussions/6662/dummy in browser doesn't get a challenge?
19:20:13<unknownsrc>hetzner US seems banned
19:20:19<tmg1|michelson>1=200 https://historyhub.history.gov/f/discussions/22119/dummy
19:20:19<tmg1|michelson>You are banned. Sleeping 1800 seconds.
19:20:22<tmg1|michelson>boo
19:20:50<tmg1|michelson>(vps iceland)
19:20:51<pokechu22>hmm, seems like it works every other request?
19:21:35<tmg1|michelson>(same thing canada residential isp)
19:21:58<@arkiver>pokechu22: got more info on that?
19:22:36<@arkiver>found a banned IP, will do some tests
19:24:04<IDK>yall have a not banned IP? :-)
19:24:04cyanbox joins
19:24:08<pokechu22>Looks like firefox devtools don't log view-source requests (I think that worked before though?), but it seems like the first load without cookies gives `<html>\n<head>\n<META NAME="robots" CONTENT="noindex,nofollow">\n<script src="/_Incapsula_Resource?SWJIYLWA=[REMOVED]">\n</script>\n<body>\n</body></html>\n` and also has set-cookie headers for visid_incap_3185430 and
19:24:10<pokechu22>incap_ses_362_3185430, and those two cookies are sufficient for later requests to work (even without JS)
19:24:16<pokechu22>might be better to discuss details in #UncleSamsArchive though?
19:25:27<tmg1|michelson>trid to set concurrency=1 and that seemed to get one or two successful responses but then ...banned
19:25:37<tmg1|michelson>(on another machine, residential canada isp)
19:26:21<@arkiver>pokechu22: when viewing the website, a little something is POSTed back for the CAPTCHA, that allows one to access it on a simple next try
19:27:35<IDK>zenlayer hk straight up throws 403 :-)
19:27:36<IDK>1=403 https://historyhub.history.gov/f/discussions/32057/dummy
19:27:44<IDK>I guess thats the hard ban
19:28:00<pokechu22>I'm not seeing any POSTs, just 2 https://historyhub.history.gov/_Incapsula_Resource gets (plus a 3rd one as an image that gets rejected)
19:30:48<unknownsrc>seems like one item completes, and then ban
19:30:49pabs quits [Read error: Connection reset by peer]
19:31:03<pokechu22>... OK, and if I keep deleting cookies or maybe if I block https://historyhub.history.gov/_Incapsula_Resource then I get the page with the actual captcha (which also has the "Request unsuccessful. Incapsula incident ID" message and does a POST). But I think the initial one which just /_Incapsula_Resource happens even if not banned
19:31:40pabs (pabs) joins
19:35:53<phillipsjk>On History hub I get an immediate: "1=200 https://historyhub.history.gov/f/discussions/6409/dummy
19:35:54<phillipsjk>You are banned. Sleeping 1800 seconds.
19:35:54<phillipsjk>" message
19:36:23<phillipsjk>Looks like maybe a trap URL
19:37:47<@arkiver>not a trap URL
19:37:56<@arkiver>i'm working on a solution (maybe a crappy one, but it is what it is)
19:38:50<phillipsjk>Last day is apparently today (which I am sure you know)
19:40:31petrichor quits [Ping timeout: 272 seconds]
19:44:48<@arkiver>alright crappy solution incoming
19:46:53<@arkiver>good news is it seems to run well if cookies from brower are provided
19:46:59<@arkiver>also at concurrency 20
19:47:34<@arkiver>new version is out... this will require some manual work
19:47:49<@arkiver>if you're banned it will ask you to provide cookies through an environment variable
19:47:59<pokechu22>How do you do that with the VM?
19:48:41<tmg1|michelson>or docker?
19:48:52<@arkiver>actually maybe not 20
19:48:59<@arkiver>pokechu22: currently only docker :/
19:49:26petrichor (petrichor) joins
19:49:53<@arkiver>i need to see about making some interactive part on the warrior that will ask for information
19:50:09anarcat quits [Client Quit]
19:54:40<@arkiver>yeah looks like a single set of cookies can be used for nearly any concurrency
19:54:41<@imer>god the one time a site has v6 it's making it *more* difficult to bypass
19:55:18<@imer>(can't use browser cookie since the ipv6 of the container is different..)
20:00:06UwU quits [Remote host closed the connection]
20:00:09<IDK>ah thats why its not working
20:00:15<IDK>lemme force v4 real quick
20:00:43UwU joins
20:00:44<@arkiver>sg72: working?
20:04:01cipherrot (petrichor) joins
20:04:12anarcat (anarcat) joins
20:05:51petrichor quits [Ping timeout: 272 seconds]
20:05:56<@arkiver>requeuing for historyhub is enabled, i shall be off now
20:11:06<@arkiver>should be done in a few hours, let's hope they don't kill it before then
20:11:16<@arkiver>there's also some blog entries, we're not getting those at the moment
20:12:50<@imer>i'm not having any luck with the cookies unfortunately
20:13:00<IDK>what command did yall use? im using docker run -d --name ContainerName --label=com.centurylinklabs.watchtower.enable=true --restart=unless-stopped -e HISTORYHUB_COOKIES='[Value]' atdr.meo.ws/archiveteam/historyhub-grab --concurrent 20 IDK
20:13:06<IDK>not working sadly
20:13:56<@imer>arkiver: that's "Cookie: $VALUE" from the browser request and $VALUE goes in the env, right?
20:14:32<@arkiver>imer: yes
20:14:39<@arkiver>but I'll go allow either
20:15:17<IDK>imer: try resolving historyhub.history.gov to 45.60.33.181 instead
20:15:27<IDK>or what your local quad9 resolves for you
20:15:32<tmg1|michelson>run-pipeline3: error: unrecognized arguments: -e HISTORYHUB_COOKIES
20:16:25<IDK>I had recieved my cookies from 45.60.37.181 before and it did not work
20:18:09<@arkiver>tmg1|michelson: it need to go to the docker args, not the pipeline args
20:18:26<@arkiver>imer: update is out to allow both with "Cookie: " prepended and not
20:18:32<@arkiver>i'll really be off now though
20:18:34<@imer>no luck, also tried setting the same user agent. I need to head off for a bit
20:19:20UwU quits [Client Quit]
20:19:28<@arkiver>imer: thanks for trying though
20:19:34<@arkiver>looks like IDK figured it out too
20:19:48<IDK>yep, tho I also figured out running at 40 will get my cookie banned :-)
20:19:58UwU joins
20:20:00<IDK>20 as well
20:20:05<@arkiver>the 403?
20:20:24@arkiver runs at 100
20:20:41<IDK>yep
20:20:56<@arkiver>the 403 is unrelated to the cookie i think
20:21:00<@arkiver>a retry might work
20:21:37<tmg1|michelson>arkiver: that was with IDK 's docker command
20:21:44<tmg1|michelson>minus the label (what's the label do?)
20:22:12<tmg1|michelson>ie the -e literally went to docker ??
20:22:28<IDK>did you include the ' ?
20:22:33<tmg1|michelson>yes
20:22:50<IDK>hm its working fine for me
20:22:52<tmg1|michelson>-e HISTORYHUB_COOKIES='AWSALB=nzp...
20:22:58<@arkiver>tmg1|michelson: i'd recommend `-e "HISTORYHUB_COOKIES=blabla"` (note the ")
20:23:37<IDK>https://blog.codinghorror.com/content/images/2025/05/works-on-my-machine-v2-2025-jon-galloway-1.png
20:23:41<IDK>as usual
20:23:47<tmg1|michelson>run-pipeline3: error: the following arguments are required: DOWNLOADER
20:23:55<@arkiver>fixed the message to be more clear
20:28:28<tmg1|michelson> -e "HISTORYHUB_COOKIES=AWSALB=nzpv... results in the same above message
20:30:58<tmg1|michelson>also tried -e "HISTORYHUB_COOKIES='AWSALB=nzpv...
20:33:36<tmg1|michelson>[looks like the centurylink watchtower thing is some kind of autoupdate functionality]
20:34:01<tmg1|michelson>IDK: i see hits on the tracker for you what did you change?
20:35:35<tmg1|michelson>nulldata: same for you? what command are you running?
20:35:45UwU quits [Client Quit]
20:36:23UwU joins
20:39:04<nulldata>I'm using a docker-compose file
20:41:34<nulldata>tmg1|michelson - Make sure you're using the entire "Cookie" value under Request Headers. It looks like you're using the "set-cookie" values from the response header.
20:41:58<nulldata>Should start with/contain "visid_incap_"
20:45:45<tmg1|michelson>under request headers i see accept: accept-encoding: accept-language: authorization code: connection: cookie:
20:45:51<tmg1|michelson>i am using the value from 'cookie'
20:46:02<tmg1|michelson>it doesn't start with visid_incap
20:49:01<tmg1|michelson>showing raw request headers shows a little more detail but fundamentally the same data (ie not visid_incap)
20:49:57<nulldata>https://tl.nulldata.foo/uploads/fdd7f36580ba5b5e/image.png
20:50:33<tmg1|michelson>wait, visid_incap is in there
20:51:24<tmg1|michelson>yeah my cookie doesn't start with visid_incap
20:54:30<nulldata>https://tl.nulldata.foo/uploads/3ceeab078bbac1f1/hhdockercompose.zip
20:55:24<nulldata>^ my docker-compose file. Just open .env, fill your cookie in "HISTORYHUB_COOKIES=", and then run docker-compose up -d on that folder
20:55:25<tmg1|michelson> https://shitposter.world/notice/B3I3dkCJum3ZfjtPYe
21:04:32<nulldata>Or actually with newer docker the command should be "docker compose up -d"
21:05:50Webuser452607 joins
21:06:13Webuser452607 quits [Client Quit]
21:06:40<tmg1|michelson>i was able to get that running but it still says
21:06:42<tmg1|michelson>1=200 https://historyhub.history.gov/f/discussions/19525/dummy
21:06:42<tmg1|michelson>You are banned. THE SOLUTION:
21:07:24<tmg1|michelson>it's showing that my environment variable is making its way in there: logs from inside the docker container show
21:07:27<tmg1|michelson>Using header Cookie: AWSALB=nzpv/VCFw
21:10:34<nulldata>I dunno. Maybe try getting to the site in a different browser/private tab and see if you get a different cookie to try.
21:12:52<@JAA>Shouldn't all of this be in #UncleSamsArchive?
21:16:17<Guest>is anyone else getting a default nginx page on archive.today?
21:17:56<nulldata>Guest - works here
21:18:07<nulldata>Try archive.is
21:18:09<Yakov>works for me as well
21:18:17<Yakov>.today redirects me to
21:18:21<Yakov>.ph
21:22:40<@JAA>Guest: IIRC, that's some sort of ban.
21:23:39<Guest>thats weird, this is on a residential ip (in the browser too)
21:23:46<@JAA>Also consider clearing your cookies for the domain.
21:24:43<Guest>and archive.is doesnt work either, it just hangs. requests for all of the subdomains hang for a while (until http timeout) after getting the nginx page.
21:24:55<Guest>and clearing cookies didnt work either :p
21:26:00<Guest>this is the url if anyone is interested: <http://archive.today/medium.com/@thequeryabhishk/the-json-performance-hack-that-every-go-developer-should-know-but-90-dont-b7de213c6d66> . archive.today switched to http when it shows the nginx page but i believe the site uses https.
21:29:10<@JAA>That's what I observed the last time as well: nginx page, then timeouts for a while. If you clear your cookies, it should work again after that ban expires.
21:35:43<Guest>thanks that worked
21:35:48<phillipsjk>I posted here because the wiki did not have a page telling me where to go.
21:36:42<Guest>for anyone else that might read from irclogs, you have to wait a little (after the timeouts), clear cookies for the site, and THEN visit the site. otherwise if you clear while on the site it could refresh and you are banned again.
21:42:21<Yakov>What does it take to get banned? because I've never got a default nginx page on archive.today before.
21:45:08ericgallager joins
22:03:57sec^nd quits [*.net *.split]
22:03:57SootBector quits [*.net *.split]
22:04:18sec^nd (second) joins
22:05:06SootBector (SootBector) joins
22:06:18nexussfan (nexussfan) joins
22:06:41<Guest>i had the same issue yesterday but im not sure. maybe someone else can answer. i didnt use archive.today for a whole week before this issue. after that (yesterday) i started getting the errors.
22:25:59etnguyen03 (etnguyen03) joins
22:35:57aninternettroll quits [Ping timeout: 272 seconds]
22:46:09aninternettroll (aninternettroll) joins
22:56:40<nicolas17>is historyhub the default project? if it needs manual intervention for cookies I think it shouldn't be...
23:17:06<nicolas17>https://bsky.app/profile/kendraserra.bsky.social/post/3merjyvt5322g
23:17:30<nicolas17>Ars Technica ran an article with a bunch of fake quotes and then took it down
23:18:08<nicolas17>it was archived via Save Page Now
23:18:23<nicolas17>but this could have been lost very easily
23:19:09<nicolas17>do we need something more systematic to grab news articles immediately on publication?
23:20:53<@JAA>#// might well have fetched it as well. We should be grabbing Ars Technica every 15 minutes there.
23:21:07<nicolas17>oh right that could be delayed
23:21:39<nicolas17>there are currently 3 captures, they are all from SPN, but I didn't take into account that // might have fetched it and it didn't get uploaded/indexed yet
23:31:25Dada quits [Remote host closed the connection]
23:33:54iPwnedYourIOTSmartdog6 joins
23:36:07iPwnedYourIOTSmartdog quits [Ping timeout: 272 seconds]
23:36:08iPwnedYourIOTSmartdog6 is now known as iPwnedYourIOTSmartdog
23:38:03Snivy (Snivy) joins
23:52:51Arcorann_ (Arcorann) joins