00:10:19c3manu quits [Client Quit]
00:48:12hitgrr8 quits [Client Quit]
01:11:30Dango360 (Dango360) joins
01:23:50<h2ibot>Flashfire42 edited List of websites excluded from the Wayback Machine (+25): https://wiki.archiveteam.org/?diff=51396&oldid=51390
01:25:33ScenarioPlanet quits [Client Quit]
01:34:52<h2ibot>OrIdow6 edited Google Drive (+1594, Make some of my research useful for future…): https://wiki.archiveteam.org/?diff=51397&oldid=50420
01:35:44monoxane (monoxane) joins
01:55:57<fireonlive>OrIdow6++
01:55:58<eggdrop>[karma] 'OrIdow6' now has 1 karma!
01:56:34<fireonlive>sites do privately appear in folders at leaast but hm
01:57:09parfait (kdqep) joins
02:00:57<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51398&oldid=51396
02:15:55<@JAA>ETA for OneHallyu is 4 days, 3 hours. Probably not going to finish in time.
02:16:39RealPerson joins
02:17:39mcint (mcint) joins
02:22:01<h2ibot>OrIdow6 edited Google Drive (+86, New discoveries involving Sites in Drive): https://wiki.archiveteam.org/?diff=51399&oldid=51397
02:41:46Megame (Megame) joins
02:43:38monoxane quits [Client Quit]
02:47:46RealPerson leaves
02:53:35fireonlive waits
02:54:00<fireonlive>cron pls
02:54:06<h2ibot>FireonLive edited Current Projects (+95, attempt to clean up/make easier to read the…): https://wiki.archiveteam.org/?diff=51400&oldid=51243
02:54:10<fireonlive>cc JAA/arkiver
02:54:18monoxane (monoxane) joins
02:56:06<@arkiver>looks good fireonlive
02:56:19<fireonlive>=]
02:56:31<@arkiver>i'm not sure if we still need the ukraine/russian sites project there, it's not running since a long time
02:56:56<fireonlive>ah good point
02:57:35<fireonlive>Long-term, perpetual projects?
02:57:40<fireonlive>did we have a word for 'basically forever'
02:57:49<fireonlive>an internal word that is
02:58:07<fireonlive>"occurring repeatedly; so frequent as to seem endless and uninterrupted."
02:58:09<fireonlive>that works
02:58:30<@arkiver>i'd just say long term
02:58:40<@arkiver>can't promise we'll keep them running forever
02:58:43<@JAA>I've used 'continuous' before, but doesn't really say much.
02:58:50<@JAA>Yeah, 'long term' is good.
02:59:46<fireonlive>ah ok
03:00:09<@JAA>Nothing will keep running forever. The heat death of the universe will consume it all.
03:00:17<fireonlive>yep :)
03:00:27<@arkiver>so we might as well fade out now?
03:00:28<fireonlive>50/50 on leaving a blank section, but will leave an empty medium for now, to show that it 'can' exist
03:00:34<fireonlive>arkiver: that's my dream
03:00:55<@arkiver>ouch
03:01:01<@JAA>'(none currently)'?
03:01:15<fireonlive>ah that works
03:01:27<@JAA>Rather than just an empty section, which may look weird.
03:01:56<@JAA>I plan on hanging out on this channel until the heat death. :-)
03:02:06<fireonlive>:)
03:02:15<fireonlive>have we had a scripts only project in the past N years
03:02:33<fireonlive>*removes commented out section*
03:04:11<h2ibot>FireonLive edited Current Projects (-133): https://wiki.archiveteam.org/?diff=51401&oldid=51400
03:05:07<fireonlive>arkiver: "2019-202? coronavirus outbreak: Documenting and preserving data, events, and impacts of the virus on society. IRC Channel #coronarchive (on hackint)" < would you call this one not running as well?
03:05:19<@arkiver>yes
03:05:22<fireonlive>kk
03:06:15<h2ibot>FireonLive edited Current Projects (-167, remove coronavirus): https://wiki.archiveteam.org/?diff=51402&oldid=51401
03:06:36<fireonlive>wow i cured the world i guess
03:06:37<fireonlive>:p
03:07:10<TheTechRobo>Photobucket did the purge *long* ago, right?
03:07:14<TheTechRobo>Should that be removed from upcoming?
03:07:47<TheTechRobo>also I feel like some of these hiatuses will never be unhiatused (is that a word?)
03:07:50<TheTechRobo>ex. Audit 2014
03:08:28<TheTechRobo>finally, https://wiki.archiveteam.org/index.php/NewsGrabber has been largely replaced with #//, right? should the wiki page be updated with that info?
03:08:29<fireonlive>there was a line on the audit 2014 haitus bullet that it would be done in 2016 that i removed a few months ago
03:08:47<TheTechRobo>oh, it does say under project status
03:08:50<fireonlive>re: NewsGrabber it does say "Archiving status Project superseded by URLs"
03:08:51<fireonlive>ye
03:09:03<fireonlive>lemme just...
03:09:57sec^nd quits [Remote host closed the connection]
03:10:18<h2ibot>TheTechRobo edited NewsGrabber (+51, replaced with #//): https://wiki.archiveteam.org/?diff=51403&oldid=50757
03:10:26sec^nd (second) joins
03:10:43<fireonlive>hey you
03:10:47<fireonlive>conflicting my edit
03:10:52<fireonlive>.
03:11:19<h2ibot>FireonLive edited NewsGrabber (+74): https://wiki.archiveteam.org/?diff=51404&oldid=51403
03:11:39<fireonlive>oops forgot a message lol
03:13:18<h2ibot>TheTechRobo edited NewsGrabber (+1, Add a period): https://wiki.archiveteam.org/?diff=51405&oldid=51404
03:13:19<h2ibot>TheTechRobo edited URLs (+222, Add urls-sources): https://wiki.archiveteam.org/?diff=51406&oldid=50427
03:14:19<h2ibot>FireonLive edited Current Projects (+0, alphabetize "on hiatus"): https://wiki.archiveteam.org/?diff=51407&oldid=51402
03:14:19<@arkiver>oh yeah newsgrabber was kind of out predecessor to #//
03:15:01<fireonlive>https://wiki.archiveteam.org/index.php/Project_Newsletter < neat idea
03:15:36<TheTechRobo>> yt-dlp can be used to download article URLs, making it possible to preserve news in video-form just as well as news in text-form.
03:15:36<TheTechRobo>I don't think we have that in URLs, do we? I suppose the storage would get unwieldy
03:15:52<TheTechRobo>Might be nice for high-value stuff, though
03:16:17<fireonlive>archivebot used to use youtube-dl (before the fork) but not any longer
03:16:42<TheTechRobo>Yeah
03:16:52<TheTechRobo>That integration was always jank IIRC though
03:16:59<@arkiver>we don't use yt-dlp in any project, except for the bot in #down-the-tube to discover videos of a channel for queuing
03:17:06<@arkiver>err
03:17:09<TheTechRobo>arkiver: I thought yt-dlp was replaced in the bot?
03:17:20<@arkiver>any Warrior project i should say
03:17:24<fireonlive>arkiver: is the bot on git :p
03:17:29<@arkiver>TheTechRobo: only partially replaced
03:17:40<@arkiver>fireonlive: no, it has keys that i didn't separate out yet
03:17:44<@arkiver>but yes i should get it on git
03:17:44<fireonlive>ahh np
03:17:47TheTechRobo asked about that before :P
03:17:52<fireonlive>no rushy
03:18:03<@JAA>Yes please :-)
03:18:03<TheTechRobo>arkiver: should Photobucket be removed from upcoming/proposed? or is it still planned?
03:18:09<@arkiver>just have to free up some time for that
03:18:12<TheTechRobo>Can we have the tracker next?
03:18:19<@arkiver>TheTechRobo: i don't think it's planned at the moment
03:18:24<fireonlive>good luck for tracker :P
03:19:00<@arkiver>TheTechRobo: it current tracker is so very duck taped together (with sensitive stuff spread across it), that it will likely not be released publicly any time soon
03:19:01<TheTechRobo>I've been asking ever since I touched Seesaw. Universal-tracker is, despite the name, not very universal
03:19:01<fireonlive>arkiver: oh, one more q: are the IDs it generates stored in a database or something somewhere alongside the explanation provided/project/etc?
03:19:10<@arkiver>i believe the old tracker on github should still somewhat work?
03:19:16<fireonlive>or is it mainly for irc logs?
03:19:17<TheTechRobo>arkiver: Somewhat is right.
03:19:25<TheTechRobo>fireonlive: I also have the same question about `-e`
03:19:37<@arkiver>fireonlive: the bot for queuing you main? they are currently only in the logs
03:19:43<@arkiver>together with the explanation, only in the logs
03:19:43<TheTechRobo>arkiver: No backfeed, slow, no offloader, etc
03:19:45<fireonlive>ye indeed
03:19:51<fireonlive>ah ok :)
03:19:57<@arkiver>TheTechRobo: yeah
03:20:14<fireonlive>eventually ™
03:20:17<fireonlive>:D
03:20:21<@arkiver>i guess :/
03:20:23<fireonlive>tracker is more understandable
03:20:32<fireonlive>so i don't hold that one against y'all lol
03:20:46<@arkiver>:)
03:20:49<fireonlive>:)
03:20:50<TheTechRobo>The lack of an offloader was the main reason I never archived very much of Strawpoll. Whenever tracker was running, even idle, ~4GB of RAM usage because everything was in memory
03:21:39<@arkiver>i could have setup a project for that, if i was aware
03:21:41<TheTechRobo>Maybe a project for 2024. Building Universal-tracker 3 :P
03:21:52<@arkiver>set up*
03:21:52<TheTechRobo>arkiver: No shutdown notice, I just felt like archiving it
03:21:58<@arkiver>ah okey
03:22:03<@arkiver>it's still online?
03:22:05<TheTechRobo>No
03:22:07<fireonlive>i'm sure someone will set something big on fire on 2024
03:22:14<@arkiver>went offline without shutdown notice?
03:22:18<fireonlive>well a lot of somethings
03:22:23<TheTechRobo>arkiver: No idea
03:22:33<TheTechRobo>I thought "y'know maybe I should continue archiving strawpoll" and it was ded
03:22:47<@arkiver>fireonlive: maybe, i expected more to burn down with higher interest rates. maybe that will come next year still as the rates stay somewhat high and companies need to refinance
03:22:59<@arkiver>TheTechRobo: sad :/
03:23:03<TheTechRobo>Yeah
03:23:16<fireonlive>https://support.fandom.com/hc/en-us/articles/7951865547671-August-2022-StrawPoll-me-closure
03:23:16<fireonlive>updated 2023-02-09, closed 2022-08
03:23:18<TheTechRobo>I did get a bunch of polls, but nowhere near everything :/
03:25:07<@arkiver>are they on IA?
03:25:53<fireonlive>https://nitter.net/StrawPollme
03:25:58<TheTechRobo>arkiver: i think? this was when I was very new to ATY
03:25:59<TheTechRobo>*AT
03:25:59<fireonlive>their twitter kinda died lol
03:26:10<TheTechRobo>ah: https://archive.org/details/strawpoll-my-grab
03:26:22<h2ibot>TheTechRobo edited Strawpoll.me (+2, Update info): https://wiki.archiveteam.org/?diff=51408&oldid=49804
03:26:24<fireonlive>apparently they were having technical issues? and i guess didn't want to spend resources into fixing it
03:26:35<TheTechRobo>fireonlive: lol
03:26:58<fireonlive>https://old.reddit.com/r/NoStupidQuestions/comments/rzwnpk/is_strawpollme_going_to_be_broken_forever/
03:27:19<fireonlive>i vaguely remember others saying 'not to use the .me' version as well
03:28:44<fireonlive>ooh, dark IA items with poll data :3
03:29:26<TheTechRobo>fireonlive: Yeah, not sure what's up with that
03:41:49<@arkiver>TheTechRobo: how did you archive it?
03:41:57<@arkiver>meaning how was the WARC created
03:43:29<fireonlive>looks like v1.20.3-at of https://github.com/archiveteam/wget-lua
03:43:39<fireonlive>from https://git.thetechrobo.ca/TheTechRobo/strawpoll-grab/src/branch/master/get-wget-lua.sh anyhow
03:45:20<fireonlive>https://git.thetechrobo.ca/TheTechRobo/strawpoll-grab/src/branch/master/pipeline.py#L161
03:50:56<@arkiver>thanks
03:51:11<@arkiver>TheTechRobo: i've moved the strawpoll item to archiveteam-fire , it will soon be in the Wayback Machine
03:51:43<fireonlive>i have my own collection?
03:51:47<fireonlive>:D
03:52:07<fireonlive>also sweet news :)
03:53:21<@arkiver>hah i guess so :)
03:53:54<fireonlive>:3
04:00:55<TheTechRobo>arkiver: Holy shit lmao
04:04:21<@arkiver>TheTechRobo: ?
04:05:04<TheTechRobo>arkiver: My shitty code made it into the WBM! :P
04:05:56<@arkiver>well as long as the records are fine, it should be good :)
04:06:40<TheTechRobo>I don't even think Wget-AT *lets* you write invalid records :P
04:07:00<TheTechRobo>Well, I guess you could override DNS
04:07:02<@arkiver>yeah :)
04:07:09<@arkiver>for DNS yes i guess
04:07:11<TheTechRobo>But you could do that anyway
04:08:51<TheTechRobo>Wget-AT is amazing
04:09:00<TheTechRobo>Wget-AT++
04:09:01<eggdrop>[karma] 'Wget-AT' now has 2 karma!
04:09:16<@arkiver>thanks :)
04:09:21<@arkiver>many improvements coming up!
04:09:35<fireonlive>=]
04:09:41DogsRNice quits [Read error: Connection reset by peer]
04:09:51@arkiver is preparing a response to the recent responses from the TLS working group on our proposed mime types and URIs for SSL/TLS
04:10:58<fireonlive>good luck with those IETF types
04:11:25<TheTechRobo>I'd also suggest adding some sort of unit testing
04:12:02<fireonlive>i'm sure it's on mind
04:12:11<TheTechRobo>Yeah
04:26:40<@arkiver>thanks...
04:44:08<fireonlive>🆕 !tell now supports hostmasks (nick!user@host) e.g. !tell *!*@balls.example hello
04:44:21<fireonlive>(with wildcards)
04:53:58Megame quits [Client Quit]
05:20:26<Ryz>Mmm, welp, from a random checking of links to ignore on some ArchiveBot jobs, sadly https://forum.mobilelegends.com/ has shut down earlier this year on April 30
05:20:50<Ryz>...I don't think we have much people in the mobile game area of things strongly about it :c
06:12:19benjins joins
06:12:55benjins2_ joins
06:13:43benjins2 quits [Ping timeout: 272 seconds]
06:13:43benjinsm quits [Ping timeout: 272 seconds]
07:16:37Island quits [Read error: Connection reset by peer]
07:53:23Megame (Megame) joins
08:04:48hitgrr8 joins
08:48:01eggdrop is now known as eggdrop1930
08:48:44eggdrop1930 is now known as eggdrop
09:32:12c3manu (c3manu) joins
10:00:00Bleo18260 quits [Client Quit]
10:01:19Bleo18260 joins
10:21:18Arcorann quits [Ping timeout: 265 seconds]
10:22:45Arcorann (Arcorann) joins
10:38:27Kitty quits [Ping timeout: 272 seconds]
10:38:40Kitty (Kitty) joins
11:27:20icedice (icedice) joins
11:35:48icedice quits [Client Quit]
11:59:44qwertyasdfuiopghjkl quits [Remote host closed the connection]
12:01:40benjinsm joins
12:04:50benjins quits [Ping timeout: 240 seconds]
12:15:59parfait quits [Ping timeout: 272 seconds]
12:24:39ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
12:25:29ThetaDev joins
12:39:03kiryu_ joins
12:41:20kiryu quits [Ping timeout: 240 seconds]
12:55:03icedice (icedice) joins
13:04:20Arcorann quits [Ping timeout: 240 seconds]
13:28:17magmaus3 quits [Quit: :3]
13:30:59magmaus3 (magmaus3) joins
14:18:33Hackerpcs quits [Quit: Hackerpcs]
14:26:43Hackerpcs (Hackerpcs) joins
14:27:44atphoenix quits [Remote host closed the connection]
14:28:28atphoenix (atphoenix) joins
14:50:02Megame quits [Client Quit]
14:58:48yano quits [Quit: WeeChat, the better IRC client, https://weechat.org/]
15:02:12BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
15:02:17yano (yano) joins
15:06:27RealPerson joins
15:24:07qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
15:28:03RealPerson leaves
16:00:14jacksonchen666 is now known as RJHacker56722
16:00:18jacksonchen666 (jacksonchen666) joins
16:03:29RJHacker56722 quits [Ping timeout: 250 seconds]
16:07:07jacksonchen666 quits [Remote host closed the connection]
16:09:02kiryu_ quits [Client Quit]
16:10:16kiryu joins
16:10:16kiryu quits [Changing host]
16:10:16kiryu (kiryu) joins
16:11:05jacksonchen666 (jacksonchen666) joins
16:16:39itachi1706 quits [Ping timeout: 272 seconds]
16:20:23jacksonchen666 quits [Ping timeout: 250 seconds]
16:21:14RealPerson joins
16:21:36itachi1706 (itachi1706) joins
16:36:43RealPerson leaves
16:42:00BearFortress joins
16:45:55jacksonchen666 (jacksonchen666) joins
16:47:02User joins
18:09:45<SketchCow>Hi Jason,
18:09:45<SketchCow>Sorry to bother you, but Joe Baugher has died. He wrote up so much about
18:09:45<SketchCow>aviation throughout the years and his articles are invaluable. Would you
18:09:45<SketchCow>mind asking the Archive team to archive his home page one last time?
18:09:45<SketchCow>https://www.joebaugher.com/
18:09:47<SketchCow>All the best,
18:09:50<SketchCow>Chris
18:41:13User5 joins
18:43:18RealPerson joins
18:43:29User quits [Ping timeout: 265 seconds]
18:43:35icedice quits [Client Quit]
18:44:25<c3manu>SketchCow: I’m not jason, but i think this is something we can do :)
18:45:45<c3manu>oh wait. you're jason >.<
18:46:51Lord_Nightmare quits [Client Quit]
18:47:08<c3manu>it’s queued :)
18:49:57<Nulo|m>is there something easy to quickly (multi connection) download a list of urls (without following links) into a warc?
18:51:48<c3manu>Nulo|m: i’m not as experienced as other users here (which might have better answers for you), but you shouldd just be able to use wget for that
18:52:04<Nulo|m>i guess just have to make a script to run many wget right?
18:52:16<Nulo|m>also i can't find a flag to not download files into a file when i'm already downloading them into a warc in wget
18:52:39Lord_Nightmare (Lord_Nightmare) joins
18:53:54<c3manu>Nulo|m: well, it has to download them, but there's the --delete-after flag which gets rid of them once they're in the warc
18:54:33<Nulo|m>thanks!
18:54:35<c3manu>wget has a --background mode, but i would assume they then cannot write into the same warc file
18:54:47<c3manu>if you have multiple running i mean
18:56:15<c3manu>there's also wpull, a wget fork, which archivebot also uses to download things. that one supports concurrency, but depending on your python version it might be a little fidddly to set up: https://github.com/ArchiveTeam/wpull
18:57:01<c3manu>correction: it's not a fork, just another tool. my bad
18:57:39<c3manu>what do you need the warc for, if i may ask?
18:58:00<Nulo|m>i'm downloading product pages to then scrap them offline
18:58:51<c3manu>why the warc then, and not just the pages themselves?
18:59:27<Nulo|m>because if i need to pull more info later that i wasn't scrapping before, i can still just pull from the warc
18:59:46<Nulo|m>also my scraper is kind of hacky so if it's bad i can just re-run it on the WARCs
19:00:05<c3manu>i see, that makes sense.
19:00:07<Nulo|m>also i should be able to run the scraper on WARCs from archive.org or other sources :)
19:02:51<c3manu>ok. apart from wpull i am running out of ideas. hopefully someone else can give you better answers when they're back :)
19:02:58<c3manu>did you have a look at https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem already?
19:04:13<Nulo|m>no, thank you!
19:04:22<Nulo|m>i think i'll make a script based on wget though
19:04:49<fireonlive>there’s wget-at to :)
19:04:50<fireonlive>too
19:05:24<Nulo|m>yah but wget works fine for me and i believe wget-at doesn't have multi-connection, just improved warc stuff?
19:05:40<c3manu>ah, that would probably be the fork than that i confused wpull with earlier
19:06:19<fireonlive>improved warc stuff sounds pretty paramount :3
19:07:05<Nulo|m>hehe but the warcs generated by gnu wget work fine with warcio.js which is what i'm using so 👍️
19:29:54RealPerson leaves
19:53:16RealPerson joins
19:55:25c3manu quits [Remote host closed the connection]
20:06:49RealPerson leaves
20:08:50DogsRNice joins
20:24:54RealPerson joins
20:30:56RealPerson leaves
20:34:49BlueMaxima joins
20:37:39Ruthalas59 quits [Quit: END OF LINE]
20:38:00Ruthalas59 (Ruthalas) joins
20:42:43nicolas17 is now known as nicolas17_bot
20:53:19<@JAA>I'm averaging 5k OneHallyu topics per hour now. They went read-only at 2023-12-20T11:23Z or so (date of the last post by an admin). If they shut it down at the same time of day, I expect to have covered about 81% of the topics.
20:54:09<nicolas17_bot>more parallelism/IPs unlikely to help?
20:54:27<@JAA>Their potato is too slow.
20:54:39<@JAA>6 second average response time.
20:59:45<@JAA>Let's see what happens if I throw more at it...
21:00:19Barto observes an explosion in the horizon
21:00:41<nicolas17_bot>also try less, if there's resource contention on the server it could have weird effects
21:02:15<@JAA>Can't easily go to less, but yeah, I might if this makes it worse.
21:02:33<nicolas17_bot>("half the threads, 2 second response time" would be a net win, though unlikely)
21:02:34<@JAA>Average response time now: 8404 ms ._.
21:02:39Ruthalas59 quits [Client Quit]
21:02:40<fireonlive>x_x
21:02:42nicolas17_bot is now known as nicolas17
21:03:14<@JAA>Throughput still went up a bit though.
21:04:00<nicolas17>hm how's your network-layer latency to their server?
21:04:51<@JAA>They hide behind Buttflare, so no idea.
21:05:08<nicolas17>oh :|
21:05:33<nicolas17>that latency is also irrelevant if they're in CF
21:06:33<@JAA>Depends on what their backend looks like, but the point is rather that I can't measure it anyway.
21:07:11<nicolas17>if there wasn't CF, doing the crawl from somewhere closer could help
21:07:51<@JAA>Possibly, although it can usually be balanced by higher concurrency.
21:22:50User joins
21:25:24User5 quits [Ping timeout: 265 seconds]
21:54:22Ruthalas59 (Ruthalas) joins
21:54:30<@JAA>I'm back down to the same throughput from before I increased the concurrency.
22:06:34Island joins
22:24:42RealPerson joins
22:31:41User quits [Client Quit]
22:34:38parfait (kdqep) joins
22:38:19RealPerson leaves
22:39:00User joins
23:21:34<tech234a>“Bluesky makes web view public, login no longer required to read posts” https://news.ycombinator.com/item?id=38739130
23:21:58Island quits [Read error: Connection reset by peer]
23:22:38<fireonlive>nicolas17: feel free to use #fire-spam for testing
23:23:03<fireonlive>everyone got to witness the bee movie so what's a bit more :p
23:24:37Island joins
23:32:37Arcorann (Arcorann) joins
23:45:50aninternettroll quits [Ping timeout: 240 seconds]
23:50:24tzt quits [Remote host closed the connection]
23:50:47tzt (tzt) joins