00:05:27<fireonlive>oof
00:08:52AmAnd0A quits [Read error: Connection reset by peer]
00:09:07AmAnd0A joins
00:39:10qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
00:47:07qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
00:55:56qwertyasdfuiopghjkl is now known as qwertyasdfuiopghjkl_
00:56:15qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
01:01:03qwertyasdfuiopghjkl quits [Client Quit]
01:01:47qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
01:03:40qwertyasdfuiopghjkl_ quits [Client Quit]
01:08:30<h2ibot>FireonLive edited Mailman2 (+44, Add CA/Browser Forum): https://wiki.archiveteam.org/?diff=50159&oldid=50149
01:32:00TheTechRobo quits [Client Quit]
01:38:37qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
01:41:37qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
01:51:22floridaexists joins
01:51:34floridaexists quits [Remote host closed the connection]
02:54:05TheTechRobo (TheTechRobo) joins
02:54:17icedice quits [Client Quit]
03:46:05dumbgoy quits [Ping timeout: 252 seconds]
04:02:39dumbgoy joins
04:05:32sec^nd quits [Remote host closed the connection]
04:05:50sec^nd (second) joins
04:11:04<h2ibot>PaulWise created Bugzilla (+3994, add project to archive bugzilla instances): https://wiki.archiveteam.org/?title=Bugzilla
04:11:39<pabs>JAA: ^
04:11:43<fireonlive>pabs: 👍
04:12:13pabs just airing out his todo/archive-* lists :)
04:12:26<pabs>hope other folks can/want to help with them :)
04:13:19<fireonlive>:)
04:16:04<nicolas17>that's always tricky wrt cooperative instances
04:17:13<pabs>hm?
04:17:14<nicolas17>like, I *could* give you a DB dump of the KDE forum and you avoid having to scrape it, but it would include private messages, so I would need to figure out what tables to exclude
04:17:52<pabs>scraping is probably better anyway so it ends up in the WBM?
04:18:13<nicolas17>same for bugzilla, there's private tickets sometimes
04:18:17<nicolas17>yeah true
04:18:20<pabs>there are similar issues with GitLab/etc instances too
04:18:33<nicolas17>guess the most helpful thing there is an admin providing IDs then
04:18:57<pabs>the buglist.cgi search on the page can handle that I think
04:19:09<nicolas17>and if I don't bother filtering out stuff and give you the ID of a private ticket, you can't fetch that anyway
04:19:34<nicolas17>pabs: I meant more broadly (IDs of forum posts, gitlab project list, etc)
04:19:45<pabs>ack yeah
04:21:02asda joins
04:21:41<nicolas17>and that reminds me I should update https://archive.org/details/kde-git-repositories
04:22:11<fireonlive>incoming shit
04:22:23dumbgoy quits [Ping timeout: 252 seconds]
04:22:42<pabs>are KDE git repos on SWH or the TODO for Codearchiver?
04:23:06<h2ibot>FireonLive edited Discourse (+360489, Add in uncategorized forums that don't require…): https://wiki.archiveteam.org/?diff=50161&oldid=50148
04:23:09<fireonlive>there it is
04:23:31<fireonlive>i don't love it but also don't want to lose it 🤷
04:23:44<pabs>hmm, only KDE phabricator on https://archive.softwareheritage.org/coverage/
04:23:48<fireonlive>i like pabs' layout more but not my page
04:24:28<nicolas17>"+360489" wow
04:24:43<fireonlive>i guess i coulda manually visited all 4k links myself :D
04:25:07<fireonlive>it'd have to be like right after a certain something in the day
04:26:07<h2ibot>Pokechu22 edited Bugzilla (+35, /* Archived */…): https://wiki.archiveteam.org/?diff=50162&oldid=50160
04:26:10<fireonlive>watch next, wherein fireonlive edits 4 TiB into the wiki to hold some personal backups
04:26:52<nicolas17>pabs: when I offered stuff to softwareheritage they were in "we're busy getting started and archiving stuff from big sites like github" mode and would get to custom stuff later
04:27:06icedice (icedice) joins
04:27:29<pabs>nicolas17: they now have a self-service(ish) thing for archiving gitlab and other forge types
04:27:31<fireonlive>pabs: should there be a section for dead bugzillas?
04:27:31<nicolas17>then it seems 7 years passed and they didn't bother contacting KDE? time flies
04:27:46<pabs>https://archive.softwareheritage.org/add-forge/request/
04:27:49<nicolas17>https://wiki.softwareheritage.org/wiki/Suggestion_box:_source_code_to_add/KDE
04:28:26<pabs>yeah, I sense they are not well organised or under-resourced technically
04:28:55<fireonlive>they still use svn :o
04:28:58<pabs>er better link https://archive.softwareheritage.org/add-forge/request/list/
04:29:07<h2ibot>FireonLive edited Bugzilla (+37, add The Document Foundation): https://wiki.archiveteam.org/?diff=50163&oldid=50162
04:29:17<nicolas17>I was almost expecting to find "freenode" mentioned in https://wiki.softwareheritage.org/wiki/IRC_channels :P
04:29:38<fireonlive>haha
04:29:46<pabs>ah, I already submitted https://invent.kde.org/ there, it is pending on them contacting the KDE folks though
04:30:42<fireonlive>i find it interesting they ask for random gitlab (gittea/etc) intances but not for users' github (or gitlab.com?) repos
04:30:53<pabs>they archive all of github
04:30:56<fireonlive>is it just because of potential costs i wonder or something else
04:31:06<pabs>and gitlab.com and many other gitlab sites
04:31:16<pabs>fireonlive: re dead bugzillas, yeah probably, for folks to look up old archives in the WBM?
04:31:19<fireonlive>ye but they stop to ask KDE 'can we' first
04:31:25<fireonlive>pabs: ye i was thinking so
04:31:34<fireonlive>versus just everyone on github
04:31:41<pabs>right
04:31:56<fireonlive>wonder why the difference
04:32:11<nicolas17>fireonlive: if they stop to ask KDE "can we svnmirror your entire SVN repository", we'll tell them "no, we can just send you a tarball!"
04:32:14<pabs>maybe in case they overload the sites?
04:32:19<fireonlive>ah perhaps
04:32:31<fireonlive>nicolas17: true in that case :)
04:33:27<fireonlive>https://wiki.softwareheritage.org/wiki/IRC#IRC_access_list pffft no groupserv
04:33:53<fireonlive>oh i guess it's channel-based
04:34:14<fireonlive>so i'll allow it lol
04:34:18<nicolas17>pabs: I originally created the kde-git-repositories item on archive.org when some Russian devs were worried about Internet blockages, or depeering strongly affecting their bandwidth, and this way they could use bittorrent
04:46:40Island quits [Read error: Connection reset by peer]
05:00:38nicolas17 quits [Client Quit]
05:06:23nicolas17 joins
05:07:01nicolas17 quits [Client Quit]
05:14:55<fireonlive>russians? in MY kde? it's more likely than you think!
05:18:03asda quits [Ping timeout: 265 seconds]
05:44:33<pabs>https://techcrunch.com/2023/07/10/vanmoof-the-e-bike-darling-skids-off-track-sales-paused-execs-depart/
05:58:14hitgrr8 joins
06:08:00<Barto>pabs: ab goes brr
06:16:16<fireonlive>brrrrrrrrr
06:50:12BigBrain (bigbrain) joins
06:53:46BigBrain_ quits [Ping timeout: 245 seconds]
06:54:24Miori quits [Quit: The Lounge - https://thelounge.chat]
07:03:22BlueMaxima quits [Read error: Connection reset by peer]
07:04:58yts98 leaves
07:05:00yts98 joins
07:09:02pabs quits [Ping timeout: 252 seconds]
07:09:35pabs (pabs) joins
07:34:45Arcorann (Arcorann) joins
08:02:41IDK (IDK) joins
08:07:17Miori joins
08:10:36qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
08:24:41VickoSaviour joins
08:24:56nulldata quits [Ping timeout: 252 seconds]
08:25:34<VickoSaviour>progaming.ba forum is up for a limited amount of time
08:26:39<VickoSaviour>even tho is wall locked, i have the username and password to get all of the files on.
08:27:11nulldata (nulldata) joins
08:27:46Doomahol1 quits [Read error: Connection reset by peer]
08:28:50<VickoSaviour>just send me pm on hackint
08:30:21Doomahol1 joins
09:24:41tbc1887 quits [Read error: Connection reset by peer]
09:40:39<@OrIdow6>Why would replayweb.page say that an URL is in the WARC when listing requests, but claim it wasn't found when I try to view it?
09:42:31<@OrIdow6>Whatever, record's in the file
09:47:47<pabs>Barto: another one for you https://simpleflying.com/wisk-aero-boeing-subsidiary/ :)
09:54:08Jake quits [Quit: Leaving for a bit!]
09:54:26Jake (Jake) joins
10:10:57VickoSaviour quits [Remote host closed the connection]
10:12:18<@rewby>tzt, fireonlive, JAA, arkiver: I got a (probably non-exhaustive) list of domains hosted by the (soon to shut down) FutureQuest: https://transfer.archivete.am/rgTXc/domains.txt
10:12:28dx joins
10:13:06IDK quits [Client Quit]
10:14:02<dx>hey! do you have any graceful ways to handle the thing where phpbb forums add &sid=hash to every link? archive.org seems to struggle with it, every thread link here goes nowhere: https://web.archive.org/web/20230402125320/https://www.freestompboxes.org/viewforum.php?f=1&sid=d29688f6831c923e7a7ec107ad150803
10:36:56TastyWiener952 quits [Ping timeout: 252 seconds]
10:37:23TastyWiener95 (TastyWiener95) joins
10:41:49monoxane quits [Ping timeout: 258 seconds]
10:43:04monoxane (monoxane) joins
11:30:48sec^nd quits [Remote host closed the connection]
11:31:07sec^nd (second) joins
11:48:35DigitalDragons quits [Read error: Connection reset by peer]
11:48:43DigitalDragons4 (DigitalDragons) joins
12:04:32asdf quits [Ping timeout: 265 seconds]
12:07:54<masterX244>I think the ?archiveteam urlfudgery on archivebot crawls is there to suppress that
12:32:15T31M quits [Quit: ZNC - https://znc.in]
12:32:36T31M joins
12:41:07<thuban>fireonlive: Как пропатчить KDE2 под FreeBSD?
12:54:50AmAnd0A quits [Ping timeout: 258 seconds]
12:54:58AmAnd0A joins
13:04:34nicolas17 joins
13:06:25IDK (IDK) joins
13:10:28<@OrIdow6>Wysp will be delayed another day, got sidetracked
13:17:43<imer>alright, keep us posted :)
13:33:39<@arkiver>rewby: nice! checking it out
13:33:55<thuban>fyi all: VickoSaviour is offline, but i am grabbing progaming.ba per some previous discussion
13:34:06<@arkiver>OrIdow6: do you have a channel name idea? :) i believe as idea here was posted before too
13:34:33<@arkiver>rewby: how did you collect this list?
13:41:14<@OrIdow6>arkiver: Not really, may be able to do something with will-o-the-wisps or whispers
13:43:36<@OrIdow6>Part of the issue is that the obvious puns are so straightforward as to be uncreative
14:00:26Arcorann quits [Ping timeout: 252 seconds]
14:07:01<@rewby|backup>arkiver: It's the list from the forward dns section of https://bgp.tools/prefix/69.5.0.0/19#dns (which in turn is certificate transparency logs and other magic that I don't recall)
14:07:24<@rewby|backup>Worth noting I didn't write it out manually, I asked the developer of the site to run a DB query for me
14:13:38BPCZ quits [Ping timeout: 252 seconds]
14:14:11nicolas17 quits [Ping timeout: 258 seconds]
14:18:02beario quits [Client Quit]
14:24:43<@JAA>dx: To expand on masterX244's reply: What we do is start the crawl from https://example.org/?archiveteam. That request sets the cookies, and then pages loaded after that won't have the sid params in links. It's a separate URL so that when the homepage gets loaded later, the cookies are already in place an browsing will work naturally. Once it got a few URLs, we ignore any URL with an sid param. The
14:24:49<@JAA>'?archiveteam' suffix has no special meaning; it just has to be a unique URL so the actual homepage is retrieved with cookies later.
14:25:35<@JAA>This isn't perfect though. Eventually, the session cookie might expire, and then the crawl gets another page with sid param links, which would get ignored, so coverage might be slightly incomplete. Unless the forums are very broken, that shouldn't be a significant fraction though.
14:32:33beario joins
14:35:40Miori quits [Client Quit]
14:39:45rageear joins
14:43:48Miori joins
14:52:27hexa- quits [Quit: WeeChat 3.8]
14:53:41hexa- (hexa-) joins
15:01:15nighthnh099_ joins
15:01:42<dx>JAA, masterX244: thank you!
15:02:53<nighthnh099_>I have a massive list of urls (90K urls) for a website that might shut down any day now, not all of these exist so I got a script that checked which gave back a status 200 then to mirror it
15:03:16<nighthnh099_>when I ran the script, my computer started lagging and explorer did some very strange things so I had to restart my computer
15:03:48<nighthnh099_>can someone else run the script for me? after running the script, you can run a dir command and do a find and replace to turn the files it mirrored into urls
15:04:12<nighthnh099_>then whoever runs the script can just put it into a spreadsheet and let ia save the urls
15:08:14<phaeton>if you're still looking for channel name ideas, i propose #wispaway....wisp away is semi-commonly misused instead of whisk away which means to take away suddenly
15:12:04<@JAA>nighthnh099_: We have our own tooling that can archive things much more efficiently and quickly than feeding to IA. I can take a look. Which site is it? And please upload the list to https://transfer.archivete.am/ .
15:14:49<nighthnh099_>https://transfer.archivete.am/2mctU/urls.txt the urls start at 4000 because that's as far as I got before I had to restart; basically the urls are a bunch of game scripts for an app, not all of the urls exist though; I might need help with finding the upper limit of the list because I forgot to do that
15:18:54<@JAA>Yeah, the upper limit is definitely higher.
15:21:26nicolas17 joins
15:32:11<@JAA>Quickly poked the APK but didn't see anything of relevance. Might need DEX decompiling.
15:35:00<nighthnh099_>I already did all of that
15:35:22<nighthnh099_>oh wait do you need the script I said? sorry I forgot to ask
15:36:41<@JAA>No need, I'll run the list through ArchiveBot. But need to find the upper bound first.
15:37:37<nighthnh099_>archivebot skips 404s?
15:39:30<@JAA>No, they'll just get archived as well.
15:39:59pabs quits [Ping timeout: 252 seconds]
15:40:11<nighthnh099_>oh, that's kinda messy haha
15:40:26<@JAA>Well, depends on how you look at it.
15:40:34<@JAA>Archiving them records that they didn't exist.
15:40:55<@JAA>Whereas if you only archive the ones that exist, a future archaeologist won't know whether they were simply missed.
15:41:26<nighthnh099_>oh, my reasons for not archiving them would be it's hard to filter through them when someone in the future decides to make a local server for the game
15:42:04<@JAA>It's trivial to filter that out.
15:42:29<nighthnh099_>oh how? I don't know haha
15:42:31<@JAA>Especially when you work with the WARC file ArchiveBot will produce.
15:42:49pabs (pabs) joins
15:43:22<@JAA>Well, the tooling for it is currently suboptimal, but it can be done with warcio and a 10-line Python script or so.
15:43:45<@JAA>It'll be easier once I finish the thing I've been working on for far too long now.
15:43:59<@JAA>Anyway... so how do we find the upper limit?
15:44:56<@JAA>Actually, I just checked 100000 to 100099, no hits there, so I'll do up to 100k.
15:45:22IDK quits [Client Quit]
15:48:57Shjosan quits [Quit: Am sleepy (-, – )…zzzZZZ]
15:49:14<@JAA>It's running, current ETA is 5-6 hours.
15:49:34Shjosan (Shjosan) joins
15:52:09<nighthnh099_>JAA: I think 97017 is the upper limit
15:52:52<nighthnh099_>thanks for running it! also worth noting that it needs to be http, not https; everything is 404 on https for some reason
15:53:01<@JAA>Yeah, I noticed.
15:53:12<@JAA>Just another badly configured web server. :-)
15:54:57<nighthnh099_>oh wait a second, the mention on the site of a shut down is just the name of a story someone uploaded
15:55:09<nighthnh099_>well doesn't make anything less urgent I guess
15:55:26<nighthnh099_>the app itself has been gone since 2020 so the site could shut down any day now
15:56:25<@JAA>Yeah, given how small it is, no reason not to archive it anyway.
16:00:07dumbgoy joins
16:04:11<nighthnh099_>JAA: I have to log out now so I guess I'll just see those urls in the CDX at some point?
16:05:07<nighthnh099_>also maybe you can zip up the files it mirrored and send it to me? I want a copy myself haha
16:05:26<nighthnh099_>will probably just ping once I open irc again
16:05:29<fireonlive>thuban: :3
16:10:03<@JAA>nighthnh099_: Yes, they'll appear in the WBM eventually. The WARCs will be listed at https://archive.fart.website/archivebot/viewer/job/61ha7 eventually. We don't produce plain files, so I can't simply create a ZIP for you.
16:10:38<nighthnh099_>oh wait I wasn't joined to archivebot, oh okay
16:11:18<masterX244>warcat allows to "unpack" WARCs though if you need the plain files inside
16:11:28<nighthnh099_>thanks
16:11:48<@JAA>Yeah, not sure how that would handle the 404s though.
16:28:42DigitalDragons4 is now known as DigitalDragons
16:38:27nighthnh099 joins
16:42:08nighthnh099_ quits [Ping timeout: 252 seconds]
16:44:27nighthnh099 quits [Ping timeout: 258 seconds]
17:10:19<Barto>pabs: that thing definitely goes brr too
17:16:31<kiska>RIP LBRY? https://twitter.com/LBRYcom/status/1678866789407551489
17:18:24<kiska>Not sure how much content there is to save from this
17:20:38icedice quits [Ping timeout: 252 seconds]
17:28:19BPCZ (BPCZ) joins
17:28:19<fireonlive>“30,000,000 pieces of content” interesting… hm. it’s blockchain stuff so idk lol
17:34:22<nicolas17>why should we worry, it's decentralized right? :P
17:35:46<fireonlive>🤐
17:36:18<nicolas17>(it's probably centralized and only using blockchain for regulation evasion purposes)
17:38:11<fireonlive>i seem to recall public companies just changing their names to like include AI or blockchain and their stock prices just shooting up instantly
17:38:14<fireonlive>semi related lol
17:42:15chessnoob280 joins
17:49:14Icyelut|2 quits [Ping timeout: 258 seconds]
18:04:57chrismeller3 (chrismeller) joins
18:05:00chrismeller3 quits [Client Quit]
18:07:19chrismeller3 (chrismeller) joins
18:18:56eroc1990 quits [Ping timeout: 252 seconds]
18:26:02AmAnd0A quits [Ping timeout: 258 seconds]
18:26:25AmAnd0A joins
18:38:20eroc1990 (eroc1990) joins
19:22:34mtmustski quits [Quit: Ping timeout (120 seconds)]
19:22:51mtmustski joins
19:30:54mtmustski quits [Client Quit]
19:31:11mtmustski joins
19:42:15FavoritoHJS joins
19:42:29<FavoritoHJS>this is not a drill, we have a dying site https://forums.terraria.org/index.php?threads/gfycat-shutting-down-this-september.120070/
19:42:46pokechu22 quits [Quit: Updating weechat]
19:44:42<FavoritoHJS>also appears twitter no longer requires account login for seeing posts? if so, i guess a warrior project is once again possible
19:45:15<nicolas17>only individual posts afaik
19:45:28<nicolas17>you can't see replies to them or what it's replying to
19:45:39<FavoritoHJS>still better than the nothing that was there before
19:45:39<fireonlive>#deadcat for Gfycat
19:49:19pokechu22 (pokechu22) joins
19:49:24pokechu22 quits [Client Quit]
19:51:52pokechu22 (pokechu22) joins
20:14:24Naruyoko5 joins
20:16:05Naruyoko quits [Ping timeout: 252 seconds]
20:17:07Naruyoko joins
20:18:59Naruyoko5 quits [Ping timeout: 265 seconds]
20:50:46<@arkiver>JAA: those domains rewby|backup found - do you think AB is enough for that?
20:53:19<@JAA>1555 domains, might be feasible, but not sure.
20:53:31<@JAA>Ryz has been feeding domains from that platform in, I think?
20:53:37<@JAA>I haven't been paying a whole lot of attention.
20:54:59<@arkiver>looks like these sites may not be very large?
20:55:20<@arkiver>upcoming projects this month are:
20:55:27<@arkiver>Wysp ( OrIdow6 )
20:55:30<@arkiver>Skyblog
20:55:35<@arkiver>Stitcher
20:55:38<@arkiver>Xuite
20:55:40<@arkiver>and Gfycat
20:56:04<@arkiver>if stitcher is not huge we'll get in with AB
20:57:00<@arkiver>OrIdow6: #wyspedaway for wysp
20:58:31<anarcat>i'm not sure how we can help with this, but https://github.com/grossartig/vanmoof-encryption-key-exporter
20:58:48<anarcat>"The Bluetooth connection between your smartphone and your VanMoof is encrypted for security purposes. Each time you log into your VanMoof account, this encryption key is being downloaded from VanMoof’s server."
20:59:09<anarcat>https://kolektiva.social/@phill@mastodon.notsobig.co/110701490653058697 "Little birdies tell me VanMoof has officially collapsed. They'll be making a statement shortly.
20:59:10<anarcat>If you own one of their bikes now is the time to grab your encryption keys before their servers go offline"
20:59:22<anarcat>isn't the future great?
21:01:55<murb>oh i recongise the shape of the bike, so i've probably seen them. but wasn't aware of the brand until onw.
21:03:19<@arkiver>hah
21:03:32<@arkiver>sounds like all 'smart' things about that bike will soon stop functioning
21:03:54<murb>i wonder what smart things you need on a bike...
21:03:56lk quits [Ping timeout: 252 seconds]
21:04:04<flashfire42>A helmet?
21:04:05<murb>predictive braking?
21:04:08<flashfire42>an icecream container?
21:04:09<@JAA>Ah yes, the internet of shit.
21:04:10<flashfire42>on your head
21:04:22<murb>flashfire42: why would you need one of those?
21:04:30<murb>cycling is really quite safe.
21:04:30<flashfire42>Magpies
21:04:36<murb>flashfire42: avoid .au then.
21:04:41lk (lk) joins
21:04:44<flashfire42>Bit hard when I live there
21:05:17<murb>how i avoid being swooped,.. i live on another continent.
21:05:24<h2ibot>Yts98 created Games/Engines, Platforms and Hostings (+2012, Created page with "== Engines == *…): https://wiki.archiveteam.org/?title=Games/Engines%2C%20Platforms%20and%20Hostings
21:06:23<h2ibot>Yts98 edited Games (+90): https://wiki.archiveteam.org/?diff=50165&oldid=46613
21:06:38<@JAA>> The VanMoof S5 & A5 will just keep getting better. And better. Via over-the-air updates, we can continuously improve your bike long after your first ride. From the Halo Ring Interface to Hi-Vis Lights, this bike has revolution, built in.
21:06:42<@JAA>...
21:06:52Island joins
21:06:57<@JAA>Off to -ot for that I guess.
21:06:59<murb>i hope they'll change the tyres etc.
21:07:48<masterX244>smart shit is a PITA; or anythbing with firmware. (its rare to find a firmware updater that allows local files instead of only connecting straight to server, and for those that also allow local files: always backup those files)
21:12:44<Barto>#stallmanwasright
21:17:41lk quits [Ping timeout: 252 seconds]
21:18:18lk (lk) joins
21:31:27<h2ibot>FireonLive edited Current Projects (+38, add IRC channel for Wysp): https://wiki.archiveteam.org/?diff=50166&oldid=50156
21:32:28<h2ibot>FireonLive edited Wysp (+19, add IRC channel): https://wiki.archiveteam.org/?diff=50167&oldid=50158
21:35:27<murb>Barto: a stopped clock etc.
21:48:49hitgrr8 quits [Client Quit]
21:49:32<h2ibot>Yts98 edited Games/Engines, Platforms and Hostings (+263): https://wiki.archiveteam.org/?diff=50168&oldid=50164
21:55:46W7RFa6AbNFz_ quits [Read error: Connection reset by peer]
21:56:07W7RFa6AbNFz_ joins
22:13:14lk quits [Ping timeout: 252 seconds]
22:13:33lk (lk) joins
22:20:37icedice (icedice) joins
22:26:55FavoritoHJS quits [Client Quit]
22:35:35lk quits [Ping timeout: 258 seconds]
22:36:03lk (lk) joins
22:42:29<Ryz>Hello JAA and arkiver, I'm basing the archiving regarding FutureQuest run domains on what flashfire42 has fed me with https://bgp.tools/prefix/69.5.0.0/19#dns
22:42:53<@JAA>There is a more complete list, also from bgp.tools, see above.
22:43:24<@JAA>I can throw it all into queueh2ibot if it's suitable for that.
22:43:33<@arkiver>the one that rewby posted
22:43:44<h2ibot>Arkiver edited YouTube (+120, Change YouTube rules): https://wiki.archiveteam.org/?diff=50169&oldid=49723
22:43:47AmAnd0A quits [Remote host closed the connection]
22:43:55AmAnd0A joins
22:44:25<fireonlive>here's rewb\y's list via bgp.tools (thanks rewb\y!): https://transfer.archivete.am/rgTXc/domains.txt
22:44:44<h2ibot>Arkiver edited YouTube (+95): https://wiki.archiveteam.org/?diff=50170&oldid=50169
22:44:48<Ryz>JAA, I'm not too certain on running it automatically via queueh2ibot because I've encountered some oddball websites where it would need to be treated with that particular pipeline, and others are just geo-restricted
22:45:44<h2ibot>Arkiver edited YouTube (+7): https://wiki.archiveteam.org/?diff=50171&oldid=50170
22:45:45<h2ibot>Arkiver edited YouTube (+9, Fix formatting): https://wiki.archiveteam.org/?diff=50172&oldid=50171
22:45:59<@JAA>I mean, if you'd like to run the 1555 domains manually, that's also fine with me, but it's a lot of work.
22:46:40<Ryz>That is unfortunately true... ><;
22:52:46<h2ibot>FireonLive edited YouTube (+209, Update infoboxes): https://wiki.archiveteam.org/?diff=50173&oldid=50172
22:58:29chessnoob280 quits [Ping timeout: 265 seconds]
23:00:48<h2ibot>FireonLive edited YouTube (+20, use 2=YouTube to make infobox not so weird): https://wiki.archiveteam.org/?diff=50174&oldid=50173
23:01:38<Ryz>Hmm, how about this JAA, while I work on the one that flashfire42 gave me for now, queueh2ibot can process https://transfer.archivete.am/rgTXc/domains.txt
23:03:27useretail joins
23:09:36Carnildo_again joins
23:09:36Carnildo quits [Read error: Connection reset by peer]
23:09:57<nicolas17>so, how do I archive 5-10GB files such that they appear on WBM, with inter-URL payload deduplication?
23:10:42<@JAA>Ryz: Sure, if you give me that list to filter out duplicates.
23:10:45<@JAA>nicolas17: wget-at or qwarc
23:10:58<flashfire42>such as they appear in wbm
23:11:02<flashfire42>is the core issue
23:11:07<nicolas17>archivebot doesn't deduplicate, qwarc would work but then I need Approval:tm: to make my uploaded WARCs appear in WBM
23:11:14<@JAA>Right
23:11:20<Ryz>Here it is JAA, what flashfire42 has fed me: https://bgp.tools/prefix/69.5.0.0/19#dns
23:11:33<Ryz>Which includes the odd URLs that for some reason end with '.' oo;
23:11:37lk quits [Ping timeout: 258 seconds]
23:11:48<fireonlive>those are “fully qualified”
23:11:50<@JAA>Technically, all domains end with a dot.
23:11:53<fireonlive>ye
23:12:21lk (lk) joins
23:13:04<@JAA>That has 1999 domains...?
23:13:51<h2ibot>Arkiver edited YouTube (+59, Allow archiving ads that are actually used as…): https://wiki.archiveteam.org/?diff=50175&oldid=50174
23:15:34icedice quits [Client Quit]
23:18:51<h2ibot>Arkiver edited YouTube (+0): https://wiki.archiveteam.org/?diff=50176&oldid=50175
23:19:04<pabs>anarcat: dunno if you have any bandwidth for AB jobs, but you might be interested in these new projects https://wiki.archiveteam.org/?title=Bugzilla https://wiki.archiveteam.org/?title=IRC/Logs also https://wiki.archiveteam.org/?title=Mailman2
23:20:15<@JAA>rewby: See above, I'm getting 1999 results on https://bgp.tools/prefix/69.5.0.0/19#dns , i.e. more than the 1555 in your list that comes directly from the DB? Something's not right there.
23:20:55<@JAA>No dupes with the trailing dot either.
23:23:03<@JAA>There's little overlap, too.
23:23:34<@JAA>1238 domains appear in the DB list but not on the page. 1683 domains appear on the page but not in the DB list.
23:23:52<h2ibot>Arkiver edited YouTube (+9): https://wiki.archiveteam.org/?diff=50177&oldid=50176
23:23:53<@JAA>So only a bit over 300 overlap.
23:25:01thenes quits [Ping timeout: 245 seconds]
23:25:26sec^nd quits [Ping timeout: 245 seconds]
23:25:26Ketchup901 quits [Ping timeout: 245 seconds]
23:26:21Ketchup901 (Ketchup901) joins
23:26:51icedice (icedice) joins
23:26:56betamax quits [Ping timeout: 252 seconds]
23:27:08thenes (thenes) joins
23:30:31sec^nd (second) joins
23:36:32yts98 leaves
23:37:35betamax (betamax) joins
23:38:48ats quits [Quit: inviting sharks]
23:53:35AmAnd0A quits [Read error: Connection reset by peer]
23:53:52AmAnd0A joins