00:09:53riteo quits [Ping timeout: 260 seconds]
00:10:14sighsloth1090 joins
00:10:34HP_Archivist (HP_Archivist) joins
00:15:09scurvy_duck joins
00:16:20<pokechu22>Junie: I'm not sure if you're already aware of this, but the main format for things we download is https://en.wikipedia.org/wiki/WARC_(file_format) which includes metadata such as HTTP headers. Some projects also produce logs in various forms but there isn't anything standard for it
00:16:43<Junie>Thank you! That'll help!
00:18:17<pokechu22>oh, and archive.org generates CDX files for each WARC that act as an index to them (https://archive.org/web/researcher/cdx_file_format.php) and can also be bulk queried (https://archive.org/developers/wayback-cdx-server.html)
00:19:55eroc1990 quits [Quit: Ping timeout (120 seconds)]
00:20:14eroc1990 (eroc1990) joins
00:21:30etnguyen03 quits [Client Quit]
00:27:48scurvy_duck quits [Ping timeout: 250 seconds]
00:34:58sighsloth1090 quits [Client Quit]
00:46:08riteo (riteo) joins
00:52:58scurvy_duck joins
00:53:07sighsloth1090 (sighsloth1090) joins
00:55:55Junie quits [Quit: Ooops, wrong browser tab.]
01:20:45Webuser113856 joins
01:22:38xmagik joins
01:23:11Webuser113856 quits [Client Quit]
01:24:48scurvy_duck quits [Client Quit]
01:25:24<xmagik>Does anyone have any resources for dumping a DDS-1 tape using modern-ish equipment?
01:36:09Hackerpcs quits [Quit: Hackerpcs]
01:39:44Hackerpcs (Hackerpcs) joins
01:41:34xkey quits [Quit: WeeChat 4.4.3]
01:42:36xkey (xkey) joins
01:44:55pixel leaves [Error from remote client]
01:45:22gust quits [Ping timeout: 250 seconds]
02:11:01sighsloth1090 leaves
02:12:22NeonGlitch (NeonGlitch) joins
02:13:12etnguyen03 (etnguyen03) joins
02:15:56nulldata (nulldata) joins
02:18:04<pabs>from the #sdf channel: <kmc> Urban Dead is shutting down. Another UK Online Safety Act victim. RIP.
02:18:44<pabs>https://www.urbandead.com/shutdown.html
02:32:17NeonGlitch quits [Client Quit]
02:45:17<pabs>how do I tell if the wpull db for an AB job has been uploaded yet? and where would I find it? looking for the one for https://archive.fart.website/archivebot/viewer/job/202411210253059dr05
02:50:44BornOn420 quits [Remote host closed the connection]
02:51:21BornOn420 (BornOn420) joins
02:52:13<monoxane>rewby its called traffic engineering :P
03:00:48notarobot1 joins
03:02:11Webuser149129 joins
03:06:51<balrog>can we make sure everything related to ex: https://efabless.com is archived? There's also https://github.com/efabless and the site has a lot of links to stuff like google sheets from pages such at https://efabless.com/shuttle-status ...
03:06:56<balrog>we already lost some subdomains
03:07:04<balrog>(the company recently went out of business)
03:09:32etnguyen03 quits [Client Quit]
03:10:21<pabs>website got saved https://archive.fart.website/archivebot/viewer/?q=efabless
03:10:38<pabs>GitHub got saved on #gitgud
03:10:54<TheTechRobo>pabs: I don't think AB uploads the DBs on its own, does it?
03:10:56<TheTechRobo>Cf. https://github.com/ArchiveTeam/ArchiveBot/issues/465
03:11:30<pabs>TheTechRobo: right, J_A_A does that manually
03:13:09<@JAA>pabs: No good method. You can use https://archive.org/details/archivebot?tab=collection&query=format%3AZSTANDARD&sort=-publicdate to find items that have a DB (probably, occasionally there might be other .zst files).
03:13:13<pabs>ahh I remember now, you have to search on IA for ZSTANDARD items IIRC
03:14:40<pabs>ok I will add this to the wiki
03:16:18loug83181422 quits [Quit: The Lounge - https://thelounge.chat]
03:17:29BlueMaxima quits [Read error: Connection reset by peer]
03:19:35<h2ibot>PaulWise edited ArchiveBot (+370, tips: ignores, wpull databases): https://wiki.archiveteam.org/?diff=54521&oldid=54417
03:20:35<h2ibot>PaulWise edited ArchiveBot (+41, uploads are manual/infrequent): https://wiki.archiveteam.org/?diff=54522&oldid=54521
03:26:34etnguyen03 (etnguyen03) joins
03:34:59NeonGlitch (NeonGlitch) joins
04:07:29etnguyen03 quits [Remote host closed the connection]
04:31:17<pabs>-feed/#hackernews-firehose- Local mirrors of long-defunct historical websites https://retromirror.org/ https://news.ycombinator.com/item?id=43242455
04:35:05caylin quits [Remote host closed the connection]
04:35:24caylin (caylin) joins
04:37:17caylin quits [Client Quit]
04:37:39caylin (caylin) joins
04:53:01Webuser149129 quits [Client Quit]
05:45:39StarletCharlotte joins
05:52:25<StarletCharlotte>Hello. I'm currently trying to enumerate all buckets on GRH's bucket API using curl requests (I have a premium account so I can use the API). The API's output is in JSON and I would like to download each JSON the API outputs to [start number].json. The API itself uses this format for its URLs: https://buckets.grayhatwarfare.com/api/v2/buckets?start=[start number]&limit=1000 where start is the result you want to start at, 0 being the
05:52:25<StarletCharlotte>first result.
05:52:54<StarletCharlotte>How would I go about doing that? What kind of script would I need?
05:54:56<StarletCharlotte>oh yeah, it'd need to increment by 1000 each time...
05:56:41<StarletCharlotte>https://stackoverflow.com/questions/9289015/how-do-i-make-parts-of-a-url-to-increment-at-the-same-time-in-curl This seems promising, though I'd have to figure out how to get it to increment by 1000 each time up to about 430000.
05:56:56<StarletCharlotte>And I have never written a bash script before so...
05:58:28NeonGlitch quits [Client Quit]
06:01:00<pokechu22>StarletCharlotte: If you don't need the data from the previous request and just want to generate a large list of URLs, there's a bunch of ways you can do that (`seq <start> <end>` in bash is one way, or something simple in python). If you need to download the data to get the next one you're probably going to want to use python and requests (or any other language you're
06:01:02<pokechu22>familiar with)
06:01:42<pokechu22>basically, the way I usually approach that is to get the numbers I want into a text document, and then I edit that text document into the format I need. Or I write a script in a language I'm familiar with to output all the numbers I care about
06:02:34<StarletCharlotte>@pokechu22: I am trying to download all of the data, though I am not familiar with either bash or Python. I'm currently looking into bash but I don't know if that's the best.
06:06:53<StarletCharlotte>https://www.geeksforgeeks.org/bash-scripting-until-loop/
06:07:02<StarletCharlotte>Okay this should help as well...
06:15:27SootBector quits [Remote host closed the connection]
06:15:52SootBector (SootBector) joins
06:16:02sparky14925 (sparky1492) joins
06:17:56sparky1492 quits [Ping timeout: 250 seconds]
06:17:57sparky14925 is now known as sparky1492
06:32:12<StarletCharlotte>Alright @pokechu22, I managed to bodge together a bash script that should do it. https://pastecode.io/s/89cnng9t
06:33:20<StarletCharlotte>the header is set to "nice try" because I'm not giving people my auth code for obvious reasons
06:36:01<@JAA>IIRC, you can use --output multiple times interleaved with URLs to specify a filename for each download. That lets you reuse the connection rather than starting a new process for each request.
06:36:37<StarletCharlotte>I uh... I am not sure what that means I'll be honest
06:36:53<StarletCharlotte>This is my first time doing this.
06:37:20<@JAA>`curl --header '...' --output 0.json 'https://buckets.grayhatwarfare.com/api/v2/buckets?start=0&limit=1000' --output 1000.json 'https://buckets.grayhatwarfare.com/api/v2/buckets?start=1000&limit=1000' ...`
06:37:40<pokechu22>Doesn't curl --verbose mention it leaves connections open? I never looked into how that actually works though
06:37:40<StarletCharlotte>oh thanks
06:37:54<@JAA>You could generate that with a Bash loop and an array.
06:38:19<StarletCharlotte>well, it doesn't seem like I'm getting rate limited and I don't have much left. I'll remember for next time though.
06:38:56<@JAA>`curl=(curl --header '...'); for ((i = 0; i <= 400000; i += 1000)); do curl+=(--output "${i}.json" "https://buckets.grayhatwarfare.com/api/v2/buckets?start=${i}&limit=1000"); done; "${curl[@]}"`
06:39:26<StarletCharlotte>yeah it just got done
06:39:32<StarletCharlotte>thanks btw
06:40:17pixel (pixel) joins
06:40:25<StarletCharlotte>I have copied that down for later
06:57:31ArchivalEfforts quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
06:57:41ArchivalEfforts joins
07:15:28<steering>JAA: xargs!
07:17:19<steering>(I'm testing if that would actually work because why not)
07:19:15<steering>ah, no, because -I implies -L 1
07:31:37<@JAA>Yeah
07:33:41<@JAA>`curl --header '...' $(printf -- '--output %d.json https://buckets.grayhatwarfare.com/api/v2/buckets?start=%d&limit=1000 ' {0..400000..1000}{,})` would work, but I can't recommend that.
07:35:45<@arkiver>so, does anyone know where we can contact "Junie"?
08:02:01xmagik quits [Quit: Ooops, wrong browser tab.]
08:03:42loug83181422 joins
08:10:59astrinaut leaves [][]
08:47:22dvb leaves
09:01:17Island quits [Read error: Connection reset by peer]
09:25:17pabs quits [Read error: Connection reset by peer]
09:26:39pabs (pabs) joins
10:32:22PAARCLiCKS quits [Quit: The Lounge - https://thelounge.chat]
10:32:23Wohlstand (Wohlstand) joins
10:35:35DopefishJustin quits [Read error: Connection reset by peer]
10:35:49DopefishJustin joins
10:36:55LunarianBunny1147 quits [Quit: Ping timeout (120 seconds)]
10:37:12LunarianBunny1147 (LunarianBunny1147) joins
10:38:43abirkill- (abirkill) joins
10:38:48abirkill quits [Ping timeout: 250 seconds]
10:39:06abirkill- is now known as abirkill
10:44:24PAARCLiCKS (s4n1ty) joins
10:45:08asie quits [Ping timeout: 260 seconds]
10:45:20yourfate1 (yourfate) joins
10:45:26sec^nd quits [Remote host closed the connection]
10:45:43yourfate quits [Ping timeout: 260 seconds]
10:45:49sec^nd (second) joins
10:46:18simon816 quits [Ping timeout: 260 seconds]
10:46:45asie joins
10:50:25simon816 (simon816) joins
10:53:09ducky quits [Ping timeout: 260 seconds]
12:00:05Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
12:00:58Wohlstand quits [Ping timeout: 260 seconds]
12:02:56Bleo18260072271962345 joins
12:24:37<@rewby>monoxane: What I did was much more than TE.
12:25:13<katia>:o
12:25:31<@rewby>O10 is needed for other purposes temporarily so I've shut off the targets on it
12:25:52<@rewby>No there is nothing broken with it
12:27:23ducky (ducky) joins
12:34:22SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:35:08SkilledAlpaca418962 joins
12:40:11<monoxane>ah funky
12:49:23StarletCharlotte quits [Ping timeout: 260 seconds]
13:00:41StarletCharlotte joins
13:10:33Meroje quits [Quit: bye!]
13:10:58pokechu22 quits [Ping timeout: 260 seconds]
13:11:30Meroje joins
13:11:30Meroje quits [Changing host]
13:11:30Meroje (Meroje) joins
13:14:22StarletCharlotte quits [Ping timeout: 250 seconds]
13:58:30NeonGlitch (NeonGlitch) joins
13:59:08StarletCharlotte joins
13:59:27archiver1966 joins
14:04:10archiver1966 quits [Client Quit]
14:05:38eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
14:06:06eroc1990 (eroc1990) joins
14:18:56StarletCharlotte quits [Ping timeout: 250 seconds]
14:31:44StarletCharlotte joins
14:47:13StarletCharlotte quits [Ping timeout: 260 seconds]
14:47:42moth_ joins
15:06:36benjins3 quits [Ping timeout: 250 seconds]
15:13:10Wohlstand (Wohlstand) joins
15:21:04katocala quits [Ping timeout: 260 seconds]
15:23:58balrog quits [Ping timeout: 260 seconds]
15:24:27katocala joins
15:39:41StarletCharlotte joins
15:48:19zhongfu (zhongfu) joins
15:51:22zhongfu quits [Client Quit]
15:52:05zhongfu (zhongfu) joins
16:01:39Webuser083892 joins
16:03:33<Webuser083892>Nice, this channel is also publicly logged. As it should be :D
16:04:55HP_Archivist quits [Quit: Leaving]
16:06:14<katia>hi mom
16:12:52StarletCharlotte quits [Remote host closed the connection]
16:13:05StarletCharlotte joins
16:13:07<Webuser083892>Hi Katia :D
16:13:38<Webuser083892>Is there something wrong with the trackers? The upload seems... stuck?
16:13:41<katia>ur not my mom she is not in belgium
16:14:01<katia>yeah, things are slowed down temporarily
16:18:48StarletCharlotte quits [Ping timeout: 260 seconds]
16:20:37StarletCharlotte joins
16:23:09<Webuser083892>Alright, if it's expected that's good. :)
16:38:42Webuser083892 quits [Client Quit]
16:44:07<@arkiver>all should start getting in better shape in the coming days!
16:49:23<that_lurker>\o/
17:05:21i_have_n0_idea quits [Quit: The Lounge - https://thelounge.chat]
17:05:40i_have_n0_idea (i_have_n0_idea) joins
17:06:09NeonGlitch quits [Quit: My Mac Mini has gone to sleep. ZZZzzz…]
17:12:13ThreeHM quits [Quit: WeeChat 4.4.3]
17:15:27NeonGlitch (NeonGlitch) joins
17:16:20VerifiedJ quits [Quit: The Lounge - https://thelounge.chat]
17:16:59VerifiedJ (VerifiedJ) joins
17:17:29ThreeHM (ThreeHeadedMonkey) joins
17:18:34benjins3 joins
17:21:17unlobito (unlobito) joins
17:23:14lennier2 joins
17:26:08lennier2_ quits [Ping timeout: 250 seconds]
17:35:58sparky1492 quits [Quit: No Rain, No Rainbows]
17:37:55sparky1492 (sparky1492) joins
17:38:46pokechu22 (pokechu22) joins
17:39:14sparky14920 (sparky1492) joins
17:42:48sparky1492 quits [Ping timeout: 260 seconds]
17:42:49sparky14920 is now known as sparky1492
17:48:39sparky14929 (sparky1492) joins
17:52:08sparky1492 quits [Ping timeout: 250 seconds]
17:52:09sparky14929 is now known as sparky1492
17:52:14lflare quits [Quit: Ping timeout (120 seconds)]
17:53:11lflare (lflare) joins
17:54:54lflare quits [Client Quit]
18:00:11Webuser331643 joins
18:11:26Webuser331643 quits [Client Quit]
18:15:33Sluggs quits [Excess Flood]
18:23:44Sluggs joins
18:41:01Webuser201432 joins
18:41:06Webuser201432 quits [Client Quit]
18:42:27<Exorcism>arkiver: for zhubai, will a DPoS project be planned or should we archive with archivebot?
18:48:11flotwig quits [Quit: ZNC - http://znc.in]
18:53:40riteo quits [Ping timeout: 250 seconds]
19:07:05flotwig joins
19:13:42riteo (riteo) joins
19:26:16gust joins
19:33:38kansei quits [Quit: ZNC 1.9.1 - https://znc.in]
19:34:24kansei (kansei) joins
19:51:25Webuser990617 joins
19:52:02Webuser990617 quits [Client Quit]
20:06:42Megame (Megame) joins
20:08:33spirit joins
20:33:37gust quits [Remote host closed the connection]
20:33:56gust joins
20:35:20balrog (balrog) joins
20:51:06sparky1492 quits [Ping timeout: 250 seconds]
21:00:34SkilledAlpaca4189621 joins
21:02:18SkilledAlpaca418962 quits [Ping timeout: 260 seconds]
21:02:19SkilledAlpaca4189621 is now known as SkilledAlpaca418962
21:14:04spirit quits [Ping timeout: 250 seconds]
21:14:22spirit joins
21:19:52nicolas17 quits [Quit: Konversation terminated!]
21:37:53BearFortress quits [Ping timeout: 260 seconds]
21:38:32NeonGlitch quits [Quit: My Mac Mini has gone to sleep. ZZZzzz…]
21:48:38nicolas17 joins
21:52:30trc (trc) joins
22:06:28Sokar quits [Ping timeout: 260 seconds]
22:16:29spirit quits [Remote host closed the connection]
22:27:28BlueMaxima joins
22:59:54sighsloth1090 (sighsloth1090) joins
23:51:24utulien joins
23:53:55StarletCharlotte quits [Remote host closed the connection]
23:54:14StarletCharlotte joins