| 00:05:15 | <katocala> | !a https://southernafrica.iwmi.cgiar.org/ --useragent firefox --igset blogs |
| 00:05:30 | <katocala> | blah |
| 00:37:44 | <imer> | mh, is there a recommended way of handling "ia upload" of many files? had it give up earlier and restarting it, it seems to just reupload/overwrite files it already uploaded. |
| 00:37:50 | <imer> | --delete maybe if that deletes files as it's uploaded them? but that'd nuke the files on my end.. |
| 00:39:30 | | systwi_ (systwi) joins |
| 00:39:32 | | systwi quits [Ping timeout: 252 seconds] |
| 00:41:32 | <fireonlive> | you can use -c to checksum and not upload existing files |
| 00:41:49 | <fireonlive> | "-c, --checksum Skip based on checksum. [default: False]" |
| 00:42:15 | <@JAA> | Make sure there are no pending tasks on the item before using that though: https://github.com/jjjake/internetarchive/issues/289 |
| 00:42:29 | <fireonlive> | ah! |
| 00:42:43 | <imer> | that explains why it wasnt working JAA, thanks. ill give it time to catch up |
| 00:43:11 | <fireonlive> | is there an easy recursive upload? i had some problems with directories being skipped initially |
| 00:43:19 | <fireonlive> | 'dupes_db' for example |
| 00:44:12 | | systwi_ is now known as systwi |
| 00:50:20 | | etnguyen03 quits [Ping timeout: 252 seconds] |
| 00:50:51 | <@JAA> | The CLI can upload directories. But if dupes_db in grab-site means the same thing as on AB, ignore its existence and don't upload it. |
| 00:51:22 | <@JAA> | It's a 1 TiB file of NULs. |
| 01:04:50 | | etnguyen03 (etnguyen03) joins |
| 01:15:43 | <fireonlive> | ahh it was a few hundred MB yeah |
| 01:15:50 | | BlueMaxima joins |
| 01:15:54 | <fireonlive> | ..i think |
| 01:16:07 | <fireonlive> | ahno KB |
| 01:16:45 | <@JAA> | I guess grab-site does that differently then. |
| 01:17:24 | <@JAA> | AB fallocates a 1 TiB file there for each job, but the dedupe plugin is broken anyway, so the file is never actually written to. |
| 01:19:17 | <fireonlive> | ahh ok :X yeah that would immediately out of space my little dedi at the moment |
| 01:20:02 | <fireonlive> | i had the links in internet archive.. somewhere |
| 01:20:10 | <fireonlive> | somehow i don't know how i have too many chanels now |
| 01:20:13 | <fireonlive> | weird thing that xD |
| 01:20:48 | <@JAA> | It's just a sparse file. It takes 4 KiB or something like that on disk. |
| 01:21:32 | <@JAA> | Er right, not fallocate, something else. |
| 01:23:07 | <fireonlive> | ahh ok |
| 01:43:37 | | yasomi joins |
| 01:44:55 | | yasomi is now authenticated as yasomi |
| 01:45:44 | | yasomi quits [Changing host] |
| 01:45:44 | | yasomi (yasomi) joins |
| 01:48:11 | | yasomi quits [Client Quit] |
| 01:48:23 | | yasomi (yasomi) joins |
| 02:01:38 | <HP_Archivist> | JAA: I forget, but what's the average time delay for !ao links to show up in WBM these days? |
| 02:16:28 | | etnguyen03 quits [Ping timeout: 265 seconds] |
| 02:26:25 | | etnguyen03 (etnguyen03) joins |
| 03:40:50 | | etnguyen03 quits [Ping timeout: 252 seconds] |
| 04:59:05 | | BlueMaxima quits [Client Quit] |
| 05:26:37 | | hitgrr8 joins |
| 05:32:08 | | Barto quits [Ping timeout: 252 seconds] |
| 05:39:28 | | g0tmk quits [Ping timeout: 265 seconds] |
| 06:01:29 | | IDK (IDK) joins |
| 06:09:26 | | pabs quits [Ping timeout: 265 seconds] |
| 06:12:02 | | pabs (pabs) joins |
| 06:16:44 | <Doranwen> | The fun of setting up to scan a book - only to find out that one of the two Bluetooth clickers I have either needs a new battery or just isn't working, period. They're both the same age, so I have to wonder if it's not the clicker just gone wonky. I remember it had trouble connecting before, but now the phone doesn't see it at all. |
| 06:19:12 | <Doranwen> | Can't scan without it or my hands would be reflected in the pictures. Ah well, I guess that project is going to happen another evening. |
| 06:25:36 | <pabs> | https://www.anandtech.com/show/18901/big-leap-for-hdds-32-tb-hamr-drive-is-coming-40tb-on-horizon https://news.ycombinator.com/item?id=36253499 |
| 06:27:54 | | spirit quits [Client Quit] |
| 06:47:47 | | Ivan2261 joins |
| 06:49:04 | | Ivan226 quits [Ping timeout: 265 seconds] |
| 06:49:10 | <fireonlive> | https://twitter.com/llm_sec/status/1667573374426701824?s=12 |
| 06:49:13 | <fireonlive> | lmao |
| 07:43:23 | | JackThompson3 quits [Ping timeout: 252 seconds] |
| 07:44:24 | | nito-kihk joins |
| 07:46:15 | | JackThompson3 joins |
| 08:07:50 | | TastyWiener95 quits [Quit: Ping timeout (120 seconds)] |
| 08:08:42 | | TastyWiener95 (TastyWiener95) joins |
| 08:39:07 | | nito-kihk is now authenticated as nito-kihk |
| 08:44:10 | | nito-kihk quits [Client Quit] |
| 08:44:22 | | nito-kihk (nito-kihk) joins |
| 10:45:49 | | etnguyen03 (etnguyen03) joins |
| 10:59:37 | | etnguyen03 quits [Client Quit] |
| 10:59:46 | | drin joins |
| 11:02:30 | | geezabiscuit quits [Ping timeout: 252 seconds] |
| 11:02:30 | | drin is now known as geezabiscuit |
| 11:05:33 | | drin joins |
| 11:09:13 | | geezabis- joins |
| 11:09:35 | | geezabiscuit quits [Ping timeout: 265 seconds] |
| 11:10:02 | | geezabis- is now known as geezabiscuit |
| 11:11:50 | | drin quits [Ping timeout: 252 seconds] |
| 11:19:10 | | drin joins |
| 11:21:44 | | geezabiscuit quits [Ping timeout: 252 seconds] |
| 11:21:44 | | drin is now known as geezabiscuit |
| 11:26:59 | | icedice (icedice) joins |
| 11:28:29 | | vegbrasil quits [Remote host closed the connection] |
| 11:38:14 | | Letur quits [Ping timeout: 252 seconds] |
| 11:42:04 | <pabs> | https://theconversation.com/succession-on-the-tibetan-plateau-whats-at-stake-in-the-battle-over-the-dalai-lamas-reincarnation-202353 |
| 11:49:24 | <masterX244> | The dupes_db should be a memmapped file, on grab-site it fails on large and goes to smaller sizes and then dupes_db is used but thatmfile is just a helper file and no useful data once crawl finished |
| 11:49:42 | | vegbrasil joins |
| 11:59:41 | | c3manu (c3manu) joins |
| 12:24:11 | | etnguyen03 (etnguyen03) joins |
| 12:30:42 | | vegbrasi_ joins |
| 12:34:20 | | vegbrasil quits [Ping timeout: 252 seconds] |
| 12:34:50 | | icedice quits [Client Quit] |
| 12:40:03 | | icedice (icedice) joins |
| 12:55:36 | | anityatva joins |
| 13:00:10 | | anityatva quits [Remote host closed the connection] |
| 13:00:20 | | vegbrasi_ quits [Remote host closed the connection] |
| 13:00:49 | | vegbrasil joins |
| 13:10:25 | | AmAnd0A quits [Ping timeout: 265 seconds] |
| 13:11:08 | | AmAnd0A joins |
| 13:18:23 | | diggan (diggan) joins |
| 13:54:54 | | Letur joins |
| 14:45:16 | | emberquill quits [Quit: The Lounge - https://thelounge.chat] |
| 14:46:09 | | emberquill (emberquill) joins |
| 15:03:17 | <nicolas17> | masterX244: I wonder how many dupes_db's were uploaded to archive.org... |
| 15:46:57 | <masterX244> | not sure. i usually only upload the *.warc.gz's (and sometimes data that i crunched as input for the crawl when i bruteforced/pre-checked something. had to brtuteforce once via TOR to avoid losing a IP address for site archival after already burning 2 on a initial enumerate, there was a bug with high server load that must have tripped off monitoring) |
| 15:52:00 | <fireonlive> | someone told me the whole folder was good but gave me a couple cleanup instructions |
| 15:52:19 | <fireonlive> | like if there’s a -wal with wpull.db |
| 15:52:43 | <fireonlive> | notes are on laptop and i’m on phone atm but i think that was about it |
| 15:58:32 | <masterX244> | the WARCs are the juice. i sometimes got jobs where the wpull.db is mostly garbage (rabbit holes and other shit, grab-site finds urls that shouldnt exist for some odd reason, had some jobs where 9GB of a 10GB wpull db was garbage to ignore off |
| 16:02:50 | | imer quits [Quit: Ping timeout (120 seconds)] |
| 16:03:16 | | imer (imer) joins |
| 16:09:24 | <nicolas17> | yeah |
| 16:09:33 | <nicolas17> | I'm just curious how many people *didn't* do that cleanup properly :D |
| 16:21:03 | <fireonlive> | ahh |
| 17:03:34 | | g0tmk joins |
| 17:13:48 | | yano quits [Quit: WeeChat, the better IRC client, https://weechat.org/] |
| 17:16:44 | | yano (yano) joins |
| 17:55:00 | | vegbrasi_ joins |
| 17:58:00 | | etnguyen03 quits [Ping timeout: 265 seconds] |
| 17:58:29 | | vegbrasil quits [Ping timeout: 265 seconds] |
| 17:59:56 | | vegbrasi_ quits [Ping timeout: 265 seconds] |
| 18:09:22 | | vegbrasil joins |
| 18:15:53 | | vegbrasil quits [Ping timeout: 252 seconds] |
| 18:23:20 | | vegbrasil joins |
| 18:51:08 | | Barto (Barto) joins |
| 19:10:31 | | etnguyen03 (etnguyen03) joins |
| 19:23:17 | | Ivan2261 is now known as Ivan226 |
| 19:30:39 | | pabs quits [Read error: Connection reset by peer] |
| 19:45:55 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 19:46:12 | | AmAnd0A joins |
| 20:08:02 | | driib quits [Quit: The Lounge - https://thelounge.chat] |
| 20:08:59 | | driib (driib) joins |
| 20:29:10 | | katocala quits [Remote host closed the connection] |
| 20:35:01 | | c3manu quits [Client Quit] |
| 20:35:09 | | driib quits [Remote host closed the connection] |
| 20:35:27 | | jlwoodwa joins |
| 20:35:41 | | driib (driib) joins |
| 20:41:09 | | Catdurid joins |
| 20:49:55 | | driib quits [Remote host closed the connection] |
| 20:50:26 | | driib (driib) joins |
| 20:57:07 | <bigdata> | does anyone know where i can find datasets similar to what used to be on https://opendata.rapid7.com ? particularly the dns dumps, but the rest webpages and ports would also be nice to have |
| 21:02:02 | | katocala joins |
| 21:02:37 | | katocala is now authenticated as katocala |
| 21:08:02 | | geezabiscuit quits [Ping timeout: 252 seconds] |
| 21:09:49 | | geezabiscuit (geezabiscuit) joins |
| 21:40:20 | | etnguyen03 quits [Ping timeout: 265 seconds] |
| 21:41:18 | | Mateon1 quits [Ping timeout: 265 seconds] |
| 21:46:50 | | etnguyen03 (etnguyen03) joins |
| 21:51:00 | | Mateon1 joins |
| 22:02:24 | | hitgrr8 quits [Client Quit] |
| 22:11:32 | | BlueMaxima joins |
| 22:46:52 | | jlwoodwa quits [Ping timeout: 252 seconds] |
| 23:18:54 | | pabs (pabs) joins |
| 23:27:05 | | MactasticMendez (MactasticMendez) joins |
| 23:35:53 | | g0tmk quits [Remote host closed the connection] |