To fight AI, we need 'personhood credentials,' say AI firms

31337@sh.itjust.works · 10 days ago

This is more complicated than some corporate infrastructures I’ve worked on, lol.

31337@sh.itjust.works · 11 days ago

Production AI is highly tuned by training data selection and human feedback. Every model has its own style that many people helped tune. In the open model world there are thousands of different models targeting various styles. Waifu Diffusion and GPT-4chan, for example.

31337@sh.itjust.works · 12 days ago

I think you have your janitor example backwards. Spending my time revolutionizing energy productions sounds much more enjoyable than sweeping floors. Same with designing an effective floor sweeping robot.

31337@sh.itjust.works · 12 days ago

AI are people, my friend. /s

But, really, I think people should be able to run algorithms on whatever data they want. It’s whether the output is sufficiently different or “transformative” that matters (and other laws like using people’s likeness). Otherwise, I think the laws will get complex and nonsensical once you start adding special cases for “AI.” And I’d bet if new laws are written, they’d be written by lobbiests to further erode the threat of competition (from free software, for instance).

31337@sh.itjust.works · 13 days ago

The search engine LLMs suck. I’m guessing they use very small models to save compute. ChatGPT 4o and Claude 3.5 are much better.

31337@sh.itjust.works · 13 days ago

Yeah, the image bytes are random because they’re already compressed (unless they’re bitmaps, which is not likely).

31337@sh.itjust.works · 14 days ago

Donation, patronage, gift economy, mutual aid, or whatever you want to call it is fine by me. People can pirate a lot of proprietary software as well, yet people still pay.

31337@sh.itjust.works · 14 days ago

To fight AI, we need 'personhood credentials,' say AI firms

31337@sh.itjust.works · 14 days ago

Yet, people still pay for it.

31337@sh.itjust.works · 15 days ago

The problem is that HP writes drivers and software for those things for Windows, but not for Linux, so Linux depends on random people to write software for those things for free (which often involves complex reverse-engineering). With Linux you need to make sure you use widely-used hardware that someone has already written support for (this is mostly applicable to laptops and peripherals, which often use custom non-standard hardware). There may be a way to fix your problems, but you’ll have to search forums or issue trackers for the solutions, and they’re probably pretty involved to get working correctly. The router crashing thing is probably just a coincidence though, or the laptop is using a feature that’s broken on your router.

31337@sh.itjust.works · 15 days ago

There’s also Delecta Ltd, which is an Australian sex toy maker and a mining company.

31337@sh.itjust.works · 19 days ago

OSMC’s Vero V looks interesting. Pi 4 with OSMC or Librelec could work. I’m probably going to do something like this pretty soon. I just set up an *arr stack last week, and just using my smart TV with the jellyfin app installed ATM.

My PC running the Jellyfin server can’t transcode some videos though; probably going to put an Arc a310 in it.

31337@sh.itjust.works · 22 days ago

In the Texas counties I’m most familiar with, if you’re arrested and they don’t have a good case, they just keep resetting court dates for years instead of going ahead with the process. If you can’t afford a bond, you’ll be in jail that whole time (which pressures people to take plea deals), if you can secure a bond, you’re out, but with limited rights and a whole lot of hassles to deal with.

31337@sh.itjust.works · 28 days ago

I thought the tuning procedures, such as RLHF, kind of messes up the probabilities, so you can’t really tell how confident the model is in the output (and I’m not sure how accurate these probabilities were in the first place)?

Also, it seems, at a certain point, the more context the models are given, the less accurate the output. A few times, I asked ChatGPT something, and it used its browsing functionality to look it up, and it was still wrong even though the sources were correct. But, when I disabled “browsing” so it would just use its internal model, it was correct.

It doesn’t seem there are too many expert services tied to ChatGPT (I’m just using this as an example, because that’s the one I use). There’s obviously some kind of guardrail system for “safety,” there’s a search/browsing system (it shows you when it uses this), and there’s a python interpreter. Of course, OpenAI is now very closed, so they may be hiding that it’s using expert services (beyond the “experts” in the MOE model their speculated to be using).

31337@sh.itjust.works · 28 days ago

I find Kagi results a little bit better than Google’s (for most things). I like that certain categories of results are put in their own sections (listicles, forums) so they’re easy to ignore if you want. I like that I can prioritize, deprioritize, block, or pin results from certain domains. I like that I can quickly switch “lenses” to one of the predefined or custom lenses.

31337@sh.itjust.works · 28 days ago

Their line goes up when they show they’re investing in AI, and it goes down when it looks like they’re falling behind or not investing enough in it.

TBH, a lot of times I find myself interacting with ChatGPT instead of searching. It’s overhyped, but it’s useful.

31337@sh.itjust.works · 1 month ago

I like the Turris Omnia and (highly configurable) Turris Mox. They come with OpenWrt installed.

31337@sh.itjust.works · 1 month ago

IDK, looks like 48GB cloud pricing would be 0.35/hr => $255/month. Used 3090s go for $700. Two 3090s would give you 48GB of VRAM, and cost $1400 (I’m assuming you can do “model-parallel” will Llama; never tried running an LLM, but it should be possible and work well). So, the break-even point would be <6 months. Hmm, but if Severless works well, that could be pretty cheap. Would probably take a few minutes to process and load a ~48GB model every cold start though?

31337@sh.itjust.works · 1 month ago

The EFF link I posted above provides evidence. Again, here’s a quote from part of it:

The process of machine learning for generative AI art is like how humans learn—studying other works—it is just done at a massive scale. Huge swaths of data (images, videos, and other copyrighted works) are analyzed and broken into their factual elements where billions of images, for example, could be distilled into billions of bytes, sometimes as small as less than one byte of information per image. In many instances, the process cannot be reversed because too little information is kept to faithfully recreate a copy of the original work.

As I mentioned before, Copilot at least, helps people avoid copyright infringement by notifying you if your code is similar to public code. The solution I’m proposing is no new laws, and just enforcing the ones we have. Most of the laws being proposed look like attempts at regulatory capture to me.

31337@sh.itjust.works · 1 month ago

That we already have laws that protect copyright infringement (which seem like they would still apply if it was spit out by an LLM or not), and no more should be made. That training on public data is fine.

31337@sh.itjust.works · 1 month ago

I’m saying using code for training is a different issue that copyright infringement. I edited my post above to better lay out my position.