@ricecake

ricecake@sh.itjust.works · 10 days ago

Yeah, it’s definitely faster, but I’m not sure it’s going to make too much of a difference for a Minecraft server.

With setting it up being a bit annoying by hand, I’d still rank the router option higher even if it’s a worse VPN. Otherwise you risk ending up in that yak shaving situation where you’re fighting with routing tables and DNS when you wanted a Minecraft server.

ricecake@sh.itjust.works · 10 days ago

Oh for sure. What I meant was “check router for a built in VPN and use it if it has one, otherwise use wireguard because it’s the easiest”.

The specific VPN doesn’t really matter so much. The built-in one would be the easiest, so checking for a solution that took a few clicks is worth it. :)

ricecake@sh.itjust.works · 11 days ago

I would use something like wireguard, or another VPN service you can host yourself if your router supports it natively.

From the looks of it Minecraft servers seem to have dogshit authentication, so using some form of private network setup is going to be your best move.

ricecake@sh.itjust.works · 12 days ago

Eeeh, I still think diving into the weeds of the technical is the wrong way to approach it. Their argument is that training isn’t copyright violation, not that sufficient training dilutes the violation.

Even if trained only on one source, it’s quite unlikely that it would generate copyright infringing output. It would be vastly less intelligible, likely to the point of overtly garbled words and sentences lacking much in the way of grammar.

If what they’re doing is technically an infringement or how it works is entirely aside from a discussion on if it should be infringement or permitted.

ricecake@sh.itjust.works · edit-2 13 days ago

Basing your argument around how the model or training system works doesn’t seem like the best way to frame your point to me. It invites a lot of mucking about in the details of how the systems do or don’t work, how humans learn, and what “learning” and “knowledge” actually are.

I’m a human as far as I know, and it’s trivial for me to regurgitate my training data. I regularly say things that are either directly references to things I’ve heard, or accidentally copy them, sometimes with errors.
Would you argue that I’m just a statistical collage of the things I’ve experienced, seen or read? My brain has as many copies of my training data in it as the AI model, namely zero, but “Captain Picard of the USS Enterprise sat down for a rousing game of chess with his friend Sherlock Holmes, and then Shakespeare came in dressed like Mickey mouse and said ‘to be or not to be, that is the question, for tis nobler in the heart’ or something”. Direct copies of someone else’s work, as well as multiple copyright infringements.
I’m also shit at drawing with perspective. It comes across like a drunk toddler trying their hand at cubism.

Arguing about how the model works or the deficiencies of it to justify treating it differently just invites fixing those issues and repeating the same conversation later. What if we make one that does work how humans do in your opinion? Or it properly actually extracts the information in a way that isn’t just statistically inferred patterns, whatever the distinction there is? Does that suddenly make it different?

You don’t need to get bogged down in the muck of the technical to say that even if you conceed every technical point, we can still say that a non-sentient machine learning system can be held to different standards with regards to copyright law than a sentient person. A person gets to buy a book, read it, and then carry around that information in their head and use it however they want. Not-A-Person does not get to read a book and hold that information without consent of the author.
Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.

Computers think the same way boats swim. Arguing about the difference between hands and propellers misses the point that you don’t want a shrimp boat in your swimming pool. I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy, I care that it ruins the whole thing for the people it exists for in the first place.

I think all the AI stuff is cool, fun and interesting. I also think that letting it train on everything regardless of the creators wishes has too much opportunity to make everything garbage. Same for letting it produce content that isn’t labeled or cited.
If they can find a way to do and use the cool stuff without making things worse, they should focus on that.

ricecake@sh.itjust.works · 16 days ago

So, at the time (1930) ball jar actually would have qualified as big business in the sense that you mean.
Home canning was very popular and they consistently bought out smaller companies.
Since they were privately owned, it’s tricky to find specifics about value, but they were “found a university”, “own a company town or two”, “chairman of the federal reserve” levels of rich.

So actually a pretty good use of government.

ricecake@sh.itjust.works · 16 days ago

As written the headline is pretty bad, but it seems their argument is that they should be able to train from publicly available copywritten information, like blog posts and social media, and not from private copywritten information like movies or books.

You can certainly argue that “downloading public copywritten information for the purposes of model training” should be treated differently from “downloading public copywritten information for the intended use of the copyright holder”, but it feels disingenuous to put this comment itself, to which someone has a copyright, into the same category as something not shared publicly like a paid article or a book.

Personally, I think it’s a lot like search engines. If you make something public someone can analyze it, link to it, or derivative actions, but they can’t copy it and share the copy with others.

ricecake@sh.itjust.works · 17 days ago

Aluminum cylinders only.

Not aluminum? Not interested. Not a cylinder? Not a chance.

Squared off glass cylinder? Legally prohibited.

ricecake@sh.itjust.works · 17 days ago

The weird thing is, they don’t actually sell the jars anymore. “Ball jars” are not made by the ball jar corporation after their antitrust lawsuits for being a fucking jar monopoly. So they sold the “ball jar” rights and now only do aluminum cans for food packaging and high end satellites and satellite launch systems.

ricecake@sh.itjust.works · 22 days ago

So that’s what third party cookies are. What this does is make it so that when you go to example.com and you get a Google cookie, that cookie is only associated with example.com, and your random.org Google cookie will be specific to that site.

A site will be able to use Google to track how you use their site, which is a fine and valid thing, but they or Google don’t get to see how you use a different site. (Google doesn’t actually share specifics, but they can see stuff like “behavior on one site led to sale on the other”)

ricecake@sh.itjust.works · 1 month ago

Yeah, it definitely might still be a bad data source,and it’s shady either way, just pointing out that “not public data” has a few meanings, and not all of them are synonymous with “private data”.

ricecake@sh.itjust.works · 1 month ago

I feel like that might be bad phrasing on the part of the article. They mainly aggregate public records, like legal document style public records, and they also scrapped data from not-(public record) data, which isn’t the same as (not-public) record data.

I feel like I would want more details to be sure though, but scrapping usually refers to “generally available” data.

ricecake@sh.itjust.works · 2 months ago

Totally. It’s double weird, because it’s not a petitionable issue, it’s a form where you make your case and a committee decides, and they already have the symbol and they just seem to want it to be usable like 💲, which isn’t a thing.

ricecake@sh.itjust.works · 2 months ago

I am aware of the lists and guidelines, I’ve been linking and quoting them to you. :)

It’s their report on the standards that highlights that they don’t think there’s a clear distinction between “emoji” and “character”, and that it’s mostly a matter of user expectation.
Hence some pictograph characters having a default “text” presentation, and some having a default “emoji” presentation. They also clarify that some things with a default “emoji” presentation aren’t in the set of characters people would associate with emoji and shouldn’t be counted if you’re trying.

I understand what you’re saying, which is that the selection criteria is different for a “language symbol” as opposed to a “pictographic symbol”, so they’re different things.
I disagree and think that “default presentation” might be a better metric, but that ultimately it’s about user and platform expectations. The same character can be presented “emoji” style or “text” style depending on context.

In any case, I’d also agree that there’s no viability to the notion that people use the Bitcoin symbol in a way that’s independent of the one meaning that it has, so a colorful cartoony rendition becoming an option doesn’t really fit. “His Christmas gift was $$$” is a sentiment people might express. “The hotel is ₿₿₿” just … Isn’t.

ricecake@sh.itjust.works · 2 months ago

Gotcha, so ⌚(U+231A, miscellaneous technical block) isn’t an emoji, despite it clearly being a pictograph, and there are only 80 emoji?

I feel like this definition isn’t in line with either the lay definition of emoji, nor the technical definition

Emoji are pictographs (pictorial symbols) that are typically presented in a colorful cartoon form and used inline in text. They represent things such as faces, weather, vehicles and buildings, food and drink, animals and plants, or icons that represent emotions, feelings, or activities.

People often ask how many emoji are in the Unicode Standard. This question does not have a simple answer, because there is no clear line separating which pictographic characters should be displayed with a typical emoji style.

Emoji are seriously just Unicode characters that sometimes get rendered as a fancy image. That’s it. There’s an entire bit about how different characters have different conventional presentations and a codified system of “default” for image or “text”.

The presentation of a given emoji character depends on the environment, whether or not there is an emoji or text presentation selector, and the default presentation style (emoji versus text). In informal environments like texting and chats, it is more appropriate for most emoji characters to appear with a colorful emoji presentation, and only get a text presentation with a text presentation selector. Conversely, in formal environments such as word processing, it is generally better for emoji characters to appear with a text presentation, and only get the colorful emoji presentation with the emoji presentation selector.

That’s why there’s things like ☣️ and ☣. Same codepoint, but different presentation hints. (I’m assuming that our various systems will do the right thing and capture the presentation hints, otherwise I’m going to look very odd putting the same symbol over and over :-) )

ricecake@sh.itjust.works · 2 months ago

I mean, we have a symbol for effectively any currency that anyone can or wants to fill out the paperwork for and can demonstrate the basics of “this is a meaningful symbol with more than transient relevance”.

They added ₿ in 2016.

https://www.compart.com/en/unicode/category/Sc

ricecake@sh.itjust.works · 2 months ago

There really isn’t a difference between a character and an emoji beyond an emoji being a stylized rendering of a character, or a character whose use is intended as a pictograph.

https://www.unicode.org/reports/tr51/#Introduction

They’re all just Unicode code points, although I suppose there’s some distinction between the characters with more context specific meaning or the ones that are more apt to modification a la 🧑‍⚕️👩🏿‍⚕️. But you’ve also got 💲 and $, where “bold dollar sign” is often represented as green, but “dollar sign” tends to be represented in contextual style. Is ☣ a character or an emoji? What about the thousands of “other symbols” as defined by the Unicode spec which may or may not have special character renderings depending on your platform and font?

And yeah, I didn’t know that character existed, so now it’s doubly confusing why anyone is asking for anything. The symbol has meaning, and it’s in the big book of meaningful symbols. Not sure what more they want.

ricecake@sh.itjust.works · 2 months ago

Bitcoin is stupid, but the point of Unicode is that we have a symbol for everything that has a commonly recognized symbol or representative value, or even uncommonly recognized.

If ⅌ gets a character, or all the symbols of the Byzantine musical notation system, I’m not sure why a typically recognized symbol for a cryptocurrency shouldn’t.

The weird bit is that they put together a petition. All you really need to do is submit a proposal and show that it’s a notable symbol and not owned by anyone in particular or a brand icon.

Here’s the proposal to add “goose” to Unicode. They even added a few joke-y bits, but they made a valid argument that “goose” is a symbol that people recognize. And now… 🪿

ricecake@sh.itjust.works · 2 months ago

The condensed version is that it creates a lot of avenues for a very loose definition of “keeping kids safe” that could easily include “information about dealing with bigoted family” being called “dangerous” at the discretion of an executive branch appointee who thinks that lgbtq identity is “unsafe”.

It also provides more avenues for the government to remove otherwise legal speech from the Internet entirely on the grounds that they have asserted that it’s “bad for children”.
This is literally the long running joke about how you pass draconian laws, and would only be made more on the nose if it was “keeping patriotic kids online safe for the future tax cuts of American freedom”

In general, the government should not be able to silence speech that isn’t immediately and unambiguously harmful.

ricecake@sh.itjust.works · 2 months ago