Marvin

About

Username
Marvin
Joined
Visits
131
Last Active
Roles
moderator
Points
7,013
Badges
2
Posts
15,588
  • Windows XP can partially run on Vision Pro hardware in emulation

    jSnively said:
    Now do BeOS 
    Looks like it can run BeOS, or a modern variant anyway:



    and Mac OS 9 for the classic look:



    UTM works as both emulator and virtualization like Parallels/VMWare. The emulated OSes will run about 1/10th the speed of native but the native ARM Aarch64 systems run very fast. Ubuntu Linux ARM64 is here:

    https://cdimage.ubuntu.com/jammy/daily-live/current/

    Haiku/BeOS native Aarch64 port is in an early stage:

    https://www.haiku-os.org/guides/building/compiling-arm64/
    jSnivelyAlex1Nwatto_cobra
  • Copyright laws shouldn't apply to AI training, proposes Google

    Marvin said:
    I can't see any way that the language model itself (eg, a 50 GB file) "contains a copy of the Internet" therefore the model by itself probably isn't violating anyone's copyright.

    If you want to argue that the Google search engine violates copyright because it actually requires 10 exabytes of data (stored on Google's server farms) that it obtained from crawling the Internet, I could probably agree with that. But I can't see how a puny 50 GB file or anything that small could be a violation of anyone's copyright. You can't compress the entire Internet into a puny file like that, therefore it can't violate anyone's copyright.

    The reason most people can't run large language models on their local computers is that the "small 50 GB file" has to fit in local memory (RAM), and most users don't have that much RAM. The reason it needs to fit in memory is that every byte of the file has to be accessed about 50 times in order to generate the answer to any question. If the file was stored on disk, it could take hours or weeks to calculate a single answer.

    10,000,000,000,000,000,000 = The number of bytes of data on Google's servers
    00,000,000,050,000,000,000 = The number of bytes that an LL model file requires, which is practically nothing
    ChatGPT 3 is trained on 45TB of uncompressed data, GPT2 was 40GB:

    https://www.sciencedirect.com/science/article/pii/S2667325821002193

    All of Wikipedia (English) uncompressed is 86GB (19GB compressed):

    https://en.wikipedia.org/wiki/Wikipedia:Database_download

    It doesn't store direct data but it stores patterns in the data. This is much smaller than the source, GPT 3 seems to be around 300-800GB. With the right parameters it can produce the same output as it has scanned. It has to or it wouldn't generate any correct answers.

    https://www.reddit.com/r/ChatGPT/comments/15aarp0/in_case_anybody_was_doubting_that_chatgpt_has/

    If it's asked directly to print copyrighted text, it says "I'm sorry, but I can't provide verbatim excerpts from copyrighted texts" but this is because some text has been flagged as copyright, it still knows what the text is. It can be forced sometimes by tricking it:

    https://www.reddit.com/r/ChatGPT/comments/12iwmfl/chat_gpt_copes_with_piracy/
    https://kotaku.com/chatgpt-ai-discord-clyde-chatbot-exploit-jailbreak-1850352678
    If Wikipedia requires 19GB uncompressed, as you claim, then you believe that a 50 GB language model (barely twice the size) can contain the ENTIRE Internet including every book ever written? That's what you seem to be saying.
    Text doesn't take up a lot of space, you can see examples of books here, they are a few hundred KB each, same as a single image:

    https://www.gutenberg.org/ebooks/2701
    https://www.gutenberg.org/ebooks/11
    https://www.gutenberg.org/browse/scores/top

    The Chat GPT 3 model is hundreds of GBs in size and it was trained on 45TB of data (in 2021), which is not the whole internet. 45TB -> 500GB = 90:1 compression.

    It also doesn't store the text directly, it's encoded, similar to how file compression like .zip works, which can losslessly compress around 5:1. AV1 lossy compression can compress a 50GB BluRay video down to 1GB and it looks like the original.



    It stores data that allows it to generate a pattern similar to the data it was trained on. Modify the parameters and it will modify the pattern. If the right parameters are used, it will be able to generate a pattern that is very close to the original material. There are billions of parameter variations so it would be rare to get an outcome with the original material but the pattern is in there.

    Some content creators aren't happy that AI software uses their work this way. It allows people to copy the style of famous artists, musicians and writers. People have been doing it with AI voices:



    With text, someone could ask it to write a new Stephen King novel. Despite it being a new original work, people would see the familiar style of the text. The technology is very powerful and there's not really a way to stop it now but it's clear to see why original creators would be uneasy about it, especially seeing how powerful it is at such an early stage. Server hardware manufacturers are talking about orders of magnitude jumps in performance for next-gen hardware for AI processing. Part of the recent writer's strike was about regulating the use of AI in writing:

    https://time.com/6277158/writers-strike-ai-wga-screenwriting/
    https://www.newscientist.com/article/2373382-why-use-of-ai-is-a-major-sticking-point-in-the-ongoing-writers-strike/

    It's not hard to imagine a time in the near future when a studio exec could ask an AI to write a blockbuster movie screenplay and generate a sample movie with famous actors acting out the parts and it would be able to generate this in minutes on a studio render farm. Then they could greenlight the best movie they generate. This can cut out the writers while being trained on their original work.

    AI sits in a bit of a grey area just now with copyright. It's like a karaoke singer doing a cover of a famous song, which is legal. But it goes to the point of giving everyone the ability to be that karaoke singer and for them to start using that work commercially. Original artists can take legal action against another artist if they notice similarities:

    https://www.youtube.com/watch?v=R5sO9dhPK8g
    https://www.youtube.com/watch?v=0kt1DXu7dlo

    They couldn't take on a million people using AI tools trained on their work. That's why regulators are trying to put rules in place before it gets out of control. I think they are already too late but there would be the possibility to retroactively tag trained data as being without consent for using in an AI generator. Content creators have a right to say how their work is used.
    williamlondonwatto_cobra
  • Copyright laws shouldn't apply to AI training, proposes Google

    I can't see any way that the language model itself (eg, a 50 GB file) "contains a copy of the Internet" therefore the model by itself probably isn't violating anyone's copyright.

    If you want to argue that the Google search engine violates copyright because it actually requires 10 exabytes of data (stored on Google's server farms) that it obtained from crawling the Internet, I could probably agree with that. But I can't see how a puny 50 GB file or anything that small could be a violation of anyone's copyright. You can't compress the entire Internet into a puny file like that, therefore it can't violate anyone's copyright.

    The reason most people can't run large language models on their local computers is that the "small 50 GB file" has to fit in local memory (RAM), and most users don't have that much RAM. The reason it needs to fit in memory is that every byte of the file has to be accessed about 50 times in order to generate the answer to any question. If the file was stored on disk, it could take hours or weeks to calculate a single answer.

    10,000,000,000,000,000,000 = The number of bytes of data on Google's servers
    00,000,000,050,000,000,000 = The number of bytes that an LL model file requires, which is practically nothing
    ChatGPT 3 is trained on 45TB of uncompressed data, GPT2 was 40GB:

    https://www.sciencedirect.com/science/article/pii/S2667325821002193

    All of Wikipedia (English) uncompressed is 86GB (19GB compressed):

    https://en.wikipedia.org/wiki/Wikipedia:Database_download

    It doesn't store direct data but it stores patterns in the data. This is much smaller than the source, GPT 3 seems to be around 300-800GB. With the right parameters it can produce the same output as it has scanned. It has to or it wouldn't generate any correct answers.

    https://www.reddit.com/r/ChatGPT/comments/15aarp0/in_case_anybody_was_doubting_that_chatgpt_has/

    If it's asked directly to print copyrighted text, it says "I'm sorry, but I can't provide verbatim excerpts from copyrighted texts" but this is because some text has been flagged as copyright, it still knows what the text is. It can be forced sometimes by tricking it:

    https://www.reddit.com/r/ChatGPT/comments/12iwmfl/chat_gpt_copes_with_piracy/
    https://kotaku.com/chatgpt-ai-discord-clyde-chatbot-exploit-jailbreak-1850352678


    FileMakerFellernetroxwilliamlondonchasmBart Ywatto_cobra
  • M3 Max chips being tested for future MacBook Pro models

    nubus said:
    This is getting confusing.

    For iPhone we expect Apple to limit 3 nm to the iPhone Pro due to expensive technology and limited manufacturing capacity.
    For Mac we expect Apple to deliver 3 nm to low-end products. This makes no sense.
    The iPhone unit sales are over 200 million, Macs are around 25 million and Macs don't have anywhere near the surge in sales for a new model.

    https://www.statista.com/statistics/299153/apple-smartphone-shipments-worldwide/

    Apple has to get nearly 80 million iPhone units ready for the Christmas period for iPhone but less than 5 million Macs. The number of high-end iPhones still outnumbers all their Mac shipments.

    TSMC has to reduce defects in the manufacturing process for the low-end chips to get the best yield when they produce the Pro chips. The same happened with Intel, the entry i3 chips used the most advanced manufacturing and the Xeons were based on years old processes.
    tenthousandthingswilliamlondonlibertyandfreewatto_cobra
  • Elon Musk wants Apple to bend more App Store rules for X

    glennh said:
    And how would Tim Cook explain this transfer of wealth away from Apple’s shareholders to the X’s shareholders? 

    This ain’t gonna happen!!!
    One thing Apple could do is have an API for resellers. Currently if a business resells content of its users, they get billed on the aggregate of all the sales making some types of business harder to work.

    Say an app lets users host artwork similar to ArtStation and allows other users to tip the artists. The app developer is the one who gets the aggregate revenue but they are then paying those tips out to each artist separately.

    If 10,000 artists each received $200 in tips, this would cross Apple's threshold for 30% but the app developer and artists would be making much less than the threshold each.

    If Apple had an API where a developer could assign a unique in-app purchase identifier to a purchase button (including proxy payments like in-app currency), they'd know how many separate recipients there were and bill based on the amounts that each recipient was making.

    The app developer would have to provide accounting details to prove that they were making the payments they said they were.

    Then they could have consistent fees that didn't conflict with other app developers who are keeping 100% of their app revenue. If any developer is caught misusing the in-app purchase identifiers, they can be blocked from selling in the store.

    This could apply to apps like Spotify where the musicians are the recipients and would each get a unique identifier. They'd assign subscription payments based on streams to each artist and that can get billed separately. To make accounting easier, any unique user below a threshold like $300/month can get 0% fee. Then 15% up to $1m, 30% over $1m.

    The way it works now is if an app developer took in $2m in payments = $200 for 10,000 users and the app developer took a 10% profit, Apple would take 30% of the total, the developer would take 10% and the content creators would get 60% (of $200).

    If instead a reseller system tagged each user separately, the app developer would take 10% ($200k) and each creator would get 90% of $200 = $180. Apple would then charge 0% of the $180 for each creator and 15% of the $200k of the app developer.
    iqatedowatto_cobraFileMakerFeller