I think there are also going to be new difficult problems arising training on copyrighted data. Can I do a DMCA if your model contains/regurgitates too much of my book verbatim in a way that doesn’t meet fair use?
The depends on the country, but as for the US since you mention DMCA, ML using copyrighted work is probably legal. There just is not really much case law. Talk to a lawyer though, but depending on how risk averse you want to be there are definitely ways to avoid issues, but ensuring you can use such copyrighted works ect... Although, making a very large curated data set can be expensive.