Aiui, back-feeding uncurated slop is a real problem. But curated slop is fine. So they can either curate slop or scrape websites, which is almost free. So even though synthetic training data is fine, they still prefer to scrape websites because it’s easier / cheaper / free.
Aiui, back-feeding uncurated slop is a real problem. But curated slop is fine. So they can either curate slop or scrape websites, which is almost free. So even though synthetic training data is fine, they still prefer to scrape websites because it’s easier / cheaper / free.