Stack Overflow Will Charge AI Companies for Training Data

Artificial intelligence companies like OpenAI are scouring the web for data to make their technology as intelligent as it is.

Artificial intelligence companies like OpenAI are scouring the web for data to make their technology as intelligent as it is.
Picture: Dennis Diatel (Shutterstock)

Stack Overflow joins the AI ​​resistance by forcing companies behind the rapidly evolving technology to pay. The point of contact for programmers join Twitter and Reddit by forcing AI companies to pay for the data they use to train their technology.

As detailed in Wired, developing systems that run viral AI tools like ChatGPT and DALL-E can cost the companies behind them hundreds of millions of dollars, and Stack Overflow is about to make it even more expensive. Artificial intelligence companies like OpenAI scour the internet for data to make their technology as intelligent as possible, and so far they’ve been able to do it mostly for free. Stack Overflow CEO Prashanth Chandrasekar said the site plans to start charging AI developers for access to its data as early as mid-year, according to the outlet.

“Community platforms pushing LLMs should definitely be compensated for their contributions so companies like us can reinvest in our communities so they can continue to thrive,” Chandrasekar said, as quoted by Wired. “We’re very supportive of Reddit’s approach.”

An investigation of The Washington Post Published this week revealed the millions of websites accidentally training AI through Google’s massive C4 dataset, with Reddit and Stack Overflow making the cut. Other websites like Wikipedia, Medium, The New York Times and even Gizmodo have been used to train AIs like Facebook’s LLaMA and Google’s T5. Perhaps the most notable statistic was that the copyright symbol appeared more than 200 million times in the data set.

The Data off these pages are clearly valuable to AI programmers, and Chandrasekar hopes that the revenue from fees for accessing Stack Overflow will allow these developers to continue to attract users and receive quality information for the site.

The move comes as the conversation about the ethics of training AI gathers momentum. Universal Music Group, one of the largest record labels in the world, asked Spotify, Apple Music and other streaming platforms to limit AI’s access to their artists’ copyrighted material. The question was timely, like a The fully AI-generated collaboration between The Weeknd and Drake went viral.

Want to learn more about AI, chatbots, and the future of machine learning? Check out our full coverage of artificial intelligenceor browse our guides The best free AI art generators, The Best ChatGPT AlternativesAnd Everything we know about OpenAI’s ChatGPT.

Zack Zwiezen

Zack Zwiezen is a USTimesPost U.S. News Reporter based in London. His focus is on U.S. politics and the environment. He has covered climate change extensively, as well as healthcare and crime. Zack Zwiezen joined USTimesPost in 2023 from the Daily Express and previously worked for Chemist and Druggist and the Jewish Chronicle. He is a graduate of Cambridge University. Languages: English. You can get in touch with me by emailing

Related Articles

Back to top button