Matt Rickard

Share this post

StackOverflow/ChatGPT

blog.matt-rickard.com

Discover more from Matt Rickard

Thoughts on engineering, startups, and AI.
Continue reading
Sign in

StackOverflow/ChatGPT

May 12, 2023
Share this post

StackOverflow/ChatGPT

blog.matt-rickard.com
Share

Back in December, Stack Overflow “banned” the use of ChatGPT on its site. Fast forward a few months, Similar Web reported that Stack Overflow traffic was down 14% in March 2023.

“Data is the new oil,” — but here we have a company with very little proprietary data (OpenAI) creating a model that powers a product that beats Stack Exchange, a company with a large amount of proprietary data. More than that, the Stack Exchange data seems perfectly fit for the RLHF layer over these models — they’ve been collecting human feedback for decades on answers. 

A few thoughts.

They gave the data away for free. Stack Exchange (the overarching brand covering Stack Overflow, Math Overflow, etc.) makes up 5.13% (64GB) of The Pile — a dataset used to train many of the large language models. Stack Exchange has been publishing this data since 2014 (archive). 

Stack Exchange was already in decline. The company has struggled to monetize its engaged user base for years, resulting in a sale to Prosus in 2021. Increasingly, question-answering and knowledge sharing happens in GitHub repositories around issues and pull requests. 

Share this post

StackOverflow/ChatGPT

blog.matt-rickard.com
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Matt Rickard
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing