OpenAI has launched GPTBot, a new web crawler to improve future artificial intelligence models like GPT-4 and the future GPT-5. This system scours the web for data that can enhance AI technology’s accuracy, capabilities, and safety. Reportedly, it should strictly filter out any paywall-restricted sources, sources that violate OpenAI’s policies, or sources that gather personally identifiable information.
The announcement of a web crawler comes as OpenAI is being investigated on how it obtained the data used to build its AI models by the U.S. Federal Trade Commission. A July civil investigative demand wants detailed information on OpenAI’s datasets, including how much was obtained from publicly available websites.
The web crawler also will not gather personally identifiable information (PII), such as one’s full name, social security number, bank account number and the like. OpenAI said the tool will be used to “improve future models. The company said allowing access can “help AI models become more accurate and improve their general capabilities and safety.”
If users do not want GPTBot to crawl their website, they will need to disallow permissions in their site’s backend. Platform owners can also specify parts of their site the bot can and cannot access.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.