Trying to ignore any hype, lofty sci-fi ideas, or potential philosophical questions for a moment: roughly speaking, it sounds like this is a search engine, for use in a neat and thought-provoking use case.
There's an architecture diagram[1] alongside the source code, and my summary would be:
- The system has in-house web indexes built from Common Crawl[2] data
- The system receives snippets of text from Wikipedia and determines whether existing citations exist and whether they are valid
- If no valid citation exists, then the system performs queries against the indexes to find relevant URLs
It'd be interesting to learn how this approach fares compared to pasting the relevant paragraphs of text into search engines and excluding site:wikipedia.org from the results.
Something about feedback loops and data quality makes me wary that too much application of automated systems like this would lead to a degradation of content quality (each updated copy an imperfect translation or reference to an existing one).
>Building on Meta AI’s research and advancements, we’ve developed the first model capable of automatically scanning hundreds of thousands of citations at once to check whether they truly support the corresponding claims.
Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."
There's an architecture diagram[1] alongside the source code, and my summary would be:
- The system has in-house web indexes built from Common Crawl[2] data
- The system receives snippets of text from Wikipedia and determines whether existing citations exist and whether they are valid
- If no valid citation exists, then the system performs queries against the indexes to find relevant URLs
It'd be interesting to learn how this approach fares compared to pasting the relevant paragraphs of text into search engines and excluding site:wikipedia.org from the results.
Something about feedback loops and data quality makes me wary that too much application of automated systems like this would lead to a degradation of content quality (each updated copy an imperfect translation or reference to an existing one).
[1] - https://github.com/facebookresearch/side/tree/a595fb09c85233...
[2] - https://commoncrawl.org/