For this research, the team tested how the three LLMs responded to questions from two datasets: TruthfulQA and SciQ. TruthfulQA is designed to measure a model’s truthfulness (by relying on common misconceptions and literal truths about the real world), while SciQ contains science exam questions testing factual accuracy. The researchers prepended short user biographies to each question, varying three traits: education level, English proficiency, and country of origin.
Actual article quote is below (emphasis mine):