[Paper] LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions

This study presents the first large-scale survey (N=816) of verified research article authors about their usage and perceptions of Large Language Models (LLMs) as research tools. The findings reveal that 81% of researchers have incorporated LLMs into their workflow, with Information Seeking and Editing reported most frequently. Notably, traditionally disadvantaged groups (non-White, junior, and non-native English speaking) reported higher usage and perceived benefits, potentially improving research equity. However, women, non-binary, and senior researchers expressed greater ethical concerns, highlighting potential barriers to broader adoption. Researchers overall preferred non-profit/open-source LLMs, citing transparency and ethical considerations. This study underscores the need for ethical guidelines and further research to understand the broader impacts of LLMs on research quality, creativity, and equity.

Metadata

Type of Content: Research Paper Preprint
Domain: arXiv.org
Date Published: October 30, 2024
URL: https://arxiv.org/abs/2411.05025

Summary

81% of surveyed researchers (N=816) use LLMs in their research process.
Information Seeking and Editing were the most frequent LLM uses.
Data Analysis and Generation were the least frequent LLM uses.
Non-White, junior, and non-native English speakers reported higher LLM usage and perceived benefits.
Women, non-binary, and senior researchers expressed greater ethical concerns about LLM usage.
Researchers generally prefer open-source/non-profit LLMs to those from for-profit entities.

What makes this novel or interesting

First large-scale study: This is the first large-scale quantitative study on LLM usage and perception among researchers across various disciplines. Previous studies were smaller or focused on specific fields.
Equity implications: The study highlights the potential of LLMs to improve research equity by benefiting traditionally disadvantaged groups. However, it also raises concerns about exacerbating existing inequities due to differing ethical concerns and adoption rates.
Revealing social norms: The research unveils evolving social norms around LLM usage in academia, including disclosure practices, which vary across disciplines.
Preference for open-source: The clear preference for non-profit/open-source LLMs emphasizes the importance of transparency, ethical considerations, and community-driven development in this space.

Verbatim Quotes

Benefits of LLMs:
- "For honest researchers in resource-constrained developing countries, with little to no research funding, availability and use of LLMs is a game-changer leveling the playing field with other researchers in more fortunate climes."
- "LLMs are a great tool to help you create hypotheses, as a way to brainstorm, where there really are no wrong ideas and therefore you cannot suffer with any potential misleading information, as you are expected to have domain expertise anyway."
- "I am not a native English speaker, so LLMs help me with the language barrier.”
Risks and Ethical Concerns:
- "LLMs are tools for automated plagiarism and data fabrication that pose an existential threat to the network of trust essential for the integrity of academic work and the proper attribution of credit."
- “Putting more falsehoods into [the internet's] shared memory is a crime."
- "We need better judgment, slower science, and more thoughtful and ambitious work right now, not the opposite. Otherwise, we risk ridding science of its most special attributes just to crank out more papers."
Disclosure and Attribution:
- "I'm afraid other people will use models to write papers but I only report using it for editing.”
- "[If researchers don't disclose using LLM-generated text], I fear that researchers can get lazy, and we start having a lot of 'repeated text' in articles... and eventually researchers may just ask LLMs to generate the whole paper."
- "...universities have totally different policies. It would be good if there was a generic system of how to indicate that editing or drafting tools were used."

How to report this in the news

A new study shows that the vast majority of academics are already using AI tools in their research, from generating ideas to editing papers. This is a major shift in how science is done, and it could be a game-changer for researchers in less advantaged positions, like those who don’t speak English as their first language. However, there are also ethical concerns, like the possibility of AI generating false information or making it easier to plagiarize. It's like giving everyone a powerful new microscope – it can lead to amazing discoveries, but it also requires careful handling and ethical guidelines. The research community is actively discussing these challenges, and it’s clear that we need to establish clear rules and norms for using AI in research.

Detailed Recap

Inferred LLM-related techniques used or discussed by researchers:

Core LLM Research Techniques:

Information Seeking:
- Paper Discovery: Using LLMs to find relevant research papers (e.g., through semantic search or question answering). Consider tools like Semantic Scholar, Elicit, and Consensus.
- Topic Discovery: Using LLMs to explore research areas and identify relevant subtopics.
- Summary Generation: Using LLMs to summarize research papers or create abstracts. Qlarify was mentioned as a tool for generating expandable summaries.
- Explanation Generation: Using LLMs to explain complex concepts or provide definitions. Consider tools that offer contextual definitions like those discussed in Head et al. (2021) and Head et al. (2024).
Editing and Writing:
- Grammar and Style Correction: Using LLMs to improve grammar, sentence structure, and writing style. Grammarly and similar tools are often mentioned.
- Rephrasing and Rewriting: Using LLMs to paraphrase text or rewrite sections for clarity or different audiences.
- Condensation and Summarization: Using LLMs to shorten text or create concise summaries.
- Formatting Assistance: Using LLMs to format research papers according to specific style guidelines.
Ideation and Framing:
- Brainstorming Research Questions: Using LLMs to generate research ideas and develop research questions. SciMON was mentioned as a tool for generating novel research ideas.
- Framing Research Arguments: Using LLMs to structure arguments and develop compelling narratives for research papers. Sparks was cited as a tool for generating ideas for science writing.
Direct Writing (with caveats):
- Drafting Paragraphs and Sections: Using LLMs to generate text for specific sections of a research paper. This should be approached with caution and requires careful review and editing. MetaWriter was mentioned as a tool exploring AI writing support in peer review.
- Rewriting for Different Styles: Adapting LLM-generated text to match different writing styles or target audiences.
Data Cleaning & Analysis (with strong caveats):
- Data Cleaning and Reformatting: Using LLMs to clean and structure datasets. Proceed cautiously and verify the results, as LLMs can introduce errors.
- Qualitative Data Analysis: Exploring the use of LLMs for tasks like coding qualitative data. This is an emerging area with potential but requires careful validation.
- Statistical Reporting: Using LLMs to generate descriptions or interpretations of statistical results. This should be approached with extreme caution due to the risk of inaccuracies.
Data Generation (with strong caveats):
- Synthetic Data Generation: Using LLMs to generate synthetic datasets for research purposes. Carefully consider potential biases and the need for realistic data representation. This was discussed in Veselovsky et al. (2023).
- Generating Training Labels and Examples: Using LLMs to create labeled data for training machine learning models.

Additional Techniques & Considerations (implied or from related work):

Literature Review Support: LLMs can be used for tasks like synthesizing information from multiple papers, identifying research gaps, and organizing literature reviews. Synergi and CHIME were mentioned as tools for literature review support.
Programming Assistance: LLMs can assist with writing and debugging code, generating code documentation, and explaining code functionality. CoPilot is often cited as a tool for code generation.
Peer Review Support: LLMs can be used to generate feedback on research papers, identify potential flaws, and suggest improvements. Marg was mentioned as a tool for multi-agent review generation.
Prompt Engineering: Developing effective prompting strategies is crucial for obtaining high-quality results from LLMs.
Human-AI Collaboration: Treat LLMs as research assistants and develop effective workflows for collaboration.

Inferred Advice for Advanced AI Users using LLMs for Research:

Usage Trends & Opportunities:
- Dominant Use Cases: Leverage LLMs for information seeking (literature review, paper discovery) and editing (grammar, phrasing, rewriting). Consider these as your starting point and the areas with the highest current adoption.
- Emerging Uses: Explore LLMs for idea generation and brainstorming, but carefully validate outputs. These areas are still underutilized and hold potential for novel workflows.
- Less Common but Promising: Experiment with data cleaning and analysis using LLMs, but prioritize verifying results due to potential inaccuracies. This represents an opportunity for advanced users to develop robust LLM-driven analysis workflows.
- Data Generation Potential: Investigate using LLMs for creating synthetic data or generating training labels, but be mindful of potential biases and the need for careful validation. This area could be impactful but requires careful consideration of data quality and ethics.
- Language Equity: If you are a non-native English speaker, exploit LLMs to enhance your writing clarity and fluency. This represents a significant advantage for non-native English speakers to achieve greater visibility and impact in the research community.
Risks and Mitigation Strategies:
- Hallucinations & Misinformation: Fact-check everything. Develop a rigorous validation process for any LLM-generated content, especially when used for data analysis or direct writing.
- Bias Detection & Mitigation: Be critically aware of potential biases in LLM outputs. Use diverse prompts and evaluate results across different models to identify and mitigate potential bias.
- Plagiarism Prevention: Always attribute any LLM-generated text properly. Use plagiarism detection tools and develop clear guidelines for distinguishing your original contributions from LLM-assisted content.
- Maintaining Research Integrity: Avoid over-reliance on LLMs, particularly for core research tasks. Use LLMs as tools to augment your workflow, not replace your critical thinking or domain expertise.
Ethical Considerations & Best Practices:
- Transparency & Disclosure: Clearly disclose LLM usage in your work, specifying the model, prompts, and degree of assistance. This promotes reproducibility and builds trust in your research.
- Data Privacy & Security: Be mindful of data privacy when using LLMs, especially with sensitive information. Consider using privacy-preserving techniques or opting for local LLM deployments when appropriate.
- Copyright & Attribution of Training Data: Acknowledge the ethical implications of using models trained on copyrighted data. Support initiatives that address copyright concerns and promote responsible data usage.
- Source Preference: Favor open-source/non-profit LLMs for increased transparency, control, and alignment with research values. Actively contribute to the development and evaluation of open-source models.
Advanced Strategies & Future Directions:
- Prompt Engineering: Invest in developing advanced prompt engineering skills. This is crucial for obtaining high-quality and relevant outputs from LLMs.
- Fine-tuning & Customization: Explore fine-tuning existing LLMs or training your own models on domain-specific datasets. This can significantly improve performance and address specific research needs.
- Human-AI Collaboration: Develop effective strategies for collaborating with LLMs, treating them as "research assistants" rather than replacements for human researchers.
- Community Engagement: Participate in discussions and contribute to the development of ethical guidelines and best practices for LLM usage in research.

By understanding these key findings and adopting a responsible and ethical approach, you can effectively harness the power of LLMs to become an advanced user and accelerate your research while upholding the integrity of scientific work.