GotFunding: A grant recommendation system based on scientific articles

Tong Zeng; Daniel E Acuna

doi:10.1002/pra2.323

Abstract

Obtaining funding is an important part of becoming a successful scientist. Junior faculty spend a great deal of time finding the right agencies and programs that best match their research profile. But what are the factors that influence the best publication–grant matching? Some universities might employ preaward personnel to understand these factors, but not all institutions can afford to hire them. Historical records of publications funded by grants can help us understand the matching process and also help us develop recommendation systems to automate it. In this work, we present GOTFUNDING (Grant recOmmendaTion based on past FUNDING), a recommendation system trained on National Institutes of Healthʼs (NIH) grant–publication records. Our system achieves a high performance (NDCG@1 = 0.945) by casting the problem as learning to rank. By analyzing the features that make predictions effective, our results show that the ranking considers most important (a) the year difference between publication and grant, (b) the amount of information provided in the publication, and (c) the relevance of the publication to the grant. We further discuss the implications and future improvements to this work.

Download PDF Visit Published DOI

1 INTRODUCTION

The ability of scientists to fund themselves plays an important role in a scientistʼs career, sometimes propelling their productivity (Jacob & Lefgren, 2011). Scientists, thus, spend an enormous amount of time finding the right opportunities, writing proposals, and waiting for funding decisions (Herbert, Barnett, Clarke, & Graves, 2013). Past researchers have estimated that the opportunity costs in searching and preparing a grant might not be worth it (Gross & Bergstrom, 2019). Some solutions to this problem include lowering the criteria for junior faculty (Van den Besselaar & Sandstrom, 2015), awarding grants with a lottery (Gross & Bergstrom, 2019), or instituting peer-funding mechanisms (Bollen, Crandall, Junk, Ding, & Börner, 2014). Here we explore yet another alternative that instead uses machine learning to suggest the best-matching grant for a scientist based on her publications. We show that we can cast the problem as a recommendation system trained on historical grant–publication data.

There are competing factors involved in finding the right grant. Scientists need to consider funding agencies (e.g., NSF or NIH), career stages (e.g., junior-oriented or senior/leader-oriented), award amounts (e.g., small NSF grant vs large DARPA grant), funding lengths (e.g., 1-year EAGER NSF grant or 5-year CAREER NSF grant), and call relevance (e.g., a particular program within NSF or institute in NIH) (Li & Marrongelle, 2012). Thousands of grant opportunities might be available at any given time, offering hundreds of millions of dollars combined (Boroush, 2016). These opportunities also have ramifications far beyond the receiptʼs career (Lane, 2009). It is therefore hard to navigate these funding opportunities.

While submitting a grant is time-consuming and has low probability of success (Gross and Bergstrom (2019); Bollen et al. (2014)), these low probabilities might be related to a mismatch between the grant submitted and the agency that receives it (Crow, 2020). Rather than changing the preparation and review process, we could improve the quality of the matching between scientists and opportunities. Recommendation systems are a natural way of improving how scientists find relevant information such as publications (e.g., Achakulvisut, Acuna, Ruangrong, and Kording (2016)). A similar process could be applied to grant recommendation systems. Some systems exist (e.g., Elsevierʼs Mendeley Funding) but they are closed source and difficult to evaluate.

In this publication, we propose to use historical data of past publication–grant relationships from NIH. We cast the problem as a learning-to-rank recommendation system and show that it can achieve high performance on validation (NDCG@1 = 0.945). We further explore the factors that maximize the quality of the match and describe potential improvements in the future.

2 Datasets

2.1 Federal RePORTER

Federal RePORTER is an open and automated data infrastructure that collects data on federally funded research projects and its outcomes. The federal RePORTER includes 1.15 million projects from 2000 to 2019, and involving 18 agencies. Among all the agencies, NIH accounts for 77.3% of the projects. Thus, we focus only on NIH projects. Each of the NIH projects contains a list of the publications acknowledging the grant.

2.2 PubMed

PubMed is a search engine and publication repository maintained by the United States National Library of Medicine (NLM). It provides access to over 30 million publications in biomedical and health science. We use PubMed to retrieve the publication abstract.

3 EXPERIMENTS AND RESULT

Please read the paper for details.

4 DISCUSSION AND CONCLUSION

In our work, we aim at improving how scientists can find relevant grants based on their research interests. We propose to solve this problem by building a recommendation system that learns from historical publication–grant relationships. Our results show that we can achieve a high performance of NDCG@1 = 0.975. Further, we explorer various factors that affect the matching between a publication and grant.

One of the limitations of our work is that we only look at grants that were funded in the past. Funding mechanisms might be changing over time and publication topics might also change over time. This means that there is no guarantee that a correct prediction will actually yield a successful match for a future grant. Recommendation systems however benefit from large amounts of data and unless we are able to interview and ask scientists about their opinion on publication to grant matching, it is hard to build a recommendation system otherwise. Finally, even if our recommendations are off by topic, they can still serve as a narrowing step during the initial stages of searchers.

Another limitation of our work is that we are using publications already funded by grants. However, our recommendation is trying to solve the opposite problem whereas a scientist wants to find a publication that can initiate funding. Research is still unclear on whether funding changes the direction of research but even if it does, our recommendation could be useful to discover people that have worked on similar problems. We hope to obtain new data in the future about grants that were not funded because they did not meet the criteria of a certain problem. Thus, with data that is available but potentially harder to obtain, some of these issues could be solved.

Our recommendation system is one of the first ones that offer scientists the ability to match their research to past grants. We think this research direction will benefit specially those who are starting in their career and might not have the human capital to help in finding relevant funding opportunities.

Citation

BibTeX citation:

@inproceedings{zeng2020,
  author = {Zeng, Tong and E Acuna, Daniel},
  title = {GotFunding: {A} Grant Recommendation System Based on
    Scientific Articles},
  booktitle = {Proceedings of the Association for Information Science
    and Technology},
  volume = {57},
  number = {1},
  pages = {e323},
  date = {2020-10-22},
  url = {https://asistdl.onlinelibrary.wiley.com/doi/full/10.1002/pra2.323},
  doi = {10.1002/pra2.323},
  langid = {en},
  abstract = {Obtaining funding is an important part of becoming a
    successful scientist. Junior faculty spend a great deal of time
    finding the right agencies and programs that best match their
    research profile. But what are the factors that influence the best
    publication–grant matching? Some universities might employ preaward
    personnel to understand these factors, but not all institutions can
    afford to hire them. Historical records of publications funded by
    grants can help us understand the matching process and also help us
    develop recommendation systems to automate it. In this work, we
    present GOTFUNDING (Grant recOmmendaTion based on past FUNDING), a
    recommendation system trained on National Institutes of Healthʼs
    (NIH) grant–publication records. Our system achieves a high
    performance (NDCG@1 = 0.945) by casting the problem as learning to
    rank. By analyzing the features that make predictions effective, our
    results show that the ranking considers most important (a) the year
    difference between publication and grant, (b) the amount of
    information provided in the publication, and (c) the relevance of
    the publication to the grant. We further discuss the implications
    and future improvements to this work.}
}

For attribution, please cite this work as:

Zeng, Tong, and Daniel E Acuna. 2020. “GotFunding: A Grant Recommendation System Based on Scientific Articles .” In Proceedings of the Association for Information Science and Technology, 57:e323. https://doi.org/10.1002/pra2.323.