Deep Analysis of the Geothermal Literature Using Natural Language Processing Techniques (2020-2021)
Investigator(s): Jabs Aljubran
With the globally growing volume of geothermal literature, data analysis has become useful to advance professional and academic research and development efforts. Furthermore, it is essential to leverage state-of-the-art algorithms to develop useful tools based on existing databases. This work utilized statistical and deep learning techniques to draw insights based on the geothermal literature. We retrieved papers from the International Geothermal Association (IGA) database using the Stanford University search engine. As of 23 December 2020, we gathered and preprocessed all 18,873 publications archived in this database, where headers included publication title, authors, year, keywords, abstract, language, conference, and session type.
Analysis shows that the three geothermal events with the largest volume of publications historically are the Geothermal Resources Council Transactions, World Geothermal Congress, and Stanford Geothermal Workshop. Using natural language processing (NLP) techniques, we “geoparsed” each abstract to figure out what location in geographical coordinates it is concerned about. This allowed for developing an interactive world heatmap showing the focus of geothermal research efforts historically. Latent Dirichlet Allocation (LDA) was used to cluster the geothermal literature into a total of nine topics. we also developed a geothermal literature intelligent search engine using term frequency -- inverse document frequency (TF-IDF) and cosine similarity. Preprocessing the “authors” data, we developed a coauthorship graphical network encompassing researchers within the geothermal community and reflecting the level of collaboration between them. Finally, a deep learning model was developed to perform text generation and auto-completion using the state-of-the-art generative pretrained transformers (GPT-2) fine-tuned to the geothermal literature. We made these tools available on an open-source application programming interface (API) for public use. You may access this API at http://steaming-geothermal-analytics.info.