This repository presents a deep learning approach to predicting citations of research articles from the Web of Science dataset. The data spans articles published between 2013 and 2021, including their citation counts and various other features.
- Data Preprocessing:
- Cleaned and preprocessed using Python libraries such as NLTK, pandas, numpy, matplotlib, and spacy.
- Feature Engineering:
- Top 10 features selected through three algorithms: Information Gain, Gini Index, and Gain Ratio.
- Models Used:
- Machine learning and deep learning models to predict article citations, achieving 88% accuracy.
Feature selection was done using the following algorithms:
- Information Gain
- Gini Index
- Gain Ratio
The following models were implemented:
- Random Forest
- Gradient Boosting
- Deep Neural Networks
The model achieved an accuracy of 88%, outperforming similar studies in the field.
The dataset used for this project is available upon request. Please contact the author if you would like access.
The dataset and detailed dissertation title will be made available after the related research article is officially published. Stay tuned for updates!
This repository is licensed under the MIT License. See the LICENSE file for details.
- Arslan Tahir - GitHub