Prediction of Wordle Based on Machine Learning
Abstract
For problem one, we derive a dataset to train the LSTM model by sliding window processing, and then predict the number of reported results for March 1, 2023, with a value of roughly 23321. After that we validate and normalize the model, and the computational results showed that our model has little error and strong prediction effect. For the proportion of word attributes on the number of enrollment in diff erent difficulty modes, we counted all combinations of word letters and obtained prob (F) = 0.587 by the linear regression model, which means that the significance of the model is low, indicating that enrollment is not aff ected by word attributes.
For problem two, to predict the distribution of the given word results for one day in the future, considering the circumstance of multiple inputs and multiple outputs, we adopt a regression chain model. Then we train a random forest regression algorithm based on the model, and divide the samples into training and test sets. Finally we derive percentage data for seven attempts: {0.2,3.5,18.3, 31.3,27.5,15.8,2.9}, whose MAPE are within acceptable limits. We then construct a mapping set on the attributes of the given word EERIE and derive the predicted data for the word. What’s more, we compare the result with that obtained from the data processed by the neural network algorithm and fi nd that the model used in the former is better[2].
For problem three, we divide the difficulty into three levels by RSR method, and export the data after the evaluation process. Then we train the data set by three machine learning algorithms, namely, logistic regression, decision tree and XGBOOST, and draw the corresponding learning curves. There are underfi tting and overfi tting phenomena, and the logistic regression model with the best effect among the three still failed to show a better fi t in the test set, with an F-score of 0.5. So we continue to use CNN for its classifi cation prediction, and the fi nal F-score of both training and test sets is about 0.8, which we think is a good effect. Finally, we analyze the difficulty of EERIE by this model, and the difficulty factor we get is 1, which means it’s easy.
For problem four, we present the data in graphical form and analyze its relevant features through correlation and descriptive analysis methods[3].
Keywords
Full Text:
PDFReferences
[1] Haihong Fan Application of SVM classification algorithm based on convolution neural network in image classification [J]. Science and Technology Bulletin, 2022,38 (08): 24- 28.DOI: 10. 13774/j.cnki.kjtb.2022.08.005.
[2] Xiaotong Hu, Chen Cheng. Time series prediction based on multi-dimensional and cross- scale LSTM model [J]. Computer Engineering and Design, 2023,44 (02): 440-446. DOI: 10. 16208/j.issn1000-7024.2023.02.017.
[3] Shishi Dong, Zhexue Huang. Analysis of random forest theory [J]. Integrated Technology, 2013 (1): 1-7.
[4] Lei Liu. Research on Classification of breast cancer Diagnostic Data Based on Logical Regression Algorithm [J]. Software Engineering, 2018,21 (02): 21-23+17. DOI: 10. 19644/j.cnki.issn2096- 1472.2018.02.007.
[5] Xun Wang, Jia Qiao, Yanping Yu. Risk assessment of gas pipeline based on decision tree classification algorithm [J]. Gas and Heat, 2022,42 (10): 41-43+46. DOI:10. 13608/j.cnki.1000-4416.2022.01.015.
[6] Jiqing Yan, Zhiyuan Shen, Jing Lv, et al. Automatic classification of bidding documents based on XGBoost and text focus model [J]. Journal of Wuhan University (Engineering Edition), 2022,55 (03): 310-318. DOI: 10. 14188/j.1671-8844.2022-03-013.
[7] Chicco D, Warrens M J, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation[J]. PeerJ Computer Science, 2021, 7: e623.
DOI: http://dx.doi.org/10.18686/ahe.v7i31.11564
Refbacks
- There are currently no refbacks.