RFC: Scalability Of Current Fake News API
The main motive of this project is to make the Internet a safer and more productive service for users.
ML-based Fake News Model can help users get predictions, where the decision is made by the model. But there also exists a second way to help users decide about the fakeness of the news.
I’ll be addressing two topics -
- Improving current Fake News API.
- Providing more information about the current context, making users get more information, which may help them make a better judgment.
Improving current Fake News API.
Currently, the existing Fake News API, performs stance detection against a lot of posts in the database.
A small code snippet of the above -
As seen above, it loops through every post in the database, and the loop breaks only when a post agrees or disagrees while it continues if the stance is “unrelated or discuss.” This makes it less scalable.
As of now, we can have two ways:-
- Use the input text, perform stance detection but instead of using the same dataset, use the dataset, which is sorted based on similarity(e.g., Cosine Similarity).
The motive for Sorting the data and then performing stance detection: The completely unrelated data may have the least similarity, and the one having the same context will have more similarity. Hence instead of looping on the unsorted dataset, sort the data and then loop. This will indeed reduce the number of iterations to a significant level( maybe we can get an agree/disagree stance with the first data instance too.)
We can break our dataset based on six different POS Tags(namely NNPS, NNP, NNS, NN, FW, and others). The input text can then also be given POS tags. Instead of calculating the similarity of whole data at once, calculate only for having the same POS tags, and then follow method 1. If it is not found, we could then loop over other remaining data fragments in the same way.
Other NLP-based methods can also be an option for detecting Fake News.
- [New Feature] Providing more information about the current context, making users get more information, which may help them make a better judgment.
If we talk about models trained on older datasets, they become of little/no use since the news evolves with time (context changes). They may not be able to achieve good performance. Hence we should inculcate any other solution, not entirely ML-based.
We could think of how instead of predicting Fake / True, we could provide users functionality to navigate and confirm news reliability. This can be showing a popup when the user clicks on check for fake news, and the contents could include the details obtained from the first google search. (Functionality which is similar to news origin.)
I’ll post more ideas in the comment. It’ll be helpful if we could discuss it and help in expanding the scope of this idea.