se find attached the input file([url removed, login to view]) for all the sub-questions for the given question.
Also, please make sure the code is putty executable.
1. Write a python program to process an input file called [url removed, login to view], which is attached in the dropbox. You can use NLTK data and functions for this assignments.
a Calculate the frequency distribution of the words in the file. Plot a histogram for the top 20 most common used words in the text file
b Open the file and read the contents to generate an output file which contains the lemmatized words of the original content. The output file must be created by replacing all the words with lemmatized words.
c Tokenize the text file into sentences and calculate the number of words in each sentence, and its entropy. Output the results in the following format.
Sentence (The first 5 words with …) #of Words Entropy
At this second appearing to … 6 7.233
d For the text file [url removed, login to view], conduct part-of-speech tagging using one of the taggers and one of the tagged corpus in the NLTK toolkit. The program outputs the tagged text into a text file and named it by adding “tagged_” before the original text filename.
7 freelancers estão ofertando em média $27 para esse trabalho
I'm use NLTK in scope of Question Answering System. therefore, i have a good understanding of Natural Language Processing Tools and i think, i can help you.
This task will be a cakewalk for me as I continuously do these types of works. Also, I can get you python script with good comments which will make you understand how everything works.