Finding a string within a books in a specific folder
Printing a string on it’s sentence within a thousand of books or whatever in Folder
Imagine you have a folder somewhere in your computer, in that
folder, you have thousands of books range in txt file or docx file. One great
day, I have a research on specific topic and would like to find for
example COVID or Corona in those books.
It’s just an idea, to see if there were any kind of such word mentioned somewhere
before. What our code is going to do here , just tell python to go to specific
folder , open it, and open every book over there and locate any word that equal to ‘Corona’, print it in its sentence,
also print the name of the book of that ‘Corona’ word appears.
>>> import os
>>> os.chdir('') # between that bracket you can specify the source of your book folder. If you keep it empty there will occur an error
Else, you can just specify your path or source of bookstore
>>> bookstore = ‘ ‘
>>> def finding_string_in_bookstore(x):
for root, dirs, files in
os.walk(bookstore):
for filename in
files:
if
filename.endswith('.txt'):
with
open(os.path.join(root, filename), 'r', encoding = 'utf-8') as f:
text =
f.read()
tx = f.readline()
sp_text =
text.split() # split every text book
sent_text
= text.splitlines() # splitting sentences or sentence tokenizer
for w in sent_text:
if x
in w:
print(w.partition(x), filename, file = open('result_of_my_research_in_book_store.txt', 'a+', encoding = 'utf-8'))
Printing the output into a specific text file and reading it there is better than running it in shell, it can be too much.
In that code above, what if you wanted to see the meeting of two
string together in the same sentence within those thousands of books in your
bookstore.
Let say, we want to see the meeting of corona and virus or corona
and disease within those books.
We change the line of the code:
>>> if x in w: by replacing it with if ‘ corona ‘ in w and
‘ diseases ‘ in w: or if ‘ corona ‘ in w
and ‘ virus ‘ in w.
It’s important to keep the distance in the string to avoid an unexpected output. Ex, looking for corona and getting coronate or corona….and something.
In my next blog, I will show my experience on sentence analyzing
using numpy and pandas packages.
Comments
Post a Comment