Extract frequencies of a specific string or word on every chapter in one book or word within different books
Extract frequencies of a specific string or word on every chapter
in one book or word within different books
Surprisingly I decided to keep analyzing two different sentences in
a one text or different texts or whatever sources books etc… for tomorrow.
Today as it might to
be 23 Mai 2020 on COVID-19 period somewhere downtown in Senegal, I will try to
show you how to extract the frequencies of a specific strings or words in every
chapter. Assume that you have almost thousands of books there in a folder called
for example books, and I would like to know the frequency of different drives
of verb think or whatever in all those books. We can assume that those books
are mostly talking about cars business. Let’s stigmatize the python here to get
some reflects, here we go for the animation, I better call it luring python:
>>> import os
I have a columns index from ibmauto and will use them as my
variable, let’s call it:
columns = ['symboling', 'normalized-losses', 'make', 'fuel-type',
'aspiration',
'num-of-doors',
'body-style', 'drive-wheels', 'engine-location',
'wheel-base',
'length', 'width', 'height', 'curb-weight', 'engine-type',
'num-of-cylinders',
'engine-size', 'fuel-system', 'bore', 'stroke',
'compression-ratio',
'horsepower', 'peak-rpm', 'city-mpg',
'highway-mpg',
'price']
>>> def frequencies_of_string_in_the_bookstore(var):
for root, dirs, files in
os.walk(source):
for filename in
files:
if
filename.endswith('.txt'):
with
open(os.path.join(root, filename), 'r', encoding = 'utf-8') as f:
text =
f.read()
tx =
f.readline()
sptext =
text.split()
linetext
= text.splitlines()
for k in
columns:
for w
in linetext:
if k in w:
print(len(k), k, filename, file = open('frequency of every string in
columns.txt', 'a+', encoding = 'utf-8'))
Fantastic, when you run the code by putting columns as your search
data, you will get all there
frequencies.
I will experience my code here in Coran book which is Arabic
language. I have a variable here called sarata or صرط
as on my first blog and want to see there frequencies in the book, here we go:
>>> def frequencies_of_string_in_the_bookstore (var):
for root, dirs, files in os.walk(source):
for filename in files:
if filename.endswith('.txt'):
with open(os.path.join(root,
filename), 'r', encoding = 'utf-8') as f:
text = f.read()
tx = f.readline()
sptext = text.split()
linetext =
text.splitlines()
for k,v in var:
for w in linetext:
if k in w:
print(len(k),
k, filename))
6 صراط آل عمران.txt
6 صراط ابراهيم.txt
6 صراط الأعراف.txt
7 صراطك الأعراف.txt
6 صراط الأنعام.txt
7 صراطي الأنعام.txt
6 صراط البقرة.txt
6 صراط الحج.txt
6 صراط الحجر.txt
6 صراط الزخرف.txt
6 صراط الشوري.txt
6 صراط الصافات.txt
8 الصراط الصافات.txt
6 صراط الفاتحة.txt
8 الصراط الفاتحة.txt
7 صراطا الفتح.txt
6 صراط المؤمنون.txt
8 الصراط المؤمنون.txt
6 صراط المائدة.txt
6 صراط الملك.txt
6 صراط النحل.txt
7 صراطا النساء.txt
6 صراط النور.txt
6 صراط سبإ.txt
8 الصراط طه.txt
6 صراط مريم.txt
7 صراطا مريم.txt
6 صراط هود.txt
6 صراط يــس.txt
8 الصراط يــس.txt
6 صراط يونس.txt
Comments
Post a Comment