Extract frequencies of a specific string or word on every chapter in one book or word within different books

Extract frequencies of a specific string or word on every chapter in one book or word within different books


Surprisingly I decided to keep analyzing two different sentences in a one text or different texts or whatever sources books etc… for tomorrow.

         Today as it might to be 23 Mai 2020 on COVID-19 period somewhere downtown in Senegal, I will try to show you how to extract the frequencies of a specific strings or words in every chapter. Assume that you have almost thousands of books there in a folder called for example books, and I would like to know the frequency of different drives of verb think or whatever in all those books. We can assume that those books are mostly talking about cars business. Let’s stigmatize the python here to get some reflects, here we go for the animation, I better call it luring python:

 >>> source = ‘ ‘  # I can also call it as usually path = ‘ ‘ in the case I’m working in different path

>>> import os

I have a columns index from ibmauto and will use them as my variable, let’s call it:

columns = ['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration',

       'num-of-doors', 'body-style', 'drive-wheels', 'engine-location',

       'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type',

       'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke',

       'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg',

       'highway-mpg', 'price']

>>> def frequencies_of_string_in_the_bookstore(var):

    for root, dirs, files in os.walk(source):

        for filename in files:

            if filename.endswith('.txt'):

                with open(os.path.join(root, filename), 'r', encoding = 'utf-8') as f:

                    text = f.read()

                    tx = f.readline()

                    sptext = text.split()

                    linetext = text.splitlines()

                    for k in columns:

                        for w in linetext:

                            if k in w:

                                print(len(k), k, filename, file = open('frequency of every string in columns.txt', 'a+', encoding = 'utf-8'))

 

Fantastic, when you run the code by putting columns as your search data, you will get  all there frequencies.

 

I will experience my code here in Coran book which is Arabic language. I have a variable here called sarata or صرط as on my first blog and want to see there frequencies in the book, here we go:

>>> def frequencies_of_string_in_the_bookstore (var):

    for root, dirs, files in os.walk(source):

        for filename in files:

            if filename.endswith('.txt'):

                with open(os.path.join(root, filename), 'r', encoding = 'utf-8') as f:

                    text = f.read()

                    tx = f.readline()

                    sptext = text.split()

                    linetext = text.splitlines()

                    for k,v in var:

                        for w in linetext:

                            if k in w:

                                print(len(k), k, filename))

 

 

 

6  صراط  آل عمران.txt

6  صراط  ابراهيم.txt

6  صراط  الأعراف.txt

7  صراطك  الأعراف.txt

6  صراط  الأنعام.txt

7  صراطي  الأنعام.txt

6  صراط  البقرة.txt

6  صراط  الحج.txt

6  صراط  الحجر.txt

6  صراط  الزخرف.txt

6  صراط  الشوري.txt

6  صراط  الصافات.txt

8  الصراط  الصافات.txt

6  صراط  الفاتحة.txt

8  الصراط  الفاتحة.txt

7  صراطا  الفتح.txt

6  صراط  المؤمنون.txt

8  الصراط  المؤمنون.txt

6  صراط  المائدة.txt

6  صراط  الملك.txt

6  صراط  النحل.txt

7  صراطا  النساء.txt

6  صراط  النور.txt

6  صراط  سبإ.txt

8  الصراط  طه.txt

6  صراط  مريم.txt

7  صراطا  مريم.txt

6  صراط  هود.txt

6  صراط  يــس.txt

8  الصراط  يــس.txt

6  صراط  يونس.txt


Comments

Popular posts from this blog

Farming land size owned by households per region in Senegal