knowmine module

The ” knowmine app”extracts potentially relevant sentences from the collection of scientific articles.Currently a User should provide a path to the collection of texts in pdf format, list of main keywords and connection words for the extraction. It is also possible to choose format of the output file (default=excel file) and define the number of cores to be used for the parallel works (default=2)

knowmine.extract_relevant_sentences(folder_path, main_terms, connection_words, outputfile_format='xls', cores_number=2)
knowmine.get_sentences(file)

knowmine.FilesReader module

The module contains a function accessing the files in a User provided folder and returning the list of file names

knowmine.FilesReader.get_file_names(folder)

knowmine.TextExtractor module

This module contains the TextExtraction class, which allows to extract and clean text from pdf articles

class knowmine.TextExtractor.TextExtraction(filepath)

Bases: object

getText()

Class method extracting texts of articles

knowmine.AllSentencesExtractor module

The module contains SentencesExtraction class which provides functionality to extract sentences from the given articles

class knowmine.AllSentencesExtractor.SentencesExtraction(filetext)

Bases: object

This class provides methods for recognizing single sentences of a given text

get_sentences()

Class method extracting all the sentences of article texts

knowmine.KeywordsExtractor module

The module provides functionality to extract keywords of the sentences. Applies a pke module: https://boudinfl.github.io/pke/build/html/index.html

knowmine.KeywordsExtractor.ExtractKeywords(sentences)

knowmine.RelevantSentencesExtractor module

This module contains a RelevantSentences class, which, from all the articles sentences, extracts only the ones containing the provided main terms and relation words if they are keywords of the sentence

class knowmine.RelevantSentencesExtractor.RelevantSentences(file_name, main_terms, relation_words)

Bases: object

get_relevant_sentences()

Class method extracting relevant sentences of all the sentences of the texts

knowmine.OutputfileGenerator module

This module contains the Output class which helps to generate the output file of the desired format (sqlite database or excel) containing the extracted sentences, number of sentences in the original text (after the cleaning) and number of the extracted sentences The result file is generated in the folder containing the articles for mining.

class knowmine.OutputfileGenerator.Output(folder, output)

Bases: object

add_result_to_database()
add_result_to_excel()