knowmine module¶
The ” knowmine app”extracts potentially relevant sentences from the collection of scientific articles.Currently a User should provide a path to the collection of texts in pdf format, list of main keywords and connection words for the extraction. It is also possible to choose format of the output file (default=excel file) and define the number of cores to be used for the parallel works (default=2)
-
knowmine.
extract_relevant_sentences
(folder_path, main_terms, connection_words, outputfile_format='xls', cores_number=2)¶
-
knowmine.
get_sentences
(file)¶
knowmine.FilesReader module¶
The module contains a function accessing the files in a User provided folder and returning the list of file names
-
knowmine.FilesReader.
get_file_names
(folder)¶
knowmine.TextExtractor module¶
This module contains the TextExtraction class, which allows to extract and clean text from pdf articles
knowmine.AllSentencesExtractor module¶
The module contains SentencesExtraction class which provides functionality to extract sentences from the given articles
knowmine.KeywordsExtractor module¶
The module provides functionality to extract keywords of the sentences. Applies a pke module: https://boudinfl.github.io/pke/build/html/index.html
-
knowmine.KeywordsExtractor.
ExtractKeywords
(sentences)¶
knowmine.RelevantSentencesExtractor module¶
This module contains a RelevantSentences class, which, from all the articles sentences, extracts only the ones containing the provided main terms and relation words if they are keywords of the sentence
knowmine.OutputfileGenerator module¶
This module contains the Output class which helps to generate the output file of the desired format (sqlite database or excel) containing the extracted sentences, number of sentences in the original text (after the cleaning) and number of the extracted sentences The result file is generated in the folder containing the articles for mining.