API Reference¶
-
class
SimpleAudioIndexer.
SimpleAudioIndexer
(src_dir, mode, username_ibm=None, password_ibm=None, ibm_api_limit_bytes=100000000, verbose=False, needed_directories=set(['filtered', 'staging']))¶ Indexes audio and searches for a string within it or matches a regex pattern.
Audio files that are intended to be indexed should be in wav format, placed in a same directory and the absolute path to that directory should be passed as src_dir upon initialization.
Call the method index_audio (which results in calling index_audio_ibm or index_audio_cmu based on the given mode) prior to searching or accessing timestamps, unless you have saved the data for your previously indexed audio (in that case, load_indexed_audio method must be used)
You may see timestamps of the words that have been indexed so far sorted by audio files and the time of their occurance, by calling the method get_audio_timestamps.
You may saved the indexed audio data (which is basically just the time- regularized timestamps) via save_indexed_audio method and load it back via load_indexed_audio
Do exhustive search with the method search_all, do iterative search with the method search_gen or do regex based search with the method search_regexp
For more information see the docs and read usage guide.
Attributes: - mode : {“ibm”, “cmu”}
specifying whether speech to text engine is IBM’s Watson or Pocketsphinx.
- src_dir : str
Absolute path to the source directory of audio files such that the absolute path of the audio that’ll be indexed would be src_dir/audio_file.wav
- verbose : bool, optional
True if progress needs to be printed. Default is False.
- ibm_api_limit_bytes : int, optional
It holds the API limitation of Watson speech api http sessionless which is 100Mbs. Default is 100000000.
Methods
get_mode() get_username_ibm() set_username_ibm() get_password_ibm() set_password_ibm() get_verbosity() set_verbosity() get_timestamps() Returns a corrected dictionary whose key is the original file name and whose value is a list of words and their beginning and ending time. It accounts for large files and does the timing calculations to return the correct result. get_errors() Returns a dictionary that has all the erros that have occured while processing the audio file. Dictionary contains time of error, file that had the error and the actual error. _index_audio_ibm(name=None, continuous=True, model=”en-US_BroadbandModel”, word_confidence=True, word_alternatives_threshold=0.9, profanity_filter_for_US_results=False) Implements a searching-suitable interface for the Watson API _index_audio_cmu(name=None) Implements an experimental interface for the CMu Pocketsphinx index_audio(*args, **kwargs) Returns a corrected dictionary whose key is the original file name and whose value is a list of words and their beginning and ending time. It accounts for large files and does the timing calculations to return the correct result. save_indexed_audio(indexed_audio_file_abs_path) load_indexed_audio(indexed_audio_file_abs_path) search_gen(query, audio_basename=None, case_sensitive=False, subsequence=False, supersequence=False, timing_error=0.0, anagram=False, missing_word_tolerance=0) A generator which returns a valid search result at each iteraiton. search_all(queries, audio_basename=None, case_sensitive=False, subsequence=False, supersequence=False, timing_error=0.0, anagram=False, missing_word_tolerance=0) Returns a dictionary of all results of all of the queries for either all of the audio files or the audio_basename. search_regexp(pattern, audio_basename=None) Returns a dictionary of all results which matched pattern for either all of the audio files or the auio_basename -
get_mode
(self)¶ Returns whether the instance is initialized with ibm or cmu mode.
Returns: - str
-
get_username_ibm
(self)¶ Returns: - str, None
Returns str if mode is ibm, else None
-
set_username_ibm
(self, username_ibm)¶ Parameters: - username_ibm : str
Raises: - Exception
If mode is not ibm
-
get_password_ibm
(self)¶ Returns: - str, None
Returns str if mode is ibm, else None
-
set_password_ibm
(self, password_ibm)¶ Parameters: - password_ibm : str
Raises: - Exception
If mode is not ibm
-
get_verbosity
(self)¶ Returns whether the instance is initialized to be quite or loud while processing audio files.
Returns: - bool
True for being verbose.
-
set_verbosity
(self, pred)¶ Parameters: - pred : bool
-
get_timestamps
(self)¶ Returns a dictionary whose keys are audio file basenames and whose values are a list of word blocks. In case the audio file was large enough to be splitted, it adds seconds to correct timing and in case the timestamp was manually loaded, it leaves it alone.
Returns: - {str: [[str, float, float]]}
-
get_errors
(self)¶ Returns a dictionary containing any errors while processing the audio files. Works for either mode.
Returns: - {(float, str): any}
The return is a dictionary whose keys are tuples whose first elements are the time of the error and whose second values are the audio file’s name. The values of the dictionary are the actual errors.
-
index_audio
(self, *args, **kwargs)¶ Calls the correct indexer function based on the mode.
If mode is ibm, _indexer_audio_ibm is called which is an interface for Watson. Note that some of the explaination of _indexer_audio_ibm’s arguments is from [1]
If mode is cmu, _indexer_audio_cmu is called which is an interface for PocketSphinx Beware that the output would not be sufficiently accurate. Use this only if you don’t want to upload your files to IBM.
Parameters: - mode : {“ibm”, “cmu”}
- basename : str, optional
A specific basename to be indexed and is placed in src_dir e.g audio.wav.
If None is selected, all the valid audio files would be indexed. Default is None.
- replace_already_indexed : bool
True, To reindex some audio file that’s already in the timestamps.
Default is False.
- continuous : bool
Valid Only if mode is ibm
Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned. If true, such phrases are returned; if false (the default), recognition ends after the first end-of-speech (EOS) incident is detected.
Default is True.
- model : {
‘ar-AR_BroadbandModel’, ‘en-UK_BroadbandModel’ ‘en-UK_NarrowbandModel’, ‘en-US_BroadbandModel’, (the default) ‘en-US_NarrowbandModel’, ‘es-ES_BroadbandModel’, ‘es-ES_NarrowbandModel’, ‘fr-FR_BroadbandModel’, ‘ja-JP_BroadbandModel’, ‘ja-JP_NarrowbandModel’, ‘pt-BR_BroadbandModel’, ‘pt-BR_NarrowbandModel’, ‘zh-CN_BroadbandModel’, ‘zh-CN_NarrowbandModel’
}
Valid Only if mode is ibm
The identifier of the model to be used for the recognition
Default is ‘en-US_BroadbandModel’
- word_confidence : bool
Valid Only if mode is ibm
Indicates whether a confidence measure in the range of 0 to 1 is returned for each word.
The default is True. (It’s False in the original)
- word_alternatives_threshold : numeric
Valid Only if mode is ibm
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive.
Default is 0.9.
- profanity_filter_for_US_results : bool
Valid Only if mode is ibm
Indicates whether profanity filtering is performed on the transcript. If true, the service filters profanity from all output by replacing inappropriate words with a series of asterisks.
If false, the service returns results with no censoring. Applies to US English transcription only.
Default is False.
Raises: - OSError
Valid only if mode is cmu.
If the output of pocketsphinx command results in an error.
References
[1] : https://ibm.com/watson/developercloud/speech-to-text/api/v1/ Else if mode is cmu, then _index_audio_cmu would be called:
-
save_indexed_audio
(self, indexed_audio_file_abs_path)¶ Writes the corrected timestamps to a file. Timestamps are a python dictionary.
Parameters: - indexed_audio_file_abs_path : str
-
load_indexed_audio
(self, indexed_audio_file_abs_path)¶ Parameters: - indexed_audio_file_abs_path : str
-
search_gen
(self, query, audio_basename=None, case_sensitive=False, subsequence=False, supersequence=False, timing_error=0.0, anagram=False, missing_word_tolerance=0)¶ A generator that searches for the query within the audiofiles of the src_dir.
Parameters: - query : str
A string that’ll be searched. It’ll be splitted on spaces and then each word gets sequentially searched.
- audio_basename : str, optional
Search only within the given audio_basename.
Default is None
- case_sensitive : bool, optional
Default is False
- subsequence : bool, optional
True if it’s not needed for the exact word be detected and larger strings that contain the given one are fine.
If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.
Default is False.
- supersequence : bool, optional
True if it’s not needed for the exact word be detected and smaller strings that are contained within the given one are fine.
If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.
Default is False.
- anagram : bool, optional
True if it’s acceptable for a complete permutation of the word to be found. e.g. “abcde” would be acceptable for “edbac”.
If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.
Default is False.
- timing_error : None or float, optional
Sometimes other words (almost always very small) would be detected between the words of the query. This parameter defines the timing difference/tolerance of the search.
Default is 0.0 i.e. No timing error is tolerated.
- missing_word_tolerance : int, optional
The number of words that can be missed within the result. For example, if the query is “Some random text” and the tolerance value is 1, then “Some text” would be a valid response. Note that the first and last words cannot be missed. Also, there’ll be an error if the value is more than the number of available words. For the example above, any value more than 1 would have given an error (since there’s only one word i.e. “random” that can be missed)
Default is 0.
Yields: - {“File Name”: str, “Query”: query, “Result”: (float, float)}
The result of the search is returned as a tuple which is the value of the “Result” key. The first element of the tuple is the starting second of query and the last element is the ending second of query
Raises: - AssertionError
If missing_word_tolerance value is more than the total number of words in the query minus 2 (since the first and the last word cannot be removed)
-
search_all
(self, queries, audio_basename=None, case_sensitive=False, subsequence=False, supersequence=False, timing_error=0.0, anagram=False, missing_word_tolerance=0)¶ Returns a dictionary of all results of all of the queries for all of the audio files. All the specified parameters work per query.
Parameters: - queries : [str] or str
A list of the strings that’ll be searched. If type of queries is str, it’ll be insterted into a list within the body of the method.
- audio_basename : str, optional
Search only within the given audio_basename.
Default is None.
- case_sensitive : bool
Default is False
- subsequence : bool, optional
True if it’s not needed for the exact word be detected and larger strings that contain the given one are fine.
If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.
Default is False.
- supersequence : bool, optional
True if it’s not needed for the exact word be detected and smaller strings that are contained within the given one are fine.
If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.
Default is False.
- anagram : bool, optional
True if it’s acceptable for a complete permutation of the word to be found. e.g. “abcde” would be acceptable for “edbac”.
If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.
Default is False.
- timing_error : None or float, optional
Sometimes other words (almost always very small) would be detected between the words of the query. This parameter defines the timing difference/tolerance of the search.
Default is 0.0 i.e. No timing error is tolerated.
- missing_word_tolerance : int, optional
The number of words that can be missed within the result. For example, if the query is “Some random text” and the tolerance value is 1, then “Some text” would be a valid response. Note that the first and last words cannot be missed. Also, there’ll be an error if the value is more than the number of available words. For the example above, any value more than 1 would have given an error (since there’s only one word i.e. “random” that can be missed)
Default is 0.
Returns: - search_results : {str: {str: [(float, float)]}}
A dictionary whose keys are queries and whose values are dictionaries whose keys are all the audiofiles in which the query is present and whose values are a list whose elements are 2-tuples whose first element is the starting second of the query and whose values are the ending second. e.g. {“apple”: {“fruits.wav” : [(1.1, 1.12)]}}
Raises: - TypeError
if queries is neither a list nor a str
-
search_regexp
(self, pattern, audio_basename=None)¶ First joins the words of the word_blocks of timestamps with space, per audio_basename. Then matches pattern and calculates the index of the word_block where the first and last word of the matched result appears in. Then presents the output like search_all method.
Note that the leading and trailing spaces from the matched results would be removed while determining which word_block they belong to.
Parameters: - pattern : str
A regex pattern.
- audio_basename : str, optional
Search only within the given audio_basename.
Default is False.
Returns: - search_results : {str: {str: [(float, float)]}}
A dictionary whose keys are queries and whose values are dictionaries whose keys are all the audiofiles in which the query is present and whose values are a list whose elements are 2-tuples whose first element is the starting second of the query and whose values are the ending second. e.g. {“apple”: {“fruits.wav” : [(1.1, 1.12)]}}