有一种方法是预测每个语音段的边界,然后对语音段进行分类。但是如果我们错过了一个片段,那么这个错误将会使整个片段产生错误。想要解决这题我们可以使用GMM smooth,音频检测器生成时间范围片段和每个片段的标签。GMM smooth的输入数据是这些段,它可以帮助我们来降低最终预测中的噪声。
基于 GMM 的平滑器
我们的目标是解决时间概念定位问题,比如输出如下所示:[[StartTime1, EndTime1, Class1], [StartTime2, EndTime2, Class2], …]。 如果我们想直观地展示一下,可以像下图这样:
from copy import deepcopy
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.mixture import GaussianMixture
import logging
logger = logging.getLogger()
class GMMSmoother:
This class is the main class of the Smoother. It performs a smoothing to joint segments
def __init__(self, min_samples=10):
# The minimum number of samples for applying GMM
self.min_samples = min_samples
# Logger instance
self.logger = logger
def smooth_segments_gmm(self, segments, gmm_segment_class='background', bg_segment_class='foreground'):
This method performs the smoothing using Gaussian Mixture Model (GMM) (for more information about GMM
please visit: https://scikit-learn.org/stable/modules/mixture.html). It calculates two GMMs: first with one
gaussian component and the second with two components. Then, it selects the best model using AIC, and BIC metrics.
After we choose the best model, we perform a clustering of tew clusters: real or fake
Please note that the GMMs don't use the first and last segments because in our case
the stream's time limit is an hour and we don't have complete statistics on
the lengths of the first and last segments.
:param segments: a list of dictionaries, each dict represents a segment
:param gmm_segment_class: the segment class of the "reals"
:param bg_segment_class: the segment class of the "fakes"
segments_copy: the smoothed version of segments
self.logger.info("Begin smoothing using Gaussian Mixture Model (GMM)")
# Some instancing
preds_map = {0: bg_segment_class, 1: gmm_segment_class}
gmms_results_dict = {}
# Copy segments to a new variable
segments_copy = deepcopy(segments)
self.logger.info("Create input data for GMM")
# Keep the gmm_segment_class data points and perform GMM on them.
# For example: gmm_segment_class = 'background'
segments_filtered = {i: s for i, s in enumerate(segments_copy) if
s['segment'] == gmm_segment_class and (i > 0 and i < len(segments_copy) - 1)}
# Calcualte the length of each segment
X = np.array([[(s['endTime'] - s['startTime']).total_seconds()] for _, s in segments_filtered.items()])
# Check if the length of data points is less than the minimum.
# If it is, do not apply GMM!
if len(X) <= self.min_samples:
self.logger.warning(f"Size of input ({len(X)} smaller than min simples ({self.min_samples}). Do not perform smoothing.)")
return segments
# Go over 1 and 2 components and calculate statistics
best_fitting_score = np.Inf
self.logger.info("Begin to fit GMMs with 1 and 2 components.")
for i in [1, 2]:
# For each number of component (1 or 2), fit GMM
gmm = GaussianMixture(n_components=i, random_state=0, tol=10 ** -6).fit(X)
# Calculate AIC and BIC and the average between them
aic, bic = gmm.aic(X), gmm.bic(X)
fitting_score = (aic + bic) / 2
# If the average is less than the best score, replace them
if fitting_score < best_fitting_score:
best_model = gmm
best_fitting_score = fitting_score
gmms_results_dict[i] = {"model": gmm, "fitting_score": fitting_score, "aic": aic, "bic": bic}
self.logger.info(f"GMM with {best_model.n_components} components was selected")
# If the number of components is 1, change the label to the points that
# have distance from the mean that is bigger than 2*STD
if best_model.n_components == 1:
mean = best_model.means_[0, 0]
std = np.sqrt(best_model.covariances_[0, 0])
model_preds = [0 if x < mean - 2 * std else 1 for x in range(len(X))]
# If the number of components is 2, assign a label to each data point,
# and replace the label to the points that assigned to the low mean Gaussian
if np.linalg.norm(best_model.means_[0]) > np.linalg.norm(best_model.means_[1]):
preds_map = {1: bg_segment_class, 0: gmm_segment_class}
model_preds = best_model.predict(X)
self.logger.info("Replace previous predictions with GMM predictions")
# Perform smoothing
for i, (k, s) in enumerate(segments_filtered.items()):
if s['segment'] != preds_map[model_preds[i]]:
s['segment'] = preds_map[model_preds[i]]
segments_copy[k] = s
self.logger.info("Merge segments")
# Join consecutive segments after the processing
segments_copy = join_consecutive_segments(segments_copy)
return segments_copy
def plot_bars(res_dict_objs, color_dict={"foreground": "#DADDFC", "background": '#FC997C', "null": "#808080"}, channel="",
start_time="", end_time="", snrs=None, titles=['orig', 'smoothed'],
save=False, save_path="", show=True):
Inspired by https://stackoverflow.com/questions/70142098/stacked-horizontal-bar-showing-datetime-areas
This function is for visualizing the smoothing results
of multiple segments' lists
:param res_dict_objs: a list of lists. Each list is a segments list to plot
:param color_dict: dictionary which represents the mapping between class to color in the plot
:param channel: channel number
:param start_time: absolute start time
:param end_time: absolute end time
:param snrs: list of snrs to display in the title
:param titles: title to each subplot
:param save: flag to save the figure into a png file
:param save_path: save path of the figure
:param show: flag to show the figure
if snrs == None:
snrs = [''] * len(res_dict_objs)
if type(res_dict_objs) != list:
res_dict_objs = [res_dict_objs]
fig, ax = plt.subplots(len(res_dict_objs), 1, figsize=(20, 10))
fig.suptitle(f"Channel {channel}, {start_time}-{end_time}\n{snrs[0]}\n{snrs[1]}")
for dict_idx, res_dict in enumerate(res_dict_objs):
date_from = [a['startTime'] for a in res_dict]
date_to = [a['endTime'] for a in res_dict]
segment = [a['segment'] for a in res_dict]
df = pd.DataFrame({'date_from': date_from, 'date_to': date_to,
'segment': segment})
for i in range(df.shape[0]):
ax[dict_idx].plot([df['date_from'][i], df['date_to'][i]], [1, 1],
linewidth=50, c=color_dict[df['segment'][i]])
if show:
if save:
def join_consecutive_segments(seg_list):
This function is merged consecutive segments if they
have the same segment class and create one segment. It also changes the
start and the end times with respect to the joined segments
:param seg_list: a list of dictionaries. Each dict represents a segment
:return: joined_segments: a list of dictionaries, where the segments are merged
joined_segments = list()
init_seg = {
'startTime': seg_list[0]['startTime'],
'endTime': seg_list[0]['endTime'],
'segment': seg_list[0]['segment']
collector = init_seg
last_segment = init_seg
last_segment = last_segment['segment']
for seg in seg_list:
segment = seg['segment']
start_dt = seg['startTime']
end_dt = seg['endTime']
prefiltered_type = segment
if prefiltered_type == last_segment:
collector['endTime'] = end_dt
init_seg = {
'startTime': start_dt,
'endTime': end_dt,
'segment': prefiltered_type
collector = init_seg
last_segment = init_seg
last_segment = last_segment['segment']
return joined_segments
def main(seg_list):
# Create GMMSmoother instance
gmm_smoother = GMMSmoother()
# Join consecutive segments that have the same segment label
seg_list_joined = join_consecutive_segments(seg_list)
# Perform smoothing on background class
smoothed_segs_tmp = gmm_smoother.smooth_segments_gmm(seg_list_joined)
# Perform smoothing on foreground class
smoothed_segs_final = gmm_smoother.smooth_segments_gmm(smoothed_segs_tmp, gmm_segment_class='foreground', bg_segment_class='background') if len(
smoothed_segs_tmp) != len(seg_list_joined) else smoothed_segs_tmp
return smoothed_segs_final
if __name__ == "__main__":
# The read_data_func should be implemented by the user,
# depending on his needs.
seg_list = read_data_func()
res = main(seg_list)
数据结构是一个字典列表。每个字典代表一个段预测,具有以下键值对: “startTime”,“endTime”和“segment”。下面是一个例子:
{"startTime": ISODate("%Y-%m-%dT%H:%M:%S%z"), "endTime": ISODate("%Y-%m-%dT%H:%M:%S%z"), "segment": "background/foreground"}
# Input segments list
seg_list = [{"startTime": ISODate("2022-11-19T00:00:00Z"), "endTime": ISODate("2022-11-19T01:00:00Z"), "segment": "background"},
{"startTime": ISODate("2022-11-19T01:00:00Z"), "endTime": ISODate("2022-11-19T02:00:00Z"), "segment": "background"}]
# Apply join_consecutive_segments on seg_list to join consecutive segments
seg_list_joined = join_consecutive_segments(seg_list)
# After applying the function, the new list should look like the following:
# seg_list_joined = [{"startTime": ISODate("2022-11-19T00:00:00Z"), "endTime": ISODate("2022-11-19T02:00:00Z"), "segment": "background"}]
def join_consecutive_segments(seg_list):
This function is merged consecutive segments if they
have the same segment class and create one segment. It also changes the
start and the end times with respect to the joined segments
:param seg_list: a list of dictionaries. Each dict represents a segment
:return: joined_segments: a list of dictionaries, where the segments are merged
joined_segments = list()
init_seg = {
'startTime': seg_list[0]['startTime'],
'endTime': seg_list[0]['endTime'],
'segment': seg_list[0]['segment']
collector = init_seg
last_segment = init_seg
last_segment = last_segment['segment']
for seg in seg_list:
segment = seg['segment']
start_dt = seg['startTime']
end_dt = seg['endTime']
prefiltered_type = segment
if prefiltered_type == last_segment:
collector['endTime'] = end_dt
init_seg = {
'startTime': start_dt,
'endTime': end_dt,
'segment': prefiltered_type
collector = init_seg
last_segment = init_seg
last_segment = last_segment['segment']
return joined_segments
# Copy segments to a new variable
segments_copy = deepcopy(segments)
# Keep the gmm_segment_class data points and perform GMM on them.
# For example: gmm_segment_class = 'background'
segments_filtered = {i: s for i, s in enumerate(segments_copy) if s['segment'] == gmm_segment_class and (i > 0 and i < len(segments_copy) - 1)}
# Calcualte the length of each segment
X = np.array([[(s['endTime'] - s['startTime']).total_seconds()] for _, s in segments_filtered.items()])
仅获取背景片段的长度并将 GMM 应用于长度数据。 如果有足够的数据点(预定义数量——超参数),我们这里使用两个GMM:一个分量模型和两个分量模型。 然后使用贝叶斯信息准则 (BIC) 和 Akaike 信息准则 (AIC) 之间的平均值来选择最适合的 GMM。
# Check if the length of data points is less than the minimum.
# If it is, do not apply GMM!
if len(X) <= self.min_samples:
self.logger.warning(f"Size of input ({len(X)} smaller than min simples ({self.min_samples}). Do not perform smoothing.)")
return segments
# Go over 1 and 2 number of components and calculate statistics
best_fitting_score = np.Inf
self.logger.info("Begin to fit GMMs with 1 and 2 components.")
for i in range(1, 3):
# For each number of component (1 or 2), fit GMM
gmm = GaussianMixture(n_components=i, random_state=0, tol=10 ** -6).fit(X)
# Calculate AIC and BIC and the average between them
aic, bic = gmm.aic(X), gmm.bic(X)
fitting_score = (aic + bic) / 2
# If the average is less than the best score, replace them
if fitting_score < best_fitting_score:
best_model = gmm
best_fitting_score = fitting_score
gmms_results_dict[i] = {"model": gmm, "fitting_score": fitting_score, "aic": aic, "bic": bic}
如果选择了一个分量:将距离均值大于 2-STD 的数据点标记为前景,其余数据点保留为背景点。
# If the number of components is 1, change the label to the points that
# have distance from the mean that is bigger than 2*STD
if best_model.n_components == 1:
mean = best_model.means_[0, 0]
std = np.sqrt(best_model.covariances_[0, 0])
model_preds = [0 if x < mean - 2 * std else 1 for x in range(len(X))]
# If the number of components is 2, assign a label to each data point,
# and replace the label to the points that assigned to the low mean Gaussian
if np.linalg.norm(best_model.means_[0]) > np.linalg.norm(best_model.means_[1]):
preds_map = {1: bg_segment_class, 0: gmm_segment_class}
model_preds = best_model.predict(X)
self.logger.info("Replace previous predictions with GMM predictions")
# Perform smoothing
for i, (k, s) in enumerate(segments_filtered.items()):
if s['segment'] != preds_map[model_preds[i]]:
s['segment'] = preds_map[model_preds[i]]
segments_copy[k] = s
self.logger.info("Merge segments")
# Join consecutive segments after the processing
segments_copy = join_consecutive_segments(segments_copy)
def plot_bars(res_dict_objs, color_dict={"foreground": "#DADDFC", "background": '#FC997C', "null": "#808080"}, channel="",
start_time="", end_time="", snrs=None, titles=['orig', 'smoothed'],
save=False, save_path="", show=True):
This function is for visualizing the smoothing results of multiple segments lists
:param res_dict_objs: a list of lists. Each list is a segments list to plot
:param color_dict: dictionary which represents the mapping between class to color in the plot
:param channel: channel number
:param start_time: absolute start time
:param end_time: absolute end time
:param snrs: list of snrs to display in the title
:param titles: title to each subplot
:param save: flag to save the figure into a png file
:param save_path: save path of the figure
:param show: flag to show the figure
if snrs == None:
snrs = [''] * len(res_dict_objs)
if type(res_dict_objs) != list:
res_dict_objs = [res_dict_objs]
fig, ax = plt.subplots(len(res_dict_objs), 1, figsize=(20, 10))
fig.suptitle(f"Channel {channel}, {start_time}-{end_time}\n{snrs[0]}\n{snrs[1]}")
for dict_idx, res_dict in enumerate(res_dict_objs):
date_from = [a['startTime'] for a in res_dict]
date_to = [a['endTime'] for a in res_dict]
segment = [a['segment'] for a in res_dict]
df = pd.DataFrame({'date_from': date_from, 'date_to': date_to,
'segment': segment})
for i in range(df.shape[0]):
ax[dict_idx].plot([df['date_from'][i], df['date_to'][i]], [1, 1],
linewidth=50, c=color_dict[df['segment'][i]])
if show:
if save:
在本文中探讨GMM作为时间数据平滑算法的使用。GMM(Gaussian Mixture Model)是一种统计模型,常用于数据聚类和密度估计。虽然它主要用于聚类任务,但也可以在一定程度上用作时间数据平滑算法。虽然它并不是专门为此任务设计的,但是对于这种类别相关的数据平滑,GMM在降噪和结果改善方面表现非常好(信噪比参数)。