根据B站视频链接生成当前视频的弹幕云图。目前只支持B站,后续会添加爱奇艺、腾讯视频等其他平台。
可在代码中引入调用或命令行调用,也可以直接修改代码使用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 引入调用import wc_builder as ww ww.main('https://www.bilibili.com/video/BV1EK411o7iV' ) 命令行调用 python .\wc_builder.py https://www.bilibili.com/video/BV1EK411o7iV Building prefix dict from the default dictionary ... Loading model from cache C:\Users\V\AppData\Local\Temp\jieba.cache Loading model cost 0.532 seconds. Prefix dict has been built successfully. 修改代码调用if __name__ == '__main__' : main('https://www.bilibili.com/video/BV1EK411o7iV' )
初始代码如下。
代码比较粗糙,有空再研究下弹幕的数量、去重。(B站的弹幕有很多重复的,例如“哈哈哈”和‘哈哈哈哈’这种文本在wordcloud中就是两个词,生成出来的词云图会出现很多重复但字数不同的文本)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 import reimport sysimport jiebafrom wordcloud import WordCloudimport requestsclass wc_builder : def __init__ (self ) -> None : pass def wordcloud_builder (self, param ): raise Exception('Wordcloud_builder Not Implemented' )class bl_builder (wc_builder ): def wordcloud_builder (self, param ): if ('https://www.bilibili.com/video/' not in param) & (requests.get(param).status_code != 200 ): raise Exception('The video links is Wrong' ) video_context = requests.get(param).text cid = re.compile (r'"cid":(\d{9}),"dimension"' ).findall( video_context)[0 ] new_links = 'https://api.bilibili.com/x/v1/dm/list.so?oid=' +cid dm = requests.get(new_links) dm.encoding = 'utf-8' match = re.compile (r">(.*?)</d>" ).findall(dm.text) mstr = '.' .join(match ) words = jieba.lcut(mstr) newtext = '' .join(words) '[\001\002\003\004\005\006\007\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a]+' , '' , newtext) wordcloud = WordCloud( font_path="../tools/files/SimHei.ttf" ).generate(newtext) wordcloud.to_file('弹幕词云图.png' ) import matplotlib.pyplot as plt plt.imshow(wd, interpolation='bilinear' ) plt.axis("off" ) plt.show()def main (parm ): ss = bl_builder() ss.wordcloud_builder(parm)if __name__ == '__main__' : main(sys.argv[1 ])