欢迎访问译联翻译公司!  联系邮箱:fanyi@translian.com
当前位置:首页 > 新闻动态 > 行业新闻

新闻动态 / NEWS

在线咨询 / ONLINE CHAT



广州智能设备翻译公司|人工智能突破声音障碍|智能设备翻译服务

作者: 来源: 日期:2016-06-15 9:55:09

 Algorithm produces ‘real’ sounds from silent videos

人工智能突破声音障碍

 

智能设备翻译服务是译联广州智能设备翻译公司的核心业务之一。自成立以来,译联智能设备翻译公司不断提高自身服务能力,译员翻译水平,汇聚了300多名专业的智能设备翻译译员,他们对智能设备行业有着透彻的了解,熟悉智能设备行业相关翻译术语,同时又具备着扎实的翻译功底。译联始终坚信“以质取胜”的永恒信念,且译联广州智能设备翻译公司也始终都在实践着这一信念。如果您正在寻找合适的智能设备翻译公司,请一定选择译联广州智能设备翻译公司,如果您有意向更换您的翻译服务供应商,请一定选择译联专业智能设备翻译公司。译联翻译,致力于提供专业的智能设备翻译服务,因为专业,所以优秀。

 

导读:MIT开发出一款人工智能系统,该系统可“观看”无声视频并生成该视频应该包含的声音

 

Artificial intelligence has broken through a sound barrier. Researchers from the Massachusetts Institute of Technology have developed an AI system that “watches” a silent video clip and generates a sound so convincing that most human viewers cannot tell whether it is computer-generated.

人工智能已经突破了一项声音障碍。麻省理工学院(MIT)的研究人员开发了一款人工智能系统,可以“观看”无声的视频并且生成一段声音,效果十分逼真,以至于多数观众无法分辨出这些声音是否是计算机生成的。

 

The MIT Computer Science and Artificial Intelligence Lab says its “deep learning algorithm” is the first to pass a “Turing test for sound” — making noises indistinguishable from the real thing.

MIT计算机科学和人工智能实验室(MIT Computer Science and Artificial Intelligence Laboratory)称,其“深度学习算法”是头一个通过“声音图灵测试”的,生成的声音能够以假乱真。

 

The Visually Indicated Sounds system, or Vis, was trained to analyse the sounds made when a stick hits, scrapes or prods a variety of objects, from leaves and water to soil and steel.

这款视觉指示声音系统(Visually Indicated Sounds,简称Vis)受到训练,能够对棍子击打、刮擦或者捅一系列物体(从树叶、水到土壤和钢铁)时发出的声音进行分析。

 

The Vis repertoire could be extended to many other settings, the researchers say. Future versions could, for example, produce more realistic sound effects for film and television than traditional methods such as dropping salt on aluminium foil to imitate rainfall.

研究人员称,Vis的本领可以延伸至很多其他场景。例如,未来的版本可以为电影和电视生成比传统方法(比如在铝箔上撒盐以模拟雨声)更逼真的声音效果。

 

A more significant application might be to help robots understand objects’ physical properties and interact better with their surroundings, said Andrew Owens, leader of the project, which will be presented later this month at the annual Computer Vision and Pattern Recognition conference in Las Vegas.

该项目的负责人安德鲁•欧文斯(Andrew Owens)称,更重要的应用或许是帮助机器人理解物体的物理属性、更好地与它们所在的环境互动。该项目将于本月晚些时候在拉斯维加斯举行的计算机视觉与模式识别大会(CVPR)上展出。

 

A robot could look at a sidewalk and instinctively know that the cement is hard and the grass is soft, and therefore know what would happen if it stepped on either of them,” he said. “Being able to predict sound is an important first step towards being able to predict the consequences of physical interactions with the world.”

“机器人可以看着人行道、本能地知道水泥地是硬的而草地是软的,因此知道如果它踩上两种地面会发生什么,”他说,“能够预测声音,是预测与世界进行物理互动的后果的重要第一步。”

 

The MIT team “trained” Vis by feeding in 1,000 videos including 46,000 sounds made by a drumstick hitting or moving through objects of different consistencies. Then a “deep learning” algorithm, which enables a computer to find patterns within vast quantities of data, deconstructed the sounds.

MIT团队“训练”Vis的方法是,向其输入包含4.6万种声音的1000段视频,这些声音是鼓槌在不同坚实度的物体上敲打或划过而产生的。之后,帮计算机在大量数据中找到模式的“深度学习”算法对这些声音进行解构。

 

To predict a new sound from a silent film clip, Vis looks at the audio properties most likely to be associated with each video frame and knits them together into a coherent sound. Vis can simulate the subtleties of quick and slow sounds, from staccato taps on a rock to gentle rustling through ivy. It manages low-pitched thuds against a cushion and high-pitched clicks against a railing.

为了从无声视频片断中预测一种新声音,Vis会查看最可能与每个视频画面相关的音频属性,并把这些音频串在一起编辑成连贯的声音。Vis可以模拟快节奏和舒缓的微妙声音,无论是断断续续拍打岩石的声音,还是穿过常春藤时发出的细小沙沙声。它既可以模拟击打垫子发出的低沉砰砰声,又可以模拟敲击栏杆时发出的刺耳声音。

 

To test the realism of the sounds for human listeners, the researchers surveyed 400 viewers who saw video clips twice, once with the real sound and one with Vis’s version. They had to say which was real.

为了测试这些声音在人耳听起来的逼真性,研究人员对400名观众进行了调查。他们观看了两遍视频,其中一遍听的是真实声音,另一遍听的是Vis的模拟版。他们要说出哪一次是真实的声音。

 

If Vis made sounds that were indistinguishable from reality, they would be chosen half the time. In fact it achieved a very creditable 40 per cent score.

如果Vis生成的声音与真实的声音无法区分,Vis版被选择的概率应为50%。结果调查得出的概率为40%,这是一个相当不错的分数。

 

The system is least successful when the sounds are clean and sharp, such as hitting wood or metal, and best at reproducing softer and more drawn-out sounds made by leaves or dirt. It sometimes also “hallucinates” a false hit if the stick stops just short of its target.

当声音清晰、尖锐时——比如敲木头或金属的声音——该系统的表现最差,在生成由树叶或泥土产生的较轻柔、持续时间较长的声音时,Vis表现最好。有时,如果棍子刚好停在目标物之前,它还会“幻想出”假的敲击声。

 

Mr Owens dreams of simulating sounds where there is no clear visual clue. “From the gentle blowing of the wind to the buzzing of laptops, at any given moment there are so many ambient sounds that aren’t related to what we’re actually looking at,” he said.

 

欧文斯的梦想是在没有明确视觉线索的情况下让Vis模拟声音。“从微风轻抚、到笔记本电脑的嗡嗡作响,任何时候环境中都存在着许多与我们正盯着看的东西无关的声音,”他说。

 

广州智能设备翻译公司 智能设备翻译服务

 

本文由:译联广州翻译公司免费发布:供学习参考,禁止商用与转载。
sssssssssssssssssssssssss