CSAIL invented deep-learning vision system that predicts human actions

Computer Science and Artificial Intelligence Lab’s (CSAIL’s) deep-learning vision system predicted human interaction using videos of TV shows. Humans are quite efficient in anticipating human actions, but a machine does not know anything about human behavior, unless it is not trained to anticipate. Machines made with deep learning vision capability could transform future of robotics.

If machines are installed with similar type of sense of prediction, they could be useful to men in number of ways. Future would be robotics age, when robot will surround us and assist us in variety of tasks. Their role will become more reliable if they could predict actions. The robots could be handled with more responsibility if they would anticipate.

Robots with sense of prediction could be used to better navigate human environments, to be used as emergency response systems that predict falls, and could work like Google Glass-style headsets that give suggestions for what to do in different situations. This would be made possible by the recent invention in predictive vision, developing an algorithm that can anticipate interactions with unprecedented accuracy by MIT researchers.

The newly invented system was trained on YouTube videos and TV shows such as ‘The Office’ and ‘Desperate Housewives’. This helped the system to predict whether two individuals will hug, kiss, shake hands or slap five, when meeting.

“We wanted to show that just by watching large amounts of video, computers can gain enough knowledge to consistently make predictions about their surroundings”, said CSAIL PhD student Carl Vondrick, who is first author on a related paper that he will present this week at the International Conference on Computer Vision and Pattern Recognition (CVPR).

According to a report in Popsci by Mary Beth Griggs, “In a paper that will be presented this week at the International Conference on Computer Vision and Pattern Recognition, researchers at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) created an algorithm that can predict how humans will behave in certain situations.”

Eventually, this could lead to artificial intelligence that is better able to react to humans or even security cameras that could alert authorities when people are in need of help. (In an alternative, more dystopian scenario, we could imagine computers being able to predict human behavior could lead to an AI version of Minority Report.)

“I’m excited to see how much better the algorithms get if we can feed them a lifetime’s worth of videos,” says lead author Carl Vondrick. “We might see some significant improvements that would get us closer to using predictive-vision in real-world situations.”

“Researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL) in Cambridge set out to train a computer to be able to predict how people will greet each other. And their algorithm can do just that.,” according to a news report published by CS Monitor.

The system studied 600 hours of raw footage from YouTube videos and television shows like “The Office” and “Desperate Housewives.” Then, when shown previously-unseen footage, the algorithm was able to predict how people would greet each other accurately over 43 percent of the time when the video was one second away from the greeting.

“Humans automatically learn to anticipate actions through experience, which is what made us interested in trying to imbue computers with the same sort of common sense,” Carl Vondrick, a CSAIL PhD student who is the first author on a paper to be presented this week at the International Conference on Computer Vision and Pattern Recognition, said in a press release. “We wanted to show that just by watching large amounts of video, computers can gain enough knowledge to consistently make predictions about their surroundings.”

Leave a Reply

Your email address will not be published. Required fields are marked *