Skip to main content

AI Weekly: CVPR 2019 showcased AI that can visualize hidden objects and see around corners

Microsoft ObjGAN
Microsoft's ObjGAN in action.
Image Credit: Microsoft

Watch all the Transform 2020 sessions on-demand here.


Yesterday marked the end of the Conference on Computer Vision and Pattern Recognition (CVPR) 2019, an academic convention cosponsored by the Institute of Electrical and Electronic Engineers’ Computer Society and the Computer Vision Foundation. The conference has grown substantially since its debut in1983, and this year was one for the books. A total of 1,300 papers were accepted out of a pool of 5,165 from 14,104 authors (a 25.2% acceptance rate), and organizers report a 56% year-over-year increase in submissions. Over 9,200 people from over 68 countries registered to attend, nearly 300 of whom gave oral presentations. And CVPR received a record high of over $3.1 million in sponsorships.

So what were the week’s highlights? Well, chipmaker Intel shared four of its contributions early on, one of which — acoustic non-line-of-sight imaging — describes a machine learning system capable of constructing images of unseen objects. Nvidia researchers presented a method for precisely detecting and predicting where an object begins and ends — knowledge that might improve inference for existing computer vision frameworks and training data sets for future architectures. Microsoft detailed ObjGAN, an AI model that can understand captions, sketch layouts, and refine details based on the caption’s exact wording, and StoryGAN, which can generate comic-like storyboards from multi-sentence paragraphs.

Not to be outdone, a CVPR-accepted paper from IBM proposes Label-Set Operations (LaSO) networks, which are designed to combine pairs of labeled image examples to create new examples that incorporate the seed images’ labels. And Facebook showcased AI Habitat, an open source simulator that can train AI agents to operate in environments meant to mimic common real-world settings, like apartments or offices.

None of this year’s corporate-backed researchers managed to snag CVPR’s Best Paper Award — it went instead to scientists from Carnegie Mellon University, the University of Toronto, and the University College London, who presented a technique that uses sources of light, sensors, and computer vision algorithms to infer the shape of items concealed by corners. As for the Best Student Paper Award, it went to a team hailing from University of California at Santa Barbara, Microsoft Research in Redmond, and Duke University that demonstrated a method that combines reinforcement and self-supervised learning to allow a robot to navigate to a specific position by following instructions that reference environmental landmarks.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


Two papers received honorable mentions at CVPR 2019: Nvidia’s “A Style-Based Generator Architecture for Generative Adversarial Networks” and Google’s “Learning the Depths of Moving People by Watching Frozen People.” The first proposes an alternative generator for GANs that affords greater control over synthesis and improves the state of the art with respect to distribution and interpolation. The latter predicts depth when both an ordinary camera and people in the scene are freely moving.

Several CVPR participants made new data sets publicly available, particularly in the driverless vehicle domain. Waymo, Google parent company Alphabet’s autonomous driving unit, said it would release a multimodal sensor corpus — the Waymo Open Dataset — later this year. It contains 16.7 hours of video data from 3,000 driving scenes, 600,000 frames, and roughly 25 million 3D bounding boxes and 22 million 2D bounding boxes, beginning with a 1,000-video batch in July. Ford-backed Argo.AI also debuted a curated collection of data — Argoverse — along with high-definition maps, including a set of 3D tracking annotations for 113 scenes, more than 300,000 vehicle trajectories, testing benchmarks, 290 kilometers of recorded road lanes, and an API to connect sensor data with map data.

All of this is to say that the week was jam-packed with exciting computer vision research. And more is on the way. The International Conference on Computer Vision kicks off on October 27 in Seoul, South Korea, and of course, NeurIPS — perhaps the biggest AI summit of the year — will take place this December in Vancouver.

For AI coverage, send news tips to Khari Johnson and Kyle Wiggers — and be sure to subscribe to the AI Weekly newsletter and bookmark our AI Channel.

Thanks for reading,

Kyle Wiggers

AI Staff Writer

P.S. Please enjoy this (fake) video of a robot undergoing “testing” at the hands of overzealous engineers.