Skip to main content

Microsoft releases Windows Vision Skills preview to streamline computer vision development

Microsoft Windows Vision Skills
The segmentation skill in Microsoft's Windows Vision Skills kit.
Image Credit: Microsoft

Watch all the Transform 2020 sessions on-demand here.


Computer vision is an exceedingly useful subfield of machine learning that’s been applied to everything from facial recognition to tuberculosis diagnosis, and Microsoft wants to streamline its deployment on Windows. The company today released a preview of Windows Vision Skills, a set of packages that enable a range of AI-driven photo and video analysis tasks.

Three prebuilt skills are available at launch: Object Detector, Skeletal Detector, and Emotion Recognizer.

“Implementing and integrating efficient machine learning and computer vision solutions is a hard task for developers. The industry is moving at a fast pace, and the amount of custom-tailored solutions coming out makes it strenuous for application developers to keep up,” wrote Microsoft developer writer Eliot Cowley in an article. “The Windows Vision Skills framework is meant to make it easier to utilize computer vision. It standardizes the way computer vision modules are put to use within a Windows application, running on the local device.”

Microsoft Windows Vision Skills


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


Developers can add the skills — modular bits of code that process inputs and produce outputs — to any .NET, Win32, and UWP application courtesy out-of-the-box WinRT APIs that don’t require prior machine learning or computer vision knowledge to use. Meanwhile, computer vision developers can take advantage of hardware acceleration frameworks like DirectX and DirectML on Windows devices by packaging their solutions as skills.

Microsoft says that the Windows Vision Skills framework can be extended to work with existing machine learning frameworks and libraries such as OpenCV, and it says that skills can be pieced together within an application to address a complex scenario or bundled together in a single package.

Windows Vision Skills complements existing Windows support for inference of ONNX models by utilizing WinML for local inferencing. The framework allows you to build intelligent applications while leveraging platform optimization.

“Skills are strongly versioned to ease iteration without breaking existing applications,” said Cowley, “[and they’re] easy to ingest, easy to update, and they preserve intellectual property through licensing.”

Microsoft isn’t the only company that’s made computer vision tools available in open source recently. Last week, Google debuted AI image segmentation models optimized for its Cloud TPU hardware platform, and in March, Intel made generally available CVAT, a toolkit for image data labeling. Last March saw the launch of Intel’s OpenVINO, a computer vision toolkit for edge computing that’s compatible with open source frameworks like Facebook’s Caffe2 and Google’s TensorFlow. And two years ago, Facebook rolled out a trio of tools for segmenting objects within images.