Image Processing Neural Network Systems and Methods with Scene Understanding

Using ScenarioNet to find high-level similarities and differences between two images



The ability for computational agents to reason about the high-level content of real-world scene images is important for many applications. For example, for many applications (e.g. robotics, human-machine teaming, surveillance, and autonomous vehicles), an agent must reason about the high-level content of real-world scene images in order to make rational, grounded decisions that can be trusted by humans. It is often also necessary to have models that can be interpreted by humans in order to further encourage trust and allow humans to understand the failure modes of the autonomous agent. For example, if a self-driving car makes an error, it is important to know what caused the error to prevent future situations where similar errors might arise.



Researchers at Rutgers University have introduced “Scenarios” as a new way of representing scenes in images. Useful for a wide range of scene understanding tasks, the scenario is an easy-to-interpret, low-dimensional, data-driven representation consisting of sets of frequently co-occurring objects.  Scenarios are learned from data using a novel matrix factorization method which is integrated into a new neural network architecture, the “ScenarioNet”.

Using ScenarioNets, semantic information about real world scene images are recovered at three levels of granularity: 1) scene categories, 2) scenarios, and 3) objects. Training a single ScenarioNet model enables scene classification, scenario recognition, multi-object recognition, content-based scene image retrieval, and content-based image comparison.  Use of this scene understanding technology enables recognizing scenes (e.g., in images and videos) as well as explaining the reasons for the recognition in a human-understandable form.



  • Provides the ability explain decisions and actions made by artificial intelligence (AI) processes.
  • Able to support safety-critical tasks and tasks involving human-machine teaming. (e.g. to understand what caused an error in order to prevent future situations where similar errors might arise.)
  • The ScenarioNet neural network structure is efficient, requiring significantly fewer parameters than other convolutional neural networks, while achieving similar performance on benchmark tasks; and is interpretable because it can produce human-understandable explanations for every decision.


Market Applications:

The technology enabled by this invention is applicable to areas including: Human-machine teaming, robotics, medical image diagnostics, surveillance, and autonomous vehicles.


Intellectual Property & Development Status:

The technology is patent pending and is available for licensing and/or research collaboration with industry partners.

Patent Information:
For Information, Contact:
Andrea Dick
Associate Director, Licensing
Rutgers University