computer vision research papers

The application of deep learning in computer vision

Ieee account.

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Editor’s Note: Special Issue on Robust Vision

Editorial Notes
Published: 24 May 2024

Cite this article

106 Accesses

Explore all metrics

Avoid common mistakes on your manuscript.

The International Journal of Computer Vision gratefully acknowledges the editorial work of the Guest Editors listed below. The special issue included an open call for papers on the topic, as well as extended papers invited from the associated workshop at ECCV 2022. The special issue features 32 papers which have been rigorously reviewed according to journal standards. The papers are available via the “Collections” link on the International Journal of Computer Vision homepage.

Oliver Zendel, Austrian Institute of Technology, Austria.

Adam Kortylewski , Max Planck Institute for Informatics, Germany.

Bernhard Egger, University of Erlangen-Nuremberg, Germany.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Editor’s Note: Special Issue on Robust Vision. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-02122-7

Download citation

Published : 24 May 2024

DOI : https://doi.org/10.1007/s11263-024-02122-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Find a journal
Publish with us
Track your research

Two big computer vision papers boost prospect of safer self-driving vehicles

New chip and camera technology bring closer potential of hands-free road time.

Like nuclear fusion and jet-packs, the self-driving car is a long-promised technology that has stalled for years - yet armed with research, boffins think they have created potential improvements.…

Citizens of Phoenix, San Francisco, and Los Angeles are able to take one of Waymo's self-driving taxis, first introduced to the public in December 2020. But they have not been without their glitches. Just last month in San Francisco, for example, one of the taxi service's autonomous vehicles drove down the wrong side of the street to pass a unicycle. In December last year, a Waymo vehicle hit a backwards-facing pickup truck, resulting in a report with the US National Highway Traffic Safety Administration (NHTSA) and a software update.

But this week, not one but two groups of researchers bidding to improve the performance of self-driving cars and other autonomous vehicles have published papers in the international science journal Nature.

A design for a new chip geared towards autonomous vehicles has arrived from China. Tsinghua University's Luping Shi and colleagues have taken inspiration from the human visual system by both combining low-accuracy, fast event-based detection with more accurate, but slower visualization of an image.

The researchers were able to show the chip — dubbed Tianmouc — could process pixel arrays quickly and robustly in an automotive driving perception system.

In a paper published today, the authors said: "We demonstrate the integration of a Tianmouc chip into an autonomous driving system, showcasing its abilities to enable accurate, fast and robust perception, even in challenging corner cases on open roads. The primitive-based complementary sensing paradigm helps in overcoming fundamental limitations in developing vision systems for diverse open-world applications."

In a separate paper, Davide Scaramuzza, University of Zurich robotics and perception professor, and his colleagues adopt a similar hybrid approach but apply it to camera technologies.

Youtube Video

Cameras for self-driving vehicles navigate a trade-off between bandwidth and latency. While high-res color cameras have good resolution, they require high bandwidth to detect rapid changes. Conversely, reducing the bandwidth increases latency, affecting the timely processing of data for potentially life-saving decision making.

To get out of this bind, the Swiss-based researchers developed a hybrid camera combining event processing with high-bandwidth image processing. Events cameras only record intensity changes, and report them as sparse measurements, meaning the system does not suffer from the bandwidth/latency trade-off.

The event camera is used to detect changes in the blind time between image frames using events. Event data converted into a graph, which changes over time and connects nearby points, is computed locally. The resulting hybrid object detector reduces the detection time in dangerous high-speed situations, according to an explanatory video.

In their paper, the authors say: "Our method exploits the high temporal resolution and sparsity of events and the rich but low temporal resolution information in standard images to generate efficient, high-rate object detections, reducing perceptual and computational latency."

They argue their use of a 20 frames per second RGB camera plus an event camera can achieve the same latency as a 5,000-fps camera with the bandwidth of a 45-fps camera without compromising accuracy.

"Our approach paves the way for efficient and robust perception in edge-case scenarios by uncovering the potential of event cameras," the authors write.

With a hybrid approach to both cameras and data processing in the offing, more widespread adoption of self-driving vehicles may be just around the corner. ®

Two big computer vision papers boost prospect of safer self-driving vehicles

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: benchmarking and improving detail image caption.

Abstract: Image captioning has long been regarded as a fundamental task in visual understanding. Recently, however, few large vision-language model (LVLM) research discusses model's image captioning performance because of the outdated short-caption benchmarks and unreliable evaluation metrics. In this work, we propose to benchmark detail image caption task by curating high-quality evaluation datasets annotated by human experts, GPT-4V and Gemini-1.5-Pro. We also design a more reliable caption evaluation metric called CAPTURE (CAPtion evaluation by exTracting and coUpling coRE information). CAPTURE extracts visual elements, e.g., objects, attributes and relations from captions, and then matches these elements through three stages, achieving the highest consistency with expert judgements over other rule-based or model-based caption metrics. The proposed benchmark and metric provide reliable evaluation for LVLM's detailed image captioning ability. Guided by this evaluation, we further explore to unleash LVLM's detail caption capabilities by synthesizing high-quality data through a five-stage data construction pipeline. Our pipeline only uses a given LVLM itself and other open-source tools, without any human or GPT-4V annotation in the loop. Experiments show that the proposed data construction strategy significantly improves model-generated detail caption data quality for LVLMs with leading performance, and the data quality can be further improved in a self-looping paradigm. All code and dataset will be publicly available at this https URL .

Submission history

Access paper:.

HTML (experimental)
Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

Novel Computer Vision Research Papers From 2020
Example paper
(PDF) Computer Vision for 3D Perception A review
NeurIPS 2020: Key Research Papers in Computer Vision
(PDF) Computer Vision for Interactive Computer Graphics
(PDF) Progress in Computer Vision at the University of Massachusetts

VIDEO

Video Background Subtraction by Using MOG and KNN
What is Artificial Intelligence (AI)?
Dr Loay Hannoudi, Middlesex University, United Kingdom, Excellence In Research
What is the primary goal of image processing?
Computer Vision Research Overview
Foundations of Data Visualisation

COMMENTS

computer vision - IEEE Xplore">The application of deep learning in computer vision - IEEE Xplore
This paper first reviews the main ideas of deep learning, and displays several related frequently-used algorithms for computer vision. Afterwards, the current research status of computer vision field is demonstrated in this paper, particularly the main applications of deep learning in the research field.
ARTIFICIAL INTELLIGENCE IN COMPUTER VISION - ResearchGate">(PDF) ARTIFICIAL INTELLIGENCE IN COMPUTER VISION - ResearchGate
This study focuses on employing artificial intelligence and computer vision analysis to automatically identify aggregates quarries from satellite images within continental Spain.
Computer Vision and Image Processing: A Paper Review - ResearchGate">Computer Vision and Image Processing: A Paper Review -...
This paper provide a survey of the recent technologies and theoretical concept explaining the development of computer vision especially related to image processing using different areas of...
Computer Vision: List of Issues - Wiley Online Library">IET Computer Vision: List of Issues - Wiley Online Library
Volume 16, Issue 1. Pages: 1-97. February 2022. Submit an article. View Calls for Papers. Journal Metrics. back. <em>IET Computer Vision</em> is an open access journal that introduces new horizons and sets the agenda for future avenues of research in a wide range of areas of computer vision.
computer vision - arXiv.org">[2212.05153] Algorithmic progress in computer vision - arXiv.org
We investigate algorithmic progress in image classification on ImageNet, perhaps the most well-known test bed for computer vision. We estimate a model, informed by work on neural scaling laws, and infer a decomposition of progress into the scaling of compute, data, and algorithms.
Vision - Springer">Editor’s Note: Special Issue on Robust Vision - Springer
The International Journal of Computer Vision gratefully acknowledges the editorial work of the Guest Editors listed below. The special issue included an open call for papers on the topic, as well as extended papers invited from the associated workshop at ECCV 2022. The special issue features 32 papers which have been rigorously reviewed ...
Research in Computer Vision - arXiv.org">Industry and Academic Research in Computer Vision - arXiv.org
Industry and Academic Research in Computer Vision. Iuliia Kotseruba, Manos Papagelis, John K. Tsotsos. yulia_k,papaggel,[email protected] York University Toronto, ON. ABSTRACT. This work aims to study the dynamic between research in the indus-try and academia in computer vision.
Computer Vision: A ...">Computers | Free Full-Text | Object Tracking Using Computer ...
A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the ...
computer vision papers boost prospect of safer self ... - MSN">Two big computer vision papers boost prospect of safer self ... -...
Cameras for self-driving vehicles navigate a trade-off between bandwidth and latency. While high-res color cameras have good resolution, they require high bandwidth to detect rapid changes ...
[2405.19092] Benchmarking and Improving Detail Image Caption -...
Image captioning has long been regarded as a fundamental task in visual understanding. Recently, however, few large vision-language model (LVLM) research discusses model's image captioning performance because of the outdated short-caption benchmarks and unreliable evaluation metrics. In this work, we propose to benchmark detail image caption task by curating high-quality evaluation datasets ...