首页 星云 工具 资源 星选 资讯 热门工具
:

PDF转图片 完全免费 小红书视频下载 无水印 抖音视频下载 无水印 数字星空

视觉里程计参考论文及数据集下载百度网盘地址

行业研究 11.42MB 15 需要积分: 1
立即下载

资源介绍:

写论文整理的视觉里程计参考论文PDF,KITTI数据集下载百度网盘地址,进攻参考。
1
PL-SLAM: a Stereo SLAM System through the Combination
of Points and Line Segments
Ruben Gomez-Ojeda, David Zuñiga-Noël, Francisco-Angel Moreno,
Davide Scaramuzza, and Javier Gonzalez-Jimenez
Abstract—Traditional approaches to stereo visual
SLAM rely on point features to estimate the camera
trajectory and build a map of the environment. In low-
textured environments, though, it is often difficult to
find a sufficient number of reliable point features and,
as a consequence, the performance of such algorithms
degrades. This paper proposes PL-SLAM, a stereo
visual SLAM system that combines both points and
line segments to work robustly in a wider variety of
scenarios, particularly in those where point features
are scarce or not well-distributed in the image. PL-
SLAM leverages both points and segments at all the
instances of the process: visual odometry, keyframe
selection, bundle adjustment, etc. We contribute also
with a loop closure procedure through a novel bag-of-
words approach that exploits the combined descriptive
power of the two kinds of features. Additionally, the
resulting map is richer and more diverse in 3D ele-
ments, which can be exploited to infer valuable, high-
level scene structures like planes, empty spaces, ground
plane, etc. (not addressed in this work). Our proposal
has been tested with several popular datasets (such as
KITTI and EuRoC), and is compared to state of the
art methods like ORB-SLAM, revealing a more robust
performance in most of the experiments, while still
running in real-time. An open source version of the
PL-SLAM C++ code will be released for the benefit
of the community.
Index Terms—Stereo Visual SLAM, line segment
features, bundle adjustment, loop closure
I. Introduction
In recent years, visual Simultaneous Localization And
Mapping (SLAM) is firmly progressing towards the degree
of reliability required for fully autonomous vehicles: mobile
robots, self-driving cars or Unmanned Aerial Vehicles
(UAVs). In a nutshell, the SLAM problem consists of the
estimation of the vehicle trajectory given as a set of poses
(position and orientation), while simultaneously building
a map of the environment. Apart from self-localization, a
map becomes useful for obstacle avoidance, object recog-
nition, task planning, etc. [1].
As a first-level classification, SLAM systems can be di-
vided into topological (e.g. [2]–[5]) and metric approaches.
This work has been supported by the Spanish Government (project
DPI2017-84827-R and grant BES-2015-071606) and the Andalusian
Government (project TEP2012-530).
R. Gomez-Ojeda, F.A. Moreno, D. Zuñiga-Noël, and J. Gonzalez-
Jimenez are with the Machine Perception and Intelligent
Robotics (MAPIR) Group, University of Malaga. (email:
rubengooj@gmail.com).
D. Scaramuzza is with the Robotics and Perception Group, Dep.
of Informatics, University of Zurich, and Dep. of Neuroinformatics,
University of Zurich and ETH Zurich, Switzerland.
(a) lt-easy (b) euroc/V2-01-easy
(c) euroc/V1-01-easy (d) Map from (c)
Figure 1. Low-textured environments are challenging for feature-
based SLAM systems based on traditional keypoints. In contrast,
line segments are usually common in human-made environments,
and apart from an improved camera localization, the built maps are
richer as they are populated with more meaningful elements (3D line-
segments).
In this paper, we focus on the latter, which take into
account the geometric information of the environment and
build a physically meaningful map of it [6], [7]. These
approaches can be further classified into direct and feature-
based systems. The first group, i.e. direct methods, esti-
mates the camera motion by minimizing the photometric
errors between consecutive frames under the assumption of
constant brightness along the local parts of the sequences
(examples of this approach can be found elsewhere [8]–
[10]). While this group of techniques has the advantage
of working directly with the input images regardless of
any intermediate representation, they are very sensitive to
brightness changes (this phenomena was addressed in [11])
and constrained to narrow baseline motions. In contrast,
feature-based methods employ an indirect representation
of the images, typically in the form of point features, that
are tracked along the successive frames and then employed
for recovering the pose by minimizing the projection errors
[12], [13].
It is noticeable that the performance of any of the above-
mentioned approaches usually decreases in low-textured
environments in which it is typically difficult to find a
large set of keypoint features. The effect in such cases is
an accuracy impoverishment and, occasionally, the com-
plete failure of the system. Many of such low-textured
arXiv:1705.09479v2 [cs.CV] 9 Apr 2018
2
environments, however, contain planar elements that are
rich in linear shapes, so it would be possible to extract
line segments from them. We claim that these two types of
features (keypoints and segments) complement each other
and its combination leads to a more versatile, robust and
stable SLAM system. Furthermore, the resulting maps
comprising both 3D points and segments provide more
structural information from the environment than point-
only maps, as can be seen in the example shown in Figure
1(d). Thus, applications that perform high-level tasks such
as place recognition, semantic mapping or task planning,
among others, can significantly benefit from the richer
information that can be inferred from them.
These benefits, though, come at the expense of a higher
computational burden in both detecting and matching
line-segments in images [14], and also in dealing effectively
with segment-specific problems like partial occlusions, line
disconnection, etc. which complicate feature tracking and
matching as well as the residual computation for the map
and pose optimization. Such hurdles are the reason why
the number of solutions that have been proposed in the
literature to SLAM or Structure from Motion (SfM) with
line features (e.g. [15]–[19]) is so limited. Besides, the
few solutions we have found only perform robustly in
highly structured environments while showing unreliable
results when applied to more realistic ones such as those
recorded in the KITTI or EuRoC datasets. In this work,
we address the segment-specific tracking and matching
issues by discarding outliers through the comparison of
the length and the orientation of the line features, while,
for the residual computation, we represent segments in the
map with their endpoints coordinates. Thus, the residuals
between the observed segments and their corresponding
lines in the map are computed by the distance between the
projections of those endpoints on the image plane and the
infinite lines associated to the observed ones. This way, we
are able to build a consistent cost function that seamlessly
encompasses both point and line features.
These two kinds of features are also employed to ro-
bustly detect loop closures during robot navigation, fol-
lowing a new bag-of-words approach that combines the
advantages of using each of them to perform place recog-
nition. In summary, we propose a novel and versatile stereo
visual SLAM system, coined PL-SLAM, which builds upon
our previous Visual Odometry approach presented in [20],
and combines both point and line segment features to
perform real-time robot localization and mapping. The
main contributions of this work are:
The first open source stereo SLAM system that employs
point and line segment features in real time, hence
being capable of operating robustly in low-textured
environments where traditional point-only approaches
tend to fail, while obtaining similar accuracy in the
rest of the scenarios. Because of the consideration of
both kinds of features, our proposal also produces rich
geometrical maps.
A new implementation of the bundle adjustment process
that seamlessly accounts for both kinds of features while
refining the poses of the keyframes.
An extension of the bag-of-words approach presented
in [21] that takes into account the description of both
points and line segments to improve the loop-closure
process.
A set of illustrative videos showing the performance
of proposed system and an open source version
of the developed C++ PL-SLAM library are
publicly available at http://mapir.uma.es and
https://github.com/rubengooj/pl-slam.
II. Related Work
Feature-based SLAM is traditionally addressed by
tracking keypoints along successive frames and then
minimizing some error function (typically based on re-
projection errors) to estimate the robot poses [22]. Among
the most successful proposals we can highlight FastSLAM
[23], PTAM [24] [25], SVO [26] [10], and, more recently,
ORB-SLAM [13], which relies on a fast and continuous
tracking of ORB features [27], and a local bundle adjust-
ment step with the continuous observations of the point
features. However, all of the previous approaches tend
to fail or reduce their accuracy in low-textured scenarios
where the lack of repeatable and reliable features usually
hinders the feature tracking process. In the following, we
review the state of the art of SLAM systems based on
alternative image features to keypoints: i.e. edgelets, lines,
or line segments.
One of the remarkable approaches that employs line
features is the one in [28], where the authors propose an
algorithm to integrate them into a monocular Extended
Kalman Filter SLAM system (EKF-SLAM). In the refered
paper, the line detection relies on an hypothesize-and-test
method that connects several near keypoints to achieve
real-time performance. Other works employ edge land-
marks as features in monocular SLAM, as the one reported
in [29], which does not only include the information of
the local planar patch as in the case of keypoints, but
also considers local edge segments, hence introducing new
valuable information as the orientation of the so-called
edgelets. In that work they derive suitable models for those
kinds of features and use them within a particle-filter
SLAM system, achieving nearly real-time performance.
More recently, authors in [10] also introduced edgelets in
combination with intensity corners in order to improve
robustness in environments with little or high-frequency
texture.
A different approach, known as model-based, incor-
porates prior information about the orientation of the
landmarks derived from line segments. Particularly, the
method in [30] presents a monocular 2D SLAM system
that employs vertical and horizontal lines on the floor as
features for both motion and map estimation. For that,
they propose two different parameterizations for the verti-
cal and the horizontal lines: vertical lines are represented
as 2D points on the floor plane (placed the intersection
point between the line and such plane), while horizontal
3
lines are represented by their two end-points placed on
the floor. Finally, the proposed models is incorporated into
an EKF-SLAM system. Another model-based approach is
reported in [31], where the authors introduce structural
lines in an extension of a standard EKF-SLAM system.
The dominant directions of the lines are estimated by com-
puting their vanishing points under the assumption of a
Manhattan world [32]. All these model-based approaches,
though, are limited to very structured scenarios and/or
planar motions, as they rely solely on line features.
The works in [16], [33] address a generic approach
that compares the impact of eight different landmark
parametrization for monocular EKF-SLAM, including the
use of point and line features. Nevertheless, such systems
are only validated through analytic and statistical tools
that assumed already known data association and that,
unlike our proposal, do not implement a complete front-
end that detect and track the line segments. Another
technique for building a 3D line-based SLAM system has
been proposed in the recent work [34]. For that, the
authors employ two different representations for the line
segments: the Plücker line coordinates for the initialization
and 3D projections, and an orthonormal representation
for the back-end optimization. Unfortunately, neither the
source code is available nor the employed dataset contain
any ground-truth, therefore it has not been possible to
carry out a comparison against our proposal.
Recently, line segment features have also been employed
for monocular pose estimation in combination with points,
due to the bad-conditioned nature of this problem. For
that, in [35] the authors extended the semi-direct approach
in [26] with line segments. Thanks to this pipeline, line
segments can be propagated efficiently throughout the
image sequence, while refining the position of the end-
points under the assumptions of high frame rate and very
narrow-baseline.
Finally, by the time of the first submission of this
paper, a work with the same name (PL-SLAM, [36]) was
published extending the monocular algorithm ORB-SLAM
to the case of including line segment features computed
through the LSD detector [37]. Apart from being a monoc-
ular system (unlike our stereo approach), their proposal
deals with line tracking and matching in an essentially
different way: they propagate the line segments by their
endpoints and then perform descriptor-based tracking,
which increases the computational burden of ORB-SLAM.
Besides this computational drawback, when working with
features detected with the LSD detector, the variance
of the endpoints becomes quite pronounced, specially in
challenging illumination conditions or very low-textured
scenes, making more difficult wide-baseline tracking and
matching between line features in non-consecutive frames.
Our PL-SLAM approach, in contrast, does not make any
assumption regarding the position of the lines endpoints
so that our tracking front-end allows to handle partially
occluded line segments, endpoints variance, etc., for both
the stereo and frame-to-frame tracking, hence becoming a
more robust approach to point-and-line SLAM.
New Frame
is it a KF?
Feature
Extraction
Stereo
Matching
Frame-to-frame
Tracking
Motion
Estimation
Stereo Visual Odometry
Search new
matches with
other KFs in
the map
Local Mapping
Get visual
descriptor
Query
bag-of-words
Compute
pose change
Loop
correction
Loop Closing
Insert
new KF
Perform
local BA
Map
Keyframes
Covisibility
graph
Spanning tree
Graph
3D Line
segments
3D points
Landmarks
yes
no
Figure 2. Scheme of the stereo PL-SLAM system.
III. PL-SLAM Overview
The general structure of the PL-SLAM system proposed
here is depicted in Figure 2, and its main modules are
described in the following sections. As it is common to
other SLAM systems (being ORB-SLAM [13] the most
popular method nowadays), our proposal is also based
on three different threads: visual odometry, local mapping,
and loop closure. This efficient distribution allows for a
continuous tracking of the VO module while the local
mapping and the loop closure ones are processed in the
background only when a new keyframe is inserted.
Map. The map consists of i) a set of keyframes (KFs),
ii) the detected 3D landmarks (both keypoints and line
segments), iii) a covisibility graph and iv) a spanning tree.
The keyframes contain the observed stereo features and
their descriptors, a visual descriptor of the corresponding
left image computed through a visual vocabulary as ex-
plained later in Section VI-A, and the information of the
3D camera pose.
Regarding the landmarks, we store the list of obser-
vations and the most representative descriptor for each
detected landmark. Besides, specifically for points, we
also keep its estimated 3D position while, for the line
segments, we keep both their direction and the estimated
3D coordinates of their endpoints.
Finally, the covisibility information, as in [38], is mod-
eled by a graph: each node represents a KF, and edges
between KFs are created only if they share a minimum
number of landmarks, which in this work is set to 20
landmarks (see Figure 3 for an example), allowing for real-
time bundle adjustment along the local map.
Similarly, in order to perform a faster loop closure
optimization, we also form the so-called essential graph,
which is less dense than the covisibility graph because an
edge between two KFs is created when they share more
4
than 100 landmark observations. Finally, the map also
contains a spanning tree, which is the minimum connected
representation of a graph that includes all the KFs.
Feature Tracking. We perform feature tracking
through the stereo visual odometry algorithm from our
previous work [20]. In a nutshell, we track image fea-
tures (points and segments) from a sequence of stereo
frames and compute their 3D position and their associated
uncertainty represented by covariance matrices. The 3D
landmarks are then projected to the new camera pose, and
the projection errors are minimized in order to obtain both
the camera pose increment and the covariance associated
to such estimation. This process is repeated every new
frame, performing simply frame to frame VO, until a new
KF is inserted to the map. Further discussion about this
feature tracking procedure will be formally addressed in
Section IV. Once a KF is inserted into the map, two
procedures are run in parallel: local mapping and loop
closure detection.
Local Mapping. The local mapping procedure looks
for new feature correspondences between the new KF, the
last one and those connected to the last one in the cov-
isibility graph. This way, we build the so-called local map
of the current KF, which includes all the KFs that share
at least 20 landmark observations with the current one as
well as all the landmarks observed by them. Finally, an
optimization of all the elements within the local map (KF
poses and landmarks positions) is performed. A detailed
description of this procedure will be presented in Section
V.
Loop Closure. In parallel to local mapping, a loop
closure detection is carried out by extracting a visual
descriptor for each image, based on a bag-of-words ap-
proach, as will be described in Section VI. All the visual
descriptors of the captured frames during camera motion
are stored in a database, which is later employed to find
similar frames to the current one. The best match will
be considered a loop closure candidate only if the local
sequence surrounding this KF is also similar. Finally, the
relative SE(3) transformation between the current KF and
the loop closure candidate is estimated so that, if a proper
estimation is found, all the KFs poses involved in the loop
are corrected through a pose-graph optimization (PGO)
process.
It is important to remark that the stereo visual odom-
etry system runs continuously at every frame while both
the local mapping and loop closure detection procedures
are launched in background (in separated threads) only
when a new KF is inserted, thus allowing our system
to reach real-time performance. In the event of a new
keyframe being inserted in the system while the local
mapping thread is still being processed, the keyframe is
temporary stored until the map is updated and then a
new local mapping process is launched.
These mapping and loop closure approaches are identi-
cal to the ones followed in ORB-SLAM, being aimed to
reduce the high computational burden that general BA
involves (along with the incorporation of recent sparse al-
Figure 3. Covisibility graph in the sequence lt-first for which we have
represented the edges connecting the keyframes with green lines.
gebra techniques). Within the BA framework, our proposal
belongs to the so-called relative techniques (e.g. [39]–[41]),
which have gained great popularity in the last years as an
alternative to the more costly global approaches (e.g. [24],
[42]).
IV. Feature Tracking
This section reviews the most important aspects of our
previous work [20], which deals with the visual odometry
estimation between consecutive frames, and also with
the KF decision policy. Basically, both points and line
segments are tracked along a sequence of stereo frames
(see Figure 1), and then the 3D motion of the camera
(and also its uncertainty) is computed by minimizing the
projection errors.
A. Point Features
In this work we use the well-known ORB method [27]
due to its great performance for keypoint detection, and
the binary nature of the descriptor it provides, which
allows for a fast, efficient keypoint matching. In order to
reduce the number of outliers, we only consider measure-
ments which fulfill that the best match in the left image
corresponds to the best match in the right one, i.e. they
are mutual best matches. Finally, we also filter out those
matches whose distance in the descriptor space with the
second best match is less than twice the distance with
the best match, to ensure that the correspondences are
meaningful enough.
B. Line Segment Features
The Line Segment Detector (LSD) method [37] has been
employed to extract line segments, providing high preci-
sion and repeatability. For stereo matching and frame-to-
frame tracking we augment line segments with a binary
descriptor provided by the Line Band Descriptor (LBD)
method [43], which allows us to find correspondences
between lines based on their local appearance. Similarly to
the case of points, we check that both candidate features
are mutual best matches, and also that the feature is
meaningful enough. Finally, we take advantage of the
useful geometrical information that line segments provide
in order to filter out those line matches with different
orientations and lengths, and those with a high difference
on the disparities of the endpoint. Notice that this filter
helps the system to retain a larger amount of structural

资源文件列表:

视觉里程计参考论文及数据集下载百度网盘地址.zip 大约有9个文件
  1. 基于双目视觉的视觉里程计相关论文/
  2. 基于双目视觉的视觉里程计相关论文/AV1FeaturefromAcceleratedSegmentTest.pdf 453.27KB
  3. 基于双目视觉的视觉里程计相关论文/BARSAN-IoanAndrei-RobustDenseMapping-ICRA-2018-CameraReady.pdf 3.12MB
  4. 基于双目视觉的视觉里程计相关论文/KITTI数据集最全网盘地址.txt 108B
  5. 基于双目视觉的视觉里程计相关论文/LK特征点跟踪算法.pdf 341.88KB
  6. 基于双目视觉的视觉里程计相关论文/LSD A Fast Line Segment Detectorwith a False Detection Control.pdf 2.21MB
  7. 基于双目视觉的视觉里程计相关论文/PL-SLAM:a Stereo SLAM System through the Combination of Points and Line Segments.pdf 3.27MB
  8. 基于双目视觉的视觉里程计相关论文/Robust Stereo Visual Odometry through a ProbabilisticCombination of Points and Line Segments.pdf 2.73MB
  9. 基于双目视觉的视觉里程计相关论文/金字塔L-K光流.pdf 206.53KB
0评论
提交 加载更多评论
其他资源 WebGIS煤矿城市浏览demo
课程作业demo,使用OGC的WMS服务,Vue+OpenLayers+ElementUI+Geoserver,实现图层加载,使用CQL进行图层查询
果迷MTK_初音全功能去控一键刷机.zip
果迷MTK_初音全功能去控一键刷机.zip
UFI工具箱v1.6(原ZXIC小工具).zip
UFI工具箱v1.6(原ZXIC小工具).zip
HJGLXT_XiaZaiBa.zip
HJGLXT_XiaZaiBa.zip
HJGLXT_XiaZaiBa.zip
笔者的数字识别模型、树莓派上所需安装的依赖包(包括onnxruntime依赖)以及YOLOv5-lite1.4版本的源码
所有在树莓派上需要的依赖都在这里,还有笔者训练好的模型
仿美团外卖小程序源码及程序
放美团,源码
基于TinUI(tkinter)界面的pip可视化管理器
通过TinUI(tkinter)搭建界面,使用命令行作为后端的python.pip可视化管理器。 非阻塞多线程设计,可以同时执行多个pip任务,单个pip任务的耗时不会影响界面控制,使用安全的tkinter多线程结构设计。
TinUIXml编辑器
简易编辑通过xml文本布局的TinUI界面,适用于python.tkinter(安装TinUI)