2
environments, however, contain planar elements that are
rich in linear shapes, so it would be possible to extract
line segments from them. We claim that these two types of
features (keypoints and segments) complement each other
and its combination leads to a more versatile, robust and
stable SLAM system. Furthermore, the resulting maps
comprising both 3D points and segments provide more
structural information from the environment than point-
only maps, as can be seen in the example shown in Figure
1(d). Thus, applications that perform high-level tasks such
as place recognition, semantic mapping or task planning,
among others, can significantly benefit from the richer
information that can be inferred from them.
These benefits, though, come at the expense of a higher
computational burden in both detecting and matching
line-segments in images [14], and also in dealing effectively
with segment-specific problems like partial occlusions, line
disconnection, etc. which complicate feature tracking and
matching as well as the residual computation for the map
and pose optimization. Such hurdles are the reason why
the number of solutions that have been proposed in the
literature to SLAM or Structure from Motion (SfM) with
line features (e.g. [15]–[19]) is so limited. Besides, the
few solutions we have found only perform robustly in
highly structured environments while showing unreliable
results when applied to more realistic ones such as those
recorded in the KITTI or EuRoC datasets. In this work,
we address the segment-specific tracking and matching
issues by discarding outliers through the comparison of
the length and the orientation of the line features, while,
for the residual computation, we represent segments in the
map with their endpoints coordinates. Thus, the residuals
between the observed segments and their corresponding
lines in the map are computed by the distance between the
projections of those endpoints on the image plane and the
infinite lines associated to the observed ones. This way, we
are able to build a consistent cost function that seamlessly
encompasses both point and line features.
These two kinds of features are also employed to ro-
bustly detect loop closures during robot navigation, fol-
lowing a new bag-of-words approach that combines the
advantages of using each of them to perform place recog-
nition. In summary, we propose a novel and versatile stereo
visual SLAM system, coined PL-SLAM, which builds upon
our previous Visual Odometry approach presented in [20],
and combines both point and line segment features to
perform real-time robot localization and mapping. The
main contributions of this work are:
◦ The first open source stereo SLAM system that employs
point and line segment features in real time, hence
being capable of operating robustly in low-textured
environments where traditional point-only approaches
tend to fail, while obtaining similar accuracy in the
rest of the scenarios. Because of the consideration of
both kinds of features, our proposal also produces rich
geometrical maps.
◦ A new implementation of the bundle adjustment process
that seamlessly accounts for both kinds of features while
refining the poses of the keyframes.
◦ An extension of the bag-of-words approach presented
in [21] that takes into account the description of both
points and line segments to improve the loop-closure
process.
A set of illustrative videos showing the performance
of proposed system and an open source version
of the developed C++ PL-SLAM library are
publicly available at http://mapir.uma.es and
https://github.com/rubengooj/pl-slam.
II. Related Work
Feature-based SLAM is traditionally addressed by
tracking keypoints along successive frames and then
minimizing some error function (typically based on re-
projection errors) to estimate the robot poses [22]. Among
the most successful proposals we can highlight FastSLAM
[23], PTAM [24] [25], SVO [26] [10], and, more recently,
ORB-SLAM [13], which relies on a fast and continuous
tracking of ORB features [27], and a local bundle adjust-
ment step with the continuous observations of the point
features. However, all of the previous approaches tend
to fail or reduce their accuracy in low-textured scenarios
where the lack of repeatable and reliable features usually
hinders the feature tracking process. In the following, we
review the state of the art of SLAM systems based on
alternative image features to keypoints: i.e. edgelets, lines,
or line segments.
One of the remarkable approaches that employs line
features is the one in [28], where the authors propose an
algorithm to integrate them into a monocular Extended
Kalman Filter SLAM system (EKF-SLAM). In the refered
paper, the line detection relies on an hypothesize-and-test
method that connects several near keypoints to achieve
real-time performance. Other works employ edge land-
marks as features in monocular SLAM, as the one reported
in [29], which does not only include the information of
the local planar patch as in the case of keypoints, but
also considers local edge segments, hence introducing new
valuable information as the orientation of the so-called
edgelets. In that work they derive suitable models for those
kinds of features and use them within a particle-filter
SLAM system, achieving nearly real-time performance.
More recently, authors in [10] also introduced edgelets in
combination with intensity corners in order to improve
robustness in environments with little or high-frequency
texture.
A different approach, known as model-based, incor-
porates prior information about the orientation of the
landmarks derived from line segments. Particularly, the
method in [30] presents a monocular 2D SLAM system
that employs vertical and horizontal lines on the floor as
features for both motion and map estimation. For that,
they propose two different parameterizations for the verti-
cal and the horizontal lines: vertical lines are represented
as 2D points on the floor plane (placed the intersection
point between the line and such plane), while horizontal