Explore chapters and articles related to this topic
Deep Semantic Segmentation in Autonomous Driving
Published in Mahmoud Hassaballah, Ali Ismail Awad, Deep Learning in Computer Vision, 2020
Hazem Rashed, Senthil Yogamani, Ahmad El-Sallab, Mohamed Elhelw, Mahmoud Hassaballah
On the other hand, deep learning has been used to formulate optical flow estimation as a learning problem where temporally sequential images are used as input to the network and the output is a dense optical flow. FlowNet [77] used a large-scale synthetic dataset, namely “Flying Chairs”, as a ground truth for dense optical flow. Two architectures were proposed. The first one accepts an input of six channels for two temporally sequential images stacked together in a single frame. The second is a Mid-Fusion implementation in which each image is processed separately on one stream and 2D correlation is done to output a DOF map. This work has been later extended to FlowNet v2 [78] focusing on small displacement motion, where an additional specialized subnetwork is added to learn small displacements. Results are then fused with the original architecture using another network to learn the best fusion. In [79], the estimated flow is used to warp one of the two temporally sequential images to the other, and the loss function is implemented to minimize the error between the warped image and the original image. Junhwa and Stefan [80] implemented joint estimation of optical flow and semantic estimation such that each task benefits from the other.
A review on camera ego-motion estimation methods based on optical flow for robotics
Published in Lin Liu, Automotive, Mechanical and Electrical Engineering, 2017
These optical flow-based methods can be used for ego-motion parameters estimation, for example, the estimation of a quadrocopter’s velocity. However, as errors occur in all the steps of the ego-motion estimation, these methods are not fit for position or pose estimation at present. The errors can be classified as the following three aspects. First, errors occur in the estimation of optical flow because the assumption of the intensity temporal constancy is not true in all situations. This kind of error is often caused by moving light sources or an additive Gaussian noise. Second, errors occur in estimation of ego-motion from optical flow due to additional sources besides ego-motion contributing to the optical flow. These additional sources, e.g. an independent moving object in sight, are usually called outlier noise. Third, in the integration from ego-motion to position or pose, the errors cumulate with time.
Applications of Computer Vision
Published in Manas Kamal Bhuyan, Computer Vision and Image Processing, 2019
Optical flow Optical flow is the motion of brightness pattern in a sequence of images. Optical flow is the apparent motion of objects as perceived by an observer or a camera. Optical flow indicates the change in image due to motion during a time interval δt. Optical flow is nothing but the velocity field. Image velocity of a point moving in a scene is called “motion field.” Ideally, optical flow is equal to motion field. The velocity field represents the 3D motion of the points of an object across 2D image. So, optical flow can describe how quickly and which direction an image pixel is moving [213]. It employs flow vectors to detect moving regions. The following points are important for estimating optical flows: Optical flow should not depend on illumination changes in the scene.Motion of unwanted objects like shadow should not affect the optical flow.Smooth spheres like objects rotating under constant illumination give no optical flow. In this case, a motion field exists, but there will not be any optical flows, i.e., the optical flow is not equal to the motion field. Again, non-zero optical flow is detected if a fixed sphere is illuminated by a moving source. In this case, only the shading changes, but the motion field does not change.
A Review of Deep Learning-based Human Activity Recognition on Benchmark Video Datasets
Published in Applied Artificial Intelligence, 2022
Vijeta Sharma, Manjari Gupta, Anil Kumar Pandey, Deepti Mishra, Ajai Kumar
Varol et al. (Varol, Laptev, and Schmid 2018) present a new architecture of a two-stream convolutional neural network with Long-short-term Spatiotemporal Features (LSF CNN). This network aims to recognize human action from video data fast and efficiently compared to previous networks. This complete network is a fusion of two subnetworks. The first subnetwork is a long-term spatiotemporal features extraction network (LT-Net), which receives the RGB frames as inputs. Another subnetwork is the short-term spatiotemporal features extraction network (STNet) that accepts the optical flow data as input. Further, these two streams fuse in the CNN fully connected layer. Finally, the fully connected layer’s output sends to the simple classifier support vector machine (SVM). This model includes a novel approach for better utilizing the optical flow field, which has better performance than CNN-based deep learning models (Feichtenhofer, Pinz, and Zisserman 2016). They followed conventional methods to use optical in action recognition problems. This model can learn very deep features in both spatial and temporal areas in this fusion-based two-stream network.
Digital twin for human-machine interaction with convolutional neural network
Published in International Journal of Computer Integrated Manufacturing, 2021
Tian Wang, Jiakun Li, Yingjun Deng, Chuang Wang, Hichem Snoussi, Fei Tao
For video processing tasks, the traditional CNNs are not useful because the 2D convolution kernel cannot extract temporal features of action. The optical flow method or three-dimensional method is usually used to extract temporal information. The calculation of optical flow assumes two adjacent frames as input, and its output is the changing distance and direction of the pixel position in the image. The optical flow contains information of the temporal dimension. Simonyan proposed a two-stream network (Simonyan and Zisserman 2014a), which takes optical flow as a part of network inputs. The two-stream network separated the feature extraction of the spatial and temporal dimensions. The original RGB image is used to extract spatial dimension features, and the optical flow image is used to extract temporal dimension features. Finally, the two features are fused at the last layer of the network. As the optical flow took a considerable amount of time for computation, the two-stream neural network is not suitable for the HMI, which is a practical problem in manufacturing. The demand for real-time response is fatal for the HMI problem. The DT data service should take the efficiency of the algorithm into account.
Automatic recognition of work phases in cable yarding supported by sensor fusion
Published in International Journal of Forest Engineering, 2018
Marek Pierzchała, Knut Kvaal, Karl Stampfer, Bruce Talbot
The IMU provided orientation, angular velocity and linear acceleration data. Orientation data were converted from quaternions to Euler angles (roll, pitch and yaw), while GPS messages (time-stamped geospatial coordinates) were converted to velocity, where the elevation variable was separated. Frames from the downward facing camera were used to extract motion information. For this purpose a popular method for optical flow proposed by Farnebäck (2003) was used. Optical flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of the object or camera. It is a 2D vector field where each vector is a displacement vector showing the movement of points between the first frame and the second. In this study, the vectors were distributed spatially over the whole frame in a regular grid of 30 by 40 points. This provided an array of 1200 vectors, representing the optical flow in the form of velocity vectors in x and y dimensions implemented for each frame. From this array, a single value representative for the whole frame was extracted. Knowing that there might be a motion that is not uniform, for example due to occlusion from swinging logs in the field of view, the most prevalent values were selected automatically. This yielded mean vector orientation Θ in the range 0 to Π (0–180 degrees). Mean vector orientation can be explicitly interpreted as one of three states, inhaul, outhaul and stop (Table 2). A similar approach using such feature descriptors is used by Saeedi et al. (2014).