Construction Image Captioning
Image captioning refers to the automatic generation of one or several sentences to describe the contents of an image; it is a disciplinary technology rooted in computer vision and natural language processing, which can be potentially used for construction scene analysis. Recently, deep learning methods have enjoyed considerable success because they are capable of extracting high-level features automatically from images for various applications in computer vision, natural language processing, reinforcement learning, and so on. The applications of construction image captioning include automated detection of safety rule violation, roadway asset evaluation, and construction hazard identification.
Annotations
Annotation examples
Ground Truth: A dozer is riding on the road.
Ground Truth: An excavator and a truck are working on site.
Ground Truth: A campactor is riding on the road.
Ground Truth: A truck and a dozer is woking on a road construciton site.
Linguistic schema for annotation
The right figure shows the linguistic schema proposed by ACID team for annotating construction machine images. First, the following elements must be deconstructed from the construction image according to the linguistic schema: (1) the primary machine object; (2) the machine object cooperating with the primary object; (3) the working contents (e.g., dirt, stone, and construction materials) of the primary object; (4) the activities of the primary machine; and (5) supplementary information, such as color, count, and weather conditions. Then, the correct terms must be matched with each element deconstructed in the previous step. Finally, a logical and correct sentence is formed using words to describe what is occurring in the construction image.
Image Captioning Algorithm Analysis
Six deep learning image captioning algorithms have been tested on a sub-set of ACID (4,000 images with around 8,000 captions. The testing results is shown below. To be noted, more testing results of image captioning on the entire ACID will be released soon.
​
Image Captioning Demo Images
Below images shows example image captioning results with the model trained on the ACID sub-set.