top of page

Construction Image Captioning

Image captioning refers to the automatic generation of one or several sentences to describe the contents of an image; it is a disciplinary technology rooted in computer vision and natural language processing, which can be potentially used for construction scene analysis. Recently, deep learning methods have enjoyed considerable success because they are capable of extracting high-level features automatically from images for various applications in computer vision, natural language processing, reinforcement learning, and so on. The applications of construction image captioning include automated detection of safety rule violation, roadway asset evaluation, and construction hazard identification. 

Annotations

Annotation examples

Linguistic schema for annotation

The right figure shows the linguistic schema proposed by ACID team for annotating construction machine images. First, the following elements must be deconstructed from the construction image according to the linguistic schema: (1) the primary machine object; (2) the machine object cooperating with the primary object; (3) the working contents (e.g., dirt, stone, and construction materials) of the primary object; (4) the activities of the primary machine; and (5) supplementary information, such as color, count, and weather conditions. Then, the correct terms must be matched with each element deconstructed in the previous step. Finally, a logical and correct sentence is formed using words to describe what is occurring in the construction image.

Fig.3.jpg

Image Captioning Algorithm Analysis

Six deep learning image captioning algorithms have been tested on a sub-set of ACID (4,000 images with around 8,000 captions. The testing results is shown below. To be noted, more testing results of image captioning on the entire ACID will be released soon.

​

Table5.jpg

Image Captioning Demo Images

Below images shows example image captioning results with the model trained on the ACID sub-set. 

Fig.8.jpg

Download

Please click the button below back to dataset download.

bottom of page