Off-the-shelf OCR data covers natural scenes image, handwriting, bill and document, test paper and etc. The data covers 20 languages, multiple natural scenes, and multiple photographic angles. 1. Specifications Data size : 500,000 images Collecting environment : including shop plaque, stop board, poster, ticket, road sign, comic, cover picture, prompt/reminder, warning, packing instruction, menu, building sign, etc. Diversity : including 20 languages, multiple natural scenes, multiple photographic angles (looking up angle, looking down angle, eye-level angle) Device : cellphone, camera Image parameter : the image data format is .jpg, and the annotation file data format is .json Annotation content : line-level quadrilateral bounding box annotation and transcription for the texts Accuracy : the error bound of each vertex of quadrilateral bounding box is within 5 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 97%; the texts transcription accuracy is not less than 97% 2. About Nexdata Nexdata owns off-the-shelf 200,000 hours of speech recognition data, 800TB of image/video data, about 2 billion pieces of NLP data. These ready-to-go datasets support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/?source=Datarade or contact us via info@nexdata.ai.