Explore chapters and articles related to this topic
Map Reproduction
Published in Terry A. Slocum, Robert B. McMaster, Fritz C. Kessler, Hugh H. Howard, Thematic Cartography and Geovisualization, 2022
Terry A. Slocum, Robert B. McMaster, Fritz C. Kessler, Hugh H. Howard
A more attractive option is to deliver the digital map in a portable document format such as Encapsulated PostScript (EPS) or Portable Document Format (PDF). An EPS file is a subset of the PostScript PDL that allows digital maps and other documents to be transported between software applications and between different types of computers. It consists of PostScript code for high-resolution printing and an optional low-resolution raster image for on-screen display. Because they are written in PostScript, EPS files can be very large, but they do not require the specific printing device information required by PostScript page description files. EPS files also have the ability to embed related data and fonts. A PDF file is similar to an EPS file in that it is related to PostScript but is more flexible and more efficient. In addition to being able to embed related data and fonts, PDF files also have the ability to embed, or encapsulate, features such as hyperlinks, movies, and keywords for searching and indexing. Zooming and panning capabilities for on-screen viewing are also provided. PDF files can be viewed using Adobe Acrobat Reader (a free download) and can be created from virtually any application via Adobe's Acrobat and Acrobat Distiller. Acrobat compresses and optimizes digital maps for printing, Web display, and so on. Although the PDF format was originally intended for on-screen display, it has become a format of choice for the delivery of maps and other documents to service bureaus for high-end printing (Joyce 2015).
CyberTax™: Add Intelligence to Tax Forms with Prolog
Published in Don Potter, Manton Matthews, Moonis Ali, Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 2020
Adobe’s Portable Document Format (PDF) is a standard means to communicate sophisticated documents across heterogeneous networks, computers and printers. Many organizations are encoding their documents in PDF format including the Internal Revenue Service (I.R.S.) which has made available all the federal tax forms for personal filers on the Internet. Each year more taxpayers are downloading these forms, and many are filing their tax returns electronically.
Automatic extraction of materials and properties from superconductors scientific literature
Published in Science and Technology of Advanced Materials: Methods, 2023
Luca Foppiano, Pedro Baptista Castro, Pedro Ortiz Suarez, Kensei Terashima, Yoshihiko Takano, Masashi Ishii
We developed Grobid-superconductors as a Grobid module following principles (multi-step, sentence-based, full-text-based) discussed in a previous preliminary study [19]. Grobid has several advantages: 1) it can be integrated with pdfalto (https://github.com/kermitt2/pdfalto), a specialised tool for converting PDF to XML, which mitigates extraction issues such as the resolution of embedded fonts, invalid character encoding, and the reconstruction of the correct reading order, 2) it allows access to PDF document layout information for both machine learning and document decoration (e.g. coordinates in the PDF document); and, 3) it provides access to a set of high-quality, pre-trained machine learning models for structuring documents. Grobid-superconductors is structured as a three-steps process illustrated in Figure 1 and described in the Sections 2.1, 2.2, and 2.3.
ASHRAE URP-1883: Development and Analysis of the ASHRAE Global Occupant Behavior Database
Published in Science and Technology for the Built Environment, 2023
Yapan Liu, Bing Dong, Tianzhen Hong, Bjarne Olesen, Thomas Lawrence, Zheng O’Neill
One of the main features of the ASHRAE Global Occupant Behavior Database is the query builder. It allows users to select and download data from different studies, filtered by behavior types and multiple other criteria. Figure 4 shows the web interface of this query builder with 5 different steps in the same figure. Step 1 shows a list of all behaviors in this database, one or more types of behaviors can be selected. Step 2 returns a list of countries and locations associated with available studies based on previous selections. Step 3 presents the available building types from selections made in step 2. Step 4 returns a list of publications of available studies. Once the user clicks “FINISH” button in step 5, the query builder will pass all selected parameters to the server, and a compressed file will be returned from the server for download. For the survey and mixed types of dataset as Figure 5 step 1 shows, the output file includes processed data and a dictionary either in a separate file or within the data file, the dictionary provides detailed information of the different types of data collected in this study. For the in-situ type of dataset as Figure 5 step 2 shows, the output file includes static information of this study, measurement data of different behavior types, processed data, and the Brick model. Among those files, the static information provides detailed information of the location, building types, room numbers; the Brick model includes a Turtle file and a PDF document that visually summarized the metadata information of the study.
Survey of Mathematical Expression Recognition for Printed and Handwritten Documents
Published in IETE Technical Review, 2022
Ridhi Aggarwal, Shilpa Pandey, Anil Kumar Tiwari, Gaurav Harit
The end-to-end systems need to map an input expression in image/PDF or stroke/ink trajectory form into a structured or sequential output such as LaTeX or MathML markup. Encoder–decoder architectures (Figure 2) are commonly used for learning such cross-domain mapping from 2D image data or stroke sequence to a sequence of text tokens. Such systems do not require explicit grammar pertaining to math expressions for learning but rather implicitly learn it from the training data. The end-to-end systems are commonly trained using the loss computed over the final output, that is, the markup sequence. The most common loss function used is the cross-entropy loss (XEnt) which compares the probability distribution over the predicted symbols with the ground truth probability distribution. Such systems can also be augmented by making use of losses computed over related tasks, which are referred to as multi-task learning. Works addressing the translation of printed math expressions to LaTeX have used the IM2LaTeX-100K dataset [43] are [44,46,47,50]. Systems that recognize math in PDF also exploit the font and glyph information available in the metadata [15,45]. Works addressing translation of handwritten math expressions to LaTeX have used the ink trajectory dataset of CROHME competitions rendered as images for input. Example works are [40–42,48,49].