In the Informatics team, we provide essential support for informatics projects that involve enriching or creating digital records, supported by guidance, funding, and in-kind resources from the DPO. Our approach focuses on scalability, automation, interconnectivity, innovation, machine learning, open-source software, and reusable solutions. We prioritize workflows capable of handling large volumes of records, automating tedious tasks, and establishing seamless data transfer between systems. Research into cutting-edge tools and technologies, including AI, allows us to enhance images and records. We promote transparency through open-source software and develop adaptable solutions for broader applications across projects and institutions.
Unit | Title | Status | Repository | Dates | Records Created or Enhanced | More Info |
---|---|---|---|---|---|---|
CHSDM | Update CIS from Data in Catalog Card Using the data from the transcription of the catalog cards, we are updating the fields without data in the records in TMS. | ongoing | NA | May 2024 - | NA | NA |
NMNH | Replace Image EXIF Metadata The metadata in the HSFA Mass Digitization project contained non-ascii characters in a subset of the images. We replaced the data with the correct values, regenerated the MD5 file and delivered the files to DAMS. | ongoing | NA | Apr 2024 - | 11,307 | NA |
NMAA | Image Deduplication between DAMS and Network Share We are going to match images in the DAMS and a Network Share to keep only the highest resolution images. In addition, we will help the unit to ingest the rest of the images into DAMS and into ASpace. | ongoing | NA | Feb 2024 - | NA | NA |
SI | SI Thesaurus Reconciliation Service An SI-wide reconciliation service for OpenRefine that allows to reconcile against terms in the SI Thesaurus as well as other data sources. This includes data sources that do not have reconciliation services, like SI Open Access, LoC, GBIF, and others. | ongoing | Repository | Dec 2023 - | NA | NA |
OCIO | Osprey on Hydra Running the Osprey Worker script on the Hydra High Performance Cluster. This allows us to scale processing as needed. | ongoing | Repository | Mar 2023 - | 48,980 | NA |
NMAAHC | Mass Digitization Pilot Project of the Johnson Publishing Company Archive After the 25,050 images passed QC, we delivered the images to DAMS via a hotfolder. Then, we created IDs for all archival items and we use those IDs to create 9,409 stub records in Arches and save 50,100 links between IDs in Getty's ID Manager. | ongoing | Repository | Aug 2022 - | 9,409 | NA |
OCIO | Osprey Dashboard System that receives images from vendors, checks that they meet the project requirements, and displays the results on a dashboard of Collections Digitization projects in DPO. Coded in Python. | ongoing | Repository | Jul 2022 - | 182,008 | NA |
SI | SI Thesaurus An SI-wide system to host thesauri, controlled vocabularies, taxonomies, and other lists generated by the SI units. Records are number of terms in the database. | ongoing | NA | Sep 2021 - | 1,481 | SharePoint |
CHSDM | Cooper Hewitt Card Catalog Transcription Digitization and transcription of the catalog cards of the collection of the museum. The digitization vendor is using Virtual Barcodes to link the item ID to the database ID. | ongoing | Repository | Apr 2021 - | 56,396 | NA |
NMNH | Tracking Scientific Names in Digitization of Bees We used Virtual Barcodes to encode the IRN of the scientific name (from EMu's taxonomy) in the image metadata. The IRN was extracted to CSV files to populate the database, which avoided hard-coding the species name or IRN to the image. | completed | Repository | Dec 2019 - Apr 2020 | 30,020 | NA |
NMAH | Virtual Barcodes for Digitization of Numismatics Collections Each object has a record so we used Virtual Barcodes to name the files using the unique database key value (MKEY) for easy matching of the images and the record. | completed | Repository | Oct 2019 - Feb 2020 | 25,204 | NA |
NMAfA | Archives Stephen Grant Postcard Collection Both sides of the postcards were stitched together in a single image. | completed | Repository | Oct 2019 - Mar 2020 | 7,410 | NA |
OCIO | Mass Digi Dashboard (ver. up to 1.6) Original dashboard used to track Mass Digitization projects in DPO. Dashboard was coded in R/Shiny. Replaced by Osprey. | completed | NA | Jan 2019 - Jul 2023 | 240,000* | NA |
NMAH | Princeton Posters Mass Digitization The mass digitization project needed to assign the unique database key value (MKEY) to the captured images since the objects already had item-level records. The Virtual Barcodes system allowed the vendor to search for the item and assign the filename if the item was found in the database. | completed | Repository | Dec 2018 - May 2019 | 17,976 | |
OCIO | Shiny Application Servers We are managing the internal and external R/Shiny servers. These allow the publication of web applications written entirely in R using the Shiny package. | ongoing | Repository | Jun 2018 - | NA | Confluence |
* Value was estimated
Some projects may touch the same records, so the total above will be less than the sum of all projects.
Small projects (e.g. simple file edits, data transfer, small data fixes) are not included in the table above.
Software | Details | Repository | Details |
---|---|---|---|
SIT Reconcile | Customized reconciliation service for OpenRefine to allow reconciliation against sources that do not support it. | More Info | |
Osprey | A verification system for digital files and associated dashboard to display the results. | More Info | |
MD5 Tool | A command line and graphical tool to generate text files with the MD5 hash of files in a folder. Used for verification in DAMS ingestion and other processes. | More Info |