Skip to content
On this page

Gallica

The class for fetching metadata and images from Gallica with its search API and metadata API.

Usage

Create a querier for Gallica:

python
from libquery import Gallica

directory = "./output/gallica"
querier = Gallica(
    metadata_dir=f"{directory}/metadata",
    img_dir=f"{directory}/imgs",
)

Query metadata:

python
base_url = "https://gallica.bnf.fr/SRU?operation=searchRetrieve&version=1.2&maximumRecords={maximumRecords}&startRecord={startRecord}"
queries = [
    f"{base_url}&query=dc.title all %22cartes figurative%22",
    f"{base_url}&query=dc.title all %22tableau graphique%22",
]
querier.fetch_metadata(queries=queries)

Query images:

python
querier.fetch_image()

Metadata Schema

Each metadata entry is stored as:

typescript
interface TextWithLang {
    '@xml:lang': string
    '#text': string
}

interface Record {
    '@xmlns:dc': string
    '@xmlns:oai_dc': string
    '@xmlns:xsi': string
    '@xsi:schemaLocation': string
    'dc:identifier': string | string[]
    'dc:relation': string | string[]
    'dc:source': string
    'dc:title': string | (string | TextWithLang)[]
    /** The author(s). */
    'dc:creator'?: string | string[]
    'dc:date'?: string | string[]
    'dc:subject'?: string | TextWithLang | (string | TextWithLang)[] | null
    'dc:coverage'?: string | string[] | null
    'dc:format'?: string | TextWithLang | (string | TextWithLang)[]
    /** For collections within the BnF, the language code has 3 characters. */
    /** For collections from outside, the language code can be arbitrary. */
    'dc:language'?: string | string[]
    /** Type of the document, e.g., monograph, map, image, */
    /** fascicle, manuscript, score, sound, object and video. */
    'dc:type'?: (string | TextWithLang)[]
    'dc:rights'?: TextWithLang[]
    'dc:publisher'?: string | string[]
    'dc:description'?: string | string[]
    'dc:contributor'?: string | string[]
    '#text'?: string
}

interface Page {
    numero: string | null
    ordre: string
    pagination_type: string | null
    image_width: string
    image_height: string
    legend?: string
}

interface MetadataEntry {
    uuid: string
    url: string
    source: 'Gallica'
    idInSource: string
    accessDate: string
    /** The query return from API. */
    sourceData: {
        identifier: string
        record: Record
        pages: Page[]
    }
}