Skip to content
On this page

Telefact

The class for scraping metadata and images from Telefact: 1938-1945 by Pictograph Corporation.

Usage

Create a querier for Telefact:

python
from libquery_extensions import Telefact

directory = "./output/telefact"
querier = Telefact(
    metadata_path=f"{directory}/metadata/metadata.jsonl",
    img_dir=f"{directory}/imgs",
)

Query metadata:

python
base_url = "https://modley-telefact-1939-1945.tumblr.com/post/"
queries = [
    f"{base_url}616877845505064960/rudolf-modley-pictograph",
    f"{base_url}614489066195632128/rudolf-modley-pictograph-corporation",
    f"{base_url}614476957421338624/fridec231938",
]
querier.fetch_metadata(queries=queries)

Query images:

python
querier.fetch_image()

Metadata Schema

Each metadata entry is stored as:

typescript
interface MetadataEntry {
    uuid: string
    /** The webpage url that generates the entry. */
    url: string
    source: 'Telefact'
    /**
     * The id of the entry in the data source.
     * Parsed from the url.
     * Generate with url.split('/')[-2]
     */
    idInSource: string
    accessDate: string
    sourceData: {
        /** The author of the visualization. */
        authors: ['Modley, Rudolf']
        /**
         * The time the visualization is published.
         * Parsed from the webpage.
         */
        publishDate: {
            year: number
            month: number
            day: number
        }
        /**
         * The url for downloading the image with get request.
         * Parsed from the webpage.
         */
        downloadUrl: string
        /** The languages use in the visualization. */
        languages: ['eng']
    }
}