Manifest Details
A Manifest is a list of sources to be ingested and is used to couple a Source with
a Pipeline, i.e each Source on the list can specify which Pipeline to apply.
Also, within a Manifest, arguments to Transformers, such as
first_page
and last_page
, can be set.
{
"sources": [
{
"type": "pdf",
"id": "c124d591-eebd-4796-8f3b-1fed90a5ebe8",
"pipeline": "pipeline_pdf",
"location": {
"type": "local",
"path": "tests/data/test.pdf"
},
"destination": {
"type": "local",
"directory": "/tmp/ingestum/destinations/"
},
"first_page": 1,
"last_page": 3
}
]
}
Running a manifest from the command line
In the tests/pipelines
directory, you’ll find numerous manifest
examples to explore and use for the basis of your own projects:
$ ingestum-manifest tests/pipelines/manifest_annotation.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_audio.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_csv.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_docx.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_xls.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_html.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_image.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_pdf.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_text.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_twitter_form.json --pipelines=tests/pipelines --workspace=workspace
$ ingestum-manifest tests/pipelines/manifest_xml.json --pipelines=tests/pipelines --workspace=workspace
Inspecting the results
Use ingestum-inspect
to examine a document created by the pipeline:
$ ingestum-inspect workspace/c124d591-eebd-4796-8f3b-1fed90a5ebe8/output/document.json
Generating a manifest from the command line
Use ingestum-generate-manifest
to create a manifest:
$ ingestum-generate-manifest --pipeline pipeline_pdf_example --first_page 1 --last_page 5
You can also extend an existing manifest by adding the parameter --manifest
along with its name.