Ingestum
Introduction
Installation Guide
Ingestion Basics
Pipeline Details
Manifest Details
Ingestion Examples
Code Snippets
Plugins
Tools
Instrumentation
Reference Documentation
Sources Reference
Documents Reference
Transformers Reference
Transformer Base Class
AudioSourceCreateTextDocument
BiorxivSourceCreatePublicationCollectionDocument
CollectionDocumentAdd
CollectionDocumentJoin
CollectionDocumentMerge
CollectionDocumentRemoveOnConditional
CollectionDocumentTransform
CollectionDocumentTransformOnConditional
CSVSourceCreateTabularDocument
DocumentExtract
DocumentSourceCreateDocument
DOCXSourceCreateImage
DOCXSourceCreateTextDocument
EmailSourceCreateHTMLCollectionDocument
EmailSourceCreateTextCollectionDocument
EuropePMCSourceCreatePublicationCollectionDocument
FormDocumentSet
HTMLDocumentImagesExtract
HTMLDocumentSubReplaceForUnicode
HTMLDocumentSupReplaceForUnicode
HTMLSourceCreateDocument
HTMLSourceCreateImageSource
ImageSourceCreateReferenceTextDocument
ImageSourceCreateTabularDocument
ImageSourceCreateTextDocument
LitCovidSourceCreatePublicationCollectionDocument
PassageDocumentAddMetadataFromMetadata
PassageDocumentAddMetadata
PassageDocumentAddMetadataOnAttribute
PassageDocumentStringSplit
PassageDocumentTransformOnConditional
PDFSourceCreateFormDocument
PDFSourceCreatePublicationDocument
PDFSourceCreateTabularCollectionDocument
PDFSourceCreateTabularCollectionDocumentHybrid
PDFSourceCreateTabularCollectionDocumentWithDividers
PDFSourceCreateTabularCollectionDocumentWithRegexp
PDFSourceCreateTextDocument
PDFSourceCreateTextDocumentHybrid
PDFSourceCreateTextDocumentHybridReplacedExtractables
PDFSourceCreateTextDocumentOCR
PDFSourceCreateTextDocumentReplacedExtractables
PDFSourceCropCreateImageSource
PDFSourceCropExtract
PDFSourceImagesCreateResourceCollectionDocument
PDFSourceImagesExtract
PDFSourceShapesCreateResourceCollectionDocument
PDFSourceShapesExtract
PDFSourceTablesExtract
PDFSourceTextCreateTextCollectionDocument
PDFSourceTextExtract
PPTXSourceCreateTextDocument
ProquestSourceCreatePublicationCollectionDocument
ProquestSourceCreateXMLCollectionDocument
PubmedSourceCreatePublicationCollectionDocument
PubmedSourceCreateTextCollectionDocument
PubmedSourceCreateXMLCollectionDocument
RedditSourceCreateFormCollectionDocument
RedditSourceCreatePublicationCollectionDocument
ResourceCreateTextDocument
TabularDocumentCellTransposeOnConditional
TabularDocumentColumnsInsert
TabularDocumentColumnsStringReplace
TabularDocumentColumnsUpdateWithExtractables
TabularDocumentCreateFormCollection
TabularDocumentCreateFormCollectionWithHeaders
TabularDocumentCreateMDPassage
TabularDocumentFit
TabularDocumentJoin
TabularDocumentRowMergeOnConditional
TabularDocumentRowRemoveOnConditional
TabularDocumentStripUntilConditional
TextCreatePassageDocument
TextCreateXMLDocument
TextDocumentAddPassageMarker
TextDocumentHyphensRemove
TextDocumentJoin
TextDocumentStringReplace
TextSourceCreateDocument
TextSplitIntoCollectionDocument
TwitterSourceCreateFormCollectionDocument
TwitterSourceCreatePublicationCollectionDocument
XLSSourceCreateImage
XLSSourceCreateTabularCollectionDocument
XLSSourceCreateTabularDocument
XMLCreateTextDocument
XMLDocumentTagReplace
XMLSourceCreateDocument
Conditionals Reference
Pipelines Reference
Manifests Reference
Ingestum
»
Reference Documentation
»
Transformers Reference
View page source
Transformers Reference
This is the reference page for transformers implementation and format.
Transformer Base Class
AudioSourceCreateTextDocument
BiorxivSourceCreatePublicationCollectionDocument
CollectionDocumentAdd
CollectionDocumentJoin
CollectionDocumentMerge
CollectionDocumentRemoveOnConditional
CollectionDocumentTransform
CollectionDocumentTransformOnConditional
CSVSourceCreateTabularDocument
DocumentExtract
DocumentSourceCreateDocument
DOCXSourceCreateImage
DOCXSourceCreateTextDocument
EmailSourceCreateHTMLCollectionDocument
EmailSourceCreateTextCollectionDocument
EuropePMCSourceCreatePublicationCollectionDocument
FormDocumentSet
HTMLDocumentImagesExtract
HTMLDocumentSubReplaceForUnicode
HTMLDocumentSupReplaceForUnicode
HTMLSourceCreateDocument
HTMLSourceCreateImageSource
ImageSourceCreateReferenceTextDocument
ImageSourceCreateTabularDocument
ImageSourceCreateTextDocument
LitCovidSourceCreatePublicationCollectionDocument
PassageDocumentAddMetadataFromMetadata
PassageDocumentAddMetadata
PassageDocumentAddMetadataOnAttribute
PassageDocumentStringSplit
PassageDocumentTransformOnConditional
PDFSourceCreateFormDocument
PDFSourceCreatePublicationDocument
PDFSourceCreateTabularCollectionDocument
PDFSourceCreateTabularCollectionDocumentHybrid
PDFSourceCreateTabularCollectionDocumentWithDividers
PDFSourceCreateTabularCollectionDocumentWithRegexp
PDFSourceCreateTextDocument
PDFSourceCreateTextDocumentHybrid
PDFSourceCreateTextDocumentHybridReplacedExtractables
PDFSourceCreateTextDocumentOCR
PDFSourceCreateTextDocumentReplacedExtractables
PDFSourceCropCreateImageSource
PDFSourceCropExtract
PDFSourceImagesCreateResourceCollectionDocument
PDFSourceImagesExtract
PDFSourceShapesCreateResourceCollectionDocument
PDFSourceShapesExtract
PDFSourceTablesExtract
PDFSourceTextCreateTextCollectionDocument
PDFSourceTextExtract
PPTXSourceCreateTextDocument
ProquestSourceCreatePublicationCollectionDocument
ProquestSourceCreateXMLCollectionDocument
PubmedSourceCreatePublicationCollectionDocument
PubmedSourceCreateTextCollectionDocument
PubmedSourceCreateXMLCollectionDocument
RedditSourceCreateFormCollectionDocument
RedditSourceCreatePublicationCollectionDocument
ResourceCreateTextDocument
TabularDocumentCellTransposeOnConditional
TabularDocumentColumnsInsert
TabularDocumentColumnsStringReplace
TabularDocumentColumnsUpdateWithExtractables
TabularDocumentCreateFormCollection
TabularDocumentCreateFormCollectionWithHeaders
TabularDocumentCreateMDPassage
TabularDocumentFit
TabularDocumentJoin
TabularDocumentRowMergeOnConditional
TabularDocumentRowRemoveOnConditional
TabularDocumentStripUntilConditional
TextCreatePassageDocument
TextCreateXMLDocument
TextDocumentAddPassageMarker
TextDocumentHyphensRemove
TextDocumentJoin
TextDocumentStringReplace
TextSourceCreateDocument
TextSplitIntoCollectionDocument
TwitterSourceCreateFormCollectionDocument
TwitterSourceCreatePublicationCollectionDocument
XLSSourceCreateImage
XLSSourceCreateTabularCollectionDocument
XLSSourceCreateTabularDocument
XMLCreateTextDocument
XMLDocumentTagReplace
XMLSourceCreateDocument