Skip to content

ChunkExtractor

Overview

The ChunkExtractor class is responsible for extracting chunks from a document. It implements the differents default strategies to extract chunks from a document.

Methods

@staticmethod
def extract_by_block(
    container: Container, 
    document_title : str, 
    tokenizer: ITokenizer
    ) -> List[Chunk]
Extract a chunk as a block from Container/Block structure.

@staticmethod
def extract_by_level(
    container: Container, 
    document_title: str, 
    tokenizer: ITokenizer
    ) -> List[Chunk]
Extract a chunk by level from Container/Block structure.

@staticmethod
def extract_by_section_number(
    container: Container, 
    document_title: str, 
    tokenizer: ITokenizer
    ) -> List[Chunk]
Extract a chunk by section number from Container/Block structure.

Usage Example

chunks = ChunkExtractor.extract_by_block(self.root_container, self.title, self.tokenizer)