Document Fragment. More...

#include <gdocfragment.h>

Classes

struct  Search
 

Public Member Functions

 GDocFragment (GDoc *doc, const GConceptRecord *root, size_t pos, size_t spos, size_t begin, size_t end, const R::RDate &proposed=R::RDate::Null)
 
 GDocFragment (GDoc *doc, size_t begin, size_t end, const R::RDate &proposed=R::RDate::Null)
 
int Compare (const GDocFragment &d) const
 
int Compare (const Search &search) const
 
GDocGetDoc (void) const
 
bool IsFlat (void) const
 
const GConceptRecordGetRoot (void) const
 
size_t GetNbChildren (void) const
 
R::RCursor< const GConceptRecordGetChildren (void) const
 
R::RDate GetProposed (void) const
 
size_t GetPos (void) const
 
size_t GetSyntacticPos (void) const
 
size_t GetBegin (void) const
 
size_t GetEnd (void) const
 
R::RString GetFragment (size_t max=0)
 
void AddChild (const GConceptRecord *rec)
 
bool Overlap (const GDocFragment *fragment) const
 
void Merge (const GDocFragment *fragment)
 
void Print (void) const
 
virtual ~GDocFragment (void)
 

Private Attributes

GDocDoc
 
const GConceptRecordRoot
 
R::RString Fragment
 
size_t Pos
 
size_t SyntacticPos
 
size_t Begin
 
size_t End
 
R::RDate Proposed
 
bool WholeDoc
 
R::RContainer< const
GConceptRecord, false, false > 
Children
 

Detailed Description

Document Fragment.

The GDocFragment class provides a representation for a document fragment. In practice, a fragment is anchored at a position and is defined by a text window.

Each fragment is associated to different (concept) nodes that are responsible for its selection in a query:

  • a root node (Root).
  • a set of child nodes (Children).

There are three kinds of fragments :

  1. A fragment that represents a whole document. The root node is always null, and the child nodes are those responsible for the selection.
  2. A fragment that represents a single node (the root node). This is the case of a fragment in a flat document selected by a given word.
  3. A fragment that is rooted in a node (the root one) and was selected by a set of child nodes. This can be the case of a XML fragment selected by two tags. The root node is then the deepest common parent of those child nodes.

Each search engine defines, eventually based on the document type, what a window is. To extract the text fragment of the document, the corresponding filter (GFilter class) is used.

Two document fragments are considered as identical if they are related to the same document and if they start at the same position

Warning
The GDdocFragment class manages pointers to GConceptRecord. It is never responsible for their deallocation.

Constructor & Destructor Documentation

GDocFragment ( GDoc doc,
const GConceptRecord root,
size_t  pos,
size_t  spos,
size_t  begin,
size_t  end,
const R::RDate proposed = R::RDate::Null 
)

Constructor of a document fragment.

Parameters
docDocument.
rootRoot concept record.
posPosition in the fragment centre.
sposSyntactic position of the fragment centre.
beginBeginning position of the window.
endEnd position of the window.
infoInformation.
GDocFragment ( GDoc doc,
size_t  begin,
size_t  end,
const R::RDate proposed = R::RDate::Null 
)

Constructor of a document fragment representing the whole document. A window must be specified (but it can be an empty one).

Parameters
docDocument.
beginBeginning position of the window.
endEnd position of the window.
infoInformation.
virtual ~GDocFragment ( void  )
virtual

Destruct.

Member Function Documentation

int Compare ( const GDocFragment d) const

Method to compare document fragments.

Parameters
dDocument retrieved to compare with.
int Compare ( const Search search) const

Method to compare a document fragment and a document fragment signature.

Parameters
searchSearch.
GDoc* GetDoc ( void  ) const

Get the the document. If it is null, the URI is considered as unknown in the session

Returns
the pointer to the document.
bool IsFlat ( void  ) const

Look of the document fragment is a flat one. There are several cases where it is considered as flat :

  1. It has no selected concept node.
  2. The selected concept node has no parent.
  3. The fragment represents a whole document.
    Returns
    true if it is flat or false if not.
const GConceptRecord* GetRoot ( void  ) const

Get the root concept node corresponding to the fragment.

Returns
a pointer to a GConceptRecord.
Warning
The pointer may be null if the fragment corresponds to the whole document or if the structure trees are not built during the analysis.
size_t GetNbChildren ( void  ) const
Returns
the number of children.
R::RCursor<const GConceptRecord> GetChildren ( void  ) const
Returns
a cursor over the children.
R::RDate GetProposed ( void  ) const
Returns
the date of the suggestion.
size_t GetPos ( void  ) const

Get the position of the fragment centre.

Returns
a size_t.
size_t GetSyntacticPos ( void  ) const

Get the syntactic position of the fragment centre.

Returns
a size_t.
size_t GetBegin ( void  ) const

Get the beginning of the window fragment.

Returns
a size_t.
size_t GetEnd ( void  ) const

Get the end of the window fragment.

Returns
a size_t.
R::RString GetFragment ( size_t  max = 0)

Get the text fragment. If necessary, it is extracted from the file.

Parameters
maxMaximum number of character to extract. If zero, the whole fragment is extracted.
Returns
a R::RString.
void AddChild ( const GConceptRecord rec)

Add a child record to the document fragment. The interval of the fragment is adjusted if necessary in order to contain the child (except if the fragment represents the whole document).

Parameters
recConcept record to add.
bool Overlap ( const GDocFragment fragment) const

Look if two fragments overlaps. In practice, the method follows different steps :

  1. It looks if at least one fragment represents the whole document.
  2. It looks if both fragments have the same selected node or no parent nodes nodes (for flat documents).
  3. It looks if the two intervals overlap.
    Parameters
    fragmentFragment to compare with.
    Returns
    true if overlap.
void Merge ( const GDocFragment fragment)

Merge the children of a fragment. The interval of the fragment is adjusted if necessary in order to contain all the children (except if the fragment represents the whole document).

Parameters
fragmentFragment to compare with.
void Print ( void  ) const

Print some information related to the document fragment.

Member Data Documentation

GDoc* Doc
private

Reference to the document.

const GConceptRecord* Root
private

Root concept record.

R::RString Fragment
private

The fragment.

size_t Pos
private

Position of the fragment.

size_t SyntacticPos
private

Syntactic position of the fragment.

size_t Begin
private

Beginning position of the fragment window.

size_t End
private

End position of the fragment window.

R::RDate Proposed
private

Date where the fragment was proposed.

bool WholeDoc
private

Does the fragment correspond to the whole document ?

R::RContainer<const GConceptRecord,false,false> Children
private

Child concept records used by the query to select the node.