DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

/usr/man2/cat.3/pcrepartial.3.Z(/usr/man2/cat.3/pcrepartial.3.Z)





NAME

       PCRE - Perl-compatible regular expressions


PARTIAL MATCHING IN PCRE


       In  normal  use  of  PCRE,  if  the  subject  string  that is passed to
       pcre_exec() or pcre_dfa_exec() matches as far as it goes,  but  is  too
       short  to  match  the  entire  pattern, PCRE_ERROR_NOMATCH is returned.
       There are circumstances where it might be helpful to  distinguish  this
       case from other cases in which there is no match.

       Consider, for example, an application where a human is required to type
       in data for a field with specific formatting requirements.  An  example
       might be a date in the form ddmmmyy, defined by this pattern:

         ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$

       If the application sees the user's keystrokes one by one, and can check
       that what has been typed so far is potentially valid,  it  is  able  to
       raise  an  error as soon as a mistake is made, possibly beeping and not
       reflecting the character that has been typed. This  immediate  feedback
       is  likely  to  be a better user interface than a check that is delayed
       until the entire string has been entered.

       PCRE supports the concept of partial matching by means of the PCRE_PAR-
       TIAL   option,   which   can   be   set  when  calling  pcre_exec()  or
       pcre_dfa_exec(). When this flag is set for pcre_exec(), the return code
       PCRE_ERROR_NOMATCH  is converted into PCRE_ERROR_PARTIAL if at any time
       during the matching process the last part of the subject string matched
       part  of  the  pattern. Unfortunately, for non-anchored matching, it is
       not possible to obtain the position of the start of the partial  match.
       No captured data is set when PCRE_ERROR_PARTIAL is returned.

       When   PCRE_PARTIAL   is  set  for  pcre_dfa_exec(),  the  return  code
       PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the  end  of
       the  subject is reached, there have been no complete matches, but there
       is still at least one matching possibility. The portion of  the  string
       that provided the partial match is set as the first matching string.

       Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers
       the last literal byte in a pattern, and abandons  matching  immediately
       if  such a byte is not present in the subject string. This optimization
       cannot be used for a subject string that might match only partially.


RESTRICTED PATTERNS FOR PCRE_PARTIAL


       Because of the way certain internal optimizations  are  implemented  in
       the  pcre_exec()  function, the PCRE_PARTIAL option cannot be used with
       all patterns. These restrictions do not apply when  pcre_dfa_exec()  is
       used.  For pcre_exec(), repeated single characters such as

         a{2,4}

       and repeated single metasequences such as

         \d+

       are  not permitted if the maximum number of occurrences is greater than
       one.  Optional items such as \d? (where the maximum is one) are permit-
       ted.   Quantifiers  with any values are permitted after parentheses, so
       the invalid examples above can be coded thus:

         (a){2,4}
         (\d)+

       These constructions run more slowly, but for the kinds  of  application
       that  are  envisaged  for this facility, this is not felt to be a major
       restriction.

       If PCRE_PARTIAL is set for a pattern  that  does  not  conform  to  the
       restrictions,  pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL
       (-13).


EXAMPLE OF PARTIAL MATCHING USING PCRETEST


       If the escape sequence \P is present  in  a  pcretest  data  line,  the
       PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that
       uses the date example quoted above:

           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
         data> 25jun04P
          0: 25jun04
          1: jun
         data> 25dec3P
         Partial match
         data> 3juP
         Partial match
         data> 3jujP
         No match
         data> jP
         No match

       The first data string is matched  completely,  so  pcretest  shows  the
       matched  substrings.  The  remaining four strings do not match the com-
       plete pattern, but the first two are partial matches.  The  same  test,
       using  DFA  matching (by means of the \D escape sequence), produces the
       following output:

           re> /^?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/
         data> 25jun04\P\D
          0: 25jun04
         data> 23dec3\P\D
         Partial match: 23dec3
         data> 3ju\P\D
         Partial match: 3ju
         data> 3juj\P\D
         No match
         data> j\P\D
         No match

       Notice that in this case the portion of the string that was matched  is
       made available.


MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()


       When a partial match has been found using pcre_dfa_exec(), it is possi-
       ble to continue the match by  providing  additional  subject  data  and
       calling  pcre_dfa_exec() again with the PCRE_DFA_RESTART option and the
       same working space (where details of the  previous  partial  match  are
       stored).  Here  is  an  example  using  pcretest,  where  the \R escape
       sequence sets the PCRE_DFA_RESTART option and the  \D  escape  sequence
       requests the use of pcre_dfa_exec():

           re> /^?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/
         data> 23ja\P\D
         Partial match: 23ja
         data> n05\R\D
          0: n05

       The  first  call has "23ja" as the subject, and requests partial match-
       ing; the second call  has  "n05"  as  the  subject  for  the  continued
       (restarted)  match.   Notice  that when the match is complete, only the
       last part is shown; PCRE does  not  retain  the  previously  partially-
       matched  string. It is up to the calling program to do that if it needs
       to.

       This facility can  be  used  to  pass  very  long  subject  strings  to
       pcre_dfa_exec(). However, some care is needed for certain types of pat-
       tern.

       1. If the pattern contains tests for the beginning or end  of  a  line,
       you  need  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropri-
       ate, when the subject string for any call does not contain  the  begin-
       ning or end of a line.

       2.  If  the  pattern contains backward assertions (including \b or \B),
       you need to arrange for some overlap in the subject  strings  to  allow
       for  this.  For example, you could pass the subject in chunks that were
       500 bytes long, but in a buffer of 700 bytes, with the starting  offset
       set to 200 and the previous 200 bytes at the start of the buffer.

       3.  Matching a subject string that is split into multiple segments does
       not always produce exactly the same result as matching over one  single
       long  string.   The  difference arises when there are multiple matching
       possibilities, because a partial match result is given only when  there
       are  no  completed  matches  in a call to fBpcre_dfa_exec(). This means
       that as soon as the shortest match has been found,  continuation  to  a
       new  subject  segment  is  no  longer possible.  Consider this pcretest
       example:

           re> /dog(sbody)?/
         data> do\P\D
         Partial match: do
         data> gsb\R\P\D
          0: g
         data> dogsbody\D
          0: dogsbody
          1: dog

       The pattern matches the words "dog" or "dogsbody". When the subject  is
       presented  in  several  parts  ("do" and "gsb" being the first two) the
       match stops when "dog" has been found, and it is not possible  to  con-
       tinue.  On  the  other  hand,  if  "dogsbody"  is presented as a single
       string, both matches are found.

       Because of this phenomenon, it does not usually make  sense  to  end  a
       pattern that is going to be matched in this way with a variable repeat.

Last updated: 28 February 2005
Copyright (c) 1997-2005 University of Cambridge.

                                                                PCREPARTIAL(3)

Man(1) output converted with man2html