16/09/2010

REGEX SAMPLE


REPORT ZRR_SAMPLE.

TYPE-POOLS: abap.

DATA: lt_html TYPE TABLE OF string,
lo_pattern TYPE REF TO cl_abap_regex,
lo_matcher TYPE REF TO cl_abap_matcher,
ls_match TYPE match_result,
lv_header TYPE string,
lv_header_txt TYPE string.

FIELD-SYMBOLS: <lfs_html> TYPE string,
<lfs_sub> TYPE submatch_result.

START-OF-SELECTION.

* Build the HTML document sample:
APPEND '<html><head></head><body>' TO lt_html.
APPEND '<H1>String Processing Techniques</H1>' TO lt_html.
APPEND '<h2>ABAP Character Types</h2>' TO lt_html.
APPEND '<H2>Developing a String Library</h2>' TO lt_html.
APPEND '<h3>Designing the API</h3>' TO lt_html.
APPEND '<h3>...</h3>' TO lt_html.
APPEND '</body></html>' TO lt_html.

* Extract a table of contents from the HTML document:
TRY.
* Parse the regex pattern:
CREATE OBJECT lo_pattern
EXPORTING
pattern = '<([h][1-6]).*>(.*)</\1>'
ignore_case = abap_true.

* Create a matcher to search the example HTML document:
lo_matcher = lo_pattern->create_matcher( table = lt_html ).

* Add each match to the table of contents:
WHILE lo_matcher->find_next( ) EQ abap_true.

* Retreive the next match found in the HTML document:
ls_match = lo_matcher->get_match( ).

READ TABLE lt_html INDEX ls_match-line ASSIGNING <lfs_html>.

* Since we are using backreferences, the captured text
* is actually stored in the submatch results:
LOOP AT ls_match-submatches ASSIGNING <lfs_sub>.

IF sy-tabix EQ 1.

lv_header = <lfs_html>+<lfs_sub>-offset(<lfs_sub>-length).

ELSEIF sy-tabix EQ 2.

lv_header_txt = <lfs_html>+<lfs_sub>-offset(<lfs_sub>-length).

ENDIF.

ENDLOOP.

* Output the table of contents record:
CASE lv_header.

WHEN 'H1' OR 'h1'.
WRITE: / lv_header_txt.
WHEN 'H2' OR 'h2'.
WRITE: / '##', lv_header_txt.
WHEN 'H3' OR 'h3'.
WRITE: / '####', lv_header_txt.

ENDCASE.

ENDWHILE.

CATCH cx_sy_regex.
"Invalid regular expression pattern...
CATCH cx_sy_matcher.
"Problem generating matcher instance...

ENDTRY.


Source: SAP

Um comentário: