Journal Journal: Regex gurus, please read 2
I am in need of some help. I'm trying to make a regular expression to parse a chunk of text but I am not having much luck. Mainly because a lot of the text may be optional.
The text may look like this:
Message . . . . : Object Q000006003 in COLOSYS02 type *USRQ deleted.
It could look like this:
From user . . . . . . . . . : QSYS
Message . . . . : Subsystem is ending controlled.
It could even look like this:
Message . . . . : Job ended abnormally.
Cause . . . . . : A SIGTERM signal was received for the job. The action for the signal was to terminate the job.
Or like this:
Message . . . . : Job 818753/ONEWORLD/JDENET_K ended on 06/22/05 at 15:53:39; 18 seconds used; end code 30
.
Cause . . . . . : Job 818753/ONEWORLD/JDENET_K completed on 06/22/05 at 15:53:39 after it used 18 seconds processing unit time. The job had ending code 30. The job ended after 1 routing steps with a secondary ending code of 0. The job ending codes and their meanings are as follows: 0 - The job completed normally. 10 - The job completed normally during controlled ending or controlled subsystem ending. 20 - The job exceeded end severity (ENDSEV job attribute). 30 - The job ended abnormally. 40 - The job ended before becoming active. 50 - The job ended while the job was active. 60 - The subsystem ended abnormally while the job was active. 70 - The system ended abnormally while the job was active. 80 - The job ended (ENDJOBABN command). 90 - The job was forced to end after the time limit ended (ENDJOBABN command). Recovery . . . : For more information, see the Work Management topic in the Information Center, http://www.ibm.com/eserver/iseries/infocenter.
The formatting is not exactly as shown, but slashdot is !helpfully reformating some parts of the ecode tag. The main issue is I need to parse out each of the fields (From User, Message, Cause, Recovery, etc.). These fields may or may not show up in a particular message. This particular message type seems to have the fewest of these types of things to parse. Others that I need to do will have many more with all sorts of gotchas to look out for.
I'd really like to use regex for this since it seems to make things a lot simpler. The other option which seems to involve lots of strpos and substr calls is much uglier (uglier than regex's HA).
I've been trying to play with optional groups, non-greedy matching, etc. but not having a lot of luck. Any help would be much appreciated.
Oh yeah, this is PHP, so I don't have sed or other perl stuff available.
Thank you.