Page 1 of 1

Regular Expression Hell

Posted: Mon Jan 26, 2009 9:58 am
by M_D_K
OK I really hope someone knows regular expressions. I need to read the output of ps(linux program), and extract the fields. The output looks like this

Code: Select all

//PID, USER, ARGS
 6461 mdk      /bin/sh /usr/bin/x-session-manager
 6560 root     start_kdeinit --new-startup +kcminit_startup
How the hell do I represent that as a regular expression. I have only dabbled in regular expressions once before, and this is out of my league.
So far I can only tell that its formatted correctly using:

Code: Select all

"^[ ]*[[:digit:]]"
And even that is a partial expression its pretty much checks the spaces at the start and then for the PID.
EDIT: Got it cracked for the most part

Code: Select all

^\\s+*([0-9]*)\\s+*([A-Za-z0-9_]*)\\s+*([A-Za-z0-9_]*)
gotta use the double back slashes to stop gcc bitching about "unknown escape sequence". The only thing left to do is make it so apps that are enclosed in brackets([]) are read too.

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 1:25 pm
by trufun202
Sorry dude, I'd love to help you, but there are few things in this world that I despise more than regular expressions... :evil:

I usually go to regexlib.com, find something close to what I need, then hack it. However, in my case, I usually need regex for input validation.

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 2:09 pm
by dandymcgee
What in god's name is that used for?!

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 2:10 pm
by MarauderIIC
M_D_K wrote:OK I really hope someone knows regular expressions. I need to read the output of ps(linux program), and extract the fields. The output looks like this

Code: Select all

//PID, USER, ARGS
 6461 mdk      /bin/sh /usr/bin/x-session-manager
 6560 root     start_kdeinit --new-startup +kcminit_startup
I'm not sure, but what about
^\s*(\d)+\s+(\w)+\s+(^\s)*\s*$

Start of line, zero or more whitespace characters, one or more digits (PID), one or more whitespace, one word (USER), one or more whitespace, zero or more of everything that's not whitespace (APP), zero or more whitespace to end of string? That's what I intended anyway.

"\s" is whitespace so it includes a tab, too. But if it's really spaces then \s+, I think, right?

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 2:27 pm
by M_D_K
damn yours is close to what I did. I did it the crazy way

Code: Select all

^\\s+*([0-9]*)\\s+*([A-Za-z0-9_]*)\\s+*([A-Za-z0-9_\\./\\-\\s]*)
didn't put $ at the end cause I only wanted the app name and not all the crap that got passed to it. Oh BTW \w wouldn't work i'm not using advance regex just extended hence the second backreference.

And yeah \s+ is one or more whitespaces.

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 2:52 pm
by LeonBlade
Oh god not these things...

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 2:52 pm
by MarauderIIC
^\s for app name won't work for you? ... duh yeah, of course it won't, they can have spaces. So I would say first alphanumeric char and everything after that to newline or end?

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 3:04 pm
by M_D_K
OK I'm confused. My pattern works I can extract pid, user, and appname (which is part of args) for example.

Code: Select all

16373 mdk      kio_file [kdeinit] file /tmp/ksocket-mdk/klauncherizZalb.s
[...after extraction]
16373 kio_file
the extracted tuff then gets put into a list control. Also You'll be hard pressed to find a linux app with spaces in its name. There is a long standing tradition of using underscores in place of spaces.

But I added it anywayz

Code: Select all

//third back reference
([A-Za-z0-9_\\./\\-\\s]*) //that \\s is the whitespace double slash because well I allready said in my first post.
I wrote: gotta use the double back slashes to stop gcc bitching about "unknown escape sequence".

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 3:30 pm
by MarauderIIC
"//that \\s is the whitespace double slash because well I allready said in my first post."

Yeah I know. Did I forget to change it? Sorry, my bad.


Confused about what? Is the problem that it's not grabbing everything? Looks like we just need another \\s+(whatever) ? I only had three because I missed the single space between appname and path.

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 3:37 pm
by M_D_K
MarauderIIC wrote:^\s for app name won't work for you? ... duh yeah, of course it won't, they can have spaces. So I would say first alphanumeric char and everything after that to newline or end?
Thats what confused me.

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 3:38 pm
by LeonBlade
What exactly are you making?

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 3:41 pm
by M_D_K
It's a secret. Marauder knows, and I'm trusting him not to tell. All will be revealed when its done.

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 3:44 pm
by LeonBlade
Ahh, I understand...

I always do things the hard way not using expressions to get data like this...
I guess if you took the time to learn feom scratch, it would be easy

Re: Regular Expression Hell

Posted: Mon Jan 26, 2009 4:15 pm
by MarauderIIC
M_D_K wrote:
MarauderIIC wrote:So I would say first alphanumeric char and everything after that to newline or end?
Thats what confused me.
First alphanumeric char is [A-Za-z0-9], or whatever, add all the chars that can possibly be the first character in a filename.
([A-Za-z0-9]+.*)
All characters after that is .*