Summary
The old regex.h method of using regular expressions with regcomp() and regexec() is not a great solution:
- It isn't usually available on Windows.
- It requires that users remember to call regfree(). (Which turns into a memory leak if skipped.)
- It is overly complex.
I'd much prefer to use std::regex from C++11, or the Boost equivalent boost::regex.
Of all the C++11 features, regex support was possibly the last to come to GNU GCC. It didn't make it into GCC until v4.9 in April 2014.
For this reason, and since I also routinely need regular expressions in Windows using an older version of Microsoft Visual Studio as well as GCC in Linux, I've been using Boost::regex. There should be no difference between this example code with boost::regex versus C++11's std::regex.
The following example functions performs similar functionality to the regex.h code I pasted nearly 5 years ago:
#include <string>
#include <vector>
#include <boost/regex.hpp>
typedef std::vector<std::string> VStr;
/** Find out if a pattern exists in a string.
* @return @p true if the pattern matches
* @return @p false if the pattern doesn't match
* @throw regex_error -> std::runtime_error -> std::exception
*/
bool my_regex_find( const std::string &str, const std::string &pattern )
{
VStr groups;
return my_regex_find( str, pattern, groups );
}
/** Find a pattern in a string, and remember the groupings (if any).
* @return @p true if the pattern matches
* @return @p false if the pattern doesn't match
* @throw regex_error -> std::runtime_error -> std::exception
*/
bool my_regex_find( const std::string &str, const std::string &pattern, VStr &groups )
{
groups.clear();
boost::regex exp( pattern ); // default boost regex type is "PRE" (Perl Regular Expression) but this can be changed with a 2nd parm
boost::smatch what; // string matches
bool result = boost::regex_search( str, what, exp ); // also see boost::regex_match()
// remember the groupings (if any)
for ( size_t idx = 0; result && idx < what.size(); idx ++ )
{
groups.push_back( std::string( what[idx].first, what[idx].second ) );
}
return result;
}
Several things worth pointing out:
- boost::regex_search() will find the first match in the given string, while boost::regex_match() must match the entire string for it to be successful.
- If you want the call to boost::regex_search() to behave like boost::regex_match() then use "^...$" in your pattern to ensure that it matches the start and end of the string.
- I've left out error handling. If your pattern is not a valid regular expression, the boost::regex and std::regex constructors will throw an exception of type boost::regex_error or std::regex_error. A try/catch block where you log e.what() can be quite useful.
Using this example function is quite simple.
if ( my_regex_find( "abc123xyz", "[0-9]" ) ... // this returns "true" (it matches the number "1")
if ( my_regex_find( "abc123xyz", "[a-z]+" ) ... // this returns "true" (it matches "abc")
if ( my_regex_find( "abc123xyz", "^[a-z]+$" ) ... // this returns "false" (it fails to match the entire string)
VStr results;
my_regex_find( "this is a test", "\\s([a-m]+)", results ); // this returns "true" and results[1] == "is"
// results.size() will == 2
// results[0] is the entire match
// results[1] is the first (and only) group
Getting Boost::regex into an existing CMake file is also relatively easy. For example:
...
SET ( Boost_DEBUG 0 )
SET ( Boost_USE_STATIC_LIBS ON )
SET ( Boost_USE_MULTITHREADED ON )
SET ( Boost_USE_STATIC_RUNTIME OFF )
FIND_PACKAGE ( Boost REQUIRED COMPONENTS regex system )
FIND_PACKAGE ( Threads REQUIRED )
INCLUDE_DIRECTORIES ( AFTER ${Boost_INCLUDE_DIR} )
...
ADD_EXECUTABLE ( test test.cpp )
TARGET_LINK_LIBRARIES ( test ${CMAKE_THREAD_LIBS_INIT} ${Boost_LIBRARIES} )