Regex to parse line with and capture string and comma separated number











up vote
1
down vote

favorite












I am trying to parse a file with lines similar to:



       John David James (DEM) .  .  .  .  .  .     7,808   10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29


I need to capture the name and the number in the first column. The end result would be



John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188


I've tried



s*b(.*)b(s*.s*.*)(d+,d+|d+)b
s*b(.*)b(.|.s)+b(d+,d+|d+)b


Any suggestions?










share|improve this question
























  • Is the data always column aligned?
    – Salman A
    Nov 9 at 21:26










  • @SalmanA yes. They use periods and spaces to separate the names from the numbers
    – sho
    Nov 9 at 21:27










  • Then use substr. Not regex.
    – Salman A
    Nov 9 at 21:28










  • @SalmanA the length of the name varies and the value could be 1 - 5 digits.
    – sho
    Nov 9 at 21:32

















up vote
1
down vote

favorite












I am trying to parse a file with lines similar to:



       John David James (DEM) .  .  .  .  .  .     7,808   10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29


I need to capture the name and the number in the first column. The end result would be



John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188


I've tried



s*b(.*)b(s*.s*.*)(d+,d+|d+)b
s*b(.*)b(.|.s)+b(d+,d+|d+)b


Any suggestions?










share|improve this question
























  • Is the data always column aligned?
    – Salman A
    Nov 9 at 21:26










  • @SalmanA yes. They use periods and spaces to separate the names from the numbers
    – sho
    Nov 9 at 21:27










  • Then use substr. Not regex.
    – Salman A
    Nov 9 at 21:28










  • @SalmanA the length of the name varies and the value could be 1 - 5 digits.
    – sho
    Nov 9 at 21:32















up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am trying to parse a file with lines similar to:



       John David James (DEM) .  .  .  .  .  .     7,808   10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29


I need to capture the name and the number in the first column. The end result would be



John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188


I've tried



s*b(.*)b(s*.s*.*)(d+,d+|d+)b
s*b(.*)b(.|.s)+b(d+,d+|d+)b


Any suggestions?










share|improve this question















I am trying to parse a file with lines similar to:



       John David James (DEM) .  .  .  .  .  .     7,808   10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29


I need to capture the name and the number in the first column. The end result would be



John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188


I've tried



s*b(.*)b(s*.s*.*)(d+,d+|d+)b
s*b(.*)b(.|.s)+b(d+,d+|d+)b


Any suggestions?







php regex string parsing delimited-text






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 15:27









Salman A

172k65330416




172k65330416










asked Nov 9 at 21:20









sho

6071616




6071616












  • Is the data always column aligned?
    – Salman A
    Nov 9 at 21:26










  • @SalmanA yes. They use periods and spaces to separate the names from the numbers
    – sho
    Nov 9 at 21:27










  • Then use substr. Not regex.
    – Salman A
    Nov 9 at 21:28










  • @SalmanA the length of the name varies and the value could be 1 - 5 digits.
    – sho
    Nov 9 at 21:32




















  • Is the data always column aligned?
    – Salman A
    Nov 9 at 21:26










  • @SalmanA yes. They use periods and spaces to separate the names from the numbers
    – sho
    Nov 9 at 21:27










  • Then use substr. Not regex.
    – Salman A
    Nov 9 at 21:28










  • @SalmanA the length of the name varies and the value could be 1 - 5 digits.
    – sho
    Nov 9 at 21:32


















Is the data always column aligned?
– Salman A
Nov 9 at 21:26




Is the data always column aligned?
– Salman A
Nov 9 at 21:26












@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27




@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27












Then use substr. Not regex.
– Salman A
Nov 9 at 21:28




Then use substr. Not regex.
– Salman A
Nov 9 at 21:28












@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32






@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32














3 Answers
3






active

oldest

votes

















up vote
1
down vote



accepted










This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.



Then I loop to build the new array and replace comma with nothing.



$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);

foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}


var_dump($new);


Output:



array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}


https://3v4l.org/SdqoZ






share|improve this answer





















  • Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
    – sho
    Nov 9 at 23:20


















up vote
1
down vote













You can achieve it with an UNGREEDY regexp.



Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.



But the engine is set in greedy mode default. What will happen? The first part (.+) won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.



Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.



We need to tell him to "eat" the less matchable part.



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);

// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>


Result:



John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188





share|improve this answer



















  • 1




    Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
    – Andreas
    Nov 9 at 22:06












  • Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
    – Amessihel
    Nov 9 at 22:19












  • Just another heads up, OP does not want the comma in the number.
    – Andreas
    Nov 9 at 22:25










  • Thanks for the extremely detailed description!
    – sho
    Nov 9 at 23:12










  • Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
    – sho
    Nov 9 at 23:23


















up vote
1
down vote













If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';

foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}





share|improve this answer























  • Thanks. This works well.
    – sho
    Nov 9 at 23:10











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233419%2fregex-to-parse-line-with-and-capture-string-and-comma-separated-number%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.



Then I loop to build the new array and replace comma with nothing.



$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);

foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}


var_dump($new);


Output:



array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}


https://3v4l.org/SdqoZ






share|improve this answer





















  • Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
    – sho
    Nov 9 at 23:20















up vote
1
down vote



accepted










This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.



Then I loop to build the new array and replace comma with nothing.



$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);

foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}


var_dump($new);


Output:



array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}


https://3v4l.org/SdqoZ






share|improve this answer





















  • Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
    – sho
    Nov 9 at 23:20













up vote
1
down vote



accepted







up vote
1
down vote



accepted






This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.



Then I loop to build the new array and replace comma with nothing.



$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);

foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}


var_dump($new);


Output:



array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}


https://3v4l.org/SdqoZ






share|improve this answer












This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.



Then I loop to build the new array and replace comma with nothing.



$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);

foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}


var_dump($new);


Output:



array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}


https://3v4l.org/SdqoZ







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 9 at 22:04









Andreas

14.6k31441




14.6k31441












  • Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
    – sho
    Nov 9 at 23:20


















  • Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
    – sho
    Nov 9 at 23:20
















Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20




Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20












up vote
1
down vote













You can achieve it with an UNGREEDY regexp.



Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.



But the engine is set in greedy mode default. What will happen? The first part (.+) won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.



Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.



We need to tell him to "eat" the less matchable part.



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);

// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>


Result:



John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188





share|improve this answer



















  • 1




    Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
    – Andreas
    Nov 9 at 22:06












  • Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
    – Amessihel
    Nov 9 at 22:19












  • Just another heads up, OP does not want the comma in the number.
    – Andreas
    Nov 9 at 22:25










  • Thanks for the extremely detailed description!
    – sho
    Nov 9 at 23:12










  • Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
    – sho
    Nov 9 at 23:23















up vote
1
down vote













You can achieve it with an UNGREEDY regexp.



Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.



But the engine is set in greedy mode default. What will happen? The first part (.+) won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.



Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.



We need to tell him to "eat" the less matchable part.



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);

// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>


Result:



John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188





share|improve this answer



















  • 1




    Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
    – Andreas
    Nov 9 at 22:06












  • Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
    – Amessihel
    Nov 9 at 22:19












  • Just another heads up, OP does not want the comma in the number.
    – Andreas
    Nov 9 at 22:25










  • Thanks for the extremely detailed description!
    – sho
    Nov 9 at 23:12










  • Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
    – sho
    Nov 9 at 23:23













up vote
1
down vote










up vote
1
down vote









You can achieve it with an UNGREEDY regexp.



Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.



But the engine is set in greedy mode default. What will happen? The first part (.+) won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.



Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.



We need to tell him to "eat" the less matchable part.



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);

// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>


Result:



John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188





share|improve this answer














You can achieve it with an UNGREEDY regexp.



Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.



But the engine is set in greedy mode default. What will happen? The first part (.+) won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.



Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.



We need to tell him to "eat" the less matchable part.



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);

// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>


Result:



John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 9 at 23:15

























answered Nov 9 at 21:54









Amessihel

1,9001623




1,9001623








  • 1




    Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
    – Andreas
    Nov 9 at 22:06












  • Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
    – Amessihel
    Nov 9 at 22:19












  • Just another heads up, OP does not want the comma in the number.
    – Andreas
    Nov 9 at 22:25










  • Thanks for the extremely detailed description!
    – sho
    Nov 9 at 23:12










  • Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
    – sho
    Nov 9 at 23:23














  • 1




    Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
    – Andreas
    Nov 9 at 22:06












  • Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
    – Amessihel
    Nov 9 at 22:19












  • Just another heads up, OP does not want the comma in the number.
    – Andreas
    Nov 9 at 22:25










  • Thanks for the extremely detailed description!
    – sho
    Nov 9 at 23:12










  • Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
    – sho
    Nov 9 at 23:23








1




1




Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06






Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06














Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19






Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19














Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25




Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25












Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12




Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12












Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23




Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23










up vote
1
down vote













If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';

foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}





share|improve this answer























  • Thanks. This works well.
    – sho
    Nov 9 at 23:10















up vote
1
down vote













If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';

foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}





share|improve this answer























  • Thanks. This works well.
    – sho
    Nov 9 at 23:10













up vote
1
down vote










up vote
1
down vote









If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';

foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}





share|improve this answer














If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:



<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';

foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 9 at 23:45

























answered Nov 9 at 21:36









Salman A

172k65330416




172k65330416












  • Thanks. This works well.
    – sho
    Nov 9 at 23:10


















  • Thanks. This works well.
    – sho
    Nov 9 at 23:10
















Thanks. This works well.
– sho
Nov 9 at 23:10




Thanks. This works well.
– sho
Nov 9 at 23:10


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233419%2fregex-to-parse-line-with-and-capture-string-and-comma-separated-number%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Schultheiß

Verwaltungsgliederung Dänemarks

Liste der Kulturdenkmale in Wilsdruff