-
-
Save wischweh/b6c0ac878913cca8b1ba to your computer and use it in GitHub Desktop.
// This Regexp tries to grep a price from a string. | |
// 1. The number must makes sense. it may contain "." or "," i.e 1 1.000,99 10,0 etc | |
// 2. The String must conatin a currency identifier like EUR,USD,€ or $. | |
// 2a) The currency identifer may be at the begining or at the end of the matching string | |
// 2b) There may be a space between value and currency identifier | |
// This regexp is based upon http://stackoverflow.com/questions/1547574/regex-for-prices | |
(USD|EUR|€|\$|£)\s?(\d{1,}(?:[.,]\d{3})*(?:[.,]\d{2}))|(\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})?)\s?(USD|EUR) | |
updated Version (also matches numbers without delimiters in between like $2 $34 thx PepsiX for pointing out this issue: | |
(USD|EUR|€|\$|£)\s?(\d{1,}(?:[.,]*\d{3})*(?:[.,]*\d*))|(\d{1,3}(?:[.,]*\d*)*(?:[.,]*\d*)?)\s?(USD|EUR) | |
// here is the breakdown: | |
// Price Number: \d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})? | |
// Currency Symbol: a\s?(USD|EUR|€|\$) (with optional leading space)Ω// This Regexp tries to grep a price from a string. |
Update: I kept studying regex and came up with the following (probably very imperfect) solutions:
(\+\$[0-9]+\.[0-9]+).*(\+\$[0-9]+\.[0-9]+)
Returns two prices in format +$00.00 ignoring characters between. I suppose a space could be captured fairly easy in between there.
(\d+"[?\s][xX][?\s]\d+").*(\d+"[?\s][xX][?\s]\d+")
Pulls two sizes in format 00" x 00" 00" x 00" ignoring characters in between
and finally I did this, which returns all the necessary text, even though it doesn't look pretty:
(\d+"[?\s][xX][?\s]\d+").*(?:\s(\+\$[0-9]+\.[0-9]+).*).*>(\d+"[?\s][xX][?\s]\d+").*(?:\s(\+\$[0-9]+\.[0-9]+).*)
Returns 10" x 8"+$10.0012" x 8"+$15.00
These are solutions specific to this bit of text and my goals for it. I will be trying to find more versatile solutions.
There is an bug in the first post. Valid regexp is:
(USD|EUR|€|\$|£)\s?(\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})?)|(\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})?)\s?(USD|EUR|€|\$|£)
my elymbmx2.htm have prices in them and they are like this
$9.99
I get this error
cat elymbmx2.htm | grep -o (USD|EUR|€|$|£)\s?(\d{1,3}(?:[.,]\d{3})(?:[.,]\d{2})?)|(\d{1,3}(?:[.,]\d{3})(?:[.,]\d{2})?)\s?(USD|EUR|€|$|£)
*(?:[. was unexpected at this time.
Hi there, I stumbled across your code and it works so well! I am a novice at regex and have been bashing my head against isolating prices for days. There is one final thing I cannot figure out: When I use your regex on the following html, only the first price in the HTML is returned ($10.00). I have tried a lot of things and am at a loss. Is there a way to make it return all possible matches within the HTML? Some HTML will have more than two prices...
`select name="options[1176]" id="select_1176" class=" required product-custom-option admin__control-select" title="" data-selector="options[1176]" aria-required="true" style=" ">option value="">-- Please Select --/option>option value="1257" price="0">8" x 6" /option>option value="1255" price="10">10" x 8" +
/option>option value="1256" price="15">12" x 8" +
/option>/select>`
(I deleted all the opening < so the code would appear, not sure how to do it.)