A collection of computer systems and programming tips that you may find useful.
 
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Wednesday, October 13, 2010

Stripping non-ASCII characters from text in Ruby

I need to get rid of occasional non-ASCII characters in otherwise plain ASCII text, such as 'curly quotes' like “ and ”. I don't know the real encoding of my source text but I can tell that the characters are encoded as hexadecimal characters such as \x94

Here is the regular expression I use to remove them:
str.gsub!(/[\x80-\xff]/, '')

I'm sure this won't work in many cases but with my text it does the job just fine.



 

No comments:

Archive of Tips