Skip to content
Snippets Groups Projects

Updated URL regex for hashtags to disregard hashes used mid-sentence. #2361

Closed Ben requested to merge fix/hashtag-pipes-2361 into master

Summary

Closes #2361 (closed)

Simply, users would have one of their 5 hashtags used if posting a url with a hash in the query string, for example the invalid URL https://www.minds.com/#minds

The changed regex matches when:

  1. The first character is a space, or the start of a line.
  2. Followed by a hash.
  3. Followed atleast one word character

https://regexr.com/4tjg7 image

#Lorem#ipsum #dolor.sit amet, #consectetur adipiscing e#lit, https://loremipsum.io/#123#sed #do #eiusmod #tempo1=incididunt #132323 #@ #w #[ #] ' #word \ #labore@et dolore #magna aliqua. #Ut #enim #ad .minim #veniam, ^ #quis 
#nostrud exercitation ullamco #laboris nisi 

ut #aliquip ex ea commodo#consequat. Duis ()'#aute sad'#irure dolor 

#in reprehen;' 
#derit in@#voluptate velit essehttp: ci#https://loremipsum.io/#123llum dolore https://loremipsum.io/#123 https://loremipsum.io/#eu https://loremipsum.io/#fugiat https://loremipsum.io/#123 nulla pariatur. #Excepteur #sint #occaecat cupidatat 

#non 
#proident,
a #sunt in culpa qui officia deserunt mollit anim id est laborum.

Steps to test

This one is a bit more "get in there and try your best to break it" in terms of testing, but that said, to get a foot in the door:

  1. Go on the site, make a post with 5 hashtags; try to trip the Regex up and make some invalid tags to see how it responds e.g. hello#world, @#hello.
  2. Try a URL with a hashtag in.
  3. Check the hashtags links go through to the correct place.
  4. Try to use the hashtag selector to bypass the limit
  5. Ensure when you add a tag it shows up in the selector

Estimated Regression Scope

The only thing making me pause to question is the old Regex goes like this/: /(^|\s||)#(\w+)/gim,. It could be the source of a bug, or it could be a Regex trick I'm missing.

Take the first group (^|\s||) - to me that looks like a bug. I'm not aware of any double OR operand in regex nor can I find a reference for it. This appears to stop the \s from being matched, and I presume also the ^. Thus on production #hello#world counts as 2 hashtags.

This will change occurrences of that on production. #hello#world will match the hashtag #hello, and ignore the #world.

I think this is correct behavior, but changing it needs discussion, because it may invalidate people's tags in posts that for whatever reason, like to #combine#their#tags#like#this

Merge request reports

Checking pipeline status.

Closed by BenBen 5 years ago (Feb 7, 2020 9:48pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading