URLs gone wild, or, what are all those extraneous characters added on?

5S8Zh5

Flashlight Enthusiast
Joined
Jul 20, 2014
Messages
1,745
Location
U.S.A.
Here's one example of my question. I was saving the URL of a used book I just bought on Abebooks. When I was saving it, this is what showed:

qpj2aPS.jpg


^ had to take a screen shot as when I post it the forum truncates the true URL

What I've found, is a large percentage of junk is added on / tacked on at the end (tracking? cookie information? [ shrugs ] ), and that you can cut it all down to the true URL and the link still works - e.g.

https://www.abebooks.com/servlet/BookDetailsPL?bi=30049577153

^ it ends right after what looks to me to be the item number: 30049577153. This works for a lot of sale ad URLs.
 

idleprocess

Flashaholic
Joined
Feb 29, 2004
Messages
7,197
Location
decamped
In the case of the OP, it's possible that the URL itself is being used to populate fields in the corresponding page.

URL's can have all sorts of additional - and generally non-critical - bits added to them. As a general principle URL's contain the following:
  • Domain - ex https://subdomain.site.com - coarsely where the site is located on the internet
  • Path - ex /pages/content/page.html - finely where the resource is located on the server
  • Parameters - ex ?pageid=123456789&browser=desktop&len=en - granularly make calls within the resource to populate the page or make calls to other sites/resources

There are innumerous server schemes and site frameworks on how parameters are determined - i.e. sometimes they're seemingly built into the path.

Let's take a well-known e-commerce site whose domain will be replaced with rainforest.com :
https://rainforest.com/Energizer-Lithium-Batteries-Ultimate-Battery/dp/B01C4PP8FK/ref=sr_1_4?dchild=1&keywords=energizer+lithium+aa&qid=1598109667&sr=8-4

Built into that URL is a bunch of tracking information that they likely use for analytics and possibly for dynamic page content generation. The astute will notice that the search terms used to locate the product are encoded into the URL:
&keywords=energizer+lithium+aa

Knowing a bit about how rainforest.com structures their URLs, one can strip this down to the absolute minimum...
https://rainforest.com/Energizer-Lithium-Batteries-Ultimate-Battery/dp/B01C4PP8FK/
... will also load up the same product page. In my case doing a stare-and-compare there doesn't appear to be a difference in the dynamic content (i.e. "other shoppers also bought" and "related products") but I'm also logged into the site thus rainforest.com has far better details on how to pitch to me.

Let's take something that ZuckBook likes to spam my feed with:
https://www.caranddriver.com/features/g28249372/best-vehicles-apocalypse-2019/?utm_source=facebook_dda&utm_medium=cpm&utm_campaign=dda_fb_cd_d_i_g28249372&fbclid=IwAR0USpu-B1urcfokFRvopDDZB36q85lfUDrcOQMpY6-m_4g-D81sT3skCVo

... which can also be reduced to ...
https://www.caranddriver.com/features/g28249372/best-vehicles-apocalypse-2019/

ZuckBook along with C&D do us the favor of making their parameters mostly human readable:
  • ? : Generally used to indicate parameters rather than file path information
  • & : Used to join parameters to one another
  • utm_source=facebook_dda : Zuckbook link
  • utm_medium=cpm : Medium "cpm", whatever that means
  • utm_campaign=dda_fb_cd_d_i_g28249372 : Likely an identifier for the ad campaign itself, i.e. the article ID on ZuckBook as part of an ad buy or maybe ZuckBook paying C&D to provide content
  • fbclid=IwAR0USpu-B1urcfokFRvopDDZB36q85lfUDrcOQMpY6-m_4g-D81sT3skCVo : It's possible that my entire life history as ZuckBook understands it is neatly encoded within this string, but the most likely just identifies the impression uniquely a number of ways ... including some means of identifying me and might also be used as proof of impression in whatever financial relationship exists between C&D and ZuckBook

As a general courtesy, I pull the extraneous parameters, campaign trackers, user trackers, and other potentially user-identifying info out of URLs before sharing them.
 

archimedes

Flashaholic
Joined
Nov 12, 2010
Messages
15,780
Location
CONUS, top left
....

As a general courtesy, I pull the extraneous parameters, campaign trackers, user trackers, and other potentially user-identifying info out of URLs before sharing them.

Thank you for the excellent and detailed explanation above, and agree that it is nice to simplify links when possible.
 
Top