Something strange going on here.....

Since the search functions of the new forum mostly suck, I now turn more often to Google to make my searches, and I noticed something really unsettling: normally, when you pick any random extract of any post, like half a dozen words, and use quotes, Google finds instantly the message, even when it has been posted recently.
You can try this for yourself: copy any random part of a message, paste it in Google with quotes, and you find the message, generally alone because chances of a random combination of matching words are slim.
That's the rule.
It seems to have exceptions; one at least.
I have posted a "legacy thread", containing a lot of personal IP made public, and I want to be sure that it really goes public, is indexed by search engines and can be retrieved by the wayback machine if it needs to.
So far so good, but when I try to find the sensitive sections, they don't exist for Google. The non-sensitive, introductory part works alright, but as soon as something valuable appears, it becomes invisible to Google.
Here is an example: this is the target: https://www.diyaudio.com/community/...here-is-the-legacy-thread.387391/post-7052907
And this is what Google says:

Scrsh2.png


These are the only exceptions I could find.... I don't imply that it means foul play of some sort, but even for someone not especially suspicious, it looks strange.
I need to check with other search engines, to see if the behaviour remains the same, but there is an obvious anomaly, and Google is one of the most popular search engine
Why are the sensitive posts not indexed? I would welcome any rational explanation
 
If I search for "The cross-coupled Q2 & Q3 synthesize the series resistor of the PI attenuator" Google doesn't find it.
But pop that phrase into the search window here on the forum and it shows your post with that phrase, and only your post.

Search works here, as far as I can see.
 
No, I was talking about the forum search, it works to find phrases in your posts.

However Google does NOT seem to be finding words in the "Everything Else" forum. At least not any new posts. Maybe that subforum is restricted from search bots?
See if you can find anything in the "Everything Else" with Google. So far I have not, but maybe there is a date threshold.
 
in Google with quotes
Google recently changed the action of quotes, to be better for most users(???). I could try to find that article but, you know....

However: I agree, on Google, with or without quotes or site:diyaudio.com returns many different things but not yours.

On BING, with quotes finds just your thread, w/o quotes finds your attic first and then other semi-similar words.

DuckDuckGo.com, seems exactly same as Bing. (why not? they are closer than they admit. if not bedmates, at least use the same database.)

So I think your complaint is at Google.
 
Try this search
Code:
site:www.diyaudio.com Crown XTi 6002
in Google.

I used a custom date range of July 14 2022 to Aug 10 2022 and it returned a result in the "Everything Else" section.
So Google is indexing at least part of it and recently.
 
Maybe the quotes function has changed, but it still retrieves correctly any random phrase of the forum, and returns nothing for specific parts of my thread.
I'll try Duckduck go, Bing and others tomorrow. The issue might be addressed in the mean time
 
and can be retrieved by the wayback machine if it needs to.
You can initiate saves yourself by putting in the thread URL at the Archive and hitting save page. I just put in save for pages 1&2 and they are there now but missing a few images . (This seems to be a common problem on Archive) They are giving a "service currently overloaded" notice right now but will try a re-save later today to see if it will grab those missing images.

Archive Save
 

Attachments

  • Screen Shot 2022-08-10 at 2.21.15 PM.png
    Screen Shot 2022-08-10 at 2.21.15 PM.png
    17.7 KB · Views: 94
We can only guess, but I'd say it suggests Google is preferencing newer content to stick into their search engine, possibly a change to their exact match stuff, and my own take would be that they probably have implemented some kind of cost-saving algorithm / process to cut down on the electricity or storage expense of exact match searches on older forum content. Only Google would know for sure, you could ask in a Google support forum perhaps. Could be related to the PPP (posts per page) change, any number of things. If you have any specific complaints about the XF elastic search please post them in an appropriate thread in the new forum issues forum and they will get reviewed when we review the search performance in coming months as time allows.
 
The thread is now saved in the Archive. I believe all the .png images are there , although in my browser at least, they don't all load every time the page is opened. I also get an "Oops We ran into some problems . . . . ." notice at the top of the page but I don't see indication of what exactly is going wrong.

Something I didn't know before was that all the attached .asc files are not automatically saved along with the rest of the page. I had to click on each one and save it as a separate entity. Each one saved fairly quickly but with Archive imposed bandwidth limits I could only do four or five in sequence and then had to wait for a few minutes to start again. Important to remember if there are any more additions to the thread.

Thread at Archive
 
Thanks, that's reassuring. What is puzzling is the behaviour of Google's algorithm/AI: it managed to index only the mundane, uninteresting parts of the thread, leaving out all of the tastier bits.
Completely nonsensical. In addition, they certainly have the physical means to store and index absolutely everything -and more-.
If we are going to rely an AI for all aspects of our life, it doesn't bode well.....
 
This particular post you reference https://www.diyaudio.com/community/...ere-is-the-legacy-thread.387391/#post-7052907 uses some odd font changes. Google might consider that suspect as a potential "black hat" or spam type behaviour, as changing the font style is highly irregular on most forums (ie: what percentage of posts do that) and is/was commonly used to manipulate SERPS (white text on white backgrounds, tiny fonts too hard to read but picked up by crawlers, etc). It may be overly cautious or tripping a neural network style flag / protection.
 
I am going to make a test (after all, we are ~scientifics) to test the hypothesis: I post fractional voltage multipliers, then stochastic rms to dc converters and next, the gainuator.
Normally, this should be indexed in the next hours, or days at the very worst, since there is no formatting at all.
If it isn't, it means Google is definitely prejudiced against my content......