Since the search functions of the new forum mostly suck, I now turn more often to Google to make my searches, and I noticed something really unsettling: normally, when you pick any random extract of any post, like half a dozen words, and use quotes, Google finds instantly the message, even when it has been posted recently.
You can try this for yourself: copy any random part of a message, paste it in Google with quotes, and you find the message, generally alone because chances of a random combination of matching words are slim.
That's the rule.
It seems to have exceptions; one at least.
I have posted a "legacy thread", containing a lot of personal IP made public, and I want to be sure that it really goes public, is indexed by search engines and can be retrieved by the wayback machine if it needs to.
So far so good, but when I try to find the sensitive sections, they don't exist for Google. The non-sensitive, introductory part works alright, but as soon as something valuable appears, it becomes invisible to Google.
Here is an example: this is the target: https://www.diyaudio.com/community/...here-is-the-legacy-thread.387391/post-7052907
And this is what Google says:
These are the only exceptions I could find.... I don't imply that it means foul play of some sort, but even for someone not especially suspicious, it looks strange.
I need to check with other search engines, to see if the behaviour remains the same, but there is an obvious anomaly, and Google is one of the most popular search engine
Why are the sensitive posts not indexed? I would welcome any rational explanation
You can try this for yourself: copy any random part of a message, paste it in Google with quotes, and you find the message, generally alone because chances of a random combination of matching words are slim.
That's the rule.
It seems to have exceptions; one at least.
I have posted a "legacy thread", containing a lot of personal IP made public, and I want to be sure that it really goes public, is indexed by search engines and can be retrieved by the wayback machine if it needs to.
So far so good, but when I try to find the sensitive sections, they don't exist for Google. The non-sensitive, introductory part works alright, but as soon as something valuable appears, it becomes invisible to Google.
Here is an example: this is the target: https://www.diyaudio.com/community/...here-is-the-legacy-thread.387391/post-7052907
And this is what Google says:
These are the only exceptions I could find.... I don't imply that it means foul play of some sort, but even for someone not especially suspicious, it looks strange.
I need to check with other search engines, to see if the behaviour remains the same, but there is an obvious anomaly, and Google is one of the most popular search engine
Why are the sensitive posts not indexed? I would welcome any rational explanation
If I search for "The cross-coupled Q2 & Q3 synthesize the series resistor of the PI attenuator" Google doesn't find it.
But pop that phrase into the search window here on the forum and it shows your post with that phrase, and only your post.
Search works here, as far as I can see.
But pop that phrase into the search window here on the forum and it shows your post with that phrase, and only your post.
Search works here, as far as I can see.
Not really: I have made a number of tests and they all converge.
This is one of the subtitles:
It returns one result, totally unrelated
This is one of the subtitles:
It returns one result, totally unrelated
No, I was talking about the forum search, it works to find phrases in your posts.
However Google does NOT seem to be finding words in the "Everything Else" forum. At least not any new posts. Maybe that subforum is restricted from search bots?
See if you can find anything in the "Everything Else" with Google. So far I have not, but maybe there is a date threshold.
However Google does NOT seem to be finding words in the "Everything Else" forum. At least not any new posts. Maybe that subforum is restricted from search bots?
See if you can find anything in the "Everything Else" with Google. So far I have not, but maybe there is a date threshold.
Google IS finding words in the "Everything Else" subforum, but not within the past year. That might well indicate a setting since the move.
Google recently changed the action of quotes, to be better for most users(???). I could try to find that article but, you know....in Google with quotes
However: I agree, on Google, with or without quotes or site:diyaudio.com returns many different things but not yours.
On BING, with quotes finds just your thread, w/o quotes finds your attic first and then other semi-similar words.
DuckDuckGo.com, seems exactly same as Bing. (why not? they are closer than they admit. if not bedmates, at least use the same database.)
So I think your complaint is at Google.
I am talking about Google, and stuff in everything else is indexed as all the rest: the examples I took are weeks old, and this post is only 2 days old, yet it is found easily:
https://www.diyaudio.com/community/threads/measure-voltage-gain-of-amplifiers.389077/post-7091741
https://www.diyaudio.com/community/threads/measure-voltage-gain-of-amplifiers.389077/post-7091741
https://bgr.com/tech/a-super-useful-google-search-hack-just-got-a-smart-update/I could try to find that article but
A super-useful Google Search hack just got a smart update
Being able to search using quotes in Google Search has been one of the best ways to find specific phrases or words in a webpage or document for years.
That's what they say they did but it is not what I have seen this week. Must be my hot wires, because Google code can't have bugs.
Try this search
in Google.
I used a custom date range of July 14 2022 to Aug 10 2022 and it returned a result in the "Everything Else" section.
So Google is indexing at least part of it and recently.
Code:
site:www.diyaudio.com Crown XTi 6002
I used a custom date range of July 14 2022 to Aug 10 2022 and it returned a result in the "Everything Else" section.
So Google is indexing at least part of it and recently.
Maybe the quotes function has changed, but it still retrieves correctly any random phrase of the forum, and returns nothing for specific parts of my thread.
I'll try Duckduck go, Bing and others tomorrow. The issue might be addressed in the mean time
I'll try Duckduck go, Bing and others tomorrow. The issue might be addressed in the mean time
You can initiate saves yourself by putting in the thread URL at the Archive and hitting save page. I just put in save for pages 1&2 and they are there now but missing a few images . (This seems to be a common problem on Archive) They are giving a "service currently overloaded" notice right now but will try a re-save later today to see if it will grab those missing images.and can be retrieved by the wayback machine if it needs to.
Archive Save
Attachments
We can only guess, but I'd say it suggests Google is preferencing newer content to stick into their search engine, possibly a change to their exact match stuff, and my own take would be that they probably have implemented some kind of cost-saving algorithm / process to cut down on the electricity or storage expense of exact match searches on older forum content. Only Google would know for sure, you could ask in a Google support forum perhaps. Could be related to the PPP (posts per page) change, any number of things. If you have any specific complaints about the XF elastic search please post them in an appropriate thread in the new forum issues forum and they will get reviewed when we review the search performance in coming months as time allows.
Other search engines work as they should: only Google seems to be prejudiced against my subjects. I must have offended them in some way...
The thread is now saved in the Archive. I believe all the .png images are there , although in my browser at least, they don't all load every time the page is opened. I also get an "Oops We ran into some problems . . . . ." notice at the top of the page but I don't see indication of what exactly is going wrong.
Something I didn't know before was that all the attached .asc files are not automatically saved along with the rest of the page. I had to click on each one and save it as a separate entity. Each one saved fairly quickly but with Archive imposed bandwidth limits I could only do four or five in sequence and then had to wait for a few minutes to start again. Important to remember if there are any more additions to the thread.
Thread at Archive
Something I didn't know before was that all the attached .asc files are not automatically saved along with the rest of the page. I had to click on each one and save it as a separate entity. Each one saved fairly quickly but with Archive imposed bandwidth limits I could only do four or five in sequence and then had to wait for a few minutes to start again. Important to remember if there are any more additions to the thread.
Thread at Archive
Thanks, that's reassuring. What is puzzling is the behaviour of Google's algorithm/AI: it managed to index only the mundane, uninteresting parts of the thread, leaving out all of the tastier bits.
Completely nonsensical. In addition, they certainly have the physical means to store and index absolutely everything -and more-.
If we are going to rely an AI for all aspects of our life, it doesn't bode well.....
Completely nonsensical. In addition, they certainly have the physical means to store and index absolutely everything -and more-.
If we are going to rely an AI for all aspects of our life, it doesn't bode well.....
Could it be that the relevance algorithm using TF-IDF influenced the results? Just a guess.
https://en.wikipedia.org/wiki/Tf–idf
https://en.wikipedia.org/wiki/Tf–idf
This particular post you reference https://www.diyaudio.com/community/...ere-is-the-legacy-thread.387391/#post-7052907 uses some odd font changes. Google might consider that suspect as a potential "black hat" or spam type behaviour, as changing the font style is highly irregular on most forums (ie: what percentage of posts do that) and is/was commonly used to manipulate SERPS (white text on white backgrounds, tiny fonts too hard to read but picked up by crawlers, etc). It may be overly cautious or tripping a neural network style flag / protection.
I am going to make a test (after all, we are ~scientifics) to test the hypothesis: I post fractional voltage multipliers, then stochastic rms to dc converters and next, the gainuator.
Normally, this should be indexed in the next hours, or days at the very worst, since there is no formatting at all.
If it isn't, it means Google is definitely prejudiced against my content......
Normally, this should be indexed in the next hours, or days at the very worst, since there is no formatting at all.
If it isn't, it means Google is definitely prejudiced against my content......
- Home
- Site
- Forum Problems & Feedback
- Something strange going on here.....