Apple switches from Bing to Google as default search platform in Siri, iOS Search, and Mac...

melgross · September 27, 2017 5:55PM

gatorguy said:

melgross said:

Marvin said:

lorin schultz said:

MacPro said:

It's about time Apple created their own search engine IMHO. Apple could do it for the good of their users not to exploit them.

I dunno. Have you compared Apple's maps app to Google's lately? It would be fun to see what Apple could do with it, but I wouldn't be particularly confident in Apple's ability to make their own search engine better than Google's.

Geographical maps are a little different because that's mapping the real world into digital. Google's search engine will actually help make their maps more accurate because new addresses will be posted to websites. Internet search is digital to digital. To map out the internet, you just visit every web page and download the text on the site. A typical HTML page is about 60KB (images, CSS, Javascript etc are separate):

https://gigaom.com/2014/12/29/the-overweight-web-average-web-page-size-is-up-15-in-2014/

There are ~1 billion websites, say they average 50 pages per site (forums have way more pages but most sites are just information) = 1b * 50 * 60KB = 3 petabyte of data uncompressed. This is just acquisition data, they need to keep versioned data, which will run into exabytes (1EB = 1 million 1TB HDDs).

That's with all the HTML tags and full content. The indexer would strip that out and look for unique terms and links and just associate that with the URL.

If a web page takes under 5 seconds to load, visiting 50 billion web pages would take 250b seconds = 2.8 million days. That's if you visit them sequentially. They would be loaded in parallel. If Apple had 1,000 servers indexing 1,000 pages each at a time (~480Gbit/s bandwidth), they could index the entire internet in under 3 days. It would need a data center to do it because of the bandwidth requirements, Google has at least 1 petabit/s bandwidth:

https://techcrunch.com/2015/08/18/how-googles-networking-infrastructure-has-evolved-over-the-last-10-years/
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

"this kind of speed allows 100,000 servers to read all of the scanned data in the Library of Congress in less than a tenth of a second."

DuckDuckGo uses Amazon's servers in addition to their own:

http://highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html

The difficult parts are the connection between what the search engine user tells it they want and what the search engine thinks they actually want and managing the scale of the data index and search volume.

They'd need to have an efficient mapping system. Google uses PageRank and other algorithms to determine site popularity:

https://en.wikipedia.org/wiki/PageRank

For every page indexed, the indexer notes links on the page and increments the popularity of that link in the index. The more inbound links a page has, the more popular it is and that gives it a high reputation. There are other factors like whether the page content matches the title, whether it's selling products or it's just information.

You can see an example what a search engine sees using the terminal on the Mac. The following will download Wikipedia's homepage to the desktop as text:
curl "https://www.wikipedia.org" -o ~/Desktop/wikipedia.txt
From that, the indexer would parse all the text and get the links, then visit the links and so on until all the pages online are done.
Once the indexing is complete, they'd have a ranking of most popular sites and associations with each page's terms.
When a user submits a search, the engine determines the meaning in their search like how Siri works. Those terms are used to look up the index and the most popular sites that fit the search get delivered.
This is where search engines fail badly in certain contexts because something being popular doesn't mean it's right. When you do searches for technical questions, the results are sometimes not the answer but popular sites showing the same unanswered question.
There needs to be intelligent analysis of the content to determine the nature of it, whether it's an opinion, a question, an answer, a historical fact, a peer-reviewed scientific paper and so on. This works better knowing the people who wrote the content too, which is a privacy issue.
Apple has been running Siri for a while and although they feed the data into 3rd party search engines, they wouldn't be starting from nothing. To match or exceed Google they can build up an index and run queries from Siri through both their own engine and Google and compare the results. If their results are worse then they need to improve their index.
It's obviously not just a matter of having enough resources as Microsoft has the resources and hasn't made a better engine than Google. When asked about Apple's engine switch, Microsoft said:
https://techcrunch.com/2017/09/25/apple-switches-from-bing-to-google-for-siri-web-search-results-on-ios-and-spotlight-on-mac/
"We value our relationship with Apple and look forward to continuing to partner with them in many ways, including on Bing Image Search in Siri, to provide the best experience possible for our customers. Bing has grown every year since its launch, now powering over a third of all the PC search volume in the U.S., and continues to grow worldwide. It also powers the search experiences of many other partners, including Yahoo (Verizon), AOL and Amazon, as well as the multi-lingual abilities of Twitter. As we move forward, given our work to advance the field of AI, we’re confident that Bing will be at the forefront of providing a more intelligent search experience for our customers and partners."
Microsoft has managed to gain a decent marketshare and they see AI making their search better than competitors in future. The more that the indexer can understand the content, the better the results are. This is where engines like Yandex (Russia) and Baidu (China) sometimes work better as they have a better understanding of their own language and culture (although Google gets blocked to varying degrees so that doesn't help):
https://www.semrush.com/blog/5-advantages-yandex-google-russia/
Having a high volume of users helps search engines so unpopular ones are already at a disadvantage because they can't see what lots of people are looking for. They need to get the users on the platform first and Apple has an advantage with their install base. Getting the data for the index is fairly trivial. Building an index intelligently is hard. I don't think Google has this part done as well as it could be even if it's better than anyone else, their main strength seems to be more from having the marketshare and reacting to user data.
Apple doesn't need to go all in with a search engine, they can build it up. They can be returning some results from their own engine in much the same way DuckDuckGo gets data from multiple sources and how Siri uses partner services like Yelp. It's not something that just gets switched on, it has to evolve.
Whether Apple wants to build and maintain a search engine is another matter. They'd have to take down millions of copyright links all the time and handling exabytes of private data is tough (per-user encryption is easier than pooling data together). Ultimately their motive would be to improve on what's already there but there aren't many people dissatisfied with Google search so it's not even so much that they couldn't improve on it, at least in some areas, I don't think there's much incentive for them.

Of course, as we know, ranking isn’t that straightforward. They bias the search towards paid results. They bias the search towards their own properties. They also manually bias the search towards what they think is trending, and they cut links to sites they don’t “like”.

What do you mean by "they don't like"?

Well, it’s just a term. They’re cutting hate groups out. While I agree with that, there’s no law requiring them to do so, so I say that they don’t like them.

gatorguy · September 27, 2017 5:58PM

melgross said:

gatorguy said:

melgross said:

Marvin said:

lorin schultz said:

MacPro said:

It's about time Apple created their own search engine IMHO. Apple could do it for the good of their users not to exploit them.

I dunno. Have you compared Apple's maps app to Google's lately? It would be fun to see what Apple could do with it, but I wouldn't be particularly confident in Apple's ability to make their own search engine better than Google's.

Geographical maps are a little different because that's mapping the real world into digital. Google's search engine will actually help make their maps more accurate because new addresses will be posted to websites. Internet search is digital to digital. To map out the internet, you just visit every web page and download the text on the site. A typical HTML page is about 60KB (images, CSS, Javascript etc are separate):

https://gigaom.com/2014/12/29/the-overweight-web-average-web-page-size-is-up-15-in-2014/

There are ~1 billion websites, say they average 50 pages per site (forums have way more pages but most sites are just information) = 1b * 50 * 60KB = 3 petabyte of data uncompressed. This is just acquisition data, they need to keep versioned data, which will run into exabytes (1EB = 1 million 1TB HDDs).

That's with all the HTML tags and full content. The indexer would strip that out and look for unique terms and links and just associate that with the URL.

If a web page takes under 5 seconds to load, visiting 50 billion web pages would take 250b seconds = 2.8 million days. That's if you visit them sequentially. They would be loaded in parallel. If Apple had 1,000 servers indexing 1,000 pages each at a time (~480Gbit/s bandwidth), they could index the entire internet in under 3 days. It would need a data center to do it because of the bandwidth requirements, Google has at least 1 petabit/s bandwidth:

https://techcrunch.com/2015/08/18/how-googles-networking-infrastructure-has-evolved-over-the-last-10-years/
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

"this kind of speed allows 100,000 servers to read all of the scanned data in the Library of Congress in less than a tenth of a second."

DuckDuckGo uses Amazon's servers in addition to their own:

http://highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html

The difficult parts are the connection between what the search engine user tells it they want and what the search engine thinks they actually want and managing the scale of the data index and search volume.

They'd need to have an efficient mapping system. Google uses PageRank and other algorithms to determine site popularity:

https://en.wikipedia.org/wiki/PageRank

For every page indexed, the indexer notes links on the page and increments the popularity of that link in the index. The more inbound links a page has, the more popular it is and that gives it a high reputation. There are other factors like whether the page content matches the title, whether it's selling products or it's just information.

You can see an example what a search engine sees using the terminal on the Mac. The following will download Wikipedia's homepage to the desktop as text:
curl "https://www.wikipedia.org" -o ~/Desktop/wikipedia.txt
From that, the indexer would parse all the text and get the links, then visit the links and so on until all the pages online are done.
Once the indexing is complete, they'd have a ranking of most popular sites and associations with each page's terms.
When a user submits a search, the engine determines the meaning in their search like how Siri works. Those terms are used to look up the index and the most popular sites that fit the search get delivered.
This is where search engines fail badly in certain contexts because something being popular doesn't mean it's right. When you do searches for technical questions, the results are sometimes not the answer but popular sites showing the same unanswered question.
There needs to be intelligent analysis of the content to determine the nature of it, whether it's an opinion, a question, an answer, a historical fact, a peer-reviewed scientific paper and so on. This works better knowing the people who wrote the content too, which is a privacy issue.
Apple has been running Siri for a while and although they feed the data into 3rd party search engines, they wouldn't be starting from nothing. To match or exceed Google they can build up an index and run queries from Siri through both their own engine and Google and compare the results. If their results are worse then they need to improve their index.
It's obviously not just a matter of having enough resources as Microsoft has the resources and hasn't made a better engine than Google. When asked about Apple's engine switch, Microsoft said:
https://techcrunch.com/2017/09/25/apple-switches-from-bing-to-google-for-siri-web-search-results-on-ios-and-spotlight-on-mac/
"We value our relationship with Apple and look forward to continuing to partner with them in many ways, including on Bing Image Search in Siri, to provide the best experience possible for our customers. Bing has grown every year since its launch, now powering over a third of all the PC search volume in the U.S., and continues to grow worldwide. It also powers the search experiences of many other partners, including Yahoo (Verizon), AOL and Amazon, as well as the multi-lingual abilities of Twitter. As we move forward, given our work to advance the field of AI, we’re confident that Bing will be at the forefront of providing a more intelligent search experience for our customers and partners."
Microsoft has managed to gain a decent marketshare and they see AI making their search better than competitors in future. The more that the indexer can understand the content, the better the results are. This is where engines like Yandex (Russia) and Baidu (China) sometimes work better as they have a better understanding of their own language and culture (although Google gets blocked to varying degrees so that doesn't help):
https://www.semrush.com/blog/5-advantages-yandex-google-russia/
Having a high volume of users helps search engines so unpopular ones are already at a disadvantage because they can't see what lots of people are looking for. They need to get the users on the platform first and Apple has an advantage with their install base. Getting the data for the index is fairly trivial. Building an index intelligently is hard. I don't think Google has this part done as well as it could be even if it's better than anyone else, their main strength seems to be more from having the marketshare and reacting to user data.
Apple doesn't need to go all in with a search engine, they can build it up. They can be returning some results from their own engine in much the same way DuckDuckGo gets data from multiple sources and how Siri uses partner services like Yelp. It's not something that just gets switched on, it has to evolve.
Whether Apple wants to build and maintain a search engine is another matter. They'd have to take down millions of copyright links all the time and handling exabytes of private data is tough (per-user encryption is easier than pooling data together). Ultimately their motive would be to improve on what's already there but there aren't many people dissatisfied with Google search so it's not even so much that they couldn't improve on it, at least in some areas, I don't think there's much incentive for them.

Of course, as we know, ranking isn’t that straightforward. They bias the search towards paid results. They bias the search towards their own properties. They also manually bias the search towards what they think is trending, and they cut links to sites they don’t “like”.

What do you mean by "they don't like"?

Well, it’s just a term. They’re cutting hate groups out. While I agree with that, there’s no law requiring them to do so, so I say that they don’t like them.

I'd say it's more by public and government demand. Facebook is attempting to do the same as is Apple, Twitter, and PayPal among others. It's simply the responsible thing to do IMO, especially considering the current climate.

lorin schultz · September 27, 2017 6:30PM

gatorguy said:

melgross said:

gatorguy said:

melgross said:

Marvin said:

lorin schultz said:

MacPro said:

It's about time Apple created their own search engine IMHO. Apple could do it for the good of their users not to exploit them.

I dunno. Have you compared Apple's maps app to Google's lately? It would be fun to see what Apple could do with it, but I wouldn't be particularly confident in Apple's ability to make their own search engine better than Google's.

Geographical maps are a little different because that's mapping the real world into digital. Google's search engine will actually help make their maps more accurate because new addresses will be posted to websites. Internet search is digital to digital. To map out the internet, you just visit every web page and download the text on the site. A typical HTML page is about 60KB (images, CSS, Javascript etc are separate):

https://gigaom.com/2014/12/29/the-overweight-web-average-web-page-size-is-up-15-in-2014/

There are ~1 billion websites, say they average 50 pages per site (forums have way more pages but most sites are just information) = 1b * 50 * 60KB = 3 petabyte of data uncompressed. This is just acquisition data, they need to keep versioned data, which will run into exabytes (1EB = 1 million 1TB HDDs).

That's with all the HTML tags and full content. The indexer would strip that out and look for unique terms and links and just associate that with the URL.

If a web page takes under 5 seconds to load, visiting 50 billion web pages would take 250b seconds = 2.8 million days. That's if you visit them sequentially. They would be loaded in parallel. If Apple had 1,000 servers indexing 1,000 pages each at a time (~480Gbit/s bandwidth), they could index the entire internet in under 3 days. It would need a data center to do it because of the bandwidth requirements, Google has at least 1 petabit/s bandwidth:

https://techcrunch.com/2015/08/18/how-googles-networking-infrastructure-has-evolved-over-the-last-10-years/
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

"this kind of speed allows 100,000 servers to read all of the scanned data in the Library of Congress in less than a tenth of a second."

DuckDuckGo uses Amazon's servers in addition to their own:

http://highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html

The difficult parts are the connection between what the search engine user tells it they want and what the search engine thinks they actually want and managing the scale of the data index and search volume.

They'd need to have an efficient mapping system. Google uses PageRank and other algorithms to determine site popularity:

https://en.wikipedia.org/wiki/PageRank

For every page indexed, the indexer notes links on the page and increments the popularity of that link in the index. The more inbound links a page has, the more popular it is and that gives it a high reputation. There are other factors like whether the page content matches the title, whether it's selling products or it's just information.

You can see an example what a search engine sees using the terminal on the Mac. The following will download Wikipedia's homepage to the desktop as text:
curl "https://www.wikipedia.org" -o ~/Desktop/wikipedia.txt
From that, the indexer would parse all the text and get the links, then visit the links and so on until all the pages online are done.
Once the indexing is complete, they'd have a ranking of most popular sites and associations with each page's terms.
When a user submits a search, the engine determines the meaning in their search like how Siri works. Those terms are used to look up the index and the most popular sites that fit the search get delivered.
This is where search engines fail badly in certain contexts because something being popular doesn't mean it's right. When you do searches for technical questions, the results are sometimes not the answer but popular sites showing the same unanswered question.
There needs to be intelligent analysis of the content to determine the nature of it, whether it's an opinion, a question, an answer, a historical fact, a peer-reviewed scientific paper and so on. This works better knowing the people who wrote the content too, which is a privacy issue.
Apple has been running Siri for a while and although they feed the data into 3rd party search engines, they wouldn't be starting from nothing. To match or exceed Google they can build up an index and run queries from Siri through both their own engine and Google and compare the results. If their results are worse then they need to improve their index.
It's obviously not just a matter of having enough resources as Microsoft has the resources and hasn't made a better engine than Google. When asked about Apple's engine switch, Microsoft said:
https://techcrunch.com/2017/09/25/apple-switches-from-bing-to-google-for-siri-web-search-results-on-ios-and-spotlight-on-mac/
"We value our relationship with Apple and look forward to continuing to partner with them in many ways, including on Bing Image Search in Siri, to provide the best experience possible for our customers. Bing has grown every year since its launch, now powering over a third of all the PC search volume in the U.S., and continues to grow worldwide. It also powers the search experiences of many other partners, including Yahoo (Verizon), AOL and Amazon, as well as the multi-lingual abilities of Twitter. As we move forward, given our work to advance the field of AI, we’re confident that Bing will be at the forefront of providing a more intelligent search experience for our customers and partners."
Microsoft has managed to gain a decent marketshare and they see AI making their search better than competitors in future. The more that the indexer can understand the content, the better the results are. This is where engines like Yandex (Russia) and Baidu (China) sometimes work better as they have a better understanding of their own language and culture (although Google gets blocked to varying degrees so that doesn't help):
https://www.semrush.com/blog/5-advantages-yandex-google-russia/
Having a high volume of users helps search engines so unpopular ones are already at a disadvantage because they can't see what lots of people are looking for. They need to get the users on the platform first and Apple has an advantage with their install base. Getting the data for the index is fairly trivial. Building an index intelligently is hard. I don't think Google has this part done as well as it could be even if it's better than anyone else, their main strength seems to be more from having the marketshare and reacting to user data.
Apple doesn't need to go all in with a search engine, they can build it up. They can be returning some results from their own engine in much the same way DuckDuckGo gets data from multiple sources and how Siri uses partner services like Yelp. It's not something that just gets switched on, it has to evolve.
Whether Apple wants to build and maintain a search engine is another matter. They'd have to take down millions of copyright links all the time and handling exabytes of private data is tough (per-user encryption is easier than pooling data together). Ultimately their motive would be to improve on what's already there but there aren't many people dissatisfied with Google search so it's not even so much that they couldn't improve on it, at least in some areas, I don't think there's much incentive for them.

Of course, as we know, ranking isn’t that straightforward. They bias the search towards paid results. They bias the search towards their own properties. They also manually bias the search towards what they think is trending, and they cut links to sites they don’t “like”.

What do you mean by "they don't like"?

Well, it’s just a term. They’re cutting hate groups out. While I agree with that, there’s no law requiring them to do so, so I say that they don’t like them.

I'd say it's more by public and government demand. Facebook is attempting to do the same as is Twitter. It's simply the responsible thing to do, especially considering the current climate.

Is it "responsible?" Issues like this leave me feeling conflicted. On one hand it's really, really, really, difficult to find either a reason or the will to defend hateful and socially-regressive content. On the other hand, the idea that government pressure or popular public ideology might influence what can or can not be published/seen/read is terrifying. In some cultures, government policy and public sentiment may favour things you and I think are abhorrent, and oppose things we consider fundamental human rights. Others, particularly those of certain religious views, might feel that your beliefs or mine are socially destructive. Are governments and social majorities good arbiters of what constitutes acceptable content?

Government and the masses have a long history of endorsing beliefs and actions that wind up being condemned in hindsight. Do we want to risk accidentally throwing out the baby with the bath water despite good intentions, or do we believe strongly enough in free speech that we accept that some of that speech is going to be utterly and egregiously contrary to our own values?

It's a tough question, and I don't pretend to know the answer.

gatorguy · September 27, 2017 6:51PM

lorin schultz said:

gatorguy said:

melgross said:

gatorguy said:

melgross said:

Marvin said:

lorin schultz said:

MacPro said:

It's about time Apple created their own search engine IMHO. Apple could do it for the good of their users not to exploit them.

I dunno. Have you compared Apple's maps app to Google's lately? It would be fun to see what Apple could do with it, but I wouldn't be particularly confident in Apple's ability to make their own search engine better than Google's.

Geographical maps are a little different because that's mapping the real world into digital. Google's search engine will actually help make their maps more accurate because new addresses will be posted to websites. Internet search is digital to digital. To map out the internet, you just visit every web page and download the text on the site. A typical HTML page is about 60KB (images, CSS, Javascript etc are separate):

https://gigaom.com/2014/12/29/the-overweight-web-average-web-page-size-is-up-15-in-2014/

There are ~1 billion websites, say they average 50 pages per site (forums have way more pages but most sites are just information) = 1b * 50 * 60KB = 3 petabyte of data uncompressed. This is just acquisition data, they need to keep versioned data, which will run into exabytes (1EB = 1 million 1TB HDDs).

That's with all the HTML tags and full content. The indexer would strip that out and look for unique terms and links and just associate that with the URL.

If a web page takes under 5 seconds to load, visiting 50 billion web pages would take 250b seconds = 2.8 million days. That's if you visit them sequentially. They would be loaded in parallel. If Apple had 1,000 servers indexing 1,000 pages each at a time (~480Gbit/s bandwidth), they could index the entire internet in under 3 days. It would need a data center to do it because of the bandwidth requirements, Google has at least 1 petabit/s bandwidth:

https://techcrunch.com/2015/08/18/how-googles-networking-infrastructure-has-evolved-over-the-last-10-years/
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

"this kind of speed allows 100,000 servers to read all of the scanned data in the Library of Congress in less than a tenth of a second."

DuckDuckGo uses Amazon's servers in addition to their own:

http://highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html

The difficult parts are the connection between what the search engine user tells it they want and what the search engine thinks they actually want and managing the scale of the data index and search volume.

They'd need to have an efficient mapping system. Google uses PageRank and other algorithms to determine site popularity:

https://en.wikipedia.org/wiki/PageRank

For every page indexed, the indexer notes links on the page and increments the popularity of that link in the index. The more inbound links a page has, the more popular it is and that gives it a high reputation. There are other factors like whether the page content matches the title, whether it's selling products or it's just information.

You can see an example what a search engine sees using the terminal on the Mac. The following will download Wikipedia's homepage to the desktop as text:
curl "https://www.wikipedia.org" -o ~/Desktop/wikipedia.txt
From that, the indexer would parse all the text and get the links, then visit the links and so on until all the pages online are done.
Once the indexing is complete, they'd have a ranking of most popular sites and associations with each page's terms.
When a user submits a search, the engine determines the meaning in their search like how Siri works. Those terms are used to look up the index and the most popular sites that fit the search get delivered.
This is where search engines fail badly in certain contexts because something being popular doesn't mean it's right. When you do searches for technical questions, the results are sometimes not the answer but popular sites showing the same unanswered question.
There needs to be intelligent analysis of the content to determine the nature of it, whether it's an opinion, a question, an answer, a historical fact, a peer-reviewed scientific paper and so on. This works better knowing the people who wrote the content too, which is a privacy issue.
Apple has been running Siri for a while and although they feed the data into 3rd party search engines, they wouldn't be starting from nothing. To match or exceed Google they can build up an index and run queries from Siri through both their own engine and Google and compare the results. If their results are worse then they need to improve their index.
It's obviously not just a matter of having enough resources as Microsoft has the resources and hasn't made a better engine than Google. When asked about Apple's engine switch, Microsoft said:
https://techcrunch.com/2017/09/25/apple-switches-from-bing-to-google-for-siri-web-search-results-on-ios-and-spotlight-on-mac/
"We value our relationship with Apple and look forward to continuing to partner with them in many ways, including on Bing Image Search in Siri, to provide the best experience possible for our customers. Bing has grown every year since its launch, now powering over a third of all the PC search volume in the U.S., and continues to grow worldwide. It also powers the search experiences of many other partners, including Yahoo (Verizon), AOL and Amazon, as well as the multi-lingual abilities of Twitter. As we move forward, given our work to advance the field of AI, we’re confident that Bing will be at the forefront of providing a more intelligent search experience for our customers and partners."
Microsoft has managed to gain a decent marketshare and they see AI making their search better than competitors in future. The more that the indexer can understand the content, the better the results are. This is where engines like Yandex (Russia) and Baidu (China) sometimes work better as they have a better understanding of their own language and culture (although Google gets blocked to varying degrees so that doesn't help):
https://www.semrush.com/blog/5-advantages-yandex-google-russia/
Having a high volume of users helps search engines so unpopular ones are already at a disadvantage because they can't see what lots of people are looking for. They need to get the users on the platform first and Apple has an advantage with their install base. Getting the data for the index is fairly trivial. Building an index intelligently is hard. I don't think Google has this part done as well as it could be even if it's better than anyone else, their main strength seems to be more from having the marketshare and reacting to user data.
Apple doesn't need to go all in with a search engine, they can build it up. They can be returning some results from their own engine in much the same way DuckDuckGo gets data from multiple sources and how Siri uses partner services like Yelp. It's not something that just gets switched on, it has to evolve.
Whether Apple wants to build and maintain a search engine is another matter. They'd have to take down millions of copyright links all the time and handling exabytes of private data is tough (per-user encryption is easier than pooling data together). Ultimately their motive would be to improve on what's already there but there aren't many people dissatisfied with Google search so it's not even so much that they couldn't improve on it, at least in some areas, I don't think there's much incentive for them.

Of course, as we know, ranking isn’t that straightforward. They bias the search towards paid results. They bias the search towards their own properties. They also manually bias the search towards what they think is trending, and they cut links to sites they don’t “like”.

What do you mean by "they don't like"?

Well, it’s just a term. They’re cutting hate groups out. While I agree with that, there’s no law requiring them to do so, so I say that they don’t like them.

I'd say it's more by public and government demand. Facebook is attempting to do the same as is Twitter. It's simply the responsible thing to do, especially considering the current climate.

Is it "responsible?" Issues like this leave me feeling conflicted. On one hand it's really, really, really, difficult to find either a reason or the will to defend hateful and socially-regressive content. On the other hand, the idea that government pressure or popular public ideology might influence what can or can not be published/seen/read is terrifying. In some cultures, government policy and public sentiment may favour things you and I think are abhorrent, and oppose things we consider fundamental human rights. Others, particularly those of certain religious views, might feel that your beliefs or mine are socially destructive. Are governments and social majorities good arbiters of what constitutes acceptable content?

Government and the masses have a long history of endorsing beliefs and actions that wind up being condemned in hindsight. Do we want to risk accidentally throwing out the baby with the bath water despite good intentions, or do we believe strongly enough in free speech that we accept that some of that speech is going to be utterly and egregiously contrary to our own values?

It's a tough question, and I don't pretend to know the answer.

You are absolutely correct. In effect we're trusting social sites and web services providers to use proper judgement, and they may not always do so. Yes there's a danger that "free speech" may not be quite as free on certain web properties nor with various supporting services.

At some point we've all wanted to scream "ENOUGH" when pictures of cars crashing thru crowds killing and maiming all in their path flash across our screens. Some of us want to cry when we see innocents getting the attention of some hate-monger in a political rally, come across images of bodies in the street, or some grandmother who was simply passing by now bloody and battered by someone in a rampaging angry crowd.

So I don't know the answer either. Should Apple and Google and Facebook turn away, it's not their business? Be active in social commentary but take no action that might stifle "free speech"? Be proactive, make it harder for the worst of the worst to spread messages of hate and encourage destruction?

I dunno...

(End of political rant)

SpamSandwich · September 27, 2017 7:45PM

lorin schultz said:

gatorguy said:

melgross said:

gatorguy said:

melgross said:

Marvin said:

lorin schultz said:

MacPro said:

It's about time Apple created their own search engine IMHO. Apple could do it for the good of their users not to exploit them.

I dunno. Have you compared Apple's maps app to Google's lately? It would be fun to see what Apple could do with it, but I wouldn't be particularly confident in Apple's ability to make their own search engine better than Google's.

Geographical maps are a little different because that's mapping the real world into digital. Google's search engine will actually help make their maps more accurate because new addresses will be posted to websites. Internet search is digital to digital. To map out the internet, you just visit every web page and download the text on the site. A typical HTML page is about 60KB (images, CSS, Javascript etc are separate):

https://gigaom.com/2014/12/29/the-overweight-web-average-web-page-size-is-up-15-in-2014/

There are ~1 billion websites, say they average 50 pages per site (forums have way more pages but most sites are just information) = 1b * 50 * 60KB = 3 petabyte of data uncompressed. This is just acquisition data, they need to keep versioned data, which will run into exabytes (1EB = 1 million 1TB HDDs).

That's with all the HTML tags and full content. The indexer would strip that out and look for unique terms and links and just associate that with the URL.

If a web page takes under 5 seconds to load, visiting 50 billion web pages would take 250b seconds = 2.8 million days. That's if you visit them sequentially. They would be loaded in parallel. If Apple had 1,000 servers indexing 1,000 pages each at a time (~480Gbit/s bandwidth), they could index the entire internet in under 3 days. It would need a data center to do it because of the bandwidth requirements, Google has at least 1 petabit/s bandwidth:

https://techcrunch.com/2015/08/18/how-googles-networking-infrastructure-has-evolved-over-the-last-10-years/
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

"this kind of speed allows 100,000 servers to read all of the scanned data in the Library of Congress in less than a tenth of a second."

DuckDuckGo uses Amazon's servers in addition to their own:

http://highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html

The difficult parts are the connection between what the search engine user tells it they want and what the search engine thinks they actually want and managing the scale of the data index and search volume.

They'd need to have an efficient mapping system. Google uses PageRank and other algorithms to determine site popularity:

https://en.wikipedia.org/wiki/PageRank

For every page indexed, the indexer notes links on the page and increments the popularity of that link in the index. The more inbound links a page has, the more popular it is and that gives it a high reputation. There are other factors like whether the page content matches the title, whether it's selling products or it's just information.

You can see an example what a search engine sees using the terminal on the Mac. The following will download Wikipedia's homepage to the desktop as text:
curl "https://www.wikipedia.org" -o ~/Desktop/wikipedia.txt
From that, the indexer would parse all the text and get the links, then visit the links and so on until all the pages online are done.
Once the indexing is complete, they'd have a ranking of most popular sites and associations with each page's terms.
When a user submits a search, the engine determines the meaning in their search like how Siri works. Those terms are used to look up the index and the most popular sites that fit the search get delivered.
This is where search engines fail badly in certain contexts because something being popular doesn't mean it's right. When you do searches for technical questions, the results are sometimes not the answer but popular sites showing the same unanswered question.
There needs to be intelligent analysis of the content to determine the nature of it, whether it's an opinion, a question, an answer, a historical fact, a peer-reviewed scientific paper and so on. This works better knowing the people who wrote the content too, which is a privacy issue.
Apple has been running Siri for a while and although they feed the data into 3rd party search engines, they wouldn't be starting from nothing. To match or exceed Google they can build up an index and run queries from Siri through both their own engine and Google and compare the results. If their results are worse then they need to improve their index.
It's obviously not just a matter of having enough resources as Microsoft has the resources and hasn't made a better engine than Google. When asked about Apple's engine switch, Microsoft said:
https://techcrunch.com/2017/09/25/apple-switches-from-bing-to-google-for-siri-web-search-results-on-ios-and-spotlight-on-mac/
"We value our relationship with Apple and look forward to continuing to partner with them in many ways, including on Bing Image Search in Siri, to provide the best experience possible for our customers. Bing has grown every year since its launch, now powering over a third of all the PC search volume in the U.S., and continues to grow worldwide. It also powers the search experiences of many other partners, including Yahoo (Verizon), AOL and Amazon, as well as the multi-lingual abilities of Twitter. As we move forward, given our work to advance the field of AI, we’re confident that Bing will be at the forefront of providing a more intelligent search experience for our customers and partners."
Microsoft has managed to gain a decent marketshare and they see AI making their search better than competitors in future. The more that the indexer can understand the content, the better the results are. This is where engines like Yandex (Russia) and Baidu (China) sometimes work better as they have a better understanding of their own language and culture (although Google gets blocked to varying degrees so that doesn't help):
https://www.semrush.com/blog/5-advantages-yandex-google-russia/
Having a high volume of users helps search engines so unpopular ones are already at a disadvantage because they can't see what lots of people are looking for. They need to get the users on the platform first and Apple has an advantage with their install base. Getting the data for the index is fairly trivial. Building an index intelligently is hard. I don't think Google has this part done as well as it could be even if it's better than anyone else, their main strength seems to be more from having the marketshare and reacting to user data.
Apple doesn't need to go all in with a search engine, they can build it up. They can be returning some results from their own engine in much the same way DuckDuckGo gets data from multiple sources and how Siri uses partner services like Yelp. It's not something that just gets switched on, it has to evolve.
Whether Apple wants to build and maintain a search engine is another matter. They'd have to take down millions of copyright links all the time and handling exabytes of private data is tough (per-user encryption is easier than pooling data together). Ultimately their motive would be to improve on what's already there but there aren't many people dissatisfied with Google search so it's not even so much that they couldn't improve on it, at least in some areas, I don't think there's much incentive for them.

Of course, as we know, ranking isn’t that straightforward. They bias the search towards paid results. They bias the search towards their own properties. They also manually bias the search towards what they think is trending, and they cut links to sites they don’t “like”.

What do you mean by "they don't like"?

Well, it’s just a term. They’re cutting hate groups out. While I agree with that, there’s no law requiring them to do so, so I say that they don’t like them.

I'd say it's more by public and government demand. Facebook is attempting to do the same as is Twitter. It's simply the responsible thing to do, especially considering the current climate.

Is it "responsible?" Issues like this leave me feeling conflicted. On one hand it's really, really, really, difficult to find either a reason or the will to defend hateful and socially-regressive content. On the other hand, the idea that government pressure or popular public ideology might influence what can or can not be published/seen/read is terrifying. In some cultures, government policy and public sentiment may favour things you and I think are abhorrent, and oppose things we consider fundamental human rights. Others, particularly those of certain religious views, might feel that your beliefs or mine are socially destructive. Are governments and social majorities good arbiters of what constitutes acceptable content?

Government and the masses have a long history of endorsing beliefs and actions that wind up being condemned in hindsight. Do we want to risk accidentally throwing out the baby with the bath water despite good intentions, or do we believe strongly enough in free speech that we accept that some of that speech is going to be utterly and egregiously contrary to our own values?

It's a tough question, and I don't pretend to know the answer.

To simplify matters, there's no such thing as "hate speech". It all falls under the free speech banner when referring to speech by Americans in America. Private businesses can and do restrict or ban speech of all kinds, whereas it's unconstitutional for the government to do so.

melgross · September 27, 2017 8:53PM

gatorguy said:

melgross said:

gatorguy said:

melgross said:

Marvin said:

lorin schultz said:

MacPro said:

It's about time Apple created their own search engine IMHO. Apple could do it for the good of their users not to exploit them.

I dunno. Have you compared Apple's maps app to Google's lately? It would be fun to see what Apple could do with it, but I wouldn't be particularly confident in Apple's ability to make their own search engine better than Google's.

Geographical maps are a little different because that's mapping the real world into digital. Google's search engine will actually help make their maps more accurate because new addresses will be posted to websites. Internet search is digital to digital. To map out the internet, you just visit every web page and download the text on the site. A typical HTML page is about 60KB (images, CSS, Javascript etc are separate):

https://gigaom.com/2014/12/29/the-overweight-web-average-web-page-size-is-up-15-in-2014/

There are ~1 billion websites, say they average 50 pages per site (forums have way more pages but most sites are just information) = 1b * 50 * 60KB = 3 petabyte of data uncompressed. This is just acquisition data, they need to keep versioned data, which will run into exabytes (1EB = 1 million 1TB HDDs).

That's with all the HTML tags and full content. The indexer would strip that out and look for unique terms and links and just associate that with the URL.

If a web page takes under 5 seconds to load, visiting 50 billion web pages would take 250b seconds = 2.8 million days. That's if you visit them sequentially. They would be loaded in parallel. If Apple had 1,000 servers indexing 1,000 pages each at a time (~480Gbit/s bandwidth), they could index the entire internet in under 3 days. It would need a data center to do it because of the bandwidth requirements, Google has at least 1 petabit/s bandwidth:

https://techcrunch.com/2015/08/18/how-googles-networking-infrastructure-has-evolved-over-the-last-10-years/
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

"this kind of speed allows 100,000 servers to read all of the scanned data in the Library of Congress in less than a tenth of a second."

DuckDuckGo uses Amazon's servers in addition to their own:

http://highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html

The difficult parts are the connection between what the search engine user tells it they want and what the search engine thinks they actually want and managing the scale of the data index and search volume.

They'd need to have an efficient mapping system. Google uses PageRank and other algorithms to determine site popularity:

https://en.wikipedia.org/wiki/PageRank

For every page indexed, the indexer notes links on the page and increments the popularity of that link in the index. The more inbound links a page has, the more popular it is and that gives it a high reputation. There are other factors like whether the page content matches the title, whether it's selling products or it's just information.

You can see an example what a search engine sees using the terminal on the Mac. The following will download Wikipedia's homepage to the desktop as text:
curl "https://www.wikipedia.org" -o ~/Desktop/wikipedia.txt
From that, the indexer would parse all the text and get the links, then visit the links and so on until all the pages online are done.
Once the indexing is complete, they'd have a ranking of most popular sites and associations with each page's terms.
When a user submits a search, the engine determines the meaning in their search like how Siri works. Those terms are used to look up the index and the most popular sites that fit the search get delivered.
This is where search engines fail badly in certain contexts because something being popular doesn't mean it's right. When you do searches for technical questions, the results are sometimes not the answer but popular sites showing the same unanswered question.
There needs to be intelligent analysis of the content to determine the nature of it, whether it's an opinion, a question, an answer, a historical fact, a peer-reviewed scientific paper and so on. This works better knowing the people who wrote the content too, which is a privacy issue.
Apple has been running Siri for a while and although they feed the data into 3rd party search engines, they wouldn't be starting from nothing. To match or exceed Google they can build up an index and run queries from Siri through both their own engine and Google and compare the results. If their results are worse then they need to improve their index.
It's obviously not just a matter of having enough resources as Microsoft has the resources and hasn't made a better engine than Google. When asked about Apple's engine switch, Microsoft said:
https://techcrunch.com/2017/09/25/apple-switches-from-bing-to-google-for-siri-web-search-results-on-ios-and-spotlight-on-mac/
"We value our relationship with Apple and look forward to continuing to partner with them in many ways, including on Bing Image Search in Siri, to provide the best experience possible for our customers. Bing has grown every year since its launch, now powering over a third of all the PC search volume in the U.S., and continues to grow worldwide. It also powers the search experiences of many other partners, including Yahoo (Verizon), AOL and Amazon, as well as the multi-lingual abilities of Twitter. As we move forward, given our work to advance the field of AI, we’re confident that Bing will be at the forefront of providing a more intelligent search experience for our customers and partners."
Microsoft has managed to gain a decent marketshare and they see AI making their search better than competitors in future. The more that the indexer can understand the content, the better the results are. This is where engines like Yandex (Russia) and Baidu (China) sometimes work better as they have a better understanding of their own language and culture (although Google gets blocked to varying degrees so that doesn't help):
https://www.semrush.com/blog/5-advantages-yandex-google-russia/
Having a high volume of users helps search engines so unpopular ones are already at a disadvantage because they can't see what lots of people are looking for. They need to get the users on the platform first and Apple has an advantage with their install base. Getting the data for the index is fairly trivial. Building an index intelligently is hard. I don't think Google has this part done as well as it could be even if it's better than anyone else, their main strength seems to be more from having the marketshare and reacting to user data.
Apple doesn't need to go all in with a search engine, they can build it up. They can be returning some results from their own engine in much the same way DuckDuckGo gets data from multiple sources and how Siri uses partner services like Yelp. It's not something that just gets switched on, it has to evolve.
Whether Apple wants to build and maintain a search engine is another matter. They'd have to take down millions of copyright links all the time and handling exabytes of private data is tough (per-user encryption is easier than pooling data together). Ultimately their motive would be to improve on what's already there but there aren't many people dissatisfied with Google search so it's not even so much that they couldn't improve on it, at least in some areas, I don't think there's much incentive for them.

Of course, as we know, ranking isn’t that straightforward. They bias the search towards paid results. They bias the search towards their own properties. They also manually bias the search towards what they think is trending, and they cut links to sites they don’t “like”.

What do you mean by "they don't like"?

Well, it’s just a term. They’re cutting hate groups out. While I agree with that, there’s no law requiring them to do so, so I say that they don’t like them.

I'd say it's more by public and government demand. Facebook is attempting to do the same as is Apple, Twitter, and PayPal among others. It's simply the responsible thing to do IMO, especially considering the current climate.

They don’t have to. People can claim free speech and take them to court. We’ll just have to see how this all plays out.

melgross · September 27, 2017 9:00PM

lorin schultz said:

gatorguy said:

melgross said:

gatorguy said:

melgross said:

Marvin said:

lorin schultz said:

MacPro said:

It's about time Apple created their own search engine IMHO. Apple could do it for the good of their users not to exploit them.

I dunno. Have you compared Apple's maps app to Google's lately? It would be fun to see what Apple could do with it, but I wouldn't be particularly confident in Apple's ability to make their own search engine better than Google's.

Geographical maps are a little different because that's mapping the real world into digital. Google's search engine will actually help make their maps more accurate because new addresses will be posted to websites. Internet search is digital to digital. To map out the internet, you just visit every web page and download the text on the site. A typical HTML page is about 60KB (images, CSS, Javascript etc are separate):

https://gigaom.com/2014/12/29/the-overweight-web-average-web-page-size-is-up-15-in-2014/

There are ~1 billion websites, say they average 50 pages per site (forums have way more pages but most sites are just information) = 1b * 50 * 60KB = 3 petabyte of data uncompressed. This is just acquisition data, they need to keep versioned data, which will run into exabytes (1EB = 1 million 1TB HDDs).

That's with all the HTML tags and full content. The indexer would strip that out and look for unique terms and links and just associate that with the URL.

If a web page takes under 5 seconds to load, visiting 50 billion web pages would take 250b seconds = 2.8 million days. That's if you visit them sequentially. They would be loaded in parallel. If Apple had 1,000 servers indexing 1,000 pages each at a time (~480Gbit/s bandwidth), they could index the entire internet in under 3 days. It would need a data center to do it because of the bandwidth requirements, Google has at least 1 petabit/s bandwidth:

https://techcrunch.com/2015/08/18/how-googles-networking-infrastructure-has-evolved-over-the-last-10-years/
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

"this kind of speed allows 100,000 servers to read all of the scanned data in the Library of Congress in less than a tenth of a second."

DuckDuckGo uses Amazon's servers in addition to their own:

http://highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html

The difficult parts are the connection between what the search engine user tells it they want and what the search engine thinks they actually want and managing the scale of the data index and search volume.

They'd need to have an efficient mapping system. Google uses PageRank and other algorithms to determine site popularity:

https://en.wikipedia.org/wiki/PageRank

For every page indexed, the indexer notes links on the page and increments the popularity of that link in the index. The more inbound links a page has, the more popular it is and that gives it a high reputation. There are other factors like whether the page content matches the title, whether it's selling products or it's just information.

You can see an example what a search engine sees using the terminal on the Mac. The following will download Wikipedia's homepage to the desktop as text:
curl "https://www.wikipedia.org" -o ~/Desktop/wikipedia.txt
From that, the indexer would parse all the text and get the links, then visit the links and so on until all the pages online are done.
Once the indexing is complete, they'd have a ranking of most popular sites and associations with each page's terms.
When a user submits a search, the engine determines the meaning in their search like how Siri works. Those terms are used to look up the index and the most popular sites that fit the search get delivered.
This is where search engines fail badly in certain contexts because something being popular doesn't mean it's right. When you do searches for technical questions, the results are sometimes not the answer but popular sites showing the same unanswered question.
There needs to be intelligent analysis of the content to determine the nature of it, whether it's an opinion, a question, an answer, a historical fact, a peer-reviewed scientific paper and so on. This works better knowing the people who wrote the content too, which is a privacy issue.
Apple has been running Siri for a while and although they feed the data into 3rd party search engines, they wouldn't be starting from nothing. To match or exceed Google they can build up an index and run queries from Siri through both their own engine and Google and compare the results. If their results are worse then they need to improve their index.
It's obviously not just a matter of having enough resources as Microsoft has the resources and hasn't made a better engine than Google. When asked about Apple's engine switch, Microsoft said:
https://techcrunch.com/2017/09/25/apple-switches-from-bing-to-google-for-siri-web-search-results-on-ios-and-spotlight-on-mac/
"We value our relationship with Apple and look forward to continuing to partner with them in many ways, including on Bing Image Search in Siri, to provide the best experience possible for our customers. Bing has grown every year since its launch, now powering over a third of all the PC search volume in the U.S., and continues to grow worldwide. It also powers the search experiences of many other partners, including Yahoo (Verizon), AOL and Amazon, as well as the multi-lingual abilities of Twitter. As we move forward, given our work to advance the field of AI, we’re confident that Bing will be at the forefront of providing a more intelligent search experience for our customers and partners."
Microsoft has managed to gain a decent marketshare and they see AI making their search better than competitors in future. The more that the indexer can understand the content, the better the results are. This is where engines like Yandex (Russia) and Baidu (China) sometimes work better as they have a better understanding of their own language and culture (although Google gets blocked to varying degrees so that doesn't help):
https://www.semrush.com/blog/5-advantages-yandex-google-russia/
Having a high volume of users helps search engines so unpopular ones are already at a disadvantage because they can't see what lots of people are looking for. They need to get the users on the platform first and Apple has an advantage with their install base. Getting the data for the index is fairly trivial. Building an index intelligently is hard. I don't think Google has this part done as well as it could be even if it's better than anyone else, their main strength seems to be more from having the marketshare and reacting to user data.
Apple doesn't need to go all in with a search engine, they can build it up. They can be returning some results from their own engine in much the same way DuckDuckGo gets data from multiple sources and how Siri uses partner services like Yelp. It's not something that just gets switched on, it has to evolve.
Whether Apple wants to build and maintain a search engine is another matter. They'd have to take down millions of copyright links all the time and handling exabytes of private data is tough (per-user encryption is easier than pooling data together). Ultimately their motive would be to improve on what's already there but there aren't many people dissatisfied with Google search so it's not even so much that they couldn't improve on it, at least in some areas, I don't think there's much incentive for them.

Of course, as we know, ranking isn’t that straightforward. They bias the search towards paid results. They bias the search towards their own properties. They also manually bias the search towards what they think is trending, and they cut links to sites they don’t “like”.

What do you mean by "they don't like"?

Well, it’s just a term. They’re cutting hate groups out. While I agree with that, there’s no law requiring them to do so, so I say that they don’t like them.

I'd say it's more by public and government demand. Facebook is attempting to do the same as is Twitter. It's simply the responsible thing to do, especially considering the current climate.

Is it "responsible?" Issues like this leave me feeling conflicted. On one hand it's really, really, really, difficult to find either a reason or the will to defend hateful and socially-regressive content. On the other hand, the idea that government pressure or popular public ideology might influence what can or can not be published/seen/read is terrifying. In some cultures, government policy and public sentiment may favour things you and I think are abhorrent, and oppose things we consider fundamental human rights. Others, particularly those of certain religious views, might feel that your beliefs or mine are socially destructive. Are governments and social majorities good arbiters of what constitutes acceptable content?

Government and the masses have a long history of endorsing beliefs and actions that wind up being condemned in hindsight. Do we want to risk accidentally throwing out the baby with the bath water despite good intentions, or do we believe strongly enough in free speech that we accept that some of that speech is going to be utterly and egregiously contrary to our own values?

It's a tough question, and I don't pretend to know the answer.

No one has “the” answer. But there have been limits set by the courts. Generally, those limits depend on whether the speech directly, and sometimes indirectly, calls for violence. That’s true particularly if violent actions can be attributed to the call for it from the speaker.

its why the President has been called out for this when he stated a number of times that he approved beating people at rallies who protested, and said that he loved the good old days when that was more common. But he hasn’t gotten in trouble for it, unfortunately. And even though he’s made posts on Twitter favoring violence, Twitter said that it didn’t meet its rules for banashment, even though obviously it did. But banning the president would be a precedent that, so far, at least, no one want to be the first at.

it’s a fine line indeed.

melgross · September 27, 2017 9:04PM

gatorguy said:

lorin schultz said:

gatorguy said:

melgross said:

gatorguy said:

melgross said:

Marvin said:

lorin schultz said:

MacPro said:

It's about time Apple created their own search engine IMHO. Apple could do it for the good of their users not to exploit them.

I dunno. Have you compared Apple's maps app to Google's lately? It would be fun to see what Apple could do with it, but I wouldn't be particularly confident in Apple's ability to make their own search engine better than Google's.

Geographical maps are a little different because that's mapping the real world into digital. Google's search engine will actually help make their maps more accurate because new addresses will be posted to websites. Internet search is digital to digital. To map out the internet, you just visit every web page and download the text on the site. A typical HTML page is about 60KB (images, CSS, Javascript etc are separate):

https://gigaom.com/2014/12/29/the-overweight-web-average-web-page-size-is-up-15-in-2014/

There are ~1 billion websites, say they average 50 pages per site (forums have way more pages but most sites are just information) = 1b * 50 * 60KB = 3 petabyte of data uncompressed. This is just acquisition data, they need to keep versioned data, which will run into exabytes (1EB = 1 million 1TB HDDs).

That's with all the HTML tags and full content. The indexer would strip that out and look for unique terms and links and just associate that with the URL.

If a web page takes under 5 seconds to load, visiting 50 billion web pages would take 250b seconds = 2.8 million days. That's if you visit them sequentially. They would be loaded in parallel. If Apple had 1,000 servers indexing 1,000 pages each at a time (~480Gbit/s bandwidth), they could index the entire internet in under 3 days. It would need a data center to do it because of the bandwidth requirements, Google has at least 1 petabit/s bandwidth:

https://techcrunch.com/2015/08/18/how-googles-networking-infrastructure-has-evolved-over-the-last-10-years/
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

"this kind of speed allows 100,000 servers to read all of the scanned data in the Library of Congress in less than a tenth of a second."

DuckDuckGo uses Amazon's servers in addition to their own:

http://highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html

The difficult parts are the connection between what the search engine user tells it they want and what the search engine thinks they actually want and managing the scale of the data index and search volume.

They'd need to have an efficient mapping system. Google uses PageRank and other algorithms to determine site popularity:

https://en.wikipedia.org/wiki/PageRank

For every page indexed, the indexer notes links on the page and increments the popularity of that link in the index. The more inbound links a page has, the more popular it is and that gives it a high reputation. There are other factors like whether the page content matches the title, whether it's selling products or it's just information.

You can see an example what a search engine sees using the terminal on the Mac. The following will download Wikipedia's homepage to the desktop as text:
curl "https://www.wikipedia.org" -o ~/Desktop/wikipedia.txt
From that, the indexer would parse all the text and get the links, then visit the links and so on until all the pages online are done.
Once the indexing is complete, they'd have a ranking of most popular sites and associations with each page's terms.
When a user submits a search, the engine determines the meaning in their search like how Siri works. Those terms are used to look up the index and the most popular sites that fit the search get delivered.
This is where search engines fail badly in certain contexts because something being popular doesn't mean it's right. When you do searches for technical questions, the results are sometimes not the answer but popular sites showing the same unanswered question.
There needs to be intelligent analysis of the content to determine the nature of it, whether it's an opinion, a question, an answer, a historical fact, a peer-reviewed scientific paper and so on. This works better knowing the people who wrote the content too, which is a privacy issue.
Apple has been running Siri for a while and although they feed the data into 3rd party search engines, they wouldn't be starting from nothing. To match or exceed Google they can build up an index and run queries from Siri through both their own engine and Google and compare the results. If their results are worse then they need to improve their index.
It's obviously not just a matter of having enough resources as Microsoft has the resources and hasn't made a better engine than Google. When asked about Apple's engine switch, Microsoft said:
https://techcrunch.com/2017/09/25/apple-switches-from-bing-to-google-for-siri-web-search-results-on-ios-and-spotlight-on-mac/
"We value our relationship with Apple and look forward to continuing to partner with them in many ways, including on Bing Image Search in Siri, to provide the best experience possible for our customers. Bing has grown every year since its launch, now powering over a third of all the PC search volume in the U.S., and continues to grow worldwide. It also powers the search experiences of many other partners, including Yahoo (Verizon), AOL and Amazon, as well as the multi-lingual abilities of Twitter. As we move forward, given our work to advance the field of AI, we’re confident that Bing will be at the forefront of providing a more intelligent search experience for our customers and partners."
Microsoft has managed to gain a decent marketshare and they see AI making their search better than competitors in future. The more that the indexer can understand the content, the better the results are. This is where engines like Yandex (Russia) and Baidu (China) sometimes work better as they have a better understanding of their own language and culture (although Google gets blocked to varying degrees so that doesn't help):
https://www.semrush.com/blog/5-advantages-yandex-google-russia/
Having a high volume of users helps search engines so unpopular ones are already at a disadvantage because they can't see what lots of people are looking for. They need to get the users on the platform first and Apple has an advantage with their install base. Getting the data for the index is fairly trivial. Building an index intelligently is hard. I don't think Google has this part done as well as it could be even if it's better than anyone else, their main strength seems to be more from having the marketshare and reacting to user data.
Apple doesn't need to go all in with a search engine, they can build it up. They can be returning some results from their own engine in much the same way DuckDuckGo gets data from multiple sources and how Siri uses partner services like Yelp. It's not something that just gets switched on, it has to evolve.
Whether Apple wants to build and maintain a search engine is another matter. They'd have to take down millions of copyright links all the time and handling exabytes of private data is tough (per-user encryption is easier than pooling data together). Ultimately their motive would be to improve on what's already there but there aren't many people dissatisfied with Google search so it's not even so much that they couldn't improve on it, at least in some areas, I don't think there's much incentive for them.

Of course, as we know, ranking isn’t that straightforward. They bias the search towards paid results. They bias the search towards their own properties. They also manually bias the search towards what they think is trending, and they cut links to sites they don’t “like”.

What do you mean by "they don't like"?

Well, it’s just a term. They’re cutting hate groups out. While I agree with that, there’s no law requiring them to do so, so I say that they don’t like them.

I'd say it's more by public and government demand. Facebook is attempting to do the same as is Twitter. It's simply the responsible thing to do, especially considering the current climate.

Is it "responsible?" Issues like this leave me feeling conflicted. On one hand it's really, really, really, difficult to find either a reason or the will to defend hateful and socially-regressive content. On the other hand, the idea that government pressure or popular public ideology might influence what can or can not be published/seen/read is terrifying. In some cultures, government policy and public sentiment may favour things you and I think are abhorrent, and oppose things we consider fundamental human rights. Others, particularly those of certain religious views, might feel that your beliefs or mine are socially destructive. Are governments and social majorities good arbiters of what constitutes acceptable content?

Government and the masses have a long history of endorsing beliefs and actions that wind up being condemned in hindsight. Do we want to risk accidentally throwing out the baby with the bath water despite good intentions, or do we believe strongly enough in free speech that we accept that some of that speech is going to be utterly and egregiously contrary to our own values?

It's a tough question, and I don't pretend to know the answer.

You are absolutely correct. In effect we're trusting social sites and web services providers to use proper judgement, and they may not always do so. Yes there's a danger that "free speech" may not be quite as free on certain web properties nor with various supporting services.

At some point we've all wanted to scream "ENOUGH" when pictures of cars crashing thru crowds killing and maiming all in their path flash across our screens. Some of us want to cry when we see innocents getting the attention of some hate-monger in a political rally, come across images of bodies in the street, or some grandmother who was simply passing by now bloody and battered by someone in a rampaging angry crowd.

So I don't know the answer either. Should Apple and Google and Facebook turn away, it's not their business? Be active in social commentary but take no action that might stifle "free speech"? Be proactive, make it harder for the worst of the worst to spread messages of hate and encourage destruction?

I dunno...

(End of political rant)

I would like to make the point that it isn’t censorship when companies decide to carry, or not carry any material that they feel isn't within their own comfort zone.

governments censor, corporations don’t. It’s a matter of overall coverage. A company may not sell a book by a Nazi, but others will. If a government states that that author, or his/her works are banned, then no one can sell it.

melgross · September 27, 2017 9:12PM

SpamSandwich said:

lorin schultz said:

gatorguy said:

melgross said:

gatorguy said:

melgross said:

Marvin said:

lorin schultz said:

MacPro said:

It's about time Apple created their own search engine IMHO. Apple could do it for the good of their users not to exploit them.

I dunno. Have you compared Apple's maps app to Google's lately? It would be fun to see what Apple could do with it, but I wouldn't be particularly confident in Apple's ability to make their own search engine better than Google's.

Geographical maps are a little different because that's mapping the real world into digital. Google's search engine will actually help make their maps more accurate because new addresses will be posted to websites. Internet search is digital to digital. To map out the internet, you just visit every web page and download the text on the site. A typical HTML page is about 60KB (images, CSS, Javascript etc are separate):

https://gigaom.com/2014/12/29/the-overweight-web-average-web-page-size-is-up-15-in-2014/

There are ~1 billion websites, say they average 50 pages per site (forums have way more pages but most sites are just information) = 1b * 50 * 60KB = 3 petabyte of data uncompressed. This is just acquisition data, they need to keep versioned data, which will run into exabytes (1EB = 1 million 1TB HDDs).

That's with all the HTML tags and full content. The indexer would strip that out and look for unique terms and links and just associate that with the URL.

If a web page takes under 5 seconds to load, visiting 50 billion web pages would take 250b seconds = 2.8 million days. That's if you visit them sequentially. They would be loaded in parallel. If Apple had 1,000 servers indexing 1,000 pages each at a time (~480Gbit/s bandwidth), they could index the entire internet in under 3 days. It would need a data center to do it because of the bandwidth requirements, Google has at least 1 petabit/s bandwidth:

https://techcrunch.com/2015/08/18/how-googles-networking-infrastructure-has-evolved-over-the-last-10-years/
https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html

"this kind of speed allows 100,000 servers to read all of the scanned data in the Library of Congress in less than a tenth of a second."

DuckDuckGo uses Amazon's servers in addition to their own:

http://highscalability.com/blog/2013/1/28/duckduckgo-architecture-1-million-deep-searches-a-day-and-gr.html

The difficult parts are the connection between what the search engine user tells it they want and what the search engine thinks they actually want and managing the scale of the data index and search volume.

They'd need to have an efficient mapping system. Google uses PageRank and other algorithms to determine site popularity:

https://en.wikipedia.org/wiki/PageRank

For every page indexed, the indexer notes links on the page and increments the popularity of that link in the index. The more inbound links a page has, the more popular it is and that gives it a high reputation. There are other factors like whether the page content matches the title, whether it's selling products or it's just information.

You can see an example what a search engine sees using the terminal on the Mac. The following will download Wikipedia's homepage to the desktop as text:
curl "https://www.wikipedia.org" -o ~/Desktop/wikipedia.txt
From that, the indexer would parse all the text and get the links, then visit the links and so on until all the pages online are done.
Once the indexing is complete, they'd have a ranking of most popular sites and associations with each page's terms.
When a user submits a search, the engine determines the meaning in their search like how Siri works. Those terms are used to look up the index and the most popular sites that fit the search get delivered.
This is where search engines fail badly in certain contexts because something being popular doesn't mean it's right. When you do searches for technical questions, the results are sometimes not the answer but popular sites showing the same unanswered question.
There needs to be intelligent analysis of the content to determine the nature of it, whether it's an opinion, a question, an answer, a historical fact, a peer-reviewed scientific paper and so on. This works better knowing the people who wrote the content too, which is a privacy issue.
Apple has been running Siri for a while and although they feed the data into 3rd party search engines, they wouldn't be starting from nothing. To match or exceed Google they can build up an index and run queries from Siri through both their own engine and Google and compare the results. If their results are worse then they need to improve their index.
It's obviously not just a matter of having enough resources as Microsoft has the resources and hasn't made a better engine than Google. When asked about Apple's engine switch, Microsoft said:
https://techcrunch.com/2017/09/25/apple-switches-from-bing-to-google-for-siri-web-search-results-on-ios-and-spotlight-on-mac/
"We value our relationship with Apple and look forward to continuing to partner with them in many ways, including on Bing Image Search in Siri, to provide the best experience possible for our customers. Bing has grown every year since its launch, now powering over a third of all the PC search volume in the U.S., and continues to grow worldwide. It also powers the search experiences of many other partners, including Yahoo (Verizon), AOL and Amazon, as well as the multi-lingual abilities of Twitter. As we move forward, given our work to advance the field of AI, we’re confident that Bing will be at the forefront of providing a more intelligent search experience for our customers and partners."
Microsoft has managed to gain a decent marketshare and they see AI making their search better than competitors in future. The more that the indexer can understand the content, the better the results are. This is where engines like Yandex (Russia) and Baidu (China) sometimes work better as they have a better understanding of their own language and culture (although Google gets blocked to varying degrees so that doesn't help):
https://www.semrush.com/blog/5-advantages-yandex-google-russia/
Having a high volume of users helps search engines so unpopular ones are already at a disadvantage because they can't see what lots of people are looking for. They need to get the users on the platform first and Apple has an advantage with their install base. Getting the data for the index is fairly trivial. Building an index intelligently is hard. I don't think Google has this part done as well as it could be even if it's better than anyone else, their main strength seems to be more from having the marketshare and reacting to user data.
Apple doesn't need to go all in with a search engine, they can build it up. They can be returning some results from their own engine in much the same way DuckDuckGo gets data from multiple sources and how Siri uses partner services like Yelp. It's not something that just gets switched on, it has to evolve.
Whether Apple wants to build and maintain a search engine is another matter. They'd have to take down millions of copyright links all the time and handling exabytes of private data is tough (per-user encryption is easier than pooling data together). Ultimately their motive would be to improve on what's already there but there aren't many people dissatisfied with Google search so it's not even so much that they couldn't improve on it, at least in some areas, I don't think there's much incentive for them.

Of course, as we know, ranking isn’t that straightforward. They bias the search towards paid results. They bias the search towards their own properties. They also manually bias the search towards what they think is trending, and they cut links to sites they don’t “like”.

What do you mean by "they don't like"?

Well, it’s just a term. They’re cutting hate groups out. While I agree with that, there’s no law requiring them to do so, so I say that they don’t like them.

I'd say it's more by public and government demand. Facebook is attempting to do the same as is Twitter. It's simply the responsible thing to do, especially considering the current climate.

Is it "responsible?" Issues like this leave me feeling conflicted. On one hand it's really, really, really, difficult to find either a reason or the will to defend hateful and socially-regressive content. On the other hand, the idea that government pressure or popular public ideology might influence what can or can not be published/seen/read is terrifying. In some cultures, government policy and public sentiment may favour things you and I think are abhorrent, and oppose things we consider fundamental human rights. Others, particularly those of certain religious views, might feel that your beliefs or mine are socially destructive. Are governments and social majorities good arbiters of what constitutes acceptable content?

Government and the masses have a long history of endorsing beliefs and actions that wind up being condemned in hindsight. Do we want to risk accidentally throwing out the baby with the bath water despite good intentions, or do we believe strongly enough in free speech that we accept that some of that speech is going to be utterly and egregiously contrary to our own values?

It's a tough question, and I don't pretend to know the answer.

To simplify matters, there's no such thing as "hate speech". It all falls under the free speech banner when referring to speech by Americans in America. Private businesses can and do restrict or ban speech of all kinds, whereas it's unconstitutional for the government to do so.

No, there certainly is hate speech. I don’t understand how you could think otherwise. What I’ve found over the decades is that groups that are in the majority often don’t understand hate speech, while minorities are very sensitive to it.

a few years ago, and I know who it was, but can’t remember her name right now (which is nothing unusual for me with names), a female black rapper said that there was no such thing as a black racist. Meaning black against white. It was very controversial. While I kind of disagree for several reasons, she was making a very good point.

her point was that you weren’t a racist if you didn’t like racists. White Protestants make jokes about blacks, Jews, Catholics and others all the time, and don’t think there’s anything wrong with it, because they’re the majority. It just works that way.

Apple switches from Bing to Google as default search platform in Siri, iOS Search, and Mac...

Comments