Year: 2017

I have just come across an old screenshot of a Google result page while tidying up my files, and I just could not resist comparing it to a contemporary one. Wow, those were the times when you had at least a chance to get on the screen with SEO for popular keywords!

These two screenshots showing the results for the very lucrative keyword combination: ”hotel budapest”, albeit in Hungarian, clearly illustrate how Google’s search engine evolved over the years. While you might spot first some web design related changes, it is more interesting to look behind the surface and discover the substantial changes and tendencies which have a great impact on our daily online marketing activities.

No organic results above the fold

While the old screenshot shows four non-paid search results, occupying almost the half of the screen real estate devoted to displaying results, the more recent screen capture has no ordinary search results above the fold, not even on bigger screens with full HD resolutions.

During those good old days, if you were clever enough, you could get a considerable exposure in Google search results, while nowadays you have to rather rely on paid online campaigns, just because otherwise you cannot bring your message before the eyes of your audience on the most popular web pages. On the above screenshot, you can observe how the paid results, a knowledge panel on the side, and the map-based hotel search interface push down the first organic search results way below the fold. Meanwhile, in the old screenshot below, you can spot non-paid result snippets of an old-school hotel booking/aggregating site and a website of a local hotel aptly named “Hotel Budapest”.

By the way, these webmasters of these first-generation hotel listing/booking pages were among the first ones who were actively doing SEO in Hungary. These sites had been optimized quite frequently by the owner of the property, heavily relying on automated mass link exchange schemes and other, now obsolete, if not counterproductive techniques. They had been wiped out a couple of years ago by a Google algorithm update, way before booking.com and Airbnb started new chapters in the history of online booking.

Google wants to keep users in its walled garden

Although Google’s robots still do not create content, the tendency is clear, just as it was obvious ten years ago too: more and more web content is to be scraped, processed and displayed in a structured format on Google’s own web pages, thus more and more answers are given on the search result pages itself, therefore the content creators are rewarded with fewer and fewer clickthroughs by Google.

You can observe how the hotel offers are listed on a map, how the different prices from the booking portals and the hotel web pages are blended in search results, and how the whole thing is enriched by user reviews and other additional information hosted by the Google My Business service.

It can be easily seen that this ”movement” provides a more seamless experience for the search engine users but also means a lot of drawbacks for the webmasters aiming for getting organic, that is, free traffic from one of the most popular websites of our time.

Conclusion

As big online platforms, such as Google and Facebook, are trying to increase their revenue year by year, less and less free slots are to be available for webmasters, and more and more is to be paid if someone wants to get traffic from these huge sites.

The other phenomenon related also to the constant need of generating more profit is that these online platforms are increasingly capturing others’ content, incorporating them into their services, that is, instead of conveying traffic to the content sources, they will try everything to keep their users close to their ad placements while they consume the third-party contents.

Some more tidbits

It’s also funny to see that while a couple of years ago there was only 948 000 results reported, nowadays there are approximately 8 210 000 results estimated – almost ten times more.

This might be the reason for a ”slower” result generation time: instead of 0,26 seconds, it took almost three times longer, a whopping 0,69 seconds. 🙂

The new results page just does not fit on a 1024 pixel wide screen anymore — a screen resolution commonly used by web designers as the minimum width ten years ago.

The knowledge panel providing in-depth information about a movie titled ”The Grand Budapest Hotel” among the contemporary results indicates that Google is not quite sure about your search intent when you just type ”budapest hotel”. If the big G is not sure what you are looking for, then there might be a considerable amount of search volume for the movie too — in comparison to the volume of the ordinary hotel searches.

As nowadays you can buy so many different types of LED filament bulbs and all sorts of chord sets for pendant lamps, there is quite a big temptation for the DIY-minded folks to hang their own designs instead of buying a ready-made piece of pendant light or chandelier. At least, I could not resist, so I ended up creating lamps with non-conventional lampshades too.

Ikeahack pendant light

Take one — Just chords and bulbs

First, I have just bought a triple chord set and three 400lm LED filament bulbs. As we needed a quick solution, I have just started to play around with weaving the chords. Better than nothing, but I guess I will eventually change this for something slightly more sophisticated. While you don’t need to have a lampshade for this kind of bulbs, 1200 lumens aren’t too much for a bigger room like ours.

Ikea Hemma triple pendant light with Ikea Lunnom LED filament bulbs

Take two — Non-conventional lamp shade

Next time I’ve just got a couple of vintage-looking colanders from IKEA and all the other stuff from a local DIY store. All I had to do is to drill a bigger hole in the colanders for the fittings; plus cut and connect the cables. First I opted for conventional size 1000lm filament LED light bulbs, but these were just too bright and glaring, so I replaced them with three 400lm bulbs – of the smallest size you could buy.

Ikeahack pendant light with Ikea Gemak colander lampshades

Take three – Abstract lampshade

Thinking about the next lamp I needed to install I was quite sure that I would use the surplus dry cleaner wire hangers we accumulated during the last couple of years, so I started to tinker with them. As it was intended for our low-ceiling bedchamber, I just needed something small and simple with just a single light bulb, so I just bought an IKEA Hemma cord set, a big Lunnom bulb again, and started to sculpture something out of my wire hangers, fixing them with small wires which were sold with dustbin bags. Although it ended up as a rather abstract –from some perspectives quite chaotic– wireframe lampshade, it projects quite interesting shadows onto the ceiling.

Dry cleaner wire hanger lamp with Ikea Lunnom filament LED bulb

Some more photos

I hope, these photos will give you some inspiration to build your own lamps one day. 🙂

Ikeahack pendant lamp

  

 

LinkedIn is a really cool platform when it comes to job search—or looking for your next step in your career, whatever. But what can you do when the site lacks a couple of vital features, and therefore it just wastes your time unnecessarily? Plus, how can you make sure that you have seen all the potentially interesting job offers when there are a couple of hundreds of positions available in your region?

Reading through hundreds of job offers?

Yes, I might be in a unique situation, where many circumstances just do not matter that much, so perhaps I keep more on my radar and browse through more adverts than an ordinary—or a casual job hunter. And this is where my problem is rooted: it is just too cumbersome to regularly check new job adverts on LinkedIn. Imagine that you have to enter many keywords one by one, then enter the location, set the desired distance radius, sort them by the date and start scrolling through the list, click each job offer which might seem interesting—judged by the search result page excerpt. Usually, you end up checking many ads you have already seen, again and again.

Save button for Job offers is just not enough

The whole process could be much easier with a simple feature like a button next to the Save/Unsave button with the text ”That’s not for me” or simply “Hide”. Luckily, I can quickly write a script with a web automation software to implement some of the missing functions of LinkedIN—or any other web site. However, the vast majority of ZennoPoster or Ubot Studio users are using these pieces of software to build bots which scrape a considerable amount of information from LinkedIn (something that LinkedIn hates and tries to prevent as much as possible), but you can use these tools for legitimate purposes too: to hack together something which provides you with the missing features.

It will not work with Ubot Studio alone

Ubot Studio has a nice feature I needed. It allows you to combine a web browser window with an additional user interface for data input, therefore I had initially started to implement this simple script in Ubot. Unfortunately, again I found some very basic obstacles which prevented me from building anything usable. <rant begins> To be honest, Ubot is just the least stable piece of software I have ever seen. If you take into consideration its price too, I think Ubot could be nominated for the title of the most time and money wasting application ever. I have already wasted so much time because of its frequent crashes, inability to deal with certain types of websites, unpredictable behavior, etc., that I always regret when I pay for an upgrade again. As I mentioned there are only two features why I still keep on struggling with this tool: the additional user interface and the ability to easily compile standalone .exe files for the bots. All in all, if I logged in to LinkedIN, Ubot just could not detect anything on the web page loaded, and tech support had no solution for that either — which is a shame and gives me the feeling that something is just screwed up with this tool from the ground up. <rant ends>

Let’s write two scripts then!

This is why I fired up ZennoPoster, and put together a simple script—without any problems. It logs in to my LinkedIN profile, sets the location and enters the keywords for job search, like online marketing, digital marketing, social media, google, facebook, adwords, seo, hongarije, hongaars, archicad, spanish, hungarian, hungary, marketing, marketeer, etc. Then it goes through the search results list by clicking on the pagination links, and scrapes all the job advert links to a plain text list ensuring that the link is not already listed among those which have been checked previously. It also checks the presence of the keywords on the exclusion list, such as recruit, stage, stagiair, intern, php(\ |-), javascript, frontend, backend, front-end, back-end, webdeveloper, \.net\, etc. and omit these job offers obviously not made for me.

Step #2 – back to Ubot Studio!

Now, I have a URL list of all the job offers which have certain chance to be promising for me. The next step is to read through them as quickly as possible—we are talking about hundreds of job offers. Sometimes there are many false positives: for instance, if a recruiter company includes phrases like ”visit our Facebook page”, in each of their job descriptions, then the fact that I am looking for Facebook-related jobs with the ”facebook” keyword just makes everything much more difficult. In addition to that, I could find a couple of interesting job offers which did not match any of the specific keywords like ”online marketing”, only the very generic ones like ”marketing”.

I have already had similar, very simple bots written in Ubot for going through a long list of URLs to add feedback to every loaded web page and save back the URLs with the manually added data. (Think about quickly tagging or writing titles for hundreds of products in a web shop while understanding what those products are really about.) So I just quickly modified an older script. It just loads one job offer page, waits until I click an appropriate check box: either ”Interesting” or ”Delete, and then it automatically loads the next web page while administering the process by adding the URL to the list of URLs already checked plus to the list of interesting job adverts if I clicked the corresponding box. With this method I could so quickly go through hundreds of job offers that once LinkedIn thought I am a bot (as I was logged in, since I also wanted to save some of the interesting jobs), and practically did not let me do anything without solving those silly captchas again about cars, roads and road signs, so I had to give up using LinkedIn for a day or so.

For someone who cannot just that quickly write web automation scripts, it might not have been worth setting up an automation hack for this case, but for me, I think it was worth it. On one hand, I guess the fact that I don’t speak Dutch (yet) will make my job hunting a little bit longer than the average, so it will save a lot of time for me. On the other hand by reading through so many job descriptions, now I have a better understanding of what kind of jobs are available in The Hage area.

In this article I will show you how to migrate a site created with an old, and nowadays deprecated content management system to a contemporary CMS, using web automation tools – that is grabbing a site’s content by walking through it with bots imitating human visitors, and uploading it to the new site similarly, by acting just like a human editor.

More than a decade ago I started to build a very successful website with a simple yet powerful Zope-based Wiki engine called Zwiki. As both Wikis (with the one notable exception of Wikipedia) and Zope usage has been in decline for many years, I haven’t actively developed that site anymore, but as I did not want to lose its content, I decided to migrate it to WordPress.

Step #1: Scraping a Zwiki site

Getting structured content from a wiki

When moving a site, the first challenge is to download the website’s content in a structured format. As we are talking about a wiki-type site, there is a strong emphasis on the word: structured, as the basic philosophy of the wikis consists of adding content to one single page body field, and using certain kind of formatting notations inside of that one big content field to display the information in a format which resembles some structure. Zwiki, for instance, has a handy function which allows commenting and following others’ comment on any wiki page, but all the comments are to be added to the very same field where the actual page content is stored, therefore I had to find in the pages’ text where each comment begins, and store them separately from the content.

Dealing with the special content markup

Yet another challenge was that Zwiki, just as many of its counterparts, uses a specific, simpler-than-HTML kind of markup code, which cannot be recognised by any contemporary content management system, so I could not rely on the content I could get by opening all the pages for editing, so I had to scrape the public web pages, where the special markup is already interpreted and translated to ordinary HTML.

Imitating human visitors with web automation tools

As I have experience working with a few web scraping/web automation software my obvious choice was to scrape the Zwiki site as if a human visitor would click through each and every link on the page and download its content. This way you are not limited by the export/import formats a certain CMS would offer when it comes to acquiring and uploading content, but you can get whatever part of the content you want, and process them with whatever regular expressions you want and log the results in any format. If a human visitor can walk through the entire site, you can grab all the information.

The logic behind the scraper script

Wiki-based content management systems tend to have a feature which greatly simplifies the content scraping process: they usually have a wiki contents page where all the pages of the wiki are listed. Therefore it seemed to be a very easy task to get all the content I needed to move: just open the contents page, scrape all the links, go through the list of them and visit, download and post-process each one. As an output, I have generated a .csv file where the page hierarchy, that is all the parent pages has been logged, another .csv file where the actual content of each page has been logged with a few pieces of key information such as title, URL or last modified date. This last piece of information could be obtained by visiting each wiki page’s history sub page and reading the dates of previous changes listed there.  The third file had every comment in a separate row, extracted by regular expressions from the page content. I have also generated another file with the raw content for debugging purposes. It records the page content plus the comments in their original format so that if something went wrong with the processing of the comments, the original source could be at hand.

Putting it all together with UBot Studio

As the whole process didn’t seem to be too difficult, I opted for using Ubot Studio for downloading and structuring the site’s content. It is marketed as an automation tool for internet marketers, but to be honest its main purpose was once to scrape and spam websites by link submissions, comments, etc. But nevertheless it can be used for various web automation purposes, and one of its key function that the Bots I create can be compiled in a .exe format, which can be run on any Windows computer, without having to buy the software itself. I would not publish this executable as I don’t want anyone to play around with scraping Zwiki sites, thus putting an unnecessary load on their servers, but feel free to contact me by commenting this page or dropping me a mail (kedves /at/ oldalgazda /dot/ hu) if you need that .exe file to migrate your Zwiki site as well.

Another interesting feature of Ubot is that although its primary interface is a visual programming UI, you can still switch to code view, where you can edit the script as if it was coded in an ”ordinary” programming language. The Zwiki scraper script, for instance, looks like this below in code view. If you have some patience, you can go through the script and understand what each step did, and see which regular expressions I used when structuring the data:

 ui text box("Domain to scrape (without http(s)://):",#domain)
 allow javascript("No")
 navigate("{#domain}/FrontPage/contents","Wait")
 wait for browser event("Everything Loaded","")
 wait(5)
 set(#scraped,$scrape attribute(<class="formcontent">,"innerhtml"),"Global")
 add list to list(%pageurls,$find regular expression(#scraped,"(?<=href=\")[^\"]+"),"Delete","Global")
 loop($list total(%pageurls)) {
     set(#pageurl,$list item(%pageurls,1),"Global")
     navigate(#pageurl,"Wait")
     wait for browser event("Everything Loaded","")
     wait(5)
     set(#content,$scrape attribute(<class="content">,"innerhtml"),"Global")
     set(#content,$replace regular expression(#content,"<a\\ class=\"new\\ .+?(?=</a>)</a>","<!-- no wikipage yet -->"),"Global")
     set(#content,$replace(#content,$new line,$nothing),"Global")
     set(#content,$replace regular expression(#content,"\\t"," "),"Global")
     set(#contentonly,$replace regular expression(#content,"<p><div\\ class=\"subtopics\"><a\\ name=\"subtopics\">.+",$nothing),"Global")
     set(#contentonly,$replace regular expression(#contentonly,"<p><a name=\"comments\">.+",$nothing),"Global")
     set(#contentonly,$replace regular expression(#contentonly,"<a name=\"bottom\">.+",$nothing),"Global")
     add list to list(%parents,$scrape attribute(<class="outline expandable">,"innertext"),"Delete","Global")
     set(#parentlist,$list item(%parents,0),"Global")
     clear list(%parents)
     add list to list(%parents,$list from text(#parentlist,$new line),"Delete","Global")
     set(#parentlist,$replace(#parentlist,$new line,";"),"Global")
     set(#posttitle,$list item(%parents,$eval($subtract($list total(%parents),1))),"Global")
     set(#posttitle,$replace(#posttitle," ...",$nothing),"Global")
     if($comparison($list total(%parents),"> Greater than",1)) {
         then {
             set(#parent,$list item(%parents,$eval($subtract($list total(%parents),2))),"Global")
         }
         else {
             set(#parent,$nothing,"Global")
         }
     }
     append to file("{$special folder("Desktop")}\\{#domain}-page-hierarchy.csv","{#pageurl}    {#posttitle}    {#parent}    {#parentlist}    {$new line}","End")
     clear list(%parents)
     add list to list(%comments,$find regular expression(#content,"<p><a[^>]+name=\"msg.+?(?=<p><a[^>]+name=\"msg.+)"),"Delete","Global")
     loop($list total(%comments)) {
         set(#comment,$list item(%comments,0),"Global")
         set(#date,$find regular expression(#comment,"(?<=name=\"msg)[^@]+"),"Global")
         set(#title,$find regular expression(#comment,"(?<=<b>).+?(?=</b>\\ --)"),"Global")
         set(#title,$replace regular expression(#title,"<[^>]+>",$nothing),"Global")
         set(#author,$find regular expression(#comment,"(?<=</b>\\ --).+?(?=<a\\ href=\"{#pageurl})"),"Global")
         set(#author,$replace regular expression(#author,"<[^>]+>",$nothing),"Global")
         set(#author,$replace regular expression(#author,",\\ *$",$nothing),"Global")
         set(#comment,$find regular expression(#comment,"(?<=<br(|\\ /)>).+"),"Global")
         set(#comment,"<p>{#comment}","Global")
         set(#comment,$replace regular expression(#comment,"\\t"," "),"Global")
         append to file("{$special folder("Desktop")}\\{#domain}-page-comments.csv","    {#pageurl}    {#date}    {#title}    {#author}    {#comment}    {$new line}","End")
         remove from list(%comments,0)
     }
     navigate("{#pageurl}/history","Wait")
     wait for browser event("Everything Loaded","")
     wait(5)
     scrape table(<outerhtml=w"<table>*">,&edithistory)
     set(#lastedited,$table cell(&edithistory,0,4),"Global")
     clear table(&edithistory)
     append to file("{$special folder("Desktop")}\\{#domain}-page-content-raw.csv","{#pageurl}    {#lastedited}    {#content}    {$new line}","End")
     append to file("{$special folder("Desktop")}\\{#domain}-page-content-only.csv","{#pageurl}    {#posttitle}    {#lastedited}    {#contentonly}    {$new line}","End")
     remove from list(%pageurls,0)
 }

Step #2 Uploading the content to WordPress

Now that I have all the necessary data downloaded to .csv files in a structured format, I needed to create other scripts to upload the content to a WordPress site. Here I opted for the same technique, that is imitating a human visitor, which hits the ”Create a new page” button each and every time, and fills all the edit fields with the data grabbed from the downloaded .csv files. More details about this part can be read here: From Plone to WordPress — Migrating a site with web automation tools