Do you like ads like “Melania Trump is leaving the White House!” and “Televangelist Joel Osteen is leaving his wife!” are showing on your website by the AdSense? I don’t think so.
The ads on our sites looks awful. Another reason to block the ads content of eye-catching ads did not match content of site you can visit by clicking the ads.
You can use Ads Review Center of AdSense interface to block theese bad ads, but it needs a lot of time.
So, there is a php based solution to autoblocking the bad ads by your settings: Bad ads Blocker.
Below is Google-translated user manual. If you could not get anything you can ask me in comments or by email badadsblocker at the site domain.
There is a software on php, which independently searches for ads from cheating ad-fraudsters, blocks them and displays a report on the work done (in case of erroneous locks or non-blocking programs, it is easy to tune up the program, and ads can be blocked / unlocked with just one click).
Why on php? Most site owners have the option to select a separate daddy or subdomain to host a small program on php. The source code is open, so you can make sure that the software does not steal your password and other personal data. Put in any folder, on any site, just unpack and refer to the address where it lies. If necessary, change the user to which the files belong. The default is root. If it feels that there are no rights to the necessary directories, it will certainly refuse to work, giving the reason.
Short description: we customize the daws in accordance with your wishes, we save the button from the bottom, the list of stop words is saved by a separate button.
The filtering process is started in three ways.
Manually;
Automatically through the specified interval in the settings based on JavaScript (the window with the program should be opened in your Chrome or that you like). You can play here.
Through a scheduler, for example cron.
Each launch is a single search cycle.
System requirements (as a rule, it’s all there, but not always)
php5.3 – php5.6, php7 (php7 – does not roll, because there are a lot of disagreements. In the future, support is planned, but so far without specific dates).
Curl.
DOM extension. A little about installing DOM.
A package for working with multibyte encodings mbstring.
Included support for json.
For the temporary folder (by default tempdata) and the settings folder (default settings) there must be write permission.
The server should not send the header “x-frame-options: DENY” (otherwise you’ll have to run authorization and search results in a new tab). If you still can not remove this title, then check “Frames do not work”.
It handles ads for quite a long time (10 seconds per 15 ads), with no appreciable resource consumption seen, neither by processor nor by memory. I looked through the top while working. Therefore, the application tries to override the maximum operating time to 10 minutes and slightly expand the memory (just in case).
Description (instructions for use)
Installation
Download the archive and upload to your site.
Unpack it. For example, in the folder bad_ads_blocker (I recommend to choose a different name, for example ad_cleaner).
Go to mysite.com/bad_ads_blocker/ or write the folder as you called it (do not enter specific php).
If the program swears on the absence of folder permissions, then go to the hosting / server and set permissions on the settings and tempdata folders, for example, 775.
It is possible to launch under Windows on xampp with php 7. Probably, and on php 5 under xampp will work (did not check).
On Windows x64, Curl may not work correctly.
You can also use Open Server.
Preparation
We put the necessary checkboxes, enter the numbers (a description of the checkmarks and tsiferok a little lower).
At the bottom click “Debug and other …”, there you need to enter your mail (login from AdSense).
Be sure to save – click the “Update settings” button at the bottom left (under the settings).
On the top right, click on “Google Auth” to leave the field for entering the password and the button to start the authorization process. It is hidden as it is necessary rarely, so as not to get in the way and once again not accidentally.
Enter your password and click the “Login to Google” button. If nothing appears in the bottom window, then check the headers of your server: https://bertal.ru. If there is an “X-Frame-Options: DENY”, then either set up to avoid Deny, or enable the “Frames do not work” checkbox.
Authorization
If the mail is entered incorrectly, the system will report this.
If the password is entered incorrectly, the system will report this.
If two-level is enabled, the software will ask for the code via SMS and ask you to enter this message. If not, it is simply authorized.
If your server or server, where the site where you put the software territorially located next to you, then the authorization takes place without unnecessary hassle.
It is quite possible that the software will work from another city (for example, a hosting server in Irkutsk) or another country, then Google will start to panic and send a warning message. At the same time he will not enter. We need to open this letter and confirm to him that we have tried to enter, then we will be able to try again. And then Google will send an SMS with a confirmation code (even if two-step authorization is turned off).
If in the latter case the phone is not tied, then accordingly no SMS will come, but simply Google will ask to type letters from the image (up to version 3.0 inclusive, processing of such requests is not supported).
When the software is authorized, you can start the filtering process.
Authorization (backup version)
We install some extension to export cookies. Whatever you use, look for extensions in the words export cookies. See that the exported format is compatible with “Netscape standard”. Or, it is clearly indicated that the standard is compatible with Curl. This all suits us. To prevent all cookies from being dragged, you can create a new Chrome user, enter your Google account and pull out the cookie from there. Put them in the tempdata folder under the name cookie.txt, in the same place create a file pub_id.txt, in which to save your publisher ID in the format: “pub-1908597777777777” – pub- and 16 digits. I personally checked it – it works.
Run through the scheduler
You can simply specify the search file name “search_bad_ads.php” (naturally with full path) and every time it starts it will search according to the last saved settings.
Since version 2.3, you can override the settings when running cron.
The simple variant: “search_bad_ads.php 1 50 5”. Three optional parameters.
The first is responsible for the inclusion of already viewed, analogue jackdaws “Check reviewed ads.” It is possible to set it to zero, then we forcefully turn off the scan for the given launch. By the way, when you override the scan already viewed by this method, you disable saving checked ads, and then for a day their thousands of identical ones are stored.
The second is the number of ads per page, similar to “Ads per page”.
The third is the number of pages, the equivalent of “Number of pages”.
All parameters are optional, but their sequence can not be violated. The software will in any case be waiting for the first on / off viewed, in the second and third, as above written.
Extended version:
‘{“num_of_pages”: “5”, “num_of_ads_per_page”: “50”, “lat2cyr”: “checked”, “reviewed”: “checked”}’
Be sure to enclose in single quotes!
You can specify a single parameter: a json-string with the settings that you can get by clicking the “Show json-string” button, hidden under “Debug and other …”. First, all the settings are taken from the file, and then all that is in the line is redefined (similar to checking the checkboxes if there were none, however, everything that is included will not be disabled). So if you need to override only 3 parameters, then you do not need to specify everything.
There is a variant with redefinition of all parameters:
‘!!! {“num_of_pages”: “5”, “num_of_ads_per_page”: “50”, “lat2cyr”: “checked”, “reviewed”: “checked”}’
In this case, you need to specify all meaningful parameters.
But, the override in cron will not work if register_argc_argv is set to disabled.
Frequency of launch
If you have all text-checked, then to search for unverified ads it makes sense to run in the area of the clock change (for example, from 53 minutes to 6 minutes 5 times), since in the TSO unchecked appear exactly in this time interval.
Description of settings
Number of cycles – the number of starts with sequential verification of ads in accordance with the settings below. That is, if there are 10 pages for 10 ads below, here we also put 10, then all will be tested 1000 ads. Through cron does not work yet. This function is useful for checking already viewed ads. Especially for those who do not have the opportunity to increase the execution time.
Number of pages – the number of pages that we will scroll for each search cycle. This is an analog of the pages in the ARC.
Ads per page – the number of ads on the above pages (depending on the screen size).
Ad types – choose which types we will be viewing (while multimedia is unavailable, because the handler is not written).
Check by stopwords list – search for the occurrence in the text and title of the ad of any word from that list in the middle of the page.
Check by bad ad text list – search for the occurrence in the text and title of the ad of any phrase from the list by reference.
Check by Searchwords – use the filter in the ARC to search for all ads (within the limits set in the Number of pages and Ads per page restrictions) on the corresponding list (List of search words).
Use Whitelist – use the list of good ads (the algorithm is slightly lower).
Replace lat2cyr – to search for a hash of Cyrillic and Latin characters in the list of keywords. Before checking the Latin characters and other crap, similar to the Russian letters will be replaced by the Russian.
Check for redirects – check (for the purpose of blocking) the presence of redirection to domains different from the ad specified in the destination URL. So you can screen out the slag even without a list of words, though not all.
Check reviewed ads – analogue daws in the ARC. If it’s worth, then we’ll see everything, if not, then just unseen.
Block AdWords account – when we find another masterpiece of scammers-govnoreklamistov, you can block the ad along with the AdWords account.
Check “blogspot” – check for the presence of “blogspot.com” in the destination URL. For some time they actively used this address.
Check disguised – check for the presence of masked Latin characters under the Russians in order to find a similar: “Malakhov yshel co ckandalom – Stala povchinna pcichina.” It is turning on separately for text and display ads.
Check target URL – check for “bad” words in the destination URL (not to be confused with the displayed URL). Displayed is how we see the link, the target is the final address of the page. So done in AdWords, so it’s done here.
Check only predicted blocks. Predicted blocking.
Mark reviewed as reviewed – mark scanned viewed. It is possible to load unverified ads and not mark them as viewed. It is usually required to debug a list of words to search for. If it is not checked and checkbox checked checked, then we will receive the same list of ads every time, which is the first place.
Get ad stats – everyone, probably, is interested in how many hits have managed to get a deceased allegedly about Yakubovich or supposedly the funeral of Bilan. If checked, it collects statistics for all blocked ads, sums and displays the total number in the bottom line of the ad.
The following items are hidden under the button “Debug and other …”.
Show json-string – a button that prints a json-string to override parameters when run through cron. The displayed parameters depend on the current settings that are on the screen.
Enable logs – Enables the storage of responses from Google servers. you need to find errors and debugging. If everything is good, then do not turn it on, otherwise the disk will be filled with useless data.
Disable utf8_decode – disables the conversion of ads to the report. If all ads are displayed as empty or scribbles (krakozyabr), set / uncheck this checkbox. It depends on the extension of DOM, which works for someone like: someone is issuing Russian text in utf-8, and someone has an iso-some encoding. At the author on php5.6 it is deduced correctly without this daw.
Show block / unblock buttons – in the box with each ad there are 4 buttons: to lock and unlock the ad and the AdWords account. If the daw is not checked, only the unlock buttons are shown in the locked ones, and only the lock buttons are visible in the unlocked ones.
Do not save clear ads – literally. If we have a lot of ads, filters are set up well and do not want to see normal ads again, then we set. And the right column “Clear” will not be replenished.
Login (e-mail) – the whole mail for which you have registered with AdSense.
Especially important jackdaws when turned on are marked by two red arrows. Just in case.
Run every – this is for automatic start via JavaScript. In minutes. There even a countdown appears. You can play here.
Everything described above is saved by clicking the “Update settings” button !!!
If you click something else, the changes are not saved; ajax and other similar miracles are not used.
List of stop words, bad ad text list – lists for drop-outs. If there is an occurrence of at least one line from these lists, the advertisement is blocked. With a word on which it was blocked. Convenient for debugging, if there are a lot of words. For text and display ads, different lists are made, because there are a number of words necessary to block the text slag, which leads to the blocking of good media.
List of search words – the list by which ads will be pulled out (similar to entering each word from the list into the text field to show ads containing only that word).
Whitelist – a list of words or phrases that occur only in good ads. If the daw is on, then first a check is made on this list, and if there is a match, then the ad is considered good, it is not checked further and no report is made in the report. Search goes on each line separately. It looks for the occurrence of a string in the ad text. It also checks the name (if any) and the advertiser’s account (a line of the form adv-0000000000000000) and the destination URL. When adding an ad to the whitelist, three lines are added: 2 headings and text.
The list is saved by the corresponding button below it.
Google auth – authorization in Google. To leave the field to enter the password and click the start of the authorization process. E-mail must be specified and saved. We enter the password and start. We are waiting for a few seconds. If all is well, then either we enter, or we will receive an SMS with a confirmation code. After logging in, you can start looking for ads.
Search results are displayed in the window under the button “Start Searching”, the results of deleting ads there. And ads are removed by the basket icon. To the right of the headings – delete everything in the list. On the ad itself, delete only this ad.
All settings are stored in a file in the program folder on the server (in the json string). Cookies are used only to enter the control panel if password protection is enabled. So that when running through cron problems did not arise.
Get blocked ads (in new tab) is a tool for extracting rows from the current list of blocked ads. It takes settings from the same source as the main search tool. If specified in the settings of 2 pages of 40 ads, it will collect the first 80 ads, leaving only the unique lines. He does not beat words, he leaves lines. It breaks the line of the ad text in half if it is long, displays the resulting list on the screen and saves it in a file.
Access here password – are you afraid, worried? Can worry? Put a reliable password and do not worry – no one will get to the search tool and the lock. No name-password. Just a password. Reliable and simple. If the password is too reliable, you can always log in to the server, delete the pass file in the temporary folder and set a slightly less secure password.
Frames do not work – disabling the output of information in the iframe tags, switching to new tabs (does not apply to ad blocking / unlocking) to see the answer You will not, but there was no error with it.
Unblock accs blocked – when searching for ads, the AdWords account will automatically unlock accounts that were blocked a certain number of days ago. In order not to keep hundreds of AdSense blocked in their list, they are disposable and quickly blocked, so they will never be reused.
Advertisers account list is a link to a page where you can see a list of all AdWords accounts that you have blocked. It extends directly from AdSense, that is, you will see the same list if you go to the TSO in the Settings tab (it’s a bit illogically named, because there is only a list of blocked accounts).
Setting separate version – version of the control panel, where the settings are on one page, and reports on the found ads on the other. Menu from the top.
Also, a number of settings (which I found not obvious in its meaning) is provided with comments (in English) right in the control panel in the form of tooltips when hovering over the cursor.
The user agent is the one you used to log into the control panel. Perhaps this was a mistake and will be changed, because after IE8 authorization did not work for some reason. Did not understand.
Output of ads (progress report)
Depending on the result of the check, the advertisement falls into the appropriate list. All that fell into the report (columns of ads at the bottom) remains there until we remove it. These are just copies of ads taken at a specified time. If we delete one or all of the ads, they are deleted only from the report. AdSense, as far as the author knows, can not be deleted.
The advertiser is on the top. You can unlock it manually, if that. And just note in the clean.
Also in locked, the blocking word is displayed, on the top right. If they overlap, just move the cursor or touch (on the mobile) and everything will be visible.
Next, in the usual way: header1, title2 (if any), text and display URL (media may look a little different). Link the same to the final URL via the sender from the referer field.
In the bottom panel from left to right: the lock / unlock buttons, the total number of views (only for blocked ones, if the option is enabled), the date and time when the ad was added to this list, the delete button from the list of the current ad.
Search Scheme
A completely logical question: “At first I thought that I was filtering by the word and removing everything that is.
And he goes through the order and seeks the occurrence of words. This is how much you need to take a look at, so that all the bad words block.
Can you do so that we filter by word (phrase) and block everything that exists? ”
Yes, the process is described correctly. The software seems to go into the ARC and look at all the ads in a row (this process is final only if you have the checkbox “mark scanned” and removed “watch viewed”).
In each search cycle, the parameters recorded on the disc are used. If something is fixed and not saved, then the old parameters will be used.
It would be possible to launch a search for each word from the list and block the result, but this method has drawbacks:
The most important: Google searches only for the word form that is specified, it will not issue ads with exact match only; on the request of “Bilan”, he will not show “Today, the Deputies of Dima Bilan”.
If the words, say, 50 or more, then, probably, that it will take too much time and the process will be cut down at some stage because of the time limit. And you have to store information about the last verified word, run it again … in general, the difficulty.
Advantages of the current implementation:
Viewed ads, noted; Google itself will issue a batch of unvisited. Just.
Customizable number of audited per one launch ads. And any slag, if it is not cleaned, is usually on the front page of the ARC. Thus, for a few starts, most of the ads, if not all, will be blocked. In addition, Google always blocks all other ads, which are a copy of the first blocked. Bilanov, for example, dozens, if not hundreds, run. After working with this tool, I saw several pages of identical blocked ads in the ARC, although there were not so many of them in the report.
In which fields are we looking for words from our list? Heading 1, heading 2, ad text, display URL and, if worth a daw, the destination URL.
Recommendations
- If possible, install in a separate directory, please contact via a domain that you can override in hosts. So it will be very difficult for you to pick up any intruders.
- Make sure that ads are caught. It was noticed that the Disable utf8_decode daw was standing, the texts were displayed normally, but the ads were not caught. Began to catch after removing the jackdaw.
- Do not leave the name of the folder with the software as it is.
- Do not put at a time viewing more than a hundred ads (the product of the number of pages and number of ads per page) Choose the optimal settings depending on your resources and limitations. If execution time can not be redefined, then check for fewer ads, but more often.
- If you set 1 page and 100 ads, then the process runs 2 to 3 times faster than 10 pages of 10 ads. However, the second method requires less memory.
- Do not run multiple cleaning processes from the same folder at the same time, as this can lead to malfunctions and, possibly, the need to re-authorize. If you want to run two or more processes simultaneously, for example, for cleaning yourself, a friend and a friend’s neighbor, then simply place two (or more) copies side by side. They will not interfere with each other.
- If you do not save the settings work, and in return gives some mistake server, Edit the .htaccess file in the settings or simply carry it a folder, if not fear.
Here is a demo. He is cut down, obsolete and unworked, but imitates the worker. Authorization enabled, password: hg.
Download the Bad ads Blocker you can on developer’s page or by direct link from developer’s site: v.3.3.2.
To update the program with saving all your settings it’s enough to replace everything except the tempdata and .ini files and .txt-files in the settings folder.
In other words, we replace everything except the tempdata and settings folders, then all the .php files in the settings folder.
FAQ
Question: tell me, how can I update the version of the blocker? So that those words / expressions that I have already added before are not erased.
Answer: All the settings in the settings folder.
Left column: settings.ini.
Lists: stopwords_text.txt, stopwords_media.txt, bad_ads_text.txt, whitelist.txt.
Question: How do I update the version so I do not have to log in again?
Answer: move the cookie.txt, pub_id.txt from the tempdata folder to the new version (save the existing ones).
Save your password to the control panel: pass file.
SUVAM SWAIN says
Very much informative,
I Like the way you explained each bit.
Thank you
nagarika pandey says
awesome.