Introduction
Creating Link Extractor and Filter: Part 1
In this article I will complete my previous article Creating Link Extractor and Filter. In that article we covered how to extract all the links from the page and then list them in a list box. In this article we will create the filters. Filters are used for filtering the links by a supplied parameter. Filters make it much easier to find the desirable link. So let’s start creating the filter for our links.
Creating the Link Filter
As I always say before coding anything, understand clearly what you intend to build. So let’s define our requirements first for creating the filter.
Logic for Link Filter
Coding the filter
if (filterParameters.Contains(“.zip“))
{
filterParameters.Remove(“.zip“);
}
else
{
filterParameters.Add(“.zip“);
}
sortLinks();
Check Box 2
if (filterParameters.Contains(“.zip“))
{
filterParameters.Remove(“.zip“);
}
else
{
filterParameters.Add(“.zip“);
}
sortLinks();
sortLinks() is a method that will sort our links. In the code above we are adding the parameter if the user clicks on the check box and removes them if the user clicks them again. Initially all the checkboxes are unchecked and the list will be empty. If the user checks any box then the check change handler will execute and it will add the extension to a list.
Now before we implement sortLinks() we need to take a backup of all the links that we grabbed. Just add the following snippet in your grab button handler:
List<string> allLinks = newList<string>();
foreach (var item in checkedListBox1.Items)
{
allLinks.Add(item.ToString());
}
Now it’s time to implement the sortLinks() method. It is very simple if we follow the logic stated above. The code will look like this:
List<string> temp = newList<string>();
foreach (var item in allLinks)
{
if (filterParameters.Contains(Path.GetExtension(item)) || filterParameters.Count==0)
{
temp.Add(item);
}
}
updateList(temp);
}
The seconds condition resets the grid if no parameters are selected. The updateList() method updates the list of checked list boxes.
To call this method from the sort link button, just double-click on that button and add the sortLinks() function call in it.
sortLinks();
The update function contains only this loop:
checkedListBox1.Items.Clear();
foreach (var item in temp)
{
checkedListBox1.Items.Add(item);
}
To make it much better, add the click listener to your check list box and in the click handler code add the following code:
Clipboard.SetText(checkedListBox1.SelectedItem.ToString());
This line will set the clipboard text equal to the URL you have clicked.
Output
You can find the complete code below:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace linkGrabber
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
checkedListBox1.Items.Clear();
allLinks.Clear();
WebBrowser wb = new WebBrowser();
wb.Url = new Uri(textBox1.Text);
wb.DocumentCompleted += wb_DocumentCompleted;
}
void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlDocument source = ((WebBrowser)sender).Document;
extractLink(source);
((WebBrowser)sender).DocumentCompleted -= wb_DocumentCompleted;
}
List<string> allLinks = new List<string>();
private void extractLink(HtmlDocument source)
{
HtmlElementCollection anchorList = source.GetElementsByTagName(“a“);
foreach (var item in anchorList)
{
if (checkedListBox1.Items.Contains(item))
{
continue;
}
checkedListBox1.Items.Add(((HtmlElement)item).GetAttribute(“href“));
}
foreach (var item in checkedListBox1.Items)
{
allLinks.Add(item.ToString());
}
}
List<string> filterParameters = new List<string>();
private void checkBox1_CheckedChanged(object sender, EventArgs e)
{
if (filterParameters.Contains(“.pdf“))
{
filterParameters.Remove(“.pdf“);
}
else
{
filterParameters.Add(“.pdf“);
}
sortLinks();
}
private void checkBox2_CheckedChanged(object sender, EventArgs e)
{
if (filterParameters.Contains(“.zip“))
{
filterParameters.Remove(“.zip“);
}
else
{
filterParameters.Add(“.zip“);
}
sortLinks();
}
private void sortLinks()
{
List<string> temp = new List<string>();
foreach (var item in allLinks)
{
if (filterParameters.Contains(Path.GetExtension(item)) || filterParameters.Count==0)
{
temp.Add(item);
}
}
updateList(temp);
}
private void updateList(List<string> temp)
{
checkedListBox1.Items.Clear();
foreach (var item in temp)
{
checkedListBox1.Items.Add(item);
}
}
private void checkedListBox1_Click(object sender, EventArgs e)
{
Clipboard.SetText(checkedListBox1.SelectedItem.ToString());
}
}
}
Summary
Our link extractor and filter application is completed. Now you can grab the download links of any site that provide direct file links. You can extend it by adding more filters. You can also add a file downloader in it so that you can download files from it. You can also process those links that do not directly point to file but make server requests. If you extend this project then don’t forget to share it in the comments. Thank you for reading this article and if you like it then you can always share it but don’t forget to comment.