Introduction to Deep Learning

Sebastian Raschka made 170 videos on deep learning, and you can watch all of the lessons now:

I just sat down this morning and organized all deep learning related videos I recorded in 2021. I am sure this will be a useful reference for my future self, but I am also hoping it might be useful for one or the other person out there.

It’s split into 19 lessons over five parts: introduction, mathematical foundations, neural networks, deep learning for computer vision, and generative models. Might be useful, even if you just want to learn more about machine learning is.

Tags: , ,

Spatula, a Python library for maintainable web scraping

This looks promising:

While it is often easy, and tempting, to write a scraper as a dirty one-off script, spatula makes an attempt to provide an easy framework that most scrapers fit within without additional overhead.

This reflects the reality that many scraper projects start small but grow quickly, so reaching for a heavyweight tool from the start often does not seem practical.

The initial overhead imposed by the framework should be as light as possible, providing benefits even for authors that do not wish to use every feature available to them.

Although, without my dirty one-off scripts, what will I put in my tmp data folder?

Tags: ,

Generate a color analysis by uploading an image

Mel Dollison and Liza Daly made a fun interactive that lets you upload an image, and it spits out a vintage-looking color analysis a la Vanderpoel:

This generator is based on the works of Emily Noyes Vanderpoel (1842-1939), who hoped her original color analyses would inspire others to study “whatever originals may be at hand in books, shops, private houses, or museums.” We hope you are similarly inspired by her abstract, modernist style employed in the context of everyday objects and photos.

Originally conceived as a Twitter bot, you can find the Python code behind the project on GitHub.

Tags: , , ,

✚ How to Make Line Charts in Python, with Pandas and Matplotlib

The chart type can be used to show patterns over time and relationships between variables. This is a comprehensive introduction to making them using two common libraries. Read More

Altair for visualization in Python

Vega-Lite is a grammar for interactive graphics primarily used for analysis. Altair is a visualization library in Python that is based on this grammar.

With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.

Jim Vallandingham just put up a useful introduction to the library if you’re looking to get your feet wet.

I do very little visualization-wise with Python since my current toolset typically covers my bases, but this has me curious.

Tags:

Posted by in Python, software

Tags:

Permalink

Extracting NBA player movement data

Player movements

NBA basketball teams have tracking systems installed in their arenas called SportVu, essentially a system of cameras pointed at the court to track player movements. Some of that data is browsable through the NBA site, but there's understandably no direct download link. However, there is an API. Savvas Tjortjoglou wrote a thorough tutorial on how to grab data via the API and plot it Python.

This will be fun.

Tags: , , ,

Ditch Excel and format your data with csvkit

I thought I linked to csvkit a while ago, but apparently not. If you deal with CSV data at all, you should know about the utilities suite that helps you format and re-format in various ways. Christopher Groskopf posted a list of quick things you can do with csvkit.

Over the last several months there have been two major releases of csvkit. These releases have brought long-awaited features such as Python 3 support, a csvformat utility and a new csvkit tutorial—not to mention a slew of bug fixes. To celebrate the latest release, here are eleven of my favorite awesome things you can do with csvkit. If you aren't using it yet, hopefully this will convince you.

Fun things include a quick one-liner to convert an Excel file to CSV, switching to JSON, and easy CSV export from a database.

Tags: ,

Posted by in csv, Python, software

Tags: ,

Permalink

PlotDevice: Draw with Python

PlotDevice

You've been able to visualize data with Python for a while, but Mac application PlotDevice from Christian Swinehart couples code and graphics more tightly. Write code on the right. Watch graphics change on the right.

The application gives you everything you need to start writing programs that draw to a virtual canvas. It features a text editor with syntax highlighting and tab completion plus a zoomable graphics viewer and a variety of export options.

PlotDevice's simple but com­pre­hen­sive set of graphics commands will be familiar to users of similar graphics tools like NodeBox or Processing. And if you're new to programming, you'll find there's nothing better than being able to see the results of your code as you learn to think like a computer.

Looks promising. Although when I downloaded it and tried to run it, nothing happened. I'm guessing there's still compatibility issues to iron out at version 0.9.4. Hopefully that clears up soon. [via Waxy]

Tags: ,

Downloading Your Email Metadata

Downloading Email Metadata

We spend a lot of attention on how we interact with social networks, because so many people use Twitter, Facebook, etc every day. It's fun for developers to play with this stuff. However, if you want to look at a history of your own interactions, there isn't a much better place to look (digitally) than your own email inbox.

Before you can explore though, you have to download the data. That's what you'll learn here, or more specifically, how to download your email metadata as a ready-to-use, tab-delimited file.

Setup

I'm using a dated Python 2.5, but I think the code in this tutorial should work with newer versions. Please let me know if something breaks though.There are various ways to access your email. In this tutorial, you use Python, which provides libraries to handle email access and some useful functions to parse data. If you're on a Mac, you probably already have Python installed. For Windows computers, you can download and install Python if you haven't already.

The other thing you need: An email inbox accessible via IMAP and the server information. That's most modern email services, I think. Here's the information for Gmail and Yahoo Mail. I use Fastmail, and you can get their IMAP server information here.

Connect to the IMAP server

You can of course just follow along with the code in the tutorial's download. For smaller scripts like this, I like to type things out to make sure I get everything.All setup? Good. Open your favorite text editor to get started.

The first thing to do is import the necessary libraries. In this case that's imaplib, email, getpass, and getaddresses from email.utils.

import imaplib, email, getpass
from email.utils import getaddresses

It'll be clear what these are for soon.

Like right now. Enter your email settings for your server and username. You use getpass() for the password, so that you don't have to store your password in plaintext. Instead, when you run this script, you'll be prompted for your password.

# Email settings
imap_server = 'YOUR_SERVER_HERE'
imap_user = 'YOUR_USERNAME_HERE'
imap_password = getpass.getpass()

Now connect and log in.

# Connection
conn = imaplib.IMAP4_SSL(imap_server)
(retcode, capabilities) = conn.login(imap_user, imap_password)

If everything went well, the variable retcode should be 'OK'. Otherwise, you might want to check your server and log in information.

Next up: Select the folder you want to fetch your email from. The name of the folder depends on what you want and what service you use. To get the actual names (and they need to be exact), enter conn.list() and run the script that you have so far.

It might also be useful to do all of this in the Python interpreter, so that you get instant feedback. Open your terminal or equivalent, start Python (by typing 'python'), and you should be able to enter the code covered above.

Anyway, let's say the folder is called "INBOX.Archive". Here's how you select it. I've included the commented out lines for reference.

# Specify email folder
# print conn.list()
# conn.select("INBOX.Sent Items")
conn.select("INBOX.Archive")

Search for your email

Now that you're connected, you can search your inbox to fetch the email that you want. For example, you might only want email from 2013.

# Search for email ids between dates specified
result, data = conn.uid('search', None, '(SINCE "01-Jan-2013" BEFORE "01-Jan-2014")')

Or you might have email aliases setup and you only want email sent to a specific address, since the start of 2014.

result, data = conn.uid('search', None, '(TO "user@example.org" SINCE "01-Jan-2014")')

Or maybe you want all of it.

result, data = conn.uid('search', None, 'ALL')

Note that the only thing that changes is the query in the last argument of conn.uid().

A search yields a list of unique id numbers for each email that matches your search criteria. Split them, and then fetch the headers of the matching emails.

uids = data[0].split()

# Download headers
result, data = conn.uid('fetch', ','.join(uids), '(BODY[HEADER.FIELDS (MESSAGE-ID FROM TO CC DATE)])')

For the sake of simplicity you only fetch five header fields here, but if you want others, go wild.In fetch line, you essentially pass that command to the server with the unique ids as a comma-separated string, and you specify which header fields you want. The IMAP syntax isn't incredibly intuitive, but this mini manual is helpful. Or, if you're daring, you can look at the IMAP specifications direct from the source.

In any case, the fetch is the actual downloading of your email headers. This takes up the most time when you run the full script. Parsing takes less than a second. It took about 15 seconds for me to download 9,000 headers on a standard home cable internet connection, and the resulting file was 1.2 megabytes. Obviously, the more header fields and the more email you have, the longer it will take but not too bad.

I came across some examples that took way longer. As in minutes instead of seconds. The key is getting all the headers at once with one call to the IMAP server.

Parse the returned data

So you have the data now. But, it's not in a nice readable tab-delimited file yet. You have to iterate through each item stored in the data variable (from when you fetched the headers), parse, and then spit out the format you want.

Start by creating the file. We'll call it raw-email-rec.tsv.

# Where data will be stored
raw_file = open('raw-email-rec.tsv', 'w')

And write the header of the TSV to the newly created file.

# Header for TSV file
raw_file.write("Message-ID\tDate\tFrom\tTo\tCc\n")

Time to iterate and parse. The code below is a big chunk, but here's what you're doing:

  1. Start a for loop.
  2. Check if the length of current item is 2. Those of length 2 are email headers. Those that are not of length 2 are something else.
  3. If it is a message, use message_from_string() to parse. Use get_all() to get each header field (message id, date, etc.).
  4. Put together a tab-delimited for of data.
  5. Write the row to raw_file.

And here's the same logic in code.

# Parse data and spit out info
for i in range(0, len(data)):
    
    # If the current item is _not_ an email header
    if len(data[i]) != 2:
        continue
    
    # Okay, it's an email header. Parse it.
    msg = email.message_from_string(data[i][1])
    mids = msg.get_all('message-id', None)
    mdates = msg.get_all('date', None)
    senders = msg.get_all('from', [])
    receivers = msg.get_all('to', [])
    ccs = msg.get_all('cc', [])
    
    row = "\t" if not mids else mids[0] + "\t"
    row += "\t" if not mdates else mdates[0] + "\t"
    
    # Only one person sends an email, but just in case
    for name, addr in getaddresses(senders):
        row += addr + " "
    row += "\t"
    
    # Space-delimited list of those the email was addressed to
    for name, addr in getaddresses(receivers):
        row += addr + " "
    row += "\t"
    
    # Space-delimited list of those who were CC'd
    for name, addr in getaddresses(ccs):
        row += addr + " "
    
    row += "\n"
    
    # Just going to output tab-delimited, raw data.
    raw_file.write(row)

You finished iterating, so close the file.

# Done with file, so close it
raw_file.close()

Script done. Run the script (by typing "python fetch-raw.py" in the command line) and you should get a tab-delimited file called raw-email-rec.tsv in the same directory as your script.

Wrapping up

The email download can be broken into three parts.

  1. Connect to the IMAP server
  2. Search and download your email
  3. Parse and format

If you want to get headers for multiple folders, you can run the script multiple times changing the folder name each time. Don't forget to change the file name too, or you'll just be overwriting your data each time.

Finally, if you just want your email metadata and don't care about how to do it, download the code for this tutorial. Change the values for imap_server and imap_user to your own information. You might also need to change the value for the folder and the search. Once you have that in order, you should be able to run the script and get your data.