Google Hummingbird

Google Hummingbird is a search algorithm used by Google. To celebrate their 15th birthday, on September 27, 2013 Google launched [1] a new “Hummingbird” algorithm,[2] claiming that Google search can be a more human way to interact with users and provide a more direct answer.[3]

Google started using Hummingbird about 30 August 2013,[4] it said. Google only announced the change on September 26.

 

What type of “new” search activity does Hummingbird help?

Conversational search” is one of the biggest examples Google gave. People, when speaking searches, may find it more useful to have a conversation.

I thought Google did this conversational search stuff already!

It does (see Google’s Impressive “Conversational Search” Goes Live On Chrome), but it had only been doing it really within its Knowledge Graph answers. Hummingbird is designed to apply the meaning technology to billions of pages from across the web, in addition to Knowledge Graph facts, which may bring back better results.

How do you know all this stuff?

Google shared some of it at its press event today, and then I talked with two of Google’s top search execs, Amit Singhal and Ben Gomes, after the event for more details. I also hope to do a more formal look at the changes from those conversations in the near future. But for now, hopefully you’ve found this quick FAQ based on those conversations to be helpful.

By the way, another term for the “meaning” connections that Hummingbird does is “entity search,” and we have an entire panel on that at our SMX East search marketing show in New York City, next week. The Coming “Entity Search” Revolution session is part of an entire “Semantic Search” track that also gets into ways search engines are discovering meanings behind words. Learn more about the track and the entire show on the agenda page.

cracking password hashes

Forgot your Windows admin password?

Reinstall? Oh no… But not any more…


  • This is a utility to reset the password of any user that has a valid local account on your Windows system.
  • Supports all Windows from NT3.5 to Win7, also 64 bit and also the Server versions (like 2003 and 2008)
  • You do not need to know the old password to set a new one.
  • It works offline, that is, you have to shutdown your computer and boot off a CD or USB disk to do the password reset.
  • Will detect and offer to unlock locked or disabled out user accounts!
  • There is also a registry editor and other registry utilities that works under linux/unix, and can be used for other things than password editing.

Windows stores its user information, including crypted versions of the passwords, in a file called ‘sam’, usually found in windowssystem32config. This file is a part of the registry, in a binary format previously undocumented, and not easily accessible. But thanks to a German(?) named B.D, I’ve now made a program that understands the registry.

This site provides CD and floppy images for end users to easily edit their forgotten passwords. But it also provides full source code and binary builds of the tools to allow others to use as they like for other purposes. Registry format documentation also available.

Latest release is 110511 (2011-05-11)

The following is available for download and information:

2011-05-11

  • Some major! new features for people using the registry utilites, but not much changes to password reset.

2009-12-01

  • New site, official URL is now: http://pogostick.net/~pnh/ntpasswd/
  • All releases still contains old mail address, please note NEW mailaddress is pnh@pogostick.net. Old mailaddress vil be invalid after January 1st 2010.
  • No new release, 2008-08-02 is still newest. Hope to release new early 2010.

A rainbow table is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering the plaintext password, up to a certain length consisting of a limited set of characters. It is a practical example of a space/time trade-off, using more computer processing time at the cost of less storage when calculating a hash on every attempt, or less processing time and more storage when compared to a simple lookup table with one entry per hash. Use of a key derivation function that employ a salt makes this attack infeasible.

Rainbow tables are an application of an earlier, simpler algorithm by Martin Hellman.[1]

Hash Sets are used in a data analysis technique called Hash Analysis, which uses the MD5, SHA1 and SHA256 hash of files to verify the files on a storage device. A hash uniquely identifies the contents of a file, regardless of filename and can be used to identify the presence of malicious, contraband, or incriminating files such as bootleg software, pornography and viruses. See this video of hash sets in use in OSForensics.

Rainbow tables are available for free from http://www.freerainbowtables.com/, approximately a 2.5TB (2500 GB) download.

The hash sets are available for free from the National Software Reference Library, approximately a 1.7GB download, and there is a OSForensics tutorial on how to convert them for use within OSForensics. Please note that conversion may take several days.

The hash sets and rainbow tables created by PassMark are also available from the OSForensics Download page.  We are not selling the tables, only the service of copying them onto a 3TB hard drive and shipping.

Any computer system that requires password authentication must contain a database of passwords, either hashed or in plaintext, and various methods of password storage exist. Because the tables are vulnerable to theft, storing the plaintext password is dangerous. Most databases therefore store a cryptographic hash of a user’s password in the database. In such a system, no one — including the authentication system — can determine what a user’s password is, simply by looking at the value stored in the database. Instead, when a user enters his or her password for authentication, it is hashed and that output is compared to the stored entry for that user (which was hashed before being stored). If the two hashes match, access is granted.

A thief who steals the (hashed) password table cannot merely enter the user’s (hashed) database entry to gain access since the authentication system would hash that a second time, producing a result which does not match the stored value, which was hashed only once. In order to learn a user’s password, the thief must reverse the hash to find a password which produces the hashed value. A good authentication system will make this process as difficult as possible by using a one-way hash function, that has a high ratio for the time to invert the function compared to the time to compute the function.

Rainbow tables are one tool that has been developed in an effort to derive a password by looking only at a hashed value.

Rainbow tables are not always needed, for there are simpler methods of hash reversal available. Brute-force attacks and dictionary attacks are the simplest methods available, however these are not adequate for systems that use large passwords, because of the difficulty of storing all the options available and searching through such a large database to perform a reverse-lookup of a hash.

To address this issue of scale, reverse lookup tables were generated that stored only a smaller selection of hashes that when reversed could generate long chains of passwords. Although the reverse lookup of a hash in a chained table takes more computational time, the lookup table itself can be much smaller, so hashes of longer passwords can be stored. Rainbow tables are a refinement of this chaining technique and provide a solution to a problem called chain collisions.

Ophcrack is a free Windows password cracker based on rainbow tables. It is a very efficient implementation of rainbow tables done by the inventors of the method. It comes with a Graphical User Interface and runs on multiple platforms.

The multi-platform password cracker Ophcrack is incredibly fast. How fast? It can crack the password “Fgpyyih804423” in 160 seconds. Most people would consider that password fairly secure. The Microsoft password strength checker rates it “strong”. The Geekwisdom password strength meter rates it “mediocre”.

Why is Ophcrack so fast? Because it uses Rainbow Tables.

Features:

  • » Runs on Windows, Linux/Unix, Mac OS X, …
  • » Cracks LM and NTLM hashes.
  • » Free tables available for Windows XP and Vista/7.
  • » Brute-force module for simple passwords.
  • » Audit mode and CSV export.
  • » Real-time graphs to analyze the passwords.
  • » LiveCD available to simplify the cracking.
  • » Dumps and loads hashes from encrypted SAM recovered from a Windows partition.
  • » Free and open source software (GPL).

Note that all rainbow tables have specific lengths and character sets they work in. Passwords that are too long, or contain a character not in the table’s character set, are completely immune to attack from that rainbow table.

Unfortunately, Windows servers are particularly vulnerable to rainbow table attack, due to unforgivably weak legacy Lan Manager hashes. I’m stunned that the legacy Lan Manager support “feature” is still enabled by default in Windows Server 2003. It’s highly advisable that you disable Lan Manager hashes, particularly on Windows servers which happen to store domain credentials for every single user. It’d be an awful shame to inconvenience all your Windows 98 users, but I think the increase in security is worth it.

I read that Windows Server 2008 will finally kill off LM hashes when it’s released next year. Windows Vista already removed support for these obsolete hashes on the desktop.

The Ophcrack tool isn’t very flexible. It doesn’t allow you to generate your own rainbow tables. For that, you’ll need to use the Project Rainbow Crack tools, which can be used to attack almost any character set and any hashing algorithm. But beware. There’s a reason rainbow table attacks have only emerged recently, as the price of 2 to 4 gigabytes of memory in a desktop machine have approached realistic levels. When I said massive, I meant it. Here are some generated rainbow table sizes for the more secure NT hash:

Character Set Length Table Size
ABCDEFGHIJKLMNOPQRSTUVWXYZ 14 0.6 GB
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 14 3 GB
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*()-_+= 14 24 GB
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*()-_+=~`[]{}|:;"'<>,.?/ 14 64 GB

A rainbow table attack is usually overkill for a desktop machine. If hackers have physical access to the machine, security is irrelevant. That’s rule number 3 in the 10 Immutable Laws of Computer Security. There are any number of tools that can reset passwords given physical access to the machine.

But when a remote hacker obtains a large list of hashed passwords from a server or database, we’re in trouble. There’s significant risk from a rainbow table attack. That’s why you should never rely on hashes alone– always add some salt to your hash so the resulting hash values are unique. Salting a hash sounds complicated (and vaguely delicious), but it’s quite simple. You prefix a unique value to the password before hashing it:

hash = md5('deliciously-salty-' + password)

If you’ve salted your password hashes, an attacker can’t use a rainbow table attack against you– the hash results from “password” and “deliciously-salty-password” won’t match. Unless your hacker somehow knows that all your hashes are “delicously-salty-” ones. Even then, he or she would have to generate a custom rainbow table specifically for you.

To begin, password storage 101: servers don’t usually store actual passwords. Instead, they hash the password, store the hash, and discard the password. The hash can verify a password from a login page, but can’t be reversed back to the text of the password. So when you inevitably lose your SQL password table, you haven’t exposed all the passwords; just the crappy ones.

Now let’s re-explain rainbow tables:

  1. take a “dictionary” —- say, of all combinations of alphanumerics less than 15 characters
  2. hash all of them
  3. burn the results onto a DVD.

You now have several hundred billion hash values that you can reverse back to text —- a “rainbow table”. To use,

  1. take your stolen table of hashes
  2. for each hash
  3. find it in the rainbow table.

If it’s there, you cracked it.

 

.

Here’s what you need to know about rainbow tables: no modern password scheme is vulnerable to them.

Rainbow tables are easy to beat. For each password, generate a random number (a nonce). Hash the password with the nonce, and store both the hash and the nonce. The server has enough information to verify passwords (the nonce is stored in the clear). But even with a small random value, say, 16 bits, rainbow tables are infeasible: there are now 65,536 “variants” of each hash, and instead of 300 billion rainbow table entries, you need quadrillions. The nonce in this scheme is called a “salt”.

Cool, huh? Yeah, and Unix crypt —- almost the lowest common denominator in security systems —- has had this feature since 1976. If this is news to you, you shouldn’t be designing password systems. Use someone else’s good one.

 

.

No, really. Use someone else’s password system. Don’t build your own.

Most of the industry’s worst security problems (like the famously bad LANMAN hash) happened because smart developers approached security code the same way they did the rest of their code. The difference between security code and application code is, when application code fails, you find out right away. When security code fails, you find out 4 years from now, when a DVD with all your customer’s credit card and CVV2 information starts circulating in Estonia.

 

.

Here’s a “state of the art” scheme from a recent blog post on rainbow tables and salts:

hash = md5('deliciously-salty-' + password)

There are at least two problems with this code. Yeah, the author doesn’t know what a salt is; “deliciously-salty-” is not a nonce (also, Jeff, your computer really doesn’t care if you seperate the password from the nonce with a dash; it’s a computer, not a 2nd grade teacher).

But there’s a much bigger problem with this code: the letters “md5”.

Two reasons.

1.

You’re expecting me to go off on a rant about how there is no redeeming quality to justify using MD5 in 2007. That’s true (MD5 is broken; it’s too slow to use as a general purpose hash; etc). But that’s not the problem.

2.

The problem is that MD5 is fast. So are its modern competitors, like SHA1 and SHA256. Speed is a design goal of a modern secure hash, because hashes are a building block of almost every cryptosystem, and usually get demand-executed on a per-packet or per-message basis.

Speed is exactly what you don’t want in a password hash function.

Modern password schemes are attacked with incremental password crackers.

Incremental crackers don’t precalculate all possible cracked passwords. They consider each password hash individually, and they feed their dictionary through the password hash function the same way your PHP login page would. Rainbow table crackers like Ophcrack use space to attack passwords; incremental crackers like John the Ripper, Crack, and LC5 work with time: statistics and compute.

The password attack game is scored in time taken to crack password X. With rainbow tables, that time depends on how big your table needs to be and how fast you can search it. With incremental crackers, the time depends on how fast you can make the password hash function run.

The better you can optimize your password hash function, the faster your password hash function gets, the weaker your scheme is. MD5 and SHA1, even conventional block ciphers like DES, are designed to be fast. MD5, SHA1, and DES are weak password hashes. On modern CPUs, raw crypto building blocks like DES and MD5 can be bitsliced, vectorized, and parallelized to make password searches lightning fast. Game-over FPGA implementations cost only hundreds of dollars.

Using raw hash functions to authenticate passwords is as naive as using unsalted hash functions. Don’t.

 

.

What is the state of the art here?

1.

First, what your operating system already gives you: a password scheme “optimized” to be computationally expensive. The most famous of these is PHK’s FreeBSD MD5 scheme.

The difference between PHK’s scheme and the one you were about to use for your social shopping cart 2.0 application is simple. You were just going to run MD5 on a salt and a password and store the hash. PHK runs MD5 for thousands of iterations. That’s called “stretching”.

PHK’s MD5 scheme is straightforward to code and comes with Linux and BSD operating systems. If you have to choose between the PHP code you have now and PHK’s scheme, you choose PHK’s scheme or you fail your PCI audit. [â??]

2.

The best simple answer is “adaptive hashing”, which Neils Provos and David Mazieres invented for OpenBSD in 1999. Their original scheme is called “bcrypt”, but the idea is more important than the algorithm.

There are three big differences between Provos-Mazieres and PHK’s scheme:

  1. Bcrypt was invented by two smart guys and PHK’s was only invented by one smart guy. That’s literally twice the smart.
  2. Bcrypt uses Blowfish instead of MD5. Blowfish is a block cipher with a notoriously expensive setup time. To optimize Blowfish to run much faster, you’d have to contribute a major advance to cryptography. We security practioners are all “betting people”, and we usually like to place our bets on the side that “demands major advances in cryptography”.
  3. Provos and Mazieres extended Blowfish. They call theirs “Eksblowfish”. Eksblowfish is pessimized: the setup time takes even longer than Blowfish. How long? Your call. You can make a single password trial take milliseconds, or you can make it take hours.

Why is bcrypt such a huge win? Think of the problem from two perspectives: the server, and the attacker.

First, the server: you get tens of thousands of logins per hour, or tens per second. Compared to the database hits and page refreshes and IO, the password check is negligable. You don’t care if password tests take twice as long, or even ten times as long, because password hashes aren’t in the 80/20 hot spot.

Now the attacker. This is easy. The attacker cares a lot if password tests take twice as long. If one password test takes twice as long, the total password cracking time takes twice as long.

Get it?

The major advantage of adaptive hashing is that you get to tune it. As computers get faster, the same block of code continues to produce passwords that are hard to crack.

3.

Finally, as your attorney in this matter, I am required to inform you about SRP.

SRP is the Stanford Secure Remote Password protocol. It is a public key cryptosystem designed to securely store and validate passwords without storing them in the clear or transmitting them in the clear.

That design goal is cooler than it sounds, because there’s usually a tradeoff in designing password systems:

  1. You can store a hash of the password. Now if you lose the password database, you haven’t exposed the good passwords. However, you also don’t know the password cleartext, which means that to validate passwords, your customers need to send them to you in the clear.
  2. You can use a challenge-response scheme, where both sides use a math problem to prove to each other that they know the password, but neither side sends the password over the wire. These schemes are great, but they don’t work unless both sides have access to the cleartext password —- in other words, the server has to store them in the clear.

Most practitioners will select the hashing scheme. Both attacks —- stolen databases and phished passwords —- happen all the time. But stolen databases compromise more passwords.

SRP resolves the tradeoff. It’s an extension of Diffie-Hellman. The salient detail for this post: instead of storing a salted password hash, you store a “verifier”, which is a number raised to the (obviously very large) power of the password hash modulo N.

If you understand DH, SRP is just going to make sense to you. If you don’t, the Wikipedia will do a better job explaining it than I will. For the test next Wednesday, you need to know:

  • SRP is related to Diffie-Hellman.
  • SRP is a challenge-response protocol that lets a server prove you know your password without your password ever hitting the wire.
  • SRP doesn’t require you to store plaintext passwords; you store non-reversable cryptographic verifiers.
  • “Cracking” SRP verifiers quickly would involve a significant advancement to cryptography.
  • SRP is simple enough to run out of browser Javascript.

Awesome! Why aren’t you using SRP right now? I’ll give you three reasons:

  • SRP is patented.
  • To make it work securely in a browser, you have to feed the login page over SSL; otherwise, like Meebo, you wind up with a scheme that can be beaten by anyone who can phish a web page.
  • SRP is easy to fuck up, so the first N mainstream Rails or PHP or Pylons SRP implementations are going to be trivially bypassable for at least the first year after they’re deployed.

 

.

What have we learned?
We learned that if it’s 1975, you can set the ARPANet on fire with rainbow table attacks. If it’s 2007, and rainbow table attacks set you on fire, we learned that you should go back to 1975 and wait 30 years before trying to design a password hashing scheme.

We learned that if we had learned anything from this blog post, we should be consulting our friends and neighbors in the security field for help with our password schemes, because nobody is going to find the game-over bugs in our MD5 schemes until after my Mom’s credit card number is being traded out of a curbside stall in Tallinn, Estonia.

We learned that in a password hashing scheme, speed is the enemy. We learned that MD5 was designed for speed. So, we learned that MD5 is the enemy. Also Jeff Atwood and Richard Skrenta.

Finally, we learned that if we want to store passwords securely we have three reasonable options: PHK’s MD5 scheme, Provos-Maziere’s Bcrypt scheme, and SRP. We learned that the correct choice is Bcrypt.

The Rainbow Table Is Dead

Well ok, not really.  But you should not be securing hashes against rainbow tables anymore, you need to secure them against brute forcing.  Rainbow tables are still very effective for simple hashes (md5($password)), but just because an algorithm is hard to use for a rainbow table doesn’t mean that it is safe, because the rainbow table is dead…

What Is A Rainbow Table?

Generically, a rainbow table is nothing more than a time-storage trade-off.  Instead of recomputing a function every time you want to attack it, a rainbow table is generated by pre-computing a large number of input permutations to that function.  Then, given a result, it should be easy to look-up the result in a table to determine which input(s) generate it.  That way, you can effectively reverse a non-reversible function…

Applied to hashing (and in this particular context, password hashing), a rainbow table is generated by generating a large number of candidate passwords (typically random, but may be dictionary based as well), and storing the password->hash mapping in a database or data file.  Then simply look-up the hash that you have to get the plain text password that may have generated it.

The First Problem: Storage Space

For a rainbow table to be effective, it must have a lot of candidate passwords in it.  Let’s take a look at an MD5 rainbow table, and see how much storage space it will require.  Let’s also assume that it will be stored in MySQL with a char(10) column for the password, and binary(16) column for the hash (storing it in a binary format).  So each row will have approximately 26 bytes of data (not including any overhead).  And lets look at source passwords of all printable non-control ASCII characters (there are 77 of them).

Length Of Password Number Of Possibilities Size Of Table
4 characters 35,153,041 913 MB
5 characters 2,706,784,157 70 GB
6 characters 208,422,380,089 5.4 TB
7 characters 16,048,523,266,853 417 TB
8 characters 1,235,736,291,547,681 32 PB (PetaBytes, 10^15)

As you can see, the number of possibilities goes up quite fast as you support longer passwords. So that means for a rainbow table to be effective, it must actually reduce the number of possible candidates that it stores.  After all, who would want to download 32 Petabytes to crack a hash?  Sure, you could use a dictionary and permutations on the words to try to reduce the search space significantly without cutting down on effectiveness much (statistically speaking).  But that also means a much greater resistance to strong-but-short passwords.

The Second Problem: Hash Algorithms

Hash algorithms are designed with two things in mind: security and speed.  Their typical role is to create a MAC (message authentication code) for a document.  So by hashing the document, you can tell if the original document is the same as long as the generated hashes match.  So since they need to process a lot of data (potentially gigabytes or more), a key requirement is speed.  In fact, most modern “secure” algorithms are even faster than their predecessors on modern hardware (for example, sha256 is several times faster than md5 which is much older).

The faster the hash function is, the less reason there is to use a rainbow table.  After all, the rainbow table is just a time-storage trade-off (you’re reducing time by using more storage).  So since hash functions are only getting faster, the benefit of a rainbow table is diminished.

The Third Problem: Salts

Salts are a random token (usually used only once) that is combined with the password before hashing.  They are specifically used to prevent the use of a rainbow table.  Note that using a salt doesn’t directly prevent a rainbow table from being used, it just reduces its effectiveness.  It artificially increases the length of a password in the rainbow table (so to crack a 4 character password with a 4 character salt, you’d need to generate an 8 character rainbow table).  In practice, most usual lengths of salts are too big to generate a universal rainbow table (for a 32 character salt and 8 character password, the rainbow table would need to be 2.8*10^75 bytes).  So another method that attackers use is to steal the salt along with the hash, and then generate a new rainbow table for each salt.  That’s why it’s so important to use a unique salt for each stored password (it reduces the return on investment that the new rainbow table will provide).

Why Were They Popular?

Rainbow tables were popular for one key reason: Up until very recently, disk was significantly cheaper than CPU time.  It was easier to pre-compute the rainbow table (which can take a very long time) than to do hashes as needed.

The Reality Today

I know what you’re thinking…  “Isn’t disk space even cheaper today than it was a few years ago?”…  Yes it is.  But CPU time is even cheaper by several orders of magnitude.  In 2000, the cost of a hard drive was about $13 per gigabyte.  Today, the cost of a hard drive is about $0.10 per gigabyte.  That’s 2 orders of magnitude!  But if we look at a Pentium 3, it could achieve about 300 mflops (millions of floating point operations per second) for $825, for an average of $2.75 per mflop.  A modern Intel i7 can do about 107,000 mflops for $999, averaging about $0.0093 per mflop.  That’s a 4 order or magnitude difference!

But wait; we have a reasonably new contender!  Enter, the GPU.  A single Radeon HD 6990M can achieve approximately 1,600,000 mflops for about $700.  Computed down, that’s a whopping $0.00043 per mflop.  That’s about an order of magnitude less than the Intel i7, and 5 orders of magnitude less than the P3.  Not to mention the raw performance is 4 orders of magnitude greater!

How Many Hashes Per Second?

Well, there’s a password cracking tool called John the Ripper.  Currently, it can hash up to 514 million (DES crypt()) hashes per second (abbreviated mhps from here out) on a modern 4 core CPU (Intel x7550).  When using a more modern algorithm such as sha256, John the Ripper can do a rather measly 200,000 hashes per second.  At that rate it would take 3 minutes to generate a 4 character rainbow table.  Fast, but not fast enough for our purposes.

Now, let’s look at what a GPU can do.  Bitcoin currently uses 2 internal sha256 rounds to compute a single “hash”.  So when we look at the performance numbers they are reporting, we need to realize that’s for 2 sha256 hashes.  If we look at the fastest single card setup (an ATI 5970), it does over 860 million bitcoin hashes per second.  That’s over 1.720 billion sha256 hashes per second!  And a 3 card setup can hit almost 4.2 billion sha256 hashes per second.  So let’s take a look at our chart again, this time for a salted sha256 password:

Length Of Password Number Of Possibilities CPU GPU
4 characters 35,153,041 3 minutes 0.0083 seconds
5 characters 2,706,784,157 3.75 hours 0.64 seconds
6 characters 208,422,380,089 12 days 49 seconds
7 characters 16,048,523,266,853 2.5 years 1.06 hours
8 characters 1,235,736,291,547,681 195 years 3.4 days

So, for about $2100, we can have a set of 3 GPUs that can brute force any printable 8 character password possible in about 3.4 days. And that’s at the absolute worst case possible.  If we started to do intelligence things such as using a dictionary as the base for our search, we could likely find that password much, much faster.

The Other Benefit To Brute Forcing

The other benefit to brute forcing, is you invest practically nothing in the algorithm.  For a rainbow table you need to provide both cpu time to generate (a lot of it) and storage space (a lot of it). Not to mention thinks like disk seek time.  An average high end hard drive has a seek time of around 4ms.  So to merely read the data stored in a rainbow table for a 4 character password, you’re spending about 1/2 the time taken by the gpu just seeking in the database file.  Then, the computer needs to do a full scan of all of the data to search for the hash value.  So in the end, for a 4 character password, it’s likely cheaper in all accounts just to brute force it on a GPU than it is to generate a rainbow table.

A Word On Entropy

All of the numbers that I’ve used in this article are based off the assumption that password choice is fully random.  That’s the worst case situation.  That means that given n bits of data, it would take on average 2^(n-1) tries to have a 50% chance of guessing it.  So for a pure random 8 character password (printable characters), you’d need on average about 1.7 days on a GPU to brute force it.  Each character in our pure random password has about 6.26 bits of entropy (due to the 77 possible characters, instead of 256).  So an 8 character password has about 50 bits of entropy (and this is true, since 2^50 is about 10^15, which is what we calculated above).

But that’s not the way of the world.  The vast majority of passwords are user generated.  And user generated passwords tend to have significantly less entropy.  In fact, according to NIST (Appendix A), a 8 character password with symbols and numbers would only have about 18 bits of entropy.  It could be 24 bits if there existed both upper-case and lower-case characters.  But 2^24 is only about 16 million.  So notice that our 4 character random password is actually on average twice as strong as a user-selected 8 character password.  In the worst case, it would take the full 2^50 tries to guess a user selected 8 character password, so that’s the same.  But the 50% chance occurs much sooner at 2^23 than the random password at 2^49.

Speaking of entropy, we’re going to revisit the concept in another post soon (specifically about what a recent web-comic pontificated)…

Finally

The overall point is simple.  A rainbow table is a useful tool.  But it’s also an outdated tool that doesn’t mean nearly as much as it used to.  In the era of the cheap GPU, brute forcing is more than a possibility, it’s a fact.  Using an algorithm because it’s resistant to a rainbow table is not only obsolete, it bypasses the bigger problem.  You need to hash your passwords so that they are hard to brute force.  If they are hard to brute force, they will be hard to rainbow table as well.

Presently, there are about 3 algorithms for PHP that will provide adequate defense against brute forcing. BCrypt (called Blowfish in PHP’s docs), PBKDF2 and PHPASS‘s internal function (in order from strongest to weakest).  It’s worth noting that projects such as Drupal, PHPBB and WordPress have all implemented either PHPASS or a derivative thereof.  All of the algorithms accept a “work factor” which controls how much CPU time the algorithm takes.  By artificially slowing down the hash, brute forcing is made significantly harder (but not impossible).

Use an algorithm that has protections against brute forcing, as protecting against rainbow tables alone is a lost battle…

Posted by Anthony Ferrara at 8/16/2011 10:00:00 AM

 

LAME

LAME is a free software codec used to encode/compress audio into the lossy MP3 file format.

The name LAME is a recursive acronym for “LAME Ain’t an MP3 Encoder”.[1] Around mid-1998, Mike Cheng created LAME 1.0 as a set of modifications against the “8Hz-MP3” encoder source code. After some quality concerns raised by others, he decided to start again from scratch based on the “dist10” MPEG reference software sources. His goal was only to speed up the dist10 sources, and leave its quality untouched. That branch (a patch against the reference sources) became Lame 2.0. The project quickly became a team project. Mike Cheng eventually left leadership and started working on tooLAME (an MP2 encoder).

Mark Taylor then started pursuing increased quality in addition to better speed, and released version 3.0 featuring gpsycho, a new psychoacoustic model he developed.

http://sourceforge.net/projects/lame/files/lame/3.99/

http://www.majorgeeks.com/files/details/lame.html

http://gabriel.mp3-tech.org/lame/

http://www.rarewares.org/mp3-lame-libraries.php

A few key improvements, in chronological order:

  • May 1999: a new psychoacoustic model (gpsycho) is released along with LAME 3.0.
  • June 1999: The first variable bitrate implementation is released. Soon after this, LAME also became able to target lower sampling frequencies from MPEG-2.
  • November 1999: LAME switches from a GPL license to an LGPL license, which allows using it with closed-source applications.
  • May 2000: the last pieces of the original ISO demonstration code are removed. LAME is not a patch anymore, but a full encoder.
  • December 2003: substantial improvement to default settings, along with improved speed. LAME no longer requires user to use complicated parameters to produce good results.
  • May 2007: default variable bitrate encoding speed is vastly improved.

Like all MP3 encoders, LAME implements some technology covered by patents owned by the Fraunhofer Society and other entities.[2] The developers of LAME do not themselves license the technology described by these patents. Distributing compiled binaries of LAME, its libraries, or programs that derive from LAME in countries that recognize those patents may be patent infringing.

The LAME developers state that, since their code is only released in source code form, it should only be considered as an educational description of an MP3 encoder, and thus does not infringe any patent by itself when released as source code only. At the same time, they advise users to obtain a patent license for any relevant technologies that LAME may implement before including a compiled version of the encoder in a product.[3] Some software is released using this strategy: companies use the LAME library, but obtain patent licenses.

In November 2005, there were reports that the Extended Copy Protection rootkit included on some Sony Compact Discs included portions of the LAME library without complying with the terms of the LGPL.[4]

Audacity is a free digital audio editor and recording application, available for Windows, Mac OS X, Linux and other operating systems.[5][4] Audacity was started by Dominic Mazzoni and Roger Dannenberg at Carnegie Mellon University.[1] As of 10 October 2011, it was the 11th most popular download from SourceForge, with 76.5 million downloads.[7] Audacity won the SourceForge 2007 and 2009 Community Choice Award for Best Project for Multimedia.[8][9]

La aritmética de Trachtenberg

Así como Viktor Emil Frankl desarrollo la logoterapia para superar los rigores de los campos de concentración Nazi, Jakow Trachtenberg ocupo su mente en desarrollar un sistema de aritmética mental al verse en la misma situación.

El sistema Trachtenberg de rápido cálculo mental, similar a las matemáticas Védicas, consiste en un conjunto de patrones para realizar operaciones aritméticas. Los algoritmos más importantes son multiplicación,división, y adición. El método también incluye algoritmos especializados para realizar multiplicaciones por números entre 5 y 13.

Multiplicación por 11

Abusando de la notación

(11)a = 11Σai10i =

an10n+1 + [Σj=0n-1(aj+aj+1)10j ]+ a0

Multiplicación por 12

(12)a = 12Σai10i =

an10n+1 + [Σj=0n-1 (aj+2aj+1)10j ]+ 2a0

Multiplicación por 6

Definiendo

bj = aj/2, donde / denota división entera

cj = aj mod 2

tenemos

aj = 2bj + cj

(6)a = (10/2)Σai10i  + Σai10i =

Σbi10i+1 + Σ(ai + 5ci)10i

bn10n+1 + [Σj=1n(aj + 5cj + bj-1)10j ]+ (a0 + 5c0)

Expresando el algoritmo en python:

def x6(number):
previous = 0
result = 0
power_of_10 = 1
while (number):
digit = number%10
odd_term = 5 if digit%2 else 0
result =
(digit + odd_term + previous ) *
power_of_10 + result
previous = digit//2
power_of_10 *= 10
number = number // 10
result = previous * power_of_10 + result
return result

Multiplicación por 7

De manera similar al caso anterior:

aj = 2bj + cj

(7)a = (10/2)Σai10i  + Σ2ai10i =

Σbi10i+1 + Σ(2ai + 5ci)10i

bn10n+1 + [Σj=1n(2aj + 5cj + bj-1)10j ]+ (a0 + 5c0)

Expresando el algoritmo en python:

def x7(number):
previous = 0
result = 0
power_of_10 = 1
while (number):
digit = number%10
odd_term = 5 if digit%2 else 0
result =
(2*digit + odd_term + previous ) *
power_of_10 + result
previous = digit//2
power_of_10 *= 10
number = number // 10
result = previous * power_of_10 + result
return result

Multiplicación por 5

De manera similar al caso anterior:

aj = 2bj + cj

(5)a = (10/2)Σai10i   =

Σbi10i+1 + Σ(5ci)10i

bn 10n+1 + [Σj=1n(5cj + bj-1)10j ]+ (5c0)

Expresando el algoritmo en python:

def x5(number):
previous = 0
result = 0
power_of_10 = 1
while (number):
digit = number%10
odd_term = 5 if digit%2 else 0
result =
(odd_term + previous ) *
power_of_10 + result
previous = digit//2
power_of_10 *= 10
number = number // 10
result = previous * power_of_10 + result
return result

Multiplicación por 9

Definiendo

b = 10n+1 – Σj=0naj , o sea el complemento a 10 de a

tenemos

(9)a = 10a –a =

10a –a + b – b =

10a + b – 10n+1 =

(an – 1)10n+1 + [Σj=1n(bj + aj-1)10j ]+ (b0 )

Expresando el algoritmo en python:

def x9(number):
previous = number%10
result = 10 - previous
power_of_10 = 10
number = number // 10
while (number):
digit = number%10
result =
(9 - digit + previous ) *
power_of_10 + result
previous = digit
power_of_10 *= 10
number = number // 10
result =
(previous-1) * power_of_10 +
result
return result

Multiplicación por 8

Definiendo

b = 10n+1 – Σj=0naj , o sea el complemento a 10 de a

tenemos

(8)a = 10a –2a =

10a –2a +2 b – 2b =

10a + 2b – (2)10n+1 =

(an – 2)10n+1 + [Σj=1n(2bj + aj-1)10j ]+ (2b0 )

Expresando el algoritmo en python:

def x8(number):
previous = number%10
result = 2*(10 - previous)
power_of_10 = 10
number = number // 10
while (number):
digit = number%10
result =
(2*(9 - digit) + previous ) *
power_of_10 + result
previous = digit
power_of_10 *= 10
number = number // 10
result =
(previous-2) *
power_of_10 + result
return result

Multiplicación por 3 y por 4

Los algoritmos para multiplicar por 3 y por 4 combinan las ideas usadas en la multiplicación por 5 y por 9.

Definiendo

b = 10n+1 – Σj=0naj , o sea el complemento a 10 de a

ai = 2ci + di, donde

ci = ai/2

di = ai mod 2

tenemos

(4)a = 5a –a =

10c + 5d + b – 10n+1

(3)a = 5a –2a =

10c + + 5d + 2b – (2)10n+1

Expresando los algoritmos en python:

def x3(number):
digit = number%10
result = 2*(10 - digit)
if digit % 2:
result += 5
previous = digit // 2
power_of_10 = 10
number = number // 10
while (number):
digit = number%10
odd_term = 5 if digit%2 else 0
result +=(2*(9 - digit) + odd_term + previous ) * power_of_10
previous = digit//2
power_of_10 *= 10
number = number // 10
result = (previous-2) * power_of_10 + result
return result

def x4(number):
digit = number%10
result = (10 - digit)
if digit % 2:
result += 5
previous = digit // 2
power_of_10 = 10
number = number // 10
while (number):
digit = number%10
odd_term = 5 if digit%2 else 0
result +=((9 - digit) + odd_term + previous ) * power_of_10
previous = digit//2
power_of_10 *= 10
number = number // 10
result = (previous-1) * power_of_10 + result
return result

Referencias

La historia de π (pi)

Una de las experiencias más satisfactorias para mi ha sido leer A history of PI de Petr Beckmann.

Aunque los logros específicos del saber humano se dan a través de individuos al ver la historia el contexto social parece ser determinante para el desarrollo tecnológico y el entendimiento científico. Como dijo alguien con respecto a la bomba atómica:

 El secreto es saber que se puede hacer.

Por otro lado, los genios son cosa rara. Consideramos el siglo veinte y lo que va del veintiuno como superiores al resto de la historia humana en términos de entendimiento científico y avance tecnológico pero tal vez todavía no terminamos de aprehender lo que Newton percibió y plasmo en su obra hace 300 años.

Pi es interesante porque el circulo es interesante. El circulo es una forma ideal abstracta que no existe en la realidad pero también es la forma de muchos objetos de la vida diaria.

Algunas personas pueden entender que un objeto redondo es aproximadamente circular pero que si medimos con suficiente precisión no hay círculos perfectos en el mundo. Para algunos lograr este salto de abstracción no es una posibilidad y logran demostrar que pi es igual 20612/6561, que en términos prácticos, en términos de medir una mesa, o rebanar un pastel esta más que bien, pero en términos de capacidad de desarrollar tecnología, por ponerlo de alguna manera, es un callejón sin salida. La practicidad es un duende travieso que nos permite salir adelante ante los retos de la vida pero que si nos descuidamos nos lleva por los senderos del estancamiento y de la corrupción. Empecemos por valorar a los que pueden, tratemos de entender. El primer paso, según alcohólicos anónimos es aceptar el problema. La educación es el camino, trabajemos para que nuestros niños sepan observar, pensar, discutir, y hacer, no para que sean científicos, sino para que todos vivamos mejor.

Un lector del libro de Beckmann comparte su frustración en Internet:

I think my main problem with the book is that I was looking for an interesting narrative that explores the impact of pi from a cultural and personal point of view. What I got was a mathematical primer on pi, heavy on formulas, charts and graphs, peppered with bland historical facts easily obtained from general knowledge history books and encyclopedias.

Es curioso el comentario porque el libro es ameno, atestiguado por sus ventas, y las matemáticas son un lenguaje para hablar de cosas como pi, es decir son parte de la narrativa. Parece que la comunicación entre el hemisferio izquierdo y el derecho del cerebro no es tan fácil. Para algunos, como Pitágoras, los números son mágicos y su manipulación un camino para controlar el destino.

Volviendo a pi, veamos como expresarlo como una fracción.

Empecemos con una aproximación
pi=3.141592653589793+.
=
3 + 0.14159…

Tomado el reciproco de la parte fraccionaria
1
3 + ————
7.06251…

Iterando el procedimiento
1
3 + —————-
7 + 0.06251…

1
3 + ——————-
1
7 + ————-
15.99658…

Si la parte decimal es mayor a .5 podemos acercarnos por arriba
1
3 + —————————
1
7 + ———————
1
16 – ————–
292.98696…

Simplificando

1
3 + ———-
1
7 + —-
16

1
3 + ——-
113
—–
16

16
3 + —–
113

335
—–
113

Las primeas cuatro aproximaciones a pi corresponden con valores históricos;

3/1 3.000000000000000
22/7 3.142857142857143
355/113 3.141592920353983
104348/33215 3.141592653921421

Referencias: