perl

Perl is a family of high-level, general-purpose, interpreted, dynamic programming languages. The languages in this family include Perl 5 and Perl 6.[4]

Though Perl is not officially an acronym,[5] there are various backronyms in use, such as: Practical Extraction and Reporting Language.[6] Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier.[7] Since then, it has undergone many changes and revisions. The latest major stable revision of Perl 5 is 5.18, released in May 2013. Perl 6, which began as a redesign of Perl 5 in 2000, eventually evolved into a separate language. Both languages continue to be developed independently by different development teams and liberally borrow ideas from one another.

The Perl languages borrow features from other programming languages including C, shell scripting (sh), AWK, and sed.[8] They provide powerful text processing facilities without the arbitrary data-length limits of many contemporary Unix tools,[9] facilitating easy manipulation of text files. Perl 5 gained widespread popularity in the late 1990s as a CGI scripting language, in part due to its parsing abilities.[10]

In addition to CGI, Perl 5 is used for graphics programming, system administration, network programming, finance, bioinformatics, and other applications. It’s nicknamed “the Swiss Army chainsaw of scripting languages” because of its flexibility and power,[11] and possibly also because of its perceived “ugliness”.[12] In 1998, it was also referred to as the “duct tape that holds the Internet together”, in reference to its ubiquity and perceived inelegance.[13]

Perl was originally named “Pearl”. Wall wanted to give the language a short name with positive connotations; he claims that he considered (and rejected) every three- and four-letter word in the dictionary. He also considered naming it after his wife Gloria. Wall discovered the existing PEARL programming language before Perl’s official release and changed the spelling of the name.[36]

When referring to the language, the name is normally capitalized (Perl) as a proper noun. When referring to the interpreter program itself, the name is often uncapitalized (perl) because most Unix-like file systems are case-sensitive. Before the release of the first edition of Programming Perl, it was common to refer to the language as perl; Randal L. Schwartz, however, capitalized the language’s name in the book to make it stand out better when typeset. This case distinction was subsequently documented as canonical.[37]

There is some contention about the all-caps spelling “PERL”, which the documentation declares incorrect[37] and which some core community members consider a sign of outsiders.[38] The name is occasionally expanded as Practical Extraction and Report Language, but this is a backronym.[39] Other expansions have been suggested as equally canonical, including Wall’s own humorous Pathologically Eclectic Rubbish Lister.[40] Indeed, Wall claims that the name was intended to inspire many different expansions.[41]

The Comprehensive Perl Archive Network (CPAN) currently has 121,260 Perl modules in 27,769 distributions, written by 10,733 authors, mirrored on 270 servers.

The archive has been online since October 1995 and is constantly growing.

CPAN, the Comprehensive Perl Archive Network, is an archive of over 114,000 modules of software written in the Perl programming language, as well as documentation for them.[1] It has a presence on the World Wide Web at www.cpan.org and is mirrored worldwide at more than 200 locations.[2] CPAN can denote either the archive network itself, or the Perl program that acts as an interface to the network and as an automated software installer (somewhat like a package manager). Most software on CPAN is free and open source software.[3] CPAN was conceived in 1993, and the first web-accessible mirror was launched in January 1997.[4]

Like many programming languages, Perl has mechanisms to use external libraries of code, making one file contain common routines used by several programs. Perl calls these modules. Perl modules are typically installed in one of several directories whose paths are placed in the Perl interpreter when it is first compiled; on Unix-like operating systems, common paths include /usr/lib/perl5, /usr/local/lib/perl5, and several of their subdirectories.

Perl comes with a small set of core modules. Some of these perform bootstrapping tasks, such as ExtUtils::MakeMaker, which is used for building and installing other extension modules; others, like CGI.pm, are merely commonly used. The authors of Perl do not expect this limited group to meet every need, however.

The CPAN’s main purpose is to help programmers locate modules and programs not included in the Perl standard distribution. Its structure is decentralized. Authors maintain and improve their own modules. Forking, and creating competing modules for the same task or purpose is common. There is no formal bug tracking system, but there is a third-party bug tracking system that CPAN designated as the suggested official method of reporting issues with modules. Continuous development on modules is rare; many are abandoned by their authors, or go years between new versions being released. Sometimes a maintainer will be appointed to an abandoned module. They can release new versions of the module, and accept patches from the community to the module as their time permits. CPAN has no revision control system, although the source for the modules is often stored on GitHub. Also, the complete history of the CPAN and all its modules is available as the GitPAN project, allowing to easily see the complete history for all the modules and for easy maintenance of forks. CPAN is also used to distribute new versions of Perl, as well as related projects, such as Parrot.

The CPAN is an important resource for the professional Perl programmer. With over 23,000 modules (containing 20,000,000 lines of code) as of July 2011, the CPAN can save programmers weeks of time, and large Perl programs often make use of dozens of modules. Some of them, such as the DBI family of modules used for interfacing with SQL databases, are nearly irreplaceable in their area of function; others, such as the List::Util module, are simply handy resources containing a few common functions.

Files on the CPAN are referred to as distributions. A distribution may consist of one or more modules, documentation files, or programs packaged in a common archiving format, such as a gzipped tar archive or a ZIP file. Distributions will often contain installation scripts (usually called Makefile.PL or Build.PL) and test scripts which can be run to verify the contents of the distribution are functioning properly. New distributions are uploaded to the Perl Authors Upload Server, or PAUSE (see the section Uploading distributions with PAUSE).

In 2003, distributions started to include metadata files, called META.yml, indicating the distribution’s name, version, dependencies, and other useful information; however, not all distributions contain metadata. When metadata is not present in a distribution, the PAUSE’s software will usually try to analyze the code in the distribution to look for the same information; this is not necessarily very reliable.

With thousands of distributions, CPAN needs to be structured to be useful. Distributions on the CPAN are divided into 24 broad chapters based on their purpose, such as Internationalization and Locale; Archiving, Compression, And Conversion; and Mail and Usenet News. Distributions can also be browsed by author. Finally, the natural hierarchy of Perl module names (such as “Apache::DBI” or “Lingua::EN::Inflect”) can sometimes be used to browse modules in the CPAN.

CPAN module distributions usually have names in the form of CGI-Application-3.1 (where the :: used in the module’s name has been replaced with a dash, and the version number has been appended to the name), but this is only a convention; many prominent distributions break the convention, especially those that contain multiple modules. Security restrictions prevent a distribution from ever being replaced, so virtually all distribution names do include a version number.

There is also a Perl core module named CPAN; it is usually differentiated from the repository itself by using the name CPAN.pm. CPAN.pm is mainly an interactive shell which can be used to search for, download, and install distributions. An interactive shell called cpan is also provided in the Perl core, and is the usual way of running CPAN.pm. After a short configuration process and mirror selection, it uses tools available on the user’s computer to automatically download, unpack, compile, test, and install modules. It is also capable of updating itself.

More recently, an effort to replace CPAN.pm with something cleaner and more modern has resulted in the CPANPLUS (or CPAN++) set of modules. CPANPLUS separates the back-end work of downloading, compiling, and installing modules from the interactive shell used to issue commands. It also supports several advanced features, such as cryptographic signature checking and test result reporting. Finally, CPANPLUS can uninstall a distribution. CPANPLUS was added to the Perl core in version 5.10.0.

Both modules can check a distribution’s dependencies and can be set to recursively install any prerequisites, either automatically or with individual user approval. Both support FTP and HTTP and can work through firewalls and proxies.

Install all dependent packages for CPAN

sudo apt-get install build-essential

Invoke the cpan command as a normal user

cpan

Once you hit on enter for “cpan” to execute, you be asked of some few questions. To make it simple for yourself, answer “no” for the first question so that the latter ones will be done for you automatically.

Enter the commands below

make install
install Bundle::CPAN

Now all is set and you can install any perl module you want.

Type o conf init to reconfigure cpan.

The Best Perl Programmers Use Modern Perl

by chromatic

In 1987, Perl 1.0 changed the world. In the decades since then, the language has grown from a simple tool for system administration somewhere between shell scripting and C programming to a powerful, general purpose language steeped in a rich heritage.

Even so, most Perl 5 programs in the world take far too little advantage of the language. You can write Perl 5 programs as if they were Perl 4 programs (or Perl 3 or 2 or 1), but programs written to take advantage of everything amazing the worldwide Perl 5 community has invented, polished, and discovered are shorter, faster, more powerful, and easier to maintain than their alternatives.

They solve difficult problems with speed and elegance. They take advantage of the CPAN and its unparalleled library of reusable code. They get things done.

This productivity can be yours, whether you’ve dabbled with Perl for a decade or someone just handed you this book and said “Fix this code by Friday.”

Modern Perl is suitable for programmers of every level. It’s more than a Perl tutorial—only Modern Perl focuses on Perl 5.12 and 5.14, to demonstrate the latest and most effective time-saving features. Only Modern Perl explains how and why the language works, to let you unlock the full power of Perl.

Hone your skills. Sharpen your knowledge of the tools and techniques that make Perl so effective. Master everything Perl has to offer.

When you have to solve a problem now, reach for Perl. When you have to solve a problem right, reach for Modern Perl.

Visit the companion website at Modern Perl Books or read Modern Perl: the Book online.

Modern Perl installations include two clients to connect to, search, download, build, test, and install CPAN distributions, CPAN.pm and CPANPLUS. For the most part, each of these clients is equivalent for basic installation. This book recommends the use of CPAN.pm solely due to its ubiquity. With a recent version (as of this writing, 1.9800 is the latest stable release), module installation is reasonably easy. Start the client with:

    $ cpan

To install a distribution within the client:

    $ cpan
    cpan[1]> install Modern::Perl

… or to install directly from the command line:

    $ cpan Modern::Perl

Eric Wilhelm’s tutorial on configuring CPAN.pm http://learnperl.scratchcomputing.com/tutorials/configuration/ includes a great troubleshooting section.

Introducción a Perl

Practical Extraction and Reporting Language.

There is more than one way to do it

Puntos a favor de perl:

  • perl es un lenguaje de alto nivel
  • perl es gratis
  • perl puede escribir y leer archivos binarios
  • perl puede tener múltiples archivos de entra y salida abiertos al mismo tiempo
  • Tiene un generador de reportes
  • Maneja expresiones regulares
  • Maneja arreglos lineales y asociativos
  • Es poderoso y simplifica la programación
  • Puede procesar archivos muy grandes sin limites en el tamaño de registro
  • perl incluye un conjunto amplio y poderoso de instrucciones para manejo de cadenas de caracteres y arreglos
  • Cualquier cosa se puede realizar de múltiples formas

Ejemplo de programa en Perl:

# Este sencillo programa copia registros de un archivo
# y agrega un prefijo a cada línea con un numero en secuencia
while (< >){
# while () {} genera un lazo de control que continua mientras el
# enunciado en paréntesis es verdadero.
# la instrucciones en el lazo están dentro de los corchetes {}
# < > es un símbolo especial
# Le dice a Perl que busque en la línea de comando y vea si se
# especificaron algunos archivos.
# Si es el caso, entonces se lee cada uno en turno.
# Si no se especifica ningún archivo entonces se lee de
# la entrada normal (standard input)
# Cualquiera que sea el caso los caracteres que se leen se guardan
# en la variable especial $_
# Cuando <> llega al fin de archivo (end-of-file), regresa un valor de falso,
# lo cual termina el lazo.
print STDOUT ++$i, $_;# print es un método simple sin formato de impresión
# STDOUT es una referencia de archivo normal (standard filehandle)
# para la salida normal (Standard Output).
# Filehandles se especifican en MAYUSCULAS en perl.
# ++$i indica incrementar el valor de $i y dejar el valor disponible
# para la instrucción print
# Todos los valores escalares ( es decir cualquier cosa menos una instrucción,
# un arreglo lineal, un arreglo asociativo, filehandle, o nombre de procedimiento)
# empieza con $ en perl # $_ es el operador de default de cualquier instrucción
# en este caso, $_ contiene el último registro que leyó la instrucción<>
# ; termina cada instrucción en perl
}

Breve revisión de la sintaxis de perl

  • En perl es significativo el caso de los caracteres y se diferencia entre mayúsculas y minúsculas
  • No utilice nombres que empiezen con un numero, ya que estos comúnmente son símbolos especiales para perl, por ejemplo $1, $2, etc.
  • Todas las instrucciones en perl terminan con punto y coma ;
  • Comentarios se pueden insertar en un programa con el símbolo #, y cualquier cosa después de # hasta el fin de línea será ignorado
  • perl identifica cada tipo de variable o nombre de dato con un prefijo. Estos caracteres son:

    Tipo Carácter Comentario
    Escalar $ Un numero o cadena de caracteres
    Vector lineal @ Un arreglo referenciado por un numero índice.
    Subí­ndices entre paréntesis cuadrados [].
    @cosa se refiere al arreglo completo.
    $cosa[1] se refiere al escalar que ocupa la segunda posición en el arreglo
    Vector asociativo % Un vector referenciado por una llave de texto, no necesariamente un número.
    Subí­ndices entre{}.
    %cosa se refiere al vector completo.
    $elemento{“x”} se refiere al escalar que corresponde a la llave “x”
    filehandle UC Los apuntadores se archivo se escriben en mayúsculas
    Subrutina & Una subrutina
    Etiqueta xx: Objeto de goto

  • Valores entre paréntesis () son listas. Las listas se usan frecuentemente como argumentos para una subrutina o llamada a función. No es necesario usar paréntesis si solo se usa un argumento o el programa conoce el limite de la lista.
  • Las variables $x, @x. %x, y &x, no necesitan estar relacionadas entres si, sin mencionar $X, @X, %X y &X.
  • Existen variables especiales, las más importantes son:
    $_
    Es el valor escalar de default. Si no se especifica un nombre de variable en una función donde se usa una variable escalar, se usa $_. Esto se usa bastante en perl
    @_
    Es la lista de argumentos a una subrutina
    @ARGV
    Es la lista de argumentos especificada en la línea de comando cuando el programa se ejecuta

Instrucciones básicas y control

Los corchetes {} se usan para contener un bloque de enunciados. Es posible tener variables locales dentro de un bloque. Bloques se usan como los objetos de la mayoría de los comandos de control

Asignación simple:

  • Asignación escalar
  • Listas de escalares
  • Lista a vector
  • Vector a lista
  • Vectores asociativos necesitan un llave, pero aparte de eso, funcionan como se espera de un vector
  • Al asignar un vector a un escalar se obtiene el numero de elementos del vector

Operaciones aritméticas

if-then-else

  • if( condición ) {  rama verdadera  }  else  {  rama falsa  }

  • if (condición) {instrucciones}  elsif (condición) {instrucciones}

    elsif (condición) {instrucciones}

  • unless (condición)  {  rama verdadera  }

  • La condición tiene una gama amplia de operadores comparativos. Es importante observar la diferencia entre operadores numéricos y de cadenas de caracteres.

    Numérico Cadenas Significado
    = = eq igual
    != ne no igual
    > gt mayor que
    < lt menor que

  • Cadenas de caracteres. que no están compuestas por números tienen un valor de cero.
  • perl cuenta con un conjunto extenso de pruebas de archivo:

    • -T cierto si archivo es de texto
    • -B cierto si archivo es binario
    • -M regresa el número de días desde la última modificación
    • -A regresa el número de días desde el último acceso al archivo
    • -C regresa el número de días desde la creación del archivo

Lazos de control
Los lazos más comunes son for y while

  • for ($i = 0; $i < 10; $i++) { instrucciones }
  • foreach $i (@items) { instrucciones }
  • foreach $i ($first .. $last) { instrucciones }
  • while (condición) { instrucciones }
  • until (condición) { instrucciones }
  • Las instrucciones next, last, redo, y continue se usan para escapar de un lazo.

Entrada/Salida
Abrir
Como en Unix, los tres primeros manejadores de archivos se abren automáticamente y son STDIN, STDOUT, y STDERR. Otros archivos se deben abrir explícitamente. La forma de la instrucción open es la siguiente:
open (FILEHANDLE,XFY);
donde X y Y son caracteres opcionales

X = <
Para abrir archivo F solo lectura
X = >
Para abrir archivo F solo escritura
X = > >
Para agregar datos al final de archivo F
X = |
Para escribir a un tubo (pipe) hacia programa F
Y = |
Para leer a un tubo (pipe) desde programa F

Si solo se da el nombre F, el archivo se abre de lectura/escritura

Lectura
La forma más básica de lectura es poner el manejador de archivos dentro de <>. Si no se provee una variable escalar para el registro, este se guarda en $_.
Escritura
La mayor parte de la escritura se hace usando la instrucción print o printf. Estas instrucciones se utilizan aún si el resultado no se va a imprimir realmente.
Cerrar
perl cierra automáticamente cualquier archivo al salir. Cuando se necesita cerrar un archivo se puede hacer con un cierre explicito.
Mensajes de error:

  • die se usa para imprimir un mensaje de error y terminar la ejecución
  • warn se usa para imprimir un mensaje de error pero continuar

Manejo de cadenas de caracteres:

  • split se usa para extraer fichas (tokens) o campos de una cadena a un vector.
  • sort ordena una lista o vector.
  • study optimiza operaciones de cadenas.

Codificación binaria:

  • pack empaca datos en una cadena usando un machote de formato
  • unpack recupera datos de una cadena usando un machote de formato
  • Existe una larga lista de formatos que se pueden usar
  • Se puede usar más de un formato a la vez

    • l long 32 bit signed integer
    • L long 32 bit unsigned integer
    • s short 16 bit signed integer
    • S short 16 bit unsigned integer
    • f float 32 bit floating point
    • d double 64 bit floating point
    • A ASCII string
    • c char a single byte (character)

Expresiones regulares:

perl añade un conjunto de caracteres al conjunto normal. Uno uso importante de expresiones regulares (RE) es el uso de () para seleccionar subconjuntos de la expresión regular. perl facilita el uso del operador (). Existen dos maneras de usar expresiones regulares en perl: Match y Substitute

Una expresión regular esta contenida en slashes, y el operador =~ evalúa.

Las expresiones regulares son sensitivas a mayúsculas y minúsculas

El operador !~ se usa para detectar diferencias.

Algunos caracteres especiales:

.
Cualquier Carácter menos newline
^
El principio de lí­nea o de cadena
$
El fin de línea o cadena
?
Cero o más del último Carácter
+
Uno o más del último Carácter
[]
Cualquiera de los caracteres dentro de los corchetes []
|
o inclusivo
()
Agrupar
Los caracteres especiales $, |, [, ), , / deben ir precedidos por backslash para usarse en expresiones regulares
$` $& $’
$` , $& y $’ se pueden usar para ver cuales fueron los caracteres que se encontraron antes, durante, y después de un empate

Referencias

Active State

Active Perl

Live tutorials

Distribuciones binarias

Open Perl IDE

Documentación.

Introduction to Perl