I need to hash strings to a shorter checksum on a "BigData" heavy-throughput project. The common choice would be SHA, probably SHA1 for speed reasons or CRC32 as the checksums will be used internally only and don't need to be cryptographic secure. A StackExchange answer suggested MurmurHash3, but how does it play with Perl?
Databases (and search engines like Elasticsearch) typically store the date of birth instead of the current age. It's a simple date value instead of a calculated one which must be maintained every day. But statistics often should contain the age - which is much more pleasant for humans than the date (or year) of birth. This post shows an easy way to use the Elasticsearch date_histogram aggregation to output age buckets instead of counting users by their year of birth.
Most projects are growing. New features get added, old ones deprecate. But as life goes on, ancient parts of the source code stay alive even if they're not being used anymore. My current cleanup challenge has 581k lines of code in 1500 files grown for about 15 years. Part one: Find defined, but unused subs.
When it comes to application-level caching, only two options seem to exist: Memcached and Redis. I've been using Memcached for years but wanted to re-check my choice just before adding a caching layer to another project.
Regular Expressions are powerful and typically fast. A recent script is using a set of about 1800 expressions (from a database) on roughly five million strings per day, typically 1 - 2 kB long. The RegEx matches take a lot of time and so I tried to speed them up. Working on the regular expression strings would be an option, but I also wanted to test if a methodic approach would help.
Many people keep asking me: How to work from home? Let me take you for a little tour to show you why I love to work from home. We'll start with my desk. I do work everywhere: On a train, in the garden, on a plane, sitting in a hotel lobby while little Robyn is sleeping in our room or - stereotype - in a coffee shop, but my favorite place is my desk at home.
Was haben die BeeGee's mit der Mafia zu tun? Und warum ist FAST bei schiefem Lächeln so wichtig? Oder im typischen Facebook-Sprech: Sie wollte nur ein paar Rockstarfotos machen, aber was dann passiert, damit hatte keiner gerechnet...
I started working myself though a long list of unfixed warnings today and encountered something I didn't see before: Reference found where even-sized list expected at project/Something.pm:573. The message seems to be clear, but do you find the problem at line 573?
ElasticSearch is a search engine. It's made for extremly fast searching in big data volumes. But sometimes one needs to fetch some database documents with known IDs. I found five different ways to do the job. Let's see which one is the best.
I bought a Cherry G84-4700PUCDE-2 keypad about three years ago to have some "special multimedia keys" on the left side of my keyboard. It used to work after some trying until I upgraded to Ubuntu 14.04. The "trusty" release removed support for /lib/udev/findkeys and /lib/udev/keymap and replaced both by something called "hwdb". Converting turned out to be hard, because there are many wrong hints out there spread over the internet.
Software often needs to transform values from A to B. Such transformations (given they're static) might be done using a database table, if/elsif blocks or a mapping table. Such tables are easy to create, maintain and understand. A database is always the slowest solution for a limited number of items, because the overhead for the client, network and database server is very big compared to sourcecode processing. Sourcecode-based solutions are faster, but which one is the best.
Perl's "Regular Expression" Engine is one of the most flexible and powerful pattern matching and manipulation tools. "Easy" and "powerful" often behave like magnetic poles of the same kind: They can't be together. But the "s" and "m" suffix modifiers supported by the Perl RegEx engine aren't that complicated to understand but still very powerful.
Error messages should be simple, clear and easy to understand. But there are differences: A developer writing some sourcecode will think of something different as "easy to understand" than a user who doesn't know the source or internals. MongoDB reports a "DBClientCursor::init call() failed" on connect errors. Do you know what this message means?
Some errors are really hard to find: They appear only sometimes or only on live systems or within complex source that can't run manually using a debugger. Adding debug output might help, but might also be confusing as the DBI error code 4 "statement contains no result" does.