Reading random lines from a file with PHP


While developing a testing framework I decided it would be nice to use a random sample of records from Alexa’s Top 1 million domains list. Here is the function I wrote to read a random number of lines from the file.

function random_lines($filename, $numlines, $unique=true) {
    if (!file_exists($filename) || !is_readable($filename))
        return null;
    $filesize = filesize($filename);
    $lines = array();
    $n = 0;

    $handle = @fopen($filename, 'r');

    if ($handle) {
        while ($n < $numlines) {
            fseek($handle, rand(0, $filesize));

            $started = false;
            $gotline = false;
            $line = "";

            while (!$gotline) {
                if (false === ($char = fgetc($handle))) {
                    $gotline = true;
                } elseif ($char == "\n" || $char == "\r") {
                    if ($started)
                        $gotline = true;
                    else
                        $started = true;
                } elseif ($started) {
                    $line .= $char;
                }
            }

            if ($unique && array_search($line, $lines))
                continue;

            $n++;
            array_push($lines, $line);
        }

        fclose($handle);
    }

    return $lines;
}

// Example usage
$lines = random_lines('top-1m.csv', 100);
echo json_encode($lines) . PHP_EOL;

The output produced is:

["804254,2z2z.info","298052,taronga.org.au","601192,bnsi.net","211144,best.sk","506296,bridge9.com","767784,zibashahr.com","294162,mrbookmarking.com","894095,youtube.com\/user\/Gaja2A","781514,hochschober.at","133134,global.gr"]

No related content found.

Leave a comment