2 |
3 |
4 |
grep::metacpan is an open source search engine for the Comprehensive Perl Archive
5 | Network (CPAN), an ever growing archive of code and
6 | documentation for the Perl programming language. This includes a
7 | comfortable web-based view and a first class mirror of the canonical
8 | CPAN content.
9 |
10 |
This is Yet Another project for searching the CPAN.
11 |
12 |
This is not the first attempt to provide a grep through all CPAN distributions,
13 | but we hope you'll find it helpful. You should also try the original
14 | grep.cpan.me which uses a Redis database as backend.
15 |
16 |
17 |
This project is purely experimental. The goal is to see how 'git grep' can compete with a more
18 | traditional database approach. One of its main advantages is that it's easy to deploy on a standalone server/workstation.
19 | The drawback is that it can be *slow*... but you might be surprised by how fast it can be for some searches. :-)
20 | The more frequently a term is searched for, the faster the grep for that term gets.
21 |
22 |
23 |
24 |
Using grep::metacpan
25 |
26 |
You can consume grep::metacpan in two different ways:
27 |
28 |
32 |
33 |
Read more about how the code is organized by reading our GitHub repositories introduction.
34 |
35 |
(grep::)?MetaCPAN is a community effort. The original idea came about when Todd R. was working on the 'dot removal from @INC' (and grep.cpan.me wasn't available at that particular time).
36 |
37 |
38 |
Help wanted
39 |
40 |
We are always in need of more contributors, so feel free to submit merge request to one of our source code repositories.
41 |
42 |
Where did you get that great logo?
43 |
44 |
45 | The logo was stolen from metacpan, which came from Raul Matei who won the MetaCPAN logo competition
46 | (sponsored by the Enlightened Perl Organization) with his entry.
47 |
48 |
Babs V. kindly provided an altered version for this grep::metacpan website :-)
49 |
50 |
51 |
52 |
53 |
54 |
--------------------------------------------------------------------------------
/src/views/api.tt:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | The API is currently in its first version and very naive / limited, but it exists...
6 | the API is not versionned yet, but if a number should be picked '0.0' will be the best match, this gives you an idea of how far you can go with it.
7 |
8 |
9 |
Basic concepts
10 |
11 | The API is built around these simple concepts:
12 |
13 | - simple http GET queries
14 | - the output format is JSON
15 | - just add the '/api' prefix to any of your search, and you should be to go !
16 |
17 |
18 | Here is a simple example:
19 |
20 | if you are browsing grep.metacpan.org, just copy paste your URL which should looks like this
21 |
22 | https://grep.metacpan.org/api/search?q=test&qd=&qft=
23 |
24 |
25 | You just want to add the
/api/ prefix to your URI
26 |
27 | curl -X GET 'https://grep.metacpan.org/api/search?q=test&qd=&qft' | json
28 |
29 |
30 |
31 |
32 |
The Output Format
33 |
34 | The format is not definitive and still in an early stage development, but here's what it should looks like
35 |
36 |
Keys used at the main level of the output format
37 |
38 | - results: array which contains all the matching informations
39 | - is_incomplete: boolean which tell you if the request was truncated (after 2000 files), or if it contains all result from CPAN
40 | - search_in_progress: boolean to tell you if you need to query it a little later, as the query might still be running in background
41 | - time_elapsed: time in seconds to render the request
42 | - is_a_known_distro: boolean to tell you if the distro filter matches a known CPAN distribution
43 | - match: quick sumup metrics for your query
44 |
45 |
46 |
The results values
47 | The match of your query are stored in the
results array at the main level.
48 | This is a list hashes of all matching results. Where each hash describes the results
49 | for a specific distribution.
50 |
51 | The format of a distribution result is the following:
52 |
53 | - files: list of all files for this distribution matching your query
54 | - matches: list of array representing all matching codeblocks for a file
55 | - prefix: distribution filepath on disk
56 | - distro: distribution name
57 |
58 |
59 |
Codeblock structure
60 | A codeblock represent the output of git grep with some context,
61 | so we need to indicate which line is the first one in the code extract
62 | and which lines are the one matching your query.
63 |
64 | - matchlines: array of integers listing all the file line number matching the query.
65 | - code: code extract for the match with some context
66 | - start_at: integer indicating the first line number in the code extract
67 |
68 |
69 |
The sumup statistics
70 | Basic metrics about your query, you can know how many distributions and files match your query.
71 |
72 |
73 | - distros: integer value with the number of total distributions matching your query
74 | - files: integer value with the number of total files matching your query
75 |
76 |
77 |
Sample output format
78 |
79 | {
80 | "results": [
81 | {
82 | "files": [
83 | "MANIFEST",
84 | "t/test.t"
85 | ],
86 | "matches": [
87 | {
88 | "blocks": [
89 | {
90 | "matchlines": [
91 | "6",
92 | "7",
93 | "8",
94 | "9",
95 | "10"
96 | ],
97 | "code": "Changes\nMANIFEST\nMakefile.PL\nREADME\nlib/abbreviation.pm\nt/test.t\ntestlib/CAPS/On.pm\ntestlib/Foo.pm\ntestlib/FooBar/Baz.pm\ntestlib/FooBar/Baz/Doh.pm\n",
98 | "start_at": "1"
99 | }
100 | ],
101 | "file": "MANIFEST"
102 | }
103 | ],
104 | "prefix": "distros/a/abbreviation",
105 | "distro": "abbreviation"
106 | },
107 | ... --- additional results where cut from there ---
108 | ],
109 | "is_incomplete": 0,
110 | "search_in_progress": 0,
111 | "time_elapsed": 0.016502,
112 | "is_a_known_distro": "",
113 | "match": {
114 | "files": "64",
115 | "distros": "7"
116 | }
117 | }
118 |
119 |
120 |
121 |
Adding a filter to your query
122 |
123 | Here are the valid parameters for your http query:
124 |
125 |
126 | - q: string for the query pattern (required)
127 | - qci: boolean 0 or 1 for a case insensitive search (optional - default 0)
128 | - qd: string for the distribution filter pattern (optional)
129 | - qft: string for the file type filter pattern (optional)
130 | - f: string for a specific file name filter pattern (optional)
131 | - p: integer for the page number. As the WebUI this is using pagination (optional - default 1)
132 |
133 |
134 |
135 | Some API queries samples:
136 |
137 | - search for 'test' among all CPAN distributions:
138 |
/api/search?q=test
139 |
140 | - case insensitive search for 'test' among all CPAN distributions:
141 |
/api/search?q=test&qci=1
142 |
143 | - search for 'test' among all distributions matching '*snap*':
144 |
/api/search?q=test&qd=*snap*
145 |
146 |
147 |
Thumb Rules
148 |
149 |
Use it, but do not abuse it for now, or at your own risks :-) Try to be a good citizen.
150 | This is not designed on the same architecture than fastapi.metacpan.org...
151 |
152 |
153 |
--------------------------------------------------------------------------------
/src/views/faq.tt:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | Why grepcpan is slow ?
7 |
8 | short answer: this is a *young* experimental project using a 17Gb git repository with about a 1.5Gb index... this is pushing git to the limit...
9 |
10 | read the source code description page to have a better idea of the internal implementation.
11 |
12 |
13 |
14 | Can I user Regular Expression in my query ?
15 |
16 | yes you can ! we are using a 'git grep -P', which mean you can use the power of perl Regular Expression in your queries.
17 |
18 |
19 |
20 | What about UniCode characters ?
21 |
22 | no, you cannot use directly unicode as part of your query but you can use code points.
23 |
24 | more details from perlunicode manual
25 |
26 |
27 |
28 |
29 | Any other questions ?
30 |
31 | join us on IRC @irc.perl.org channel #metacpan
32 |
33 | new to IRC ? go to www.irc.perl.org
34 |
35 |
36 |
37 |
--------------------------------------------------------------------------------
/src/views/index.tt:
--------------------------------------------------------------------------------
1 | <% USE Math; %>
2 |
2 |
3 |
4 |
grep::metacpan is an open source experimental project developped by the Perl Community.
5 |
6 |
The source code is divided into three git repositories:
7 |
8 | <%
9 |
10 | SET gh_metacpan_grep_front_end = '
metacpan-grep-front-end';
11 | SET gh_metacpan_grep_builder = '
metacpan-grep-builder';
12 | SET gh_metacpan_cpan_extracted = '
metacpan-cpan-extracted';
13 |
14 | %>
15 |
16 |
17 | - <% gh_metacpan_grep_front_end %>, the Front End website which is this website...
18 | - <% gh_metacpan_grep_builder %> experiment on building a git grep service of current CPAN.
19 | - <% gh_metacpan_cpan_extracted %> extracted CPAN, all latest files extracted in one single *big* repository. Thanks GitHub !
20 |
21 |
22 |
The concept is very basic: extract all CPAN distribution (performed by <% gh_metacpan_grep_builder %>) in one single git directory (which lives in <% gh_metacpan_cpan_extracted %>).
23 |
24 |
25 | Then from there, cross fingers and use a pure 'git grep' implementation with a frontend on top of it: <% gh_metacpan_grep_front_end %>.
26 |
27 |
28 | The git grep is divided in two stages: 'git grep -l' to get the list
29 | of files matching the pattern (this is cached for future queries), then use the list of files to perform the actual 'git grep'.
30 |
31 |
32 | You can also browse other
metacpan projects by visiting the community
github homepage.
33 |
34 |
--------------------------------------------------------------------------------
/tools/pre-commit.pl:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env perl
2 |
3 | use strict;
4 | use warnings;
5 |
6 | # Hack to use carton's local::lib.
7 | use lib 'local/lib/perl5';
8 |
9 | use Test::More;
10 |
11 | exit( run() // 0 ) unless caller;
12 |
13 | sub run {
14 |
15 | if ( -e '/tmp/nohook' ) {
16 | note "skipping pre-commit hook: /tmp/nohook file exists";
17 | return 0;
18 | }
19 |
20 | note "1 - Trimming spaces";
21 | trim_spaces();
22 | note "done";
23 | note "";
24 |
25 | note "2 - tidy code";
26 | tidyall_do();
27 | note "done";
28 | note "";
29 |
30 | note "3 - tidy/perlcritic check";
31 | tidyall_check();
32 | note "done";
33 |
34 | }
35 |
36 | # rules
37 | sub trim_spaces {
38 | my $files = list_txt_files();
39 |
40 | my $mksum;
41 | foreach (qw{ md5sum shasum }) {
42 | $mksum = qx{which $_ 2>/dev/null};
43 | if ( $? == 0 ) {
44 | chomp $mksum if $mksum;
45 | last;
46 | }
47 | }
48 |
49 | do { note "skipping trim spaces... no mksum"; return }
50 | unless $mksum && -x $mksum;
51 |
52 | foreach my $file (@$files) {
53 | my $md5a = qx[$mksum $file];
54 | qx[$^X -pi -e 's{ +\$}{}' $file];
55 | my $md5b = qx[$mksum $file];
56 | if ( $md5a ne $md5b ) {
57 | note "Removed trailing spaces from '$file'";
58 | qx{git update-index --add $file};
59 | }
60 | }
61 | }
62 |
63 | sub tidyall_do {
64 | my $tidyall = qx{which tidyall};
65 | if ( $? != 0 && !$tidyall && !-x $tidyall ) {
66 | warn "Missing tidyall binary... skipping tidyall_do";
67 | }
68 |
69 | my $files = list_perl_files();
70 | push @$files, '.gitignore'; # need to sort it
71 | foreach my $file (@$files) {
72 | my $out = qx{tidyall $file};
73 | note $out;
74 | if ( $out && $out =~ qr{tidied} ) {
75 | qx{git update-index --add $file};
76 | }
77 | }
78 |
79 | }
80 |
81 | sub tidyall_check { # this is only a check
82 | eval { require Code::TidyAll::Git::Precommit; } or do {
83 | warn
84 | "Missing module Code::TidyAll::Git::Precommit - cannot run tidyall_check\n";
85 | return 1;
86 | };
87 |
88 | note "starting Git::Precommit";
89 | Code::TidyAll::Git::Precommit->check();
90 |
91 | return;
92 | }
93 |
94 | # helper
95 | sub list_perl_files {
96 | return list_txt_files('perl');
97 | }
98 |
99 | sub list_txt_files {
100 | my ($filter) = @_;
101 |
102 | my @list;
103 |
104 | my @files = qx[ git diff --cached --name-status];
105 | chomp @files;
106 | my $ok;
107 | foreach my $file ( sort @files ) {
108 | next unless $file =~ s{^[AM]\s+}{};
109 |
110 | $ok = 0;
111 | if ( $file =~ qr{\.( t | pm | pl | psgi )$}xi ) {
112 | $ok = 1;
113 | next;
114 | }
115 | next if $filter && $filter eq 'perl'; # only perl files for perl
116 | my $type = qx{file $file};
117 | next if $? != 0;
118 | $ok = 1 if $type =~ qr{text};
119 | }
120 | continue {
121 | push @list, $file if $ok;
122 | }
123 |
124 | #note explain \@list;
125 |
126 | return \@list;
127 | }
128 |
--------------------------------------------------------------------------------
/tools/update-assets.pl:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env perl
2 |
3 | use v5.036;
4 |
5 | use FindBin;
6 | use Git::Repository ();
7 | use Pod::Usage;
8 | use Test::More;
9 |
10 | =head1 NAME
11 |
12 | update-assets.pl - Asset versioning tool for the MetaCPAN Grep Frontend
13 |
14 | =head1 SYNOPSIS
15 |
16 | # Run the script to update and timestamp all assets
17 | ./tools/update-assets.pl
18 |
19 | =head1 DESCRIPTION
20 |
21 | This script implements a cache-busting strategy for web assets by:
22 |
23 | 1. Finding all CSS/JS assets referenced in HTML and view files
24 | 2. Creating new copies with timestamp-based filenames (YYYYMMDDHHMMSS-filename.js)
25 | 3. Updating all references in source files to point to the new timestamped versions
26 | 4. Removing the old asset files
27 | 5. Committing all changes to Git
28 |
29 | The script helps ensure that users always receive the latest version of assets
30 | when changes are deployed, preventing browsers from using cached outdated versions.
31 |
32 | =head1 PROCESS
33 |
34 | The script:
35 |
36 | 1. Uses Git to find all references to assets in /_assets/ directory
37 | 2. For each asset found:
38 | - Creates a new copy with a timestamp prefix
39 | - Uses sed to replace all references in source files
40 | - Adds new files to Git and removes old ones
41 | 3. Commits all changes with a message "Bump assets to TIMESTAMP"
42 |
43 | =head1 REQUIREMENTS
44 |
45 | - Git::Repository Perl module
46 | - Command-line access to Git repository
47 | - Unix-like environment (uses sed)
48 |
49 | =head1 AUTHOR
50 |
51 | MetaCPAN Team
52 |
53 | =cut
54 |
55 | exit( run(@ARGV) // 0 ) unless caller();
56 |
57 | sub run(@args) {
58 |
59 | if ( grep { $_ eq '--help' || $_ eq '-h' } @args ) {
60 | return pod2usage( -verbose => 2, -exitval => 0 );
61 | }
62 |
63 | my $root = $FindBin::Bin . q{/..};
64 |
65 | my $git = Git::Repository->new( work_tree => $root );
66 |
67 | my @out = $git->run( 'grep', q{/_assets/}, map {"src/$_"} 'views',
68 | 'public/*.html' );
69 |
70 | my %assets;
71 |
72 | foreach my $line (@out) {
73 | my ( $file, $line ) = split( q{:}, $line, 2 );
74 | if ( $line =~ qr{/_assets/([a-zA-Z0-9-]+\.(css|js))} ) {
75 | my $asset = $1;
76 | if ( !exists $assets{$asset} ) {
77 | $assets{$asset}
78 | = -e qq{$root/src/public/_assets/$asset} ? {} : 0;
79 | }
80 | next if !$assets{$asset};
81 | $assets{$asset}->{$file} = 1; # only tag the file once
82 | }
83 | }
84 |
85 | chdir($root) or die;
86 |
87 | my $ts = qx{date "+%Y%m%d%H%M%S"};
88 | chomp $ts;
89 |
90 | die unless $ts;
91 | my $ok;
92 |
93 | foreach my $asset ( sort keys %assets ) {
94 | next unless ref $assets{$asset};
95 | my ( $prefix, $base ) = split( '-', $asset, 2 );
96 | $base = $prefix if !defined $base;
97 | my $new = qq{/_assets/$ts-$base};
98 | my $new_asset_file = qq{$root/src/public/$new};
99 | my $old_asset = "$root/src/public/_assets/$asset";
100 | system( 'cp', $old_asset, $new_asset_file ) == 0
101 | or die;
102 |
103 | my $cmd = qq{sed -i -e "s|/_assets/$asset|$new|g" }
104 | . join( ' ', sort keys %{ $assets{$asset} } );
105 | qx{$cmd};
106 | warn "Error: $cmd - $!" if $? != 0;
107 |
108 | $git->run( 'add', $new_asset_file, sort keys %{ $assets{$asset} } );
109 | $git->run( 'rm', $old_asset );
110 |
111 | foreach my $f ( sort keys %{ $assets{$asset} } ) {
112 | my $_e = $f . q{-e};
113 | unlink $_e if -e $_e;
114 | }
115 | $ok = 1;
116 | }
117 |
118 | if ($ok) {
119 | $git->run( 'commit', '-m', "Bump assets to $ts" )
120 | or die "Error while committing";
121 | note qq{Assets updated to: $ts};
122 | note scalar $git->run( 'show', '--stat', '-n1' );
123 | }
124 | else {
125 | note q{Nothing to do};
126 | }
127 |
128 | return;
129 | }
130 |
131 |
--------------------------------------------------------------------------------